Project Monocle

Logo

Monocle helps developers and platform engineers building or managing GenAI apps monitor these in prod by making it easy to instrument their code to capture traces that are compliant with the open-source cloud-native observability ecosystem.

View on Github

This cookbook provides receipts of various instrumenation solution with Monocle

Generate out of box telemetry, without any code chante

If you have a python app that run locally ie python my-app.py [args] (as opposed to hosting in a cloud serverless container like AWS Lambda or Azure Function), you can use Monocle package to enable telemetry with any code change

python -m monocle_apptrace my-app.py [args]

This will genearate the trace files monocle_trace_*.json in the local directory

Instrument your app to enable Monocle telemetry

Combining multiple APIs under single traceID

By default monocle instrumetation will generate traces for every chain or API call that your application. If you want to combine some traces for multiple APIs under a single traceID,

Track application business logic coded in a top level application method/API

Consider a chatbot application with a method called conversation() that implements a chat conversion thread with end user. This method in turn calls other APIs like OpenAI and Langchain to use LLMs and generate responses.

...
    def conversation():
        ...
        message = input("How can I help you:")
        cleaned_message = response = openai.chat.completions.create(message)    ==> GenAI code
        result = rag_chat_chain.invoke(cleaned_message)                         ==> GenAI code

By default monocle instrumetation will generate a unique trace ID for every chain or API call that your application. This is very useful to track how your app is using the GenAI services. However, that’s often not sufficient. As an app developer or owner, you might want to look at bigger picture from the logic or business context. For example, you want to look at the prompts or latency etc at the conversion level than API level. Monocle has this notion of scopes which allows to you tie multiple traces/spans under a unique id so you can group it.

Build on existing application logic to capture scope

Imagine you have a chatbot where the frontend app is running in browser and the backend gen AI code is running in a REST framework like Flask or hosted in serverless cloud service like Azure function or AWS Lambda. Let’s say that the application has a notion of conversaions, a chat thread that goes between end user and chatbot. A conversation IDs is genearted in the frontend to track each conversions and for sent as a REST header to stateless backend to retrieve the right context. Monocle enables you to track this conversation ID as a scope so all the gen AI APIs called during a conversation are marked with this unique conversation ID.

web_app = Flask(name) setup_monocle_telemetry(workflow_name = “my-chatbot-webapp”)

def main(): web_app.run(host=”0.0.0.0”, port=8096, debug=False)

@web_app.route(‘/chat’,methods = [“POST”]) def chat(): try: coversation_id= request.headers[“coversation-id”] question = request.args[“question”] response = chat(question, coversation_id) return response

Save the `monocle_scope.json` in the folder where you run the Flask application
```json
    {
        "http_header": "client-id",
        "scope_name": "conversation"
    }

The above code will generate two traces (one per chain invocation). All the spans in these traces will have an attribute called conversaion with a unique value.

"attributes": {
    "span.type": "inference",
    ...
    "scope.conversation": "conversion-id: 0xcb80e6f772968ed50ead80657b09cf52",