Monocle helps developers and platform engineers building or managing GenAI apps monitor these in prod by making it easy to instrument their code to capture traces that are compliant with the open-source cloud-native observability ecosystem.
View on Github
If you have a python app that run locally ie python my-app.py [args]
(as opposed to hosting in a cloud serverless container like AWS Lambda or Azure Function), you can use Monocle package to enable telemetry with any code change
python -m monocle_apptrace my-app.py [args]
This will genearate the trace files monocle_trace_*.json in the local directory
monocle_telemetry
in your requirements.txt
file.
pip install monocle_telemetry
Import the package and add Monocle a single line of code to enable Monocle telemetry
from monocle_apptrace import setup_monocle_telemetry
setup_monocle_telemetry(workflow_name="your-app-name")
Now when you run the application, it will generate the trace files monocle_trace_*.json
in the directory where the application is ran.
By default monocle instrumetation will generate traces for every chain or API call that your application. If you want to combine some traces for multiple APIs under a single traceID,
start_trace()
and stop_trace()
APIs
token = start_trace()
try:
embedding_api()
inferece_api()
finally:
stop_trace(token)
monocle_trace
```python
with monocle_trace():
embedding_api()
inferece_api()Consider a chatbot application with a method called conversation() that implements a chat conversion thread with end user. This method in turn calls other APIs like OpenAI and Langchain to use LLMs and generate responses.
...
def conversation():
...
message = input("How can I help you:")
cleaned_message = response = openai.chat.completions.create(message) ==> GenAI code
result = rag_chat_chain.invoke(cleaned_message) ==> GenAI code
By default monocle instrumetation will generate a unique trace ID for every chain or API call that your application. This is very useful to track how your app is using the GenAI services. However, that’s often not sufficient. As an app developer or owner, you might want to look at bigger picture from the logic or business context. For example, you want to look at the prompts or latency etc at the conversion level than API level. Monocle has this notion of scopes which allows to you tie multiple traces/spans under a unique id so you can group it.
with monocle_trace_scope("conversation"):
message = input("How can I help you:")
cleaned_message = response = openai.chat.completions.create(message) ==> GenAI code
result = rag_chat_chain.invoke(cleaned_message) ==> GenAI code
monocle_trace_scope_method
to this conversation()
method
@monocle_trace_scope_method("conversation")
def conversation():
...
monocle_scope.json
file that’s placed in the working directory of the application
{
"package": "myapp.bot",
"object": "chat",
"method": "conversation",
"scope_name": "conversation"
}
The above code will generate two traces (one per chain invocation). All the spans in these traces will have an attribute called conversaion
with a unique value.
"attributes": {
"span.type": "inference",
...
"scope.conversation": "0xcb80e6f772968ed50ead80657b09cf52",
Imagine you have a chatbot where the frontend app is running in browser and the backend gen AI code is running in a REST framework like Flask or hosted in serverless cloud service like Azure function or AWS Lambda. Let’s say that the application has a notion of conversaions, a chat thread that goes between end user and chatbot. A conversation IDs is genearted in the frontend to track each conversions and for sent as a REST header to stateless backend to retrieve the right context. Monocle enables you to track this conversation ID as a scope so all the gen AI APIs called during a conversation are marked with this unique conversation ID.
setup_monocle_telemetry()
in your flask app and specify the http headers you want to track in the monocle_scope.json
file
```python
from flask import Flask, request, jsonify
from monocle_apptrace.instrumentation.common.instrumentor import setup_monocle_telemetryweb_app = Flask(name) setup_monocle_telemetry(workflow_name = “my-chatbot-webapp”)
def main(): web_app.run(host=”0.0.0.0”, port=8096, debug=False)
@web_app.route(‘/chat’,methods = [“POST”]) def chat(): try: coversation_id= request.headers[“coversation-id”] question = request.args[“question”] response = chat(question, coversation_id) return response
Save the `monocle_scope.json` in the folder where you run the Flask application
```json
{
"http_header": "client-id",
"scope_name": "conversation"
}
The above code will generate two traces (one per chain invocation). All the spans in these traces will have an attribute called conversaion
with a unique value.
"attributes": {
"span.type": "inference",
...
"scope.conversation": "conversion-id: 0xcb80e6f772968ed50ead80657b09cf52",