Monocle helps developers and platform engineers building or managing GenAI apps monitor these in prod by making it easy to instrument their code to capture traces that are compliant with the open-source cloud-native observability ecosystem.
View on Github
This guide demonstrates how to use Monocle to instrument OpenAI and Vector DB interactions, collecting telemetry data to analyze and monitor their performance.
The example includes the following components:
openai_client.py
): A client for interacting with OpenAI’s Chat APIvector_db.py
): An in-memory vector database with OpenAI embeddingsOpenAIClient
is a wrapper around the OpenAI API that provides methods for:
chat()
methodformat_messages()
# Initialize client
client = OpenAIClient()
# Format messages and send to OpenAI
messages = client.format_messages(
system_prompts=["You are a helpful assistant."],
user_prompts=["Tell me a joke about programming."]
)
response = client.chat(messages=messages, model="gpt-3.5-turbo")
InMemoryVectorDB
is a simple vector database implementation that:
# Initialize vector database
vector_db = InMemoryVectorDB()
# Store documents
vector_db.store_text("doc1", "Python is a programming language", {"source": "docs"})
# Search for similar documents
results = vector_db.search_by_text("programming languages", top_k=2)
Output processors define what data to extract from your methods. Two examples are provided:
output_processor_inference.py
defines how to extract data from OpenAI chat completions:
INFERENCE_OUTPUT_PROCESSOR = {
"type": "inference",
"attributes": [
[
# Entity attributes for the provider
{
"attribute": "type",
"accessor": lambda arguments: "openai"
},
{
"attribute": "deployment",
"accessor": lambda arguments: arguments['kwargs'].get('model', 'unknown')
},
# More attributes...
]
],
"events": [
{
"name": "data.input",
"attributes": [
{
"attribute": "input",
"accessor": lambda arguments: [
msg["content"]
for msg in arguments['kwargs'].get('messages', [])
] if isinstance(arguments['kwargs'].get('messages'), list) else []
}
]
},
# More events...
]
}
output_processor_vector.py
defines how to extract data from vector database operations:
VECTOR_OUTPUT_PROCESSOR = {
"type": "retrieval",
"attributes": [
[
# Vector store attributes
{
"attribute": "name",
"accessor": lambda arguments: type(arguments["instance"]).__name__,
},
# More attributes...
]
],
"events": [
{
"name": "data.input",
"attributes": [
{
"attribute": "input",
"accessor": lambda arguments: arguments["args"][0] if arguments["args"] else None
}
]
},
# More events...
]
}
The key to instrumentation is the accessor
function, which extracts data from method calls:
arguments["instance"]
: The object instance (e.g., the OpenAIClient or InMemoryVectorDB)arguments["args"]
: Positional arguments passed to the methodarguments["kwargs"]
: Keyword arguments passed to the methodarguments["result"]
: The return value from the method callThese give you access to all inputs, outputs, and context of the instrumented methods.
Set up Monocle’s telemetry system with your output processors:
from monocle_apptrace.instrumentation.common.wrapper_method import WrapperMethod
from monocle_apptrace.instrumentation.common.instrumentor import setup_monocle_telemetry
setup_monocle_telemetry(
workflow_name="openai.app",
wrapper_methods=[
WrapperMethod(
package="openai_client", # Module name
object_name="OpenAIClient", # Class name
method="chat", # Method to instrument
span_name="openai_client.chat", # Span name in telemetry
output_processor=INFERENCE_OUTPUT_PROCESSOR
),
# More method wrappers...
]
)
export OPENAI_API_KEY=your_api_key_here
pip install -r requirements.txt
python example.py
# Or use the provided shell script
./run_example.sh
Monocle generates JSON trace files in your directory with names like:
monocle_trace_openai.app_<trace_id>_<timestamp>.json
The trace files contain structured telemetry data:
{
"name": "openai_client.chat",
"context": { /* trace context */ },
"attributes": {
"entity.2.type": "openai",
"entity.2.provider_name": "OpenAI",
"entity.2.deployment": "gpt-3.5-turbo",
"entity.2.inference_endpoint": "https://api.openai.com/v1",
"entity.3.name": "gpt-3.5-turbo",
"entity.3.type": "model.llm.gpt-3.5-turbo"
},
"events": [
{
"name": "data.input",
"timestamp": "2025-02-27T10:36:49.985586Z",
"attributes": {
"input": [
"You are a helpful AI assistant.",
"Tell me a short joke about programming."
]
}
},
{
"name": "data.output",
"attributes": {
"response": "Why do programmers prefer dark mode? Because the light attracts bugs!"
}
},
{
"name": "metadata",
"attributes": {
"prompt_tokens": 26,
"completion_tokens": 14,
"total_tokens": 40
}
}
]
}
data.input
: The inputs provided to the methoddata.output
: The response or results from the methodmetadata
: Additional information like token usageTo instrument your own code:
By customizing the output processors, you can collect exactly the telemetry data you need from any Python method.