Monocle helps developers and platform engineers building or managing GenAI apps monitor these in prod by making it easy to instrument their code to capture traces that are compliant with the open-source cloud-native observability ecosystem.
View on Github
Span is an observation of a code/method executed. Each span has a unique ID. It records the start time and end time of the code’s execution along with additional information relevant to that operation. Before the code execution starts, a span object is created in the memory of the host process executing this code. It’ll capture the current time as start of time of span. At this stage the span is considered active. It’ll stay active till the code execution ends. Once the code execution is complete, it’ll record the current time as end time, capture any additional relevant information (eg argument, return value, environment setttings etc.). Now the span is marked as closed and it will be queued to be saved to some configured storage. Note that the code that generated this span could in turn call other methods that are also instrumented. Those will generate spans of their own. These will be “child” spans which will refer to the span ID of the calling code as “parent” span. An initial span which has no parent is referred as “root” span.
A trace is a collection of spans with a common ID called traceID. When the first active span gets created, a new unique traceID is generated and assigned to that span. All the child spans generated by execution of other instrumented code/methods will share the same traceID. Once this top span ends, this trace ends. This way all the code executed as part of the top level instrumented code will have a common traceID to group them together. For example, consider following sequence where f1()
is the first instrumented method is executed, it calls other instrumented methods f2(),f3(),f4() and f5()
f1()--> f2() --> f3()
--> f4() --> f5()
In the above sequence, all method execution will generate a span each and they all will have a common traceID. Now if a new instrumented methods is executed after f1() finishes, it will be the first active span in the process’s execution context and a will get a new traceID.
Each child span inherits the parent’s trace ID. When spans running in same process, it captures it from process memory/context etc. But consider the above example again, where the f4()-->f5()
code is not part of the process that executing f1(). It’s a remote call, say over REST. From the overall application’s point of view, the work done if f4()
and f5()
is part of f1()
and you want same traceID associated with all spans. You want the instrumentation to seamlessly pass the tracedID over such remote calls and continue that instead of generating a new one. It’s the responsibility of Monocle to provide such mechanism to make thsi trace ID propogation transparent to the application logic and architecture.
Monocle extends these generic span types by enriching additional attributes/data for genAI specific operations.
There are the core spans that capture details of genAI component operations like call to an LLM or vectore store. The purpose of these spans is to capture the details the applications interaction with core genAI comoponents. These spans are triggered by pre-instrumented methods that handle such operations.
These are the spans that are created by a top level method that anchors a higher level of abstraction for underlying core genAI APIs. For example a langchain.invoke() which under the cover calls langchain.llm_invoke() or langchain.vector_retrieval(). Consider following psuedo code of a langchain rag pattern API,
response = rag_chain.invoke(prompt)
--> cleaned_prompt = llm1.chat(prompt)
--> context = vector_store.retrieve(cleaned_prompt)
--> response = llm2.chat(system_prompt+context+cleaned_prompt)
--> return response
If we only instrument the top level invoke call, then we’ll trace the top level prompt and response interaction between application and langchain. But we’ll miss the details like how a system prompt was added and send to mulitple LLMs and what context was extracted from a vector store etc. On the other hand, if we only instrument the low level calls to LLM and vector, then we’ll miss the fact that those are part of same RAG. Hence we instrument all of them. This exaple would genearte an anchor spna for invoke()
method, a retrieval span for retrieve()
method and two inference spans for each chat()
method. All of these will have common traceID.
The anchor spans also provides an observation window of your application interaction with an high level SDK or service. It will illustrate facts such as how much time take by the genAI service invocation compared to other local logic.
Workflow spans are synthetic spans that are created to trace the full trace. It captures the summary of the full trace including the time window, the process running this code (set as workflow_name
in the API to enab le Monocle instrumentation) and runtime environment details such as hosting service (Azure function, Lambda function etc).
The workflow spans is generated when a new trace starts or when a trace is propogated. They provide the base line observation window for the entire trace or a fragment of trace executed in a process.
Consider following example,
setup_monocle_telemetry(workflow='bot')
rag_chain.invoke()
--> context = retrieval()
--> new_prompt = REST --> azure.func.chat(prompt) -->
setup_monocle_telemetry(workflow='moderator')
return llm(moderator_system_prompt+prompt)
--> response = llm(new_prompt)
This will generate following spans:
Span{name='workflow.bot', type= workflow, traceID = xx1, spanID = yy0, parentID=None} ==> Workflow for new trace start
Span{name='chain.invoke', type=anchor, traceID = xx1, spanID = yy1, parentID=yy0} ==> anchor span for chain invoke
Span{name='chain.retrieval', type=retrieval, traceID = xx1, spanID = yy2, parentID = yy1} ==> Retrieval API span
Span{name='workflow.moderator', type=workflow, traceID = xx1, spanID = zz1, parentID=yy1} ==> Workflow for propogated trace fragement
Span{name='az.func.chat', type=anchor, traceID = xx1, spanID = zz2, parentID=zz1} ==> anchor span for az function invoke
Span{name='chain.infer', type=inference, traceID = xx1, spanID = zz2, parentID=zz2} ==> inference
Span{name='chain.infer',type=inference, traceID = xx1, spanID = yy3, parentID=yy1} ==> inference
Monocle metamodel is the way to manage standardization across all supported GenAI component stack. It includes the list of components that Monocle can identify and extract metadata. This help understanding and analyzing the traces from applications that include multiple components and can evolve over time. This is one of core value that Monocle provides to it’s user community.
The spans genearated by Monocle need to stored for future analysis. An exporter is a mechanism Monocle provides multiple
While a trace is a physical/technical tracking of APIs invoked by your application, a scopes is the logical stage of your application can be tracke with Monocle. For example an OpenAI inference API invocation would map to a trace, while a series of inferense and vector store APIs to facilitate a conversion in a chatbot app is a scope. Monocle provides programatic and declarative mechanism to track scope across traces.
Monocle supports tracing GenAI applications coded in Python and Typescript.
npm install --save monacle2ai
const { setupMonocle } = require("monacle2ai")
setup_monocle_telemetry(workflow_name="your-app-name")
pip install monocle_apptrace
from monocle_apptrace import setup_monocle_telemetry
main()
function
setup_monocle_telemetry(workflow_name="your-app-name")
Monocle spans provides detail of each genAI operation executed by your application in a consistent metamodel format. Monocle trace is OpenTelemetry compatible collection of spans with common trace ID. Each span has a JSON structure that include a traceID, a unique span id and timestamp, as per OpenTelemtry spec. There are three types of spans that Monocle generates,
inference
: When an API is called to generate model inferenceretrival
: When an API is called to generate embedding and communicate with vector storeworkflow
: A summary of for trace
The genAI related information captured by Monocle is in the attributes
and events
section of this span JSON. The attributes
section lists various entities that was part of the operation/API which generated this span eg Azure OpenAI as a model inference provider, gpt-4o-mini as a LLM etc. The events
section includes the data and metadata from this operation, for example prompt to LLM, response from LLM and token details. Here’s a complete example of traces generated by this sample python application instrumented with Monocle.
These span headers are included in every span.
"context": {
"trace_id": "0x62672060b60c246e5c7bfdf46d93e2b3", ==> Trace id common to all spans of this trace
"span_id": "0xfbd245d1509ef554", ==> Span id, unique to this span
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x34fc562203a4a926",
"start_time": "2025-03-12T17:05:57.256058Z", ==> timestamp of span start
"end_time": "2025-03-12T17:05:57.720410Z", ==> timestamp of span end
The inference span include details of genAI components used in the inference operation.The information is devided into two sections, attributes and events. A given trace can have mulitple inference spans, one per every inference check.
The attribute part of the span provides details of components like model and model hosting service
"attributes": {
"monocle_apptrace.version": "0.3.0b6",
"span.type": "inference",
"entity.1.type": "inference.azure_openai", ==> Inference service type
"entity.1.deployment": "gpt-4o-mini",
"entity.1.inference_endpoint": "https://my-az-openai.openai.azure.com/",
"entity.2.name": "gpt-35-turbo", ==> ILLM
"entity.2.type": "model.llm.gpt-35-turbo",
"entity.count": 2
}
"events": [
{
"name": "data.input", ==> Inputs to LLM
"timestamp": "2025-03-12T17:05:59.165628Z",
"attributes": {
"input": [
"{'system': \"You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.\"}",
"{'user': 'What is an americano?'}",
"[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text=\"You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.\")]), ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Context information is below.\\n---------------------\\nfile_path: coffee.txt\\n\\nCoffee is a hot drink made from the roasted and ground seeds (coffee beans) of a tropical shrub\\nA latte consists of one or more shots of espresso, served in a glass (or sometimes a cup), into which hot steamed milk is added\\nAmericano is a type of coffee drink prepared by diluting an espresso shot with hot water at a 1:3 to 1:4 ratio, resulting in a drink that retains the complex flavors of espresso, but in a lighter way\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: What is an americano?\\nAnswer: ')])]"
]
}
},
{
"name": "data.output", ==> Responses from LLM
"timestamp": "2025-03-12T17:05:59.165655Z",
"attributes": {
"response": [
"An Americano is a type of coffee drink prepared by diluting an espresso shot with hot water at a ratio of 1:3 to 1:4. This process results in a drink that retains the complex flavors of espresso while being lighter in taste."
]
}
},
{
"name": "metadata", ==> Token metadata from LLM
"timestamp": "2025-03-12T17:05:59.165675Z",
"attributes": {
"temperature": 0.1,
"completion_tokens": 52,
"prompt_tokens": 220,
"total_tokens": 272
}
}
]
The inference span include details of genAI components used in the inference operation.The information is devided into two sections, attributes and events. A given trace could have multiple retrieval spans.
The atrributes describe embedding model and vector store used
"attributes": {
"monocle_apptrace.version": "0.3.0b6",
"span.type": "retrieval",
"entity.1.name": "ChromaVectorStore", ==> Vector store
"entity.1.type": "vectorstore.ChromaVectorStore",
"entity.2.name": "text-embedding-3-large", ==> Embedding model
"entity.2.type": "model.embedding.text-embedding-3-large",
"entity.count": 2
}
The events capture the search and retrieval of vector data
"events": [
{
"name": "data.input", ==> prompts to search
"timestamp": "2025-03-12T17:05:57.720379Z",
"attributes": {
"input": "What is an americano?"
}
},
{
"name": "data.output", ==> Context retrieved
"timestamp": "2025-03-12T17:05:57.720398Z",
"attributes": {
"response": "Coffee is a hot drink made from the roasted and ground seeds (coffee beans) of a tropical shrub\nA la..."
}
}
]
A workflow span captures summary the trace, like start and end of full trace, type of client tools etc. Note that there’s only one workflow span for a given trace.
"attributes": {
"monocle_apptrace.version": "0.3.0b6",
"span.type": "workflow",entity.1.name": "my-chatbot", ==> workflow name set in setup_monocle_telemetry()
"entity.1.type": "workflow.llamaindex", ==> Type of framework
"entity.2.type": "app_hosting.github_codespace", ==> Application hosting environment
"entity.2.name": "my-chatbot-container-xyz",
"entity.count": 2
}
Monocle exporters handle storing the trace for future analysis. By default each trace is stored as a JSON file in the directory where the app runs. You can configure the exporter by setting an environment variable MONOCLE_EXPORTER to exporter setting (listed below). MONOCLE_EXPOERTER=<comma-separated-list>
By default Monocle flushes the traces in batch of 10. Note that the traces are written to it destination asynchronously so it doesn’t impact applications response. Following are the supported exporters.
|Exporter Name| Exporter Setting|Description|Format|Trace destination|Additional configuration|
|-|-|-|-|-|-|
|File (default)| file
|Export to local file system| JSON|local directory||
|Console|console
|Export to console|text|Console/stdout||
|Memory|memory
|keep in memory|string|Process memory||
|s3|s3
|Export to AWS S3 bucket|ND JSON|S3 bucket|Install Monocle aws package dependencies: pip install monocle_apptrace[aws]
Env variables for s3 exporter:
MONOCLE_AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID : AWS access key
MONOCLE_AWS_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY: AWS secret
MONOCLE_S3BUCKET_NAME: S3 bucket where traces will be stored
MONOCLE_S3_KEY_PREFIX: ND JSON file name prefixe (default: monocle_trace)|
|blob|blob
|Export to Azure blob store|ND JSON|Blob container|Install Monocle Azure package dependencies: pip install monocle_apptrace[azure]
Env variables for blob exporter:
MONOCLE_BLOB_CONNECTION_STRING: Connection string for Azure blob store
MONOCLE_BLOB_CONTAINER_NAME : Blob container to store the trace ndjson files|
|okahu|okahu
|Export to Okahu.ai service|JSON|Okahu Tenant|Env variables for Okahu exporter:
OKAHU_API_KEY : API key for Okahu tenant|
|Framework|Python|Typescript| |-|-|-| |Langchain|✅|✅| |Llama Index|✅|✅| |HayStack|✅|Not Applicable|
|API|Python|Typescript| |-|-|-| |OpenAI|✅|✅| |AWS Boto|✅|✅| |Anthropic|✅|✅|
|Service|Python|Typescript| |-|-|-| |OpenAI|✅|✅| |Azure OpenAI|✅|✅| |AWS SageMaker|✅|✅| |AWS Bedrock|✅|✅| |Anthropic|✅|✅| |NVIDIA Triton|✅|❌|
|Service|Python|Typescript| |-|-|-| |Chrome|✅|✅| |OpenSearch|✅|✅|
Imagine you have a chatbot application that supports a long conversion ie multiple question/answer back and forth between end user and bot. It uses various genAI tech components/services like LLMs and vector stores. A simple instrumentation will generate a trace per genAI API call (eg invocation of a framework chat or direct OpenAI API). As the app developer or owner, you are more interested in tracking the conversions than just APIs. The scopes in Monocle enables that use case. You can set the scope in application either programatically or declaratively. You can specific a value for scope or Monocle will generate a unique value (GUID) which gives you options to choose what’s best suited for your use case. Please see the Monocle python cookbook for the details and examples.
If you are using a genAI technology that’s not yet supported by Monocle out of the box or have you own proparitory code, you can extend monocle to generate traces in the Monocle format.
setup_monocle_telemetry
def setup_monocle_telemetry(
workflow_name: str,
span_processors: List[opentelemetry.sdk.trace.SpanProcessor] = None,
span_handlers: Dict[str, monocle_apptrace.instrumentation.common.span_handler.SpanHandler] = None,
wrapper_methods: List[Union[dict, monocle_apptrace.instrumentation.common.wrapper_method.WrapperMethod]] = None,
union_with_default_methods: bool = True
) -> None
Set up Monocle telemetry for the application.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
workflow_name | str | The name of the workflow to be used as the service name in telemetry. | None |
span_processors | List[SpanProcessor] | Custom span processors to use instead of the default ones. If None, BatchSpanProcessors with Monocle exporters will be used. |
ones |
span_handlers | Dict[str, SpanHandler] | Dictionary of span handlers to be used by the instrumentor, mapping handler names to handler objects. | None |
wrapper_methods | List[Union[dict, WrapperMethod]] | Custom wrapper methods for instrumentation. If None, default methods will be used. | methods |
monocle_exporters_list | str, optional | Comma-separated list of exporters to use. This will override the env setting MONOCLE_EXPORTERS. Supported exporters are: s3, blob, okahu, file, memory, console. This can’t be combined with span_processors . |
|
union_with_default_methods | bool, default=True | If True, combine the provided wrapper_methods with the default methods. If False, only use the provided wrapper_methods. |
methods |
start_trace
def start_trace(
)
Starts a new trace. All the spans created after this call will be part of the same trace.
Returns:
Type | Description |
---|---|
Token | A token representing the attached context for the workflow span. This token is to be used later to stop the current trace. Returns None if tracing fails. |
Raises:
Type | Description |
---|---|
Exception | The function catches all exceptions internally and logs a warning. |
stop_scope
def stop_scope(
token: object
) -> None
Stop the active scope. All the spans created after this will not have the scope attached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token | None | The token that was returned when the scope was started. | None |
Returns:
Type | Description |
---|---|
None | None |
start_scope
def start_scope(
scope_name: str,
scope_value: str = None
) -> object
Start a new scope with the given name and and optional value. If no value is provided, a random UUID will be generated.
All the spans, across traces created after this call will have the scope attached until the scope is stopped.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scope_name | None | The name of the scope. | None |
scope_value | None | Optional value of the scope. If None, a random UUID will be generated. | None |
Returns:
Type | Description |
---|---|
Token | A token representing the attached context for the scope. This token is to be used later to stop the current scope. |
stop_scope
def stop_scope(
token: object
) -> None
Stop the active scope. All the spans created after this will not have the scope attached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
token | None | The token that was returned when the scope was started. | None |
Returns:
Type | Description |
---|---|
None | None |
monocle_trace_scope
def monocle_trace_scope(
scope_name: str,
scope_value: str = None
)
Context manager to start and stop a scope. All the spans, across traces created within the encapsulated code will have the scope attached.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scope_name | None | The name of the scope. | None |
scope_value | None | Optional value of the scope. If None, a random UUID will be generated. | None |
monocle_trace_http_route
def monocle_trace_http_route(
func
)
Decorator to start and stop a continue traces and scope for a http route. It will also initiate new scopes from the http headers if configured in monocle_scopes.json
All the spans, across traces created in the route will have the scope attached.