Monocle User Guide

Monocle Concepts

Span

Span is an observation of a code/method executed. Each span has a unique ID. It records the start time and end time of the code’s execution along with additional information relevant to that operation. Before the code execution starts, a span object is created in the memory of the host process executing this code. It’ll capture the current time as start of time of span. At this stage the span is considered active. It’ll stay active till the code execution ends. Once the code execution is complete, it’ll record the current time as end time, capture any additional relevant information (eg argument, return value, environment setttings etc.). Now the span is marked as closed and it will be queued to be saved to some configured storage. Note that the code that generated this span could in turn call other methods that are also instrumented. Those will generate spans of their own. These will be “child” spans which will refer to the span ID of the calling code as “parent” span. An initial span which has no parent is referred as “root” span.

Trace

A trace is a collection of spans with a common ID called traceID. When the first active span gets created, a new unique traceID is generated and assigned to that span. All the child spans generated by execution of other instrumented code/methods will share the same traceID. Once this top span ends, this trace ends. This way all the code executed as part of the top level instrumented code will have a common traceID to group them together. For example, consider following sequence where f1() is the first instrumented method is executed, it calls other instrumented methods f2(),f3(),f4() and f5()

f1()--> f2() --> f3()
    --> f4() --> f5()

In the above sequence, all method execution will generate a span each and they all will have a common traceID. Now if a new instrumented methods is executed after f1() finishes, it will be the first active span in the process’s execution context and a will get a new traceID.

Trace ID propogation

Each child span inherits the parent’s trace ID. When spans running in same process, it captures it from process memory/context etc. But consider the above example again, where the f4()-->f5() code is not part of the process that executing f1(). It’s a remote call, say over REST. From the overall application’s point of view, the work done if f4() and f5() is part of f1() and you want same traceID associated with all spans. You want the instrumentation to seamlessly pass the tracedID over such remote calls and continue that instead of generating a new one. It’s the responsibility of Monocle to provide such mechanism to make thsi trace ID propogation transparent to the application logic and architecture.

Types of spans in Monocle

Monocle extends these generic span types by enriching additional attributes/data for genAI specific operations.

GenAI spans

There are the core spans that capture details of genAI component operations like call to an LLM or vectore store. The purpose of these spans is to capture the details the applications interaction with core genAI comoponents. These spans are triggered by pre-instrumented methods that handle such operations.

Inference span Represents interaction with LLMs, captures details like model, prompts, response and other metadata (eg tokens)
Retrieval span Represents interactions with vector stores like embedding creating, vector retrieval etc. Captures the model, search query, response, vector embedding etc.
HTTP span Represents communications between two different workflows where the traceID or scopes are propogated.

Generic spans

These are the spans that are created by a top level method that anchors a higher level of abstraction for underlying core genAI APIs. For example a langchain.invoke() which under the cover calls langchain.llm_invoke() or langchain.vector_retrieval(). Consider following psuedo code of a langchain rag pattern API,

response = rag_chain.invoke(prompt)
            --> cleaned_prompt = llm1.chat(prompt)
            --> context = vector_store.retrieve(cleaned_prompt)
            --> response = llm2.chat(system_prompt+context+cleaned_prompt)
            --> return response

If we only instrument the top level invoke call, then we’ll trace the top level prompt and response interaction between application and langchain. But we’ll miss the details like how a system prompt was added and send to mulitple LLMs and what context was extracted from a vector store etc. On the other hand, if we only instrument the low level calls to LLM and vector, then we’ll miss the fact that those are part of same RAG. Hence we instrument all of them. This exaple would genearte an anchor spna for invoke() method, a retrieval span for retrieve() method and two inference spans for each chat() method. All of these will have common traceID. The anchor spans also provides an observation window of your application interaction with an high level SDK or service. It will illustrate facts such as how much time take by the genAI service invocation compared to other local logic.

Workflow spans

Workflow spans are synthetic spans that are created to trace the full trace. It captures the summary of the full trace including the time window, the process running this code (set as workflow_name in the API to enab le Monocle instrumentation) and runtime environment details such as hosting service (Azure function, Lambda function etc). The workflow spans is generated when a new trace starts or when a trace is propogated. They provide the base line observation window for the entire trace or a fragment of trace executed in a process. Consider following example,

setup_monocle_telemetry(workflow='bot')
rag_chain.invoke()
                --> context = retrieval() 
                --> new_prompt = REST --> azure.func.chat(prompt) --> 
                                                                setup_monocle_telemetry(workflow='moderator')
                                                                return llm(moderator_system_prompt+prompt)
                --> response = llm(new_prompt)

This will generate following spans:

Span{name='workflow.bot', type= workflow, traceID = xx1, spanID = yy0, parentID=None} ==> Workflow for new trace start
Span{name='chain.invoke', type=anchor, traceID = xx1, spanID = yy1, parentID=yy0} ==> anchor span for chain invoke
Span{name='chain.retrieval', type=retrieval, traceID = xx1, spanID = yy2, parentID = yy1} ==> Retrieval API span
Span{name='workflow.moderator', type=workflow, traceID = xx1, spanID = zz1, parentID=yy1} ==> Workflow for propogated trace fragement
Span{name='az.func.chat', type=anchor, traceID = xx1, spanID = zz2, parentID=zz1} ==> anchor span for az function invoke
Span{name='chain.infer', type=inference, traceID = xx1, spanID = zz2, parentID=zz2} ==> inference
Span{name='chain.infer',type=inference, traceID = xx1, spanID = yy3, parentID=yy1} ==> inference

Metamodel

Monocle metamodel is the way to manage standardization across all supported GenAI component stack. It includes the list of components that Monocle can identify and extract metadata. This help understanding and analyzing the traces from applications that include multiple components and can evolve over time. This is one of core value that Monocle provides to it’s user community.

Exporters

The spans genearated by Monocle need to stored for future analysis. An exporter is a mechanism Monocle provides multiple

Scopes

While a trace is a physical/technical tracking of APIs invoked by your application, a scopes is the logical stage of your application can be tracke with Monocle. For example an OpenAI inference API invocation would map to a trace, while a series of inferense and vector store APIs to facilitate a conversion in a chatbot app is a scope. Monocle provides programatic and declarative mechanism to track scope across traces.

Setup Monocle

Monocle supports tracing GenAI applications coded in Python and Typescript.

Instrument TypeScript GenAI code

Get the Monocle package

    npm install --save monacle2ai

Instrument your app code

  const { setupMonocle } = require("monacle2ai")
  setup_monocle_telemetry(workflow_name="your-app-name")

Instrument Python GenAI code

Get the Monocle package

    pip install monocle_apptrace

Import the Monocle package

  from monocle_apptrace import setup_monocle_telemetry

Setup instrumentation in your main() function
```
  setup_monocle_telemetry(workflow_name="your-app-name")
```
Understanding Monocle traces and spans

Monocle spans provides detail of each genAI operation executed by your application in a consistent metamodel format. Monocle trace is OpenTelemetry compatible collection of spans with common trace ID. Each span has a JSON structure that include a traceID, a unique span id and timestamp, as per OpenTelemtry spec. There are three types of spans that Monocle generates,
inference: When an API is called to generate model inference
retrival: When an API is called to generate embedding and communicate with vector store
workflow: A summary of for trace The genAI related information captured by Monocle is in the attributes and events section of this span JSON. The attributes section lists various entities that was part of the operation/API which generated this span eg Azure OpenAI as a model inference provider, gpt-4o-mini as a LLM etc. The events section includes the data and metadata from this operation, for example prompt to LLM, response from LLM and token details. Here’s a complete example of traces generated by this sample python application instrumented with Monocle.
Open telemetry compatible Span headers

These span headers are included in every span.
```
  "context": {
      "trace_id": "0x62672060b60c246e5c7bfdf46d93e2b3",   ==> Trace id common to all spans of this trace
      "span_id": "0xfbd245d1509ef554",                    ==> Span id, unique to this span
      "trace_state": "[]"
  },
  "kind": "SpanKind.INTERNAL",
  "parent_id": "0x34fc562203a4a926",
  "start_time": "2025-03-12T17:05:57.256058Z",            ==> timestamp of span start
  "end_time": "2025-03-12T17:05:57.720410Z",              ==> timestamp of span end
```

Inference span

The inference span include details of genAI components used in the inference operation.The information is devided into two sections, attributes and events. A given trace can have mulitple inference spans, one per every inference check.

Attribute

The attribute part of the span provides details of components like model and model hosting service

"attributes": {
    "monocle_apptrace.version": "0.3.0b6",
    "span.type": "inference",
    "entity.1.type": "inference.azure_openai",              ==> Inference service type
    "entity.1.deployment": "gpt-4o-mini",
    "entity.1.inference_endpoint": "https://my-az-openai.openai.azure.com/",
    "entity.2.name": "gpt-35-turbo",                        ==> ILLM
    "entity.2.type": "model.llm.gpt-35-turbo",
    "entity.count": 2
}

Events

"events": [
    {   
        "name": "data.input",                               ==> Inputs to LLM
        "timestamp": "2025-03-12T17:05:59.165628Z",
        "attributes": {
            "input": [
                "{'system': \"You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.\"}",
                "{'user': 'What is an americano?'}",
                "[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text=\"You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.\")]), ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Context information is below.\\n---------------------\\nfile_path: coffee.txt\\n\\nCoffee is a hot drink made from the roasted and ground seeds (coffee beans) of a tropical shrub\\nA latte consists of one or more shots of espresso, served in a glass (or sometimes a cup), into which hot steamed milk is added\\nAmericano is a type of coffee drink prepared by diluting an espresso shot with hot water at a 1:3 to 1:4 ratio, resulting in a drink that retains the complex flavors of espresso, but in a lighter way\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: What is an americano?\\nAnswer: ')])]"
            ]   
        }   
    },  
    {   
        "name": "data.output",                          ==> Responses from LLM
        "timestamp": "2025-03-12T17:05:59.165655Z",
        "attributes": {
            "response": [
                "An Americano is a type of coffee drink prepared by diluting an espresso shot with hot water at a ratio of 1:3 to 1:4. This process results in a drink that retains the complex flavors of espresso while being lighter in taste."
            ]   
        }   
    },  
    {   
        "name": "metadata",                             ==> Token metadata from LLM
        "timestamp": "2025-03-12T17:05:59.165675Z",
        "attributes": {
            "temperature": 0.1,
            "completion_tokens": 52, 
            "prompt_tokens": 220,
            "total_tokens": 272 
        }   
    }   
] 

Retrieval span

The inference span include details of genAI components used in the inference operation.The information is devided into two sections, attributes and events. A given trace could have multiple retrieval spans.

Attribute

The atrributes describe embedding model and vector store used

"attributes": {
    "monocle_apptrace.version": "0.3.0b6",
    "span.type": "retrieval",
    "entity.1.name": "ChromaVectorStore",                       ==> Vector store
    "entity.1.type": "vectorstore.ChromaVectorStore",
    "entity.2.name": "text-embedding-3-large",                  ==> Embedding model
    "entity.2.type": "model.embedding.text-embedding-3-large",
    "entity.count": 2
}   

Events

The events capture the search and retrieval of vector data

"events": [
    {
        "name": "data.input",                                   ==> prompts to search
        "timestamp": "2025-03-12T17:05:57.720379Z",
        "attributes": {
            "input": "What is an americano?"
        }
    },
    {
        "name": "data.output",                                  ==> Context retrieved
        "timestamp": "2025-03-12T17:05:57.720398Z",
        "attributes": {
            "response": "Coffee is a hot drink made from the roasted and ground seeds (coffee beans) of a tropical shrub\nA la..."
        }
    }
]

Workflow span

A workflow span captures summary the trace, like start and end of full trace, type of client tools etc. Note that there’s only one workflow span for a given trace.

"attributes": {
    "monocle_apptrace.version": "0.3.0b6",
    "span.type": "workflow",entity.1.name": "my-chatbot",       ==> workflow name set in setup_monocle_telemetry()
    "entity.1.type": "workflow.llamaindex",                     ==> Type of framework
    "entity.2.type": "app_hosting.github_codespace",            ==> Application hosting environment
    "entity.2.name": "my-chatbot-container-xyz",                                    
    "entity.count": 2
}

Exporting traces

Monocle exporters handle storing the trace for future analysis. By default each trace is stored as a JSON file in the directory where the app runs. You can configure the exporter by setting an environment variable MONOCLE_EXPORTER to exporter setting (listed below).
MONOCLE_EXPOERTER=<comma-separated-list>
By default Monocle flushes the traces in batch of 10. Note that the traces are written to it destination asynchronously so it doesn’t impact applications response. Following are the supported exporters. |Exporter Name| Exporter Setting|Description|Format|Trace destination|Additional configuration| |-|-|-|-|-|-| |File (default)| file|Export to local file system| JSON|local directory|| |Console|console|Export to console|text|Console/stdout|| |Memory|memory|keep in memory|string|Process memory|| |s3|s3|Export to AWS S3 bucket|ND JSON|S3 bucket|Install Monocle aws package dependencies: pip install monocle_apptrace[aws]
Env variables for s3 exporter:
MONOCLE_AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID : AWS access key
MONOCLE_AWS_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY: AWS secret
MONOCLE_S3BUCKET_NAME: S3 bucket where traces will be stored
MONOCLE_S3_KEY_PREFIX: ND JSON file name prefixe (default: monocle_trace)| |blob|blob|Export to Azure blob store|ND JSON|Blob container|Install Monocle Azure package dependencies: pip install monocle_apptrace[azure]
Env variables for blob exporter:
MONOCLE_BLOB_CONNECTION_STRING: Connection string for Azure blob store
MONOCLE_BLOB_CONTAINER_NAME : Blob container to store the trace ndjson files| |okahu|okahu|Export to Okahu.ai service|JSON|Okahu Tenant|Env variables for Okahu exporter:
OKAHU_API_KEY : API key for Okahu tenant|

Monocle coverage

GenAI application frameworks

|Framework|Python|Typescript| |-|-|-| |Langchain|✅|✅| |Llama Index|✅|✅| |HayStack|✅|Not Applicable|

Inference Services API

|API|Python|Typescript| |-|-|-| |OpenAI|✅|✅| |AWS Boto|✅|✅| |Anthropic|✅|✅|

Inference

|Service|Python|Typescript| |-|-|-| |OpenAI|✅|✅| |Azure OpenAI|✅|✅| |AWS SageMaker|✅|✅| |AWS Bedrock|✅|✅| |Anthropic|✅|✅| |NVIDIA Triton|✅|❌|

Vector stores

|Service|Python|Typescript| |-|-|-| |Chrome|✅|✅| |OpenSearch|✅|✅|

Using scopes

Imagine you have a chatbot application that supports a long conversion ie multiple question/answer back and forth between end user and bot. It uses various genAI tech components/services like LLMs and vector stores. A simple instrumentation will generate a trace per genAI API call (eg invocation of a framework chat or direct OpenAI API). As the app developer or owner, you are more interested in tracking the conversions than just APIs. The scopes in Monocle enables that use case. You can set the scope in application either programatically or declaratively. You can specific a value for scope or Monocle will generate a unique value (GUID) which gives you options to choose what’s best suited for your use case. Please see the Monocle python cookbook for the details and examples.

Extending Monocle

If you are using a genAI technology that’s not yet supported by Monocle out of the box or have you own proparitory code, you can extend monocle to generate traces in the Monocle format.

Monocle API Referece

Python APIs

`setup_monocle_telemetry`

def setup_monocle_telemetry(
    workflow_name: str,
    span_processors: List[opentelemetry.sdk.trace.SpanProcessor] = None,
    span_handlers: Dict[str, monocle_apptrace.instrumentation.common.span_handler.SpanHandler] = None,
    wrapper_methods: List[Union[dict, monocle_apptrace.instrumentation.common.wrapper_method.WrapperMethod]] = None,
    union_with_default_methods: bool = True
) -> None

Set up Monocle telemetry for the application.

Parameters:

Name	Type	Description	Default
workflow_name	str	The name of the workflow to be used as the service name in telemetry.	None
span_processors	List[SpanProcessor]	Custom span processors to use instead of the default ones. If None, BatchSpanProcessors with Monocle exporters will be used.	ones
span_handlers	Dict[str, SpanHandler]	Dictionary of span handlers to be used by the instrumentor, mapping handler names to handler objects.	None
wrapper_methods	List[Union[dict, WrapperMethod]]	Custom wrapper methods for instrumentation. If None, default methods will be used.	methods
monocle_exporters_list	str, optional	Comma-separated list of exporters to use. This will override the env setting MONOCLE_EXPORTERS. Supported exporters are: s3, blob, okahu, file, memory, console. This can’t be combined with `span_processors`.
union_with_default_methods	bool, default=True	If True, combine the provided wrapper_methods with the default methods. If False, only use the provided wrapper_methods.	methods

`start_trace`

def start_trace(
    
)

Starts a new trace. All the spans created after this call will be part of the same trace.

Returns:

Type	Description
Token	A token representing the attached context for the workflow span. This token is to be used later to stop the current trace. Returns None if tracing fails.

Raises:

Type	Description
Exception	The function catches all exceptions internally and logs a warning.

`stop_scope`

def stop_scope(
    token: object
) -> None

Stop the active scope. All the spans created after this will not have the scope attached.

Parameters:

Name	Type	Description	Default
token	None	The token that was returned when the scope was started.	None

Returns:

Type	Description
None	None

`start_scope`

def start_scope(
    scope_name: str,
    scope_value: str = None
) -> object

Start a new scope with the given name and and optional value. If no value is provided, a random UUID will be generated.

All the spans, across traces created after this call will have the scope attached until the scope is stopped.

Parameters:

Name	Type	Description	Default
scope_name	None	The name of the scope.	None
scope_value	None	Optional value of the scope. If None, a random UUID will be generated.	None

Returns:

Type	Description
Token	A token representing the attached context for the scope. This token is to be used later to stop the current scope.

`stop_scope`

def stop_scope(
    token: object
) -> None

Stop the active scope. All the spans created after this will not have the scope attached.

Parameters:

Name	Type	Description	Default
token	None	The token that was returned when the scope was started.	None

Returns:

Type	Description
None	None

`monocle_trace_scope`

def monocle_trace_scope(
    scope_name: str,
    scope_value: str = None
)

Context manager to start and stop a scope. All the spans, across traces created within the encapsulated code will have the scope attached.

Parameters:

Name	Type	Description	Default
scope_name	None	The name of the scope.	None
scope_value	None	Optional value of the scope. If None, a random UUID will be generated.	None

`monocle_trace_http_route`

def monocle_trace_http_route(
    func
)

Decorator to start and stop a continue traces and scope for a http route. It will also initiate new scopes from the http headers if configured in monocle_scopes.json

All the spans, across traces created in the route will have the scope attached.

Monocle User Guide

Monocle Concepts

Span

Trace

Trace ID propogation

Types of spans in Monocle

GenAI spans

Generic spans

Workflow spans

Metamodel

Exporters

Scopes

Setup Monocle

Instrument TypeScript GenAI code

Instrument Python GenAI code

Understanding Monocle traces and spans

Open telemetry compatible Span headers

Inference span

Attribute

Events

Retrieval span

Attribute

Events

Workflow span

Exporting traces

Monocle coverage

GenAI application frameworks

Inference Services API

Inference

Vector stores

Using scopes

Extending Monocle

Monocle API Referece

Python APIs