Get started with distributed business tracing in context of OTel (OpenTelemetry).
Guanchen.Monitor is the component library to facilitate business tracing. Currently supported usages are:
- Console Apps, sample via IntegrationTest.Console
- Function Apps, sample via IntegrationTest.Function
The main features (extended on top of a default OpenTelemetry setup) are:
Functionality | Description |
---|---|
StartBusinessActivity (Parent/Child/Linked) methods | Starts a standardized business activity (parent, child, or linked). |
NewBusinessEvent method | Creates a standardized business event. |
LogBusiness (Information/Error) methods | Creates a standardized business log. |
Standardization of OTel Baggage | All business activities, events and logs contain the same business identifiers such as the BusinessTrace -tag and the πΌ-prefix and continuously will be set (along with other OTel Baggage) on business activities, events and logs. |
Standardization of KQL analysis | Standardized Azure Monitor KQL-queries for business activities, events and logs. |
Reliable processing of activities and events | StartBusinessActivity and NewBusinessEvent are reliably handled via the AutoFlushActivityProcessor and suitable for auditing purposes. |
Business tracing in this project simply sits right between your app and the Azure Monitor OpenTelemetry Distro, as shown below:
flowchart TB
func["Azure Function App"]
asp["ASP.NET Core App"]
cons["Console App"]
gm["Business tracing with Guanchen.Monitor"]
subgraph azm[Azure Monitor]
azmdistro[OpenTelemetry Distro]
azmappi[Application Insights]
azmtables[Traces/Requests/Exceptions Tables]
end
func --> gm
asp --> gm
cons --> gm
gm -- implements --> azmdistro
azmdistro -- exports with OpenTelemetry Protocol (OTLP), to --> azmappi
azmappi -- persists in --> azmtables
βΉοΈ OTel Spans are Activities in .NET and end up in the
requests
table.
βΉοΈ OTel Trace IDs are ActivityTraceId's in .NET and result in operation IDs.
Use StartParentBusinessActivity()
to start a business span and StartLinkedBusinessActivity()
to start a business span linked to another span. Both of these methods generate new operation IDs. These operation IDs are an important tool to analyze business logging and can effectively seen as a trace of business operations.
Use StartChildBusinessActivity()
to start a child business span. This method will not generate a new operation ID and is generally suited for sub-processes within a business operation.
βΉοΈ OTel Span Events are ActivityEvents in .NET and end up in the
traces
table.
Use NewBusinessEvent()
to create business events within a span. These events are stored in the same table as Logs but are directly associated with a span, ensuring more reliable delivery. For more details, refer to the Reliability notes.
βΉοΈ OTel Logs are ILogger logs in .NET and end up in the
traces
table.
Use LogBusinessInformation()
or LogBusinessError()
to create business logs within a span.
βΉοΈ OTel Baggage keeps contextual information and propagates the information (which currently has its limits, as explained at the Caveats paragraph). Technically, on .NET Activities they are
Tags
and for .NET ILogger logs they will be set onOpenTelemetry.Logs.LogRecord.Attributes
. For both they will end up in the Custom properties column of their related Azure Monitor table (requests
andtraces
respectively).
Use Baggage.SetBaggage()
to set business context information to the root span, this persists throughout other spans (and their logs and events) that use the same root span, unless overwritten by a more recent Baggage.SetBaggage()
.
Use yourActivity.SetBaggage()
to set business context information to an Activity, this persists throughout other child spans (and their logs and events).
This project makes sure that the baggage is continuously being set on business logs, business spans and business span events to make this information available in Azure Monitor.
Log, span and span event functions implicitly yield a Business Trace
baggage key with the level (Information
, Error
etc.) als value.
The integration test has the following OTel Span/Activity setup. There are 4 unique activities in total, 1 for the complete batch; Splitting Tomato Batch
, and 3 for each tomato in the batch; Evaluating Tomato
, Auditing HTTP Tomato
and Auditing Queue Tomato
. All activities are related to each other, either as link or as child.
In the following diagram the activities are related to the operation IDs:
flowchart TD
act_evaluating["Evaluating Tomato"]
act_audit_http["Auditing HTTP Tomato"]
act_audit_queue["Auditing Queue Tomato"]
subgraph op_parent["Unique batch operation ID"]
act_splitting["Splitting Tomato Batch"]
end
subgraph op_child["Unique child operation ID"]
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked a
8000
ctivity --> act_evaluating
act_evaluating -- has child activity --> act_audit_http
act_evaluating -- has child activity --> act_audit_queue
end
In the following diagram the activities are related to the hosting resources:
flowchart TB
act_evaluating["Evaluating Tomato"]
act_audit_http["Auditing HTTP Tomato"]
act_audit_queue["Auditing Queue Tomato"]
subgraph op_parent["Console App"]
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating
act_splitting["Splitting Tomato Batch"]
end
subgraph resource_sb["Service Bus"]
queue[π¬]
act_evaluating -- passing through --> queue
end
subgraph op_child["Function App"]
queue -- has child activity --> act_audit_queue
act_evaluating -- has child activity --> act_audit_http
end
Deploy and configure an Azure Service Bus with, as it's being used to send messages via the Console App to the Function App.
For IntegrationTest.Console, prepare the following:
-
Set the Environment Variables on your machine:
setx APPLICATIONINSIGHTS_CONNECTION_STRING "InstrumentationKey=...;IngestionEndpoint=...;LiveEndpoint=...;ApplicationId=..." setx AZURE_TENANT_ID "..." setx AZURE_SERVICEBUS_FULLYQUALIFIEDNAMESPACE "....servicebus.windows.net"
-
Set the permissions for the identity running the Console App (likely yourself) on the Azure Service Bus to write messages.
For IntegrationTest.Function, prepare the following:
-
Provision a Function App resource and deploy the Function App.
-
Set the permissions for the identity running the Function App (likely the Managed Identity) on the Azure Service Bus to read messages.
-
Configure the App Insights connection and the Service Bus connection (
ServiceBusConnection
). The environment variables inside your Function App resource should look similar to this:[ { "name": "APPLICATIONINSIGHTS_CONNECTION_STRING", "value": "InstrumentationKey=...;IngestionEndpoint=...;LiveEndpoint=...;ApplicationId=...", "slotSetting": false }, { "name": "AzureWebJobsStorage", "value": "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net", "slotSetting": false }, { "name": "DEPLOYMENT_STORAGE_CONNECTION_STRING", "value": "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net", "slotSetting": false }, { "name": "ServiceBusConnection__credential", "value": "managedidentity", "slotSetting": false }, { "name": "ServiceBusConnection__fullyQualifiedNamespace", "value": "....servicebus.windows.net", "slotSetting": false } ]
Use Transaction Diagnostics to find the business traces and requests intertwined with the full technical diagnostics. It might look like this:
Use a kql query to get all business information. It might look like this:
To replicate this query result, do the following:
Create function AppRequestsTomatoScope
with all the distributed request tables:
union withsource= SourceApp
workspace("6d489906-8b6a-4eba-b697-373f29a1a98b").AppRequests
Create function AppTracesTomatoScope
with all the distributed trace tables:
union withsource= SourceApp
workspace("6d489906-8b6a-4eba-b697-373f29a1a98b").AppTraces
Run the following query to figure out what operation IDs are related to specific Tomato IDs:
let TomatoIds = dynamic([
"1c946279-04b9-4c25-8846-0dabf1d59533"
]);
AppRequestsTomatoScope
| extend
TomatoId = tostring(Properties["Tomato ID"])
| where TomatoId in (TomatoIds)
| distinct OperationId, TomatoId
Run the following query to analyze specific operation IDs:
let OperationIds = dynamic([
"db58273aca2e0e93a3ca4897a99ab36a"
]);
let Requests = AppRequestsTomatoScope
| where OperationId in (OperationIds)
| project BusinessServiceSpanId = Id, BusinessService = Name;
AppTracesTomatoScope
| where OperationId in (OperationIds)
| extend
BusinessTrace = tostring(Properties["Business Trace"]),
TomatoId = tostring(Properties["Tomato ID"]),
TomatoPartId = tostring(Properties["Tomato Part ID"])
| where isnotempty(BusinessTrace)
| project
ParentId,
TimeGenerated,
BusinessTrace,
Message,
TomatoId,
TomatoPartId
| join kind=leftouter (Requests) on $left.ParentId == $right.BusinessServiceSpanId
| project
TimeGenerated,
BusinessTrace,
BusinessService,
BusinessServiceSpanId,
Message,
TomatoId,
TomatoPartId
| order by BusinessService asc, TimeGenerated asc
The following table mappings are relevant in this project:
Azure Monitor Table | OpenTelemetry DataType | .NET Implementation |
---|---|---|
customMetrics | Metrics | System.Diagnostics.Metrics.Meter |
exceptions | Exceptions | System.Exception |
requests | Spans (Server, Producer) | System.Diagnostics.Activity |
traces | Logs | Microsoft.Extensions.Logging.ILogger |
traces | Span Events | System.Diagnostics.ActivityEvent |
For a complete overview, see How do Application Insights telemetry types map to OpenTelemetry?.
CreateLoggerFactory
contains SetMinimumLevel
.
host.json
contains logLevel.default
.
β Using logs for auditing purposes is not supported.
Eventual export of logs to Azure Monitor is made possible by Offline Storage and Automatic Retries, which are enabled by default in the Azure Monitor OpenTelemetry Distro, but isn't a feature of the base .NET OpenTelemetry implementation yet.
- This doesn't reliably send messages again in case of a disaster at the client side, only when there is an outage at the Azure Monitor side.
- Logs could be automatically converted to span events via
AttachLogsToActivityEvent
to get the same reliability as (events in) spans, but this changes the event message and properties too much and thus requires some more tweaking viaLogToActivityEventConversionOptions
.
βοΈ Using spans for auditing purposes is supported. This includes span events via
ActivityEvent
, added to (completed) spans viaActivity.AddEvent
.
On time export of spans to Azure Monitor is made possible by ForceFlush, implemented via AutoFlushActivityProcessor
. This processor is in OpenTelemetry.Extensions.
- To confirm that it reliably exports completed spans, please see the
FailFastTest
in IntegrationTest. - See open-telemetry/opentelemetry-specification#2944 (comment) for more context.
- It doesn't check the flush status but does wait until it's done, please see: open-telemetry/opentelemetry-dotnet-contrib#2721.
- What is guaranteed on a completed Activity?
- Baggage is in a rough state, see:
- Conflicting implementations: open-telemetry/opentelemetry-dotnet#5667
- Azure Function support not complete: Azure/azure-functions-host#11026
- Streamlining attempt in .NET: dotnet/runtime#112803
- Service Bus (messaging) support missing: Azure/azure-sdk#6959
- Creating new root traces is done via some (abstracted away) tricks in this project, learn more here open-telemetry/opentelemetry-dotnet#984
- OpenTelemetry .NET Logs
- OpenTelemetry .NET Contrib
- Go OTEL Audit, a package for auditing Go code for Microsoft compliance purposes
- Audit.NET
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.