8000 GitHub - erwinkramer/otel-business: Get started with distributed business tracing in context of OTel (OpenTelemetry).
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

erwinkramer/otel-business

Repository files navigation

Business tracing with OTel πŸ’Ό

CC BY-NC-SA 4.0 GitHub commit activity

Business tracing with OTel mascot

Get started with distributed business tracing in context of OTel (OpenTelemetry).

Guanchen.Monitor is the component library to facilitate business tracing. Currently supported usages are:

  1. Console Apps, sample via IntegrationTest.Console
  2. Function Apps, sample via IntegrationTest.Function

The main features (extended on top of a default OpenTelemetry setup) are:

Functionality Description
StartBusinessActivity (Parent/Child/Linked) methods Starts a standardized business activity (parent, child, or linked).
NewBusinessEvent method Creates a standardized business event.
LogBusiness (Information/Error) methods Creates a standardized business log.
Standardization of OTel Baggage All business activities, events and logs contain the same business identifiers such as the BusinessTrace-tag and the πŸ’Ό-prefix and continuously will be set (along with other OTel Baggage) on business activities, events and logs.
Standardization of KQL analysis Standardized Azure Monitor KQL-queries for business activities, events and logs.
Reliable processing of activities and events StartBusinessActivity and NewBusinessEvent are reliably handled via the AutoFlushActivityProcessor and suitable for auditing purposes.

Library positioning

Business tracing in this project simply sits right between your app and the Azure Monitor OpenTelemetry Distro, as shown below:

flowchart TB

func["Azure Function App"]
asp["ASP.NET Core App"]
cons["Console App"]

gm["Business tracing with Guanchen.Monitor"]

subgraph azm[Azure Monitor]
azmdistro[OpenTelemetry Distro]
azmappi[Application Insights]
azmtables[Traces/Requests/Exceptions Tables]
end

func --> gm
asp --> gm
cons --> gm
gm -- implements --> azmdistro
azmdistro -- exports with OpenTelemetry Protocol (OTLP), to --> azmappi
azmappi -- persists in --> azmtables

Loading

Usage

Spans

ℹ️ OTel Spans are Activities in .NET and end up in the requests table.

ℹ️ OTel Trace IDs are ActivityTraceId's in .NET and result in operation IDs.

Use StartParentBusinessActivity() to start a business span and StartLinkedBusinessActivity() to start a business span linked to another span. Both of these methods generate new operation IDs. These operation IDs are an important tool to analyze business logging and can effectively seen as a trace of business operations.

Use StartChildBusinessActivity() to start a child business span. This method will not generate a new operation ID and is generally suited for sub-processes within a business operation.

Span Events

ℹ️ OTel Span Events are ActivityEvents in .NET and end up in the traces table.

Use NewBusinessEvent() to create business events within a span. These events are stored in the same table as Logs but are directly associated with a span, ensuring more reliable delivery. For more details, refer to the Reliability notes.

Logs

ℹ️ OTel Logs are ILogger logs in .NET and end up in the traces table.

Use LogBusinessInformation() or LogBusinessError() to create business logs within a span.

Baggage

ℹ️ OTel Baggage keeps contextual information and propagates the information (which currently has its limits, as explained at the Caveats paragraph). Technically, on .NET Activities they are Tags and for .NET ILogger logs they will be set on OpenTelemetry.Logs.LogRecord.Attributes. For both they will end up in the Custom properties column of their related Azure Monitor table (requests and traces respectively).

Use Baggage.SetBaggage() to set business context information to the root span, this persists throughout other spans (and their logs and events) that use the same root span, unless overwritten by a more recent Baggage.SetBaggage().

Use yourActivity.SetBaggage() to set business context information to an Activity, this persists throughout other child spans (and their logs and events).

This project makes sure that the baggage is continuously being set on business logs, business spans and business span events to make this information available in Azure Monitor.

Log, span and span event functions implicitly yield a Business Trace baggage key with the level (Information, Error etc.) als value.

Integration tests

The integration test has the following OTel Span/Activity setup. There are 4 unique activities in total, 1 for the complete batch; Splitting Tomato Batch, and 3 for each tomato in the batch; Evaluating Tomato, Auditing HTTP Tomato and Auditing Queue Tomato. All activities are related to each other, either as link or as child.

In the following diagram the activities are related to the operation IDs:

flowchart TD

act_evaluating["Evaluating Tomato"]
act_audit_http["Auditing HTTP Tomato"]
act_audit_queue["Auditing Queue Tomato"]

subgraph op_parent["Unique batch operation ID"]

act_splitting["Splitting Tomato Batch"]

end

subgraph op_child["Unique child operation ID"]

act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked a
8000
ctivity --> act_evaluating

act_evaluating -- has child activity --> act_audit_http
act_evaluating -- has child activity --> act_audit_queue

end
Loading

In the following diagram the activities are related to the hosting resources:

flowchart TB

act_evaluating["Evaluating Tomato"]
act_audit_http["Auditing HTTP Tomato"]
act_audit_queue["Auditing Queue Tomato"]

subgraph op_parent["Console App"]

act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating
act_splitting -- has linked activity --> act_evaluating

act_splitting["Splitting Tomato Batch"]

end

subgraph resource_sb["Service Bus"]
queue[πŸ“¬]

act_evaluating -- passing through --> queue

end

subgraph op_child["Function App"]

queue -- has child activity --> act_audit_queue
act_evaluating -- has child activity --> act_audit_http

end
Loading

General prerequisites

Deploy and configure an Azure Service Bus with, as it's being used to send messages via the Console App to the Function App.

IntegrationTest.Console prerequisites

For IntegrationTest.Console, prepare the following:

  1. Set the Environment Variables on your machine:

    setx APPLICATIONINSIGHTS_CONNECTION_STRING "InstrumentationKey=...;IngestionEndpoint=...;LiveEndpoint=...;ApplicationId=..."
    setx AZURE_TENANT_ID "..."
    setx AZURE_SERVICEBUS_FULLYQUALIFIEDNAMESPACE "....servicebus.windows.net"
  2. Set the permissions for the identity running the Console App (likely yourself) on the Azure Service Bus to write messages.

IntegrationTest.Function prerequisites

For IntegrationTest.Function, prepare the following:

  1. Provision a Function App resource and deploy the Function App.

  2. Set the permissions for the identity running the Function App (likely the Managed Identity) on the Azure Service Bus to read messages.

  3. Configure the App Insights connection and the Service Bus connection (ServiceBusConnection). The environment variables inside your Function App resource should look similar to this:

    [
        {
            "name": "APPLICATIONINSIGHTS_CONNECTION_STRING",
            "value": "InstrumentationKey=...;IngestionEndpoint=...;LiveEndpoint=...;ApplicationId=...",
            "slotSetting": false
        },
        {
            "name": "AzureWebJobsStorage",
            "value": "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net",
            "slotSetting": false
        },
        {
            "name": "DEPLOYMENT_STORAGE_CONNECTION_STRING",
            "value": "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net",
            "slotSetting": false
        },
        {
            "name": "ServiceBusConnection__credential",
            "value": "managedidentity",
            "slotSetting": false
        },
        {
            "name": "ServiceBusConnection__fullyQualifiedNamespace",
            "value": "....servicebus.windows.net",
            "slotSetting": false
        }
    ]

Analyzing via Transaction view

Use Transaction Diagnostics to find the business traces and requests intertwined with the full technical diagnostics. It might look like this:

Transaction View

Analyzing via query

Use a kql query to get all business information. It might look like this:

Transaction View

To replicate this query result, do the following:

Create function AppRequestsTomatoScope with all the distributed request tables:

union withsource= SourceApp
workspace("6d489906-8b6a-4eba-b697-373f29a1a98b").AppRequests

Create function AppTracesTomatoScope with all the distributed trace tables:

union withsource= SourceApp
workspace("6d489906-8b6a-4eba-b697-373f29a1a98b").AppTraces

Run the following query to figure out what operation IDs are related to specific Tomato IDs:

let TomatoIds = dynamic([
    "1c946279-04b9-4c25-8846-0dabf1d59533"
]);
AppRequestsTomatoScope
| extend 
    TomatoId = tostring(Properties["Tomato ID"])
| where TomatoId in (TomatoIds)
| distinct OperationId, TomatoId

Run the following query to analyze specific operation IDs:

let OperationIds = dynamic([
    "db58273aca2e0e93a3ca4897a99ab36a"
]);
let Requests = AppRequestsTomatoScope
    | where OperationId in (OperationIds)
    | project BusinessServiceSpanId = Id, BusinessService = Name;
AppTracesTomatoScope
| where OperationId in (OperationIds)
| extend 
    BusinessTrace = tostring(Properties["Business Trace"]),
    TomatoId = tostring(Properties["Tomato ID"]),
    TomatoPartId = tostring(Properties["Tomato Part ID"])
| where isnotempty(BusinessTrace)
| project
    ParentId,
    TimeGenerated,
    BusinessTrace,
    Message,
    TomatoId,
    TomatoPartId
| join kind=leftouter (Requests) on $left.ParentId == $right.BusinessServiceSpanId
| project
    TimeGenerated,
    BusinessTrace,
    BusinessService,
    BusinessServiceSpanId,
    Message,
    TomatoId,
    TomatoPartId
| order by BusinessService asc, TimeGenerated asc

Target tables

The following table mappings are relevant in this project:

Azure Monitor Table OpenTelemetry DataType .NET Implementation
customMetrics Metrics System.Diagnostics.Metrics.Meter
exceptions Exceptions System.Exception
requests Spans (Server, Producer) System.Diagnostics.Activity
traces Logs Microsoft.Extensions.Logging.ILogger
traces Span Events System.Diagnostics.ActivityEvent

For a complete overview, see How do Application Insights telemetry types map to OpenTelemetry?.

Loglevel

IntegrationTest.Console

CreateLoggerFactory contains SetMinimumLevel.

IntegrationTest.Function

host.json contains logLevel.default.

Reliability notes

Logs

❌ Using logs for auditing purposes is not supported.

Eventual export of logs to Azure Monitor is made possible by Offline Storage and Automatic Retries, which are enabled by default in the Azure Monitor OpenTelemetry Distro, but isn't a feature of the base .NET OpenTelemetry implementation yet.

  1. This doesn't reliably send messages again in case of a disaster at the client side, only when there is an outage at the Azure Monitor side.
  2. Logs could be automatically converted to span events via AttachLogsToActivityEvent to get the same reliability as (events in) spans, but this changes the event message and properties too much and thus requires some more tweaking via LogToActivityEventConversionOptions.

Spans

βœ”οΈ Using spans for auditing purposes is supported. This includes span events via ActivityEvent, added to (completed) spans via Activity.AddEvent.

On time export of spans to Azure Monitor is made possible by ForceFlush, implemented via AutoFlushActivityProcessor. This processor is in OpenTelemetry.Extensions.

  1. To confirm that it reliably exports completed spans, please see the FailFastTest in IntegrationTest.
  2. See open-telemetry/opentelemetry-specification#2944 (comment) for more context.
  3. It doesn't check the flush status but does wait until it's done, please see: open-telemetry/opentelemetry-dotnet-contrib#2721.
  4. What is guaranteed on a completed Activity?

Caveats

  1. Baggage is in a rough state, see:
  2. Creating new root traces is done via some (abstracted away) tricks in this project, learn more here open-telemetry/opentelemetry-dotnet#984

References

  1. OpenTelemetry .NET Logs
  2. OpenTelemetry .NET Contrib
  3. Go OTEL Audit, a package for auditing Go code for Microsoft compliance purposes
  4. Audit.NET

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

Languages

0