8000 feat: setup otel tracing by atanmarko · Pull Request #499 · agglayer/agglayer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: setup otel tracing #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

feat: setup otel tracing #499

wants to merge 10 commits into from

Conversation

atanmarko
Copy link
Contributor
@atanmarko atanmarko commented Dec 23, 2024

Description

Consolidate log initialization.
Implement optional otlp tracing setup.

Fixes #498

PR Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added or updated tests that comprehensively prove my change is effective or that my feature works

@atanmarko atanmarko self-assigned this Dec 23, 2024
@atanmarko atanmarko added the enhancement New feature or request label Dec 23, 2024
@atanmarko atanmarko marked this pull request as ready for review December 23, 2024 15:01
@atanmarko atanmarko requested review from a team, Freyskeyd and 0xaatif and removed request for a team December 23, 2024 15:01
@atanmarko atanmarko force-pushed the feat/otel-tracing branch 3 times, most recently from 9de192c to 521075e Compare December 25, 2024 14:47
@atanmarko atanmarko force-pushed the feat/otel-tracing branch 2 times, most recently from 405ab11 to b9ece4c Compare January 6, 2025 09:08
Base automatically changed from feat/version-info to main January 9, 2025 05:38
@Freyskeyd Freyskeyd linked an issue Jan 11, 2025 that may be closed by this pull request
@atanmarko atanmarko force-pushed the feat/otel-tracing branch from b9ece4c to b88835a Compare May 28, 2025 13:52
@atanmarko atanmarko requested review from iljakuklic and Ekleog-Polygon and removed request for 0xaatif May 28, 2025 13:52
@atanmarko atanmarko marked this pull request as draft May 28, 2025 13:53
@atanmarko atanmarko marked this pull request as ready for review May 30, 2025 15:25
Copy link
Contributor
@Ekleog-Polygon Ekleog-Polygon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very happy to see we're hopefully soon getting traces out of our system! I still have a few comments on implementation below

tracing_opentelemetry::layer()
.with_tracer(tracer)
.with_filter(
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'm not sure we actually want the same filter as for log metrics? For OTLP traces I'd expect most of the time we'd want to always trace everything, and just not keep a big retention; whereas logs would usually be much less enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is debatable. I'll leave it like here, we can change if there is a need (or even introduce separate filter for otlp).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we're not filtering properly, we could end up flooding the collector with a lot of traces related to lib (such as h2 or tonic), but that's something that we try

@atanmarko atanmarko requested a review from Ekleog-Polygon June 4, 2025 12:02
Copy link
Contributor
@Ekleog-Polygon Ekleog-Polygon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes! I just have a few last comments 😄

@atanmarko atanmarko requested a review from Ekleog-Polygon June 4, 2025 16:11
Copy link
Contributor
@Ekleog-Polygon Ekleog-Polygon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm just thinking the code might look slightly better with eg. a tiny macro to do the if env::var("") { if foo.parse { ... } else { eprintln } }code that is repeated 8 times; but it's definitely not a blocker 😄

Something that'd look like this:

if_env_var!(
    OTLP_BATCH_SCHEDULED_DELAY,
    |m: u64| batch_processor_config = batch_processor_config.with_scheduled_delay(Duration::from_millis(m)),
);

prometheus = "0.13.3"
thiserror.workspace = true
tokio = { workspace = true, features = ["full"] }
tokio-util = { workspace = true }
tracing.workspace = true
tracing-opentelemetry = "0.28"
tracing-subscriber = { workspace = true, features = ["env-filter", "json"] }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

crates/agglayer-telemetry/src/traces.rs
tracing_opentelemetry::layer()
.with_tracer(tracer)
.with_filter(
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we're not filtering properly, we could end up flooding the collector with a lot of traces related to lib (such as h2 or tonic), but that's something that we try

pub struct Log {
pub struct TracingConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you renaming this because we're using tracing lib?

pub format: LogFormat,
pub format: TracingFormat,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format represents the format of the logs exposed by the agglayer-node, I don't understand why we rename this to TracingFormat

#[default]
Stdout,
Stderr,
File(PathBuf),
Otlp,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure Otlp is an output format.
It means that if we're defining Otlp here, we do not have a way to define the log output?

Comment on lines +63 to +65
// Initialize the tracing
metrics_runtime
.block_on(async { agglayer_telemetry::traces::setup_tracing(&config.log, version) })?;
Copy link
Member
10000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't async, thus you can execute the setup_tracing at the same place as logging::tracing

for writer in &config.outputs {
// Setup instrumentation if both otlp agent url and
// otlp service name are provided as arguments
if writer == &TracingOutput::Otlp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move the otlp configuration out of this loop, or even have a dedicated TracesConfig struct that have a method that build and export the layer directly

)
.build();

let tracer = trace_provider.tracer("agglayer-otlp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a const ?

tracing_opentelemetry::layer()
.with_tracer(tracer)
.with_filter(
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can have a method on LogLevel crate for this ?

resources.push(KeyValue::new("service.name", otlp_service_name.to_string()));
resources.push(KeyValue::new("service.version", version.to_string()));

let custom_resources: Vec<_> = std::env::var("AGGLAYER_OTLP_TAGS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be scoped as AGGLAYER_NODE_OTLP_TAGS ?
Do we want those tags to be shared among other software?

@@ -96,8 +96,13 @@ insta = { git = "https://github.com/freyskeyd/insta", branch = "chore/updating-d
jsonrpsee = { version = "0.24.7", features = ["full"] }
lazy_static = "1.5"
mockall = "0.13.1"
opentelemetry = { version = "0.29.1", features = ["metrics"] }
opentelemetry-otlp = { version = "0.29.0", features = ["trace", "grpc-tonic"]}
opentelemetry-prometheus = "0.29.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opentelemetry-prometheus crate has not updated for quite some time. The crate description states:

Warning: This crate is no longer recommended for use.

Development of the Prometheus exporter has been discontinued. See the related issue. This crate depends on the unmaintained protobuf crate and has unresolved security vulnerabilities. Version 0.29 will be the final release.

For Prometheus integration, we strongly recommend using the OTLP exporter instead. Prometheus natively supports OTLP, providing a more stable and actively maintained solution.

I see we pull in otlp too. What do we actually rely on opentelemetry-prometheus for? How difficult would it be to remove the dependency altogether?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup agglayer open telemetry
4 participants
0