-
Notifications
You must be signed in to change notification settings - Fork 47
feat: setup otel tracing #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a37ab52
to
bf99b8c
Compare
9de192c
to
521075e
Compare
121bfab
to
d1dd592
Compare
405ab11
to
b9ece4c
Compare
d1dd592
to
1b61678
Compare
b9ece4c
to
b88835a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very happy to see we're hopefully soon getting traces out of our system! I still have a few comments on implementation below
tracing_opentelemetry::layer() | ||
.with_tracer(tracer) | ||
.with_filter( | ||
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I'm not sure we actually want the same filter as for log metrics? For OTLP traces I'd expect most of the time we'd want to always trace everything, and just not keep a big retention; whereas logs would usually be much less enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is debatable. I'll leave it like here, we can change if there is a need (or even introduce separate filter for otlp).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we're not filtering properly, we could end up flooding the collector with a lot of traces related to lib (such as h2 or tonic), but that's something that we try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes! I just have a few last comments 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm just thinking the code might look slightly better with eg. a tiny macro to do the if env::var("") { if foo.parse { ... } else { eprintln } }
code that is repeated 8 times; but it's definitely not a blocker 😄
Something that'd look like this:
if_env_var!(
OTLP_BATCH_SCHEDULED_DELAY,
|m: u64| batch_processor_config = batch_processor_config.with_scheduled_delay(Duration::from_millis(m)),
);
crates/agglayer-telemetry/Cargo.toml
Outdated
prometheus = "0.13.3" | ||
thiserror.workspace = true | ||
tokio = { workspace = true, features = ["full"] } | ||
tokio-util = { workspace = true } | ||
tracing.workspace = true | ||
tracing-opentelemetry = "0.28" | ||
tracing-subscriber = { workspace = true, features = ["env-filter", "json"] } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracing_opentelemetry::layer() | ||
.with_tracer(tracer) | ||
.with_filter( | ||
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we're not filtering properly, we could end up flooding the collector with a lot of traces related to lib (such as h2 or tonic), but that's something that we try
pub struct Log { | ||
pub struct TracingConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you renaming this because we're using tracing
lib?
pub format: LogFormat, | ||
pub format: TracingFormat, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format represents the format of the logs
exposed by the agglayer-node, I don't understand why we rename this to TracingFormat
#[default] | ||
Stdout, | ||
Stderr, | ||
File(PathBuf), | ||
Otlp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure Otlp
is an output format.
It means that if we're defining Otlp
here, we do not have a way to define the log
output?
// Initialize the tracing | ||
metrics_runtime | ||
.block_on(async { agglayer_telemetry::traces::setup_tracing(&config.log, version) })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't async, thus you can execute the setup_tracing
at the same place as logging::tracing
for writer in &config.outputs { | ||
// Setup instrumentation if both otlp agent url and | ||
// otlp service name are provided as arguments | ||
if writer == &TracingOutput::Otlp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move the otlp
configuration out of this loop, or even have a dedicated TracesConfig
struct that have a method that build and export the layer directly
) | ||
.build(); | ||
|
||
let tracer = trace_provider.tracer("agglayer-otlp"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be a const
?
tracing_opentelemetry::layer() | ||
.with_tracer(tracer) | ||
.with_filter( | ||
EnvFilter::try_from_default_env().unwrap_or_else(|_| config.level.into()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a method on LogLevel
crate for this ?
resources.push(KeyValue::new("service.name", otlp_service_name.to_string())); | ||
resources.push(KeyValue::new("service.version", version.to_string())); | ||
|
||
let custom_resources: Vec<_> = std::env::var("AGGLAYER_OTLP_TAGS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be scoped as AGGLAYER_NODE_OTLP_TAGS
?
Do we want those tags to be shared among other software?
@@ -96,8 +96,13 @@ insta = { git = "https://github.com/freyskeyd/insta", branch = "chore/updating-d | |||
jsonrpsee = { version = "0.24.7", features = ["full"] } | |||
lazy_static = "1.5" | |||
mockall = "0.13.1" | |||
opentelemetry = { version = "0.29.1", features = ["metrics"] } | |||
opentelemetry-otlp = { version = "0.29.0", features = ["trace", "grpc-tonic"]} | |||
opentelemetry-prometheus = "0.29.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The opentelemetry-prometheus
crate has not updated for quite some time. The crate description states:
Warning: This crate is no longer recommended for use.
Development of the Prometheus exporter has been discontinued. See the related issue. This crate depends on the unmaintained protobuf crate and has unresolved security vulnerabilities. Version 0.29 will be the final release.
For Prometheus integration, we strongly recommend using the OTLP exporter instead. Prometheus natively supports OTLP, providing a more stable and actively maintained solution.
I see we pull in otlp
too. What do we actually rely on opentelemetry-prometheus
for? How difficult would it be to remove the dependency altogether?
Description
Consolidate log initialization.
Implement optional
otlp
tracing setup.Fixes #498
PR Checklist: