8000 OTLP no longer works on 3.2.4 · Issue #9868 · fluent/fluent-bit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

OTLP no longer works on 3.2.4 #9868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
s-ledyakhov opened this issue Jan 24, 2025 · 9 comments
Closed

OTLP no longer works on 3.2.4 #9868

s-ledyakhov opened this issue Jan 24, 2025 · 9 comments
Labels
Stale waiting-for-release This has been fixed/merged but it's waiting to be included in a release.

Comments

@s-ledyakhov
Copy link
s-ledyakhov commented Jan 24, 2025

Bug Report

Describe the bug
The OTLP input stopped working on version 3.2.4, it works on version 3.1.10

To Reproduce

  • Add version 3.2.4 as a sidecar to the manifests
  • Configure the input as:
[INPUT]
    Name        opentelemetry
    Listen      0.0.0.0     
    Part        4318   
  • Get errors skipping flush for event chunk with zero records.
  • Fail to get output regardless of settings (es/stout/etc)
    logs:
[2025/01/24 06:58:06] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2025/01/24 06:58:06] [debug] [output] skipping flush for event chunk with zero records.
[2025/01/24 06:58:06] [debug] [out flush] cb_destroy coro_id=2
[2025/01/24 06:58:06] [debug] [task] destroy task=0x7f04d2a36500 (task_id=0)

Expected behavior
Logging to stdout with this configuration (it works on version 3.1)

[INPUT]
    Name        opentelemetry
    Listen      0.0.0.0     
    Port        4318   

[OUTPUT]
    Name        stdout
    Match       *
[2025/01/24 12:43:00] [debug] [task] created task=0x7fd608a36640 id=0 OK
[2025/01/24 12:43:00] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2025/01/24 12:43:00] [debug] [output:tcp:tcp.1] task_id=0 assigned to thread #1
[0] v1_logs: [[-1.000000000, {"schema"=>"otlp", "resource_id"=>0, "scope_id"=>0}], {"resource"=>{"attributes"=>{"service.name"=>"XXX", "service.namespace"=>"XXX", "service.version"=>"XXXf", "service.instance.id"=>"ef23a5a9-e0c1-4c4a-1887dc0d0", "telemetry.sdk.name"=>"opentelemetry", "telemetry.sdk.language"=>"dotnet", "telemetry.sdk.version"=>"1.10.0"}}, "schema_url"=>"", "scope"=>{"name"=>"XXX.EventQueueService"}}]
[1] v1_logs: [[1737722580.1930176384, {"otlp"=>{"observed_timestamp"=>1737722580260860500, "timestamp"=>1737722580260860500, "severity_number"=>9, "severity_text"=>"Information", "attributes"=>{"{OriginalFormat}"=>"poll"}, "trace_flags"=>0}}], {"message"=>"poll"}]
[2] v1_logs: [[1737722580.1932268984, {"otlp"=>{"observed_timestamp"=>1737722580262953100, "timestamp"=>1737722580262953100, "severity_number"=>13, "severity_text"=>"Warning", "attributes"=>{"error"=>"", "{OriginalFormat}"=>"{error}"}, "trace_flags"=>0}}], {"message"=>""}]
[3] v1_logs: [[-2.000000000, {}], {}]
[2025/01/24 12:43:00] [debug] [out flush] cb_destroy coro_id=1
[2025/01/24 12:43:00] [debug] [upstream] KA connection #61 to XXX:5044 is connected
[2025/01/24 12:43:00] [debug] [upstream] KA connection #61 to XXX:5044 is now available
[2025/01/24 12:43:00] [debug] [out flush] cb_destroy coro_id=0
[2025/01/24 12:43:00] [debug] [task] destroy task=0x7fd608a36640 (task_id=0)

Your Environment

  • Version used: 3.2.4 / 3.1.10
  • Configuration:
[SERVICE]
   Flush        1
   Daemon       Off
   Log_Level    debug
   
[INPUT]
   Name        opentelemetry
   Listen      0.0.0.0     
   Port        4318   

[OUTPUT]
   Name        stdout
   Match       *
  • Environment name and version (e.g. Kubernetes? What version?): Kubernetes 1.28.2
  • Operating System and version: docker image fluent/fluent-bit:3.2.4 (I also tried this image without a sidecar on the local machine, the problem is the same)
  • Filters and plugins: opentelemetry
@patrick-stephens
Copy link
Contributor
patrick-stephens commented Jan 24, 2025

Does it work with http2 off?

[INPUT]
   Name        opentelemetry
   Listen      0.0.0.0     
   Port        4318   
   Http2       off

The default was changed to http/2 in 3.2 and this has been seen to have an impact on some systems.

@s-ledyakhov
Copy link
Author
s-ledyakhov commented Jan 24, 2025

Does it work with http2 off?

I suppose not. In this case, I do not receive any messages in stdout at all, although there are logs in the application

Fluent Bit v3.2.4
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  _____ 
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /  
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2025/01/24 14:40:44] [ info] Configuration:
[2025/01/24 14:40:44] [ info]  flush time     | 1.000000 seconds
[2025/01/24 14:40:44] [ info]  grace          | 5 seconds
[2025/01/24 14:40:44] [ info]  daemon         | 0
[2025/01/24 14:40:44] [ info] ___________
[2025/01/24 14:40:44] [ info]  inputs:
[2025/01/24 14:40:44] [ info]      opentelemetry
[2025/01/24 14:40:44] [ info] ___________
[2025/01/24 14:40:44] [ info]  filters:
[2025/01/24 14:40:44] [ info] ___________
[2025/01/24 14:40:44] [ info]  outputs:
[2025/01/24 14:40:44] [ info]      stdout.0
[2025/01/24 14:40:44] [ info] ___________
[2025/01/24 14:40:44] [ info]  collectors:
[2025/01/24 14:40:44] [ info] [fluent bit] version=3.2.4, commit=5b0ff04120, pid=1
[2025/01/24 14:40:44] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2025/01/24 14:40:44] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/01/24 14:40:44] [ info] [simd    ] disabled
[2025/01/24 14:40:44] [ info] [cmetrics] version=0.9.9
[2025/01/24 14:40:44] [ info] [ctraces ] version=0.5.7
[2025/01/24 14:40:44] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2025/01/24 14:40:44] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2025/01/24 14:40:44] [debug] [opentelemetry:opentelemetry.0] created event channels: read=25 write=26
[2025/01/24 14:40:44] [debug] [downstream] listening on 0.0.0.0:4318
[2025/01/24 14:40:44] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2025/01/24 14:40:44] [debug] [stdout:stdout.0] created event channels: read=28 write=29
[2025/01/24 14:40:44] [ info] [sp] stream processor started
[2025/01/24 14:40:44] [ info] [output:stdout:stdout.0] worker #0 started

Although on version 3.1 I also don't get output when the http2 off parameter is enabled, if remove it, everything works again

@nalcabio-tom
Copy link
nalcabio-tom commented Jan 24, 2025

I mentioned the same problem back in December https://fluent-all.slack.com/archives/C0CTQGHKJ/p1734880937722159

It's been a while so my memory has faded, but I think the problem might be with the output sink. I seem to remember that fluent bit could send open telemetry data to another collector, but nothing is printed to stdout.

More details: https://fluent-all.slack.com/archives/C0CTQGHKJ/p1735049084785439?thread_ts=1734955974.377129&cid=C0CTQGHKJ 8000 I reported in this thread that the issue disappeared. I don't remember what I changed, but with 3.2.4 the problem is definitely back

I just checked our upstream collector, and it's not receiving any data, so ignore my initial theory above

@leonardo-albertovich
Copy link
Collaborator

This issue was caused by the introduction of log group metadata, it only affects logs and has been identified and patched, I don't know if the patch will be included in 3.2.5 or if it will be released shortly after.

@nalcabio-tom
Copy link

Regarding http2 off, #9613 (comment)

@patrick-stephens patrick-stephens added waiting-for-release This has been fixed/merged but it's waiting to be included in a release. and removed status: waiting-for-triage labels Jan 24, 2025
@s-ledyakhov
Copy link
Author

Regarding http2 off, #9613 (comment)

I tried http2 off on versions 3.1.10, 3.2.2 and 3.2.4, it didn't give results for either es or stdout
But on versions 3.2.2 and 3.1.10, without the http2 off parameter, everything works in all output options

I mentioned the same problem back in December https://fluent-all.slack.com/archives/C0CTQGHKJ/p1734880937722159

Unfortunately, I don't have access to Slack, the error is "doesn't have an account on this workspace"

I don't know if the patch will be included in 3.2.5 or if it will be released shortly after.

Thanks for the information, I can wait for the new version and check on it

@leonardo-albertovich
Copy link
Collaborator

The http2 issue that was identified and fixed only affected fluent-bit when receiving OTLP data from another fluent-bit instance over HTTP 1/x with the http2 option enabled and gzip using compression. That's because the http2 option not only enables http2 support when available but also enables the usage of the new HTTP client component which had a bug that caused it to repeat certain HTTP headers which in turn exposed a flaw in monkeys HTTP parser which caused some headers to be ignored.

That's all to say that under certain specific conditions both sending and receiving OTLP data to other systems worked but sending and receiving OTLP data between fluent-bit instances didn't .

I hope this helps clarify some of these questions.

Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Apr 25, 2025
Copy link
Contributor
github-actions bot commented May 1, 2025

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale waiting-for-release This has been fixed/merged but it's waiting to be included in a release.
Projects
None yet
Development

No branches or pull requests

4 participants
0