-
Notifications
You must be signed in to change notification settings - Fork 1.7k
seg fault under high load whilst tailing a log #9864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@sorran could you paste in the flat config just to save having to download, extract and open potentially malicious files? It looks like it is tail input with loki and http output but be good to get the full config as flat text? I wrote this a while back when I had monstrous includes to help: https://github.com/couchbase/couchbase-fluent-bit/blob/main/tools/flatten-config.sh |
Still a problem for us. We have different outputs we are testing, normally we'll test against one of opensearch, victorialogs, loki and each has the same problem. Feels like its in the tail of the input file, the input file is writing and rolling over at a very high rate. inputs.d/spitter.confg:
fluent-bit.conf:
Parsers.conf
We have reproduced it across various outputs One example output but any output seems to still have the same the issue (probably issue is in the tail of the input) outputs.d/opensearch.conf:
|
Is this with the latest version (4.x)? |
Hi @patrick-stephens this was using fluent bit 3.2.10-1 |
As far as I can see release Would you by chance be able to build your own fleunt-bit 3.2 from source? |
Bug Report
Describe the bug
When running under high load we encounter seg faults:
Indicates some memory corruption around:
fluent-bit/lib/msgpack-c/src/unpack.c
Line 372 in 81f62b9
fluent-bit/lib/msgpack-c/include/msgpack/sbuffer.h
Line 81 in 81f62b9
Valgrind logs:
valgrindx-1.log
valgrindx.log
Any advice on how to troubleshoot further would be well received.
To Reproduce
Seems to be occur under high stress. We can encounter within 1-2 minutes on a c7g.large EC2 instance that is tailing a log that is producing at 50k lines/s. Performane at 25k line/s performance appears stable. The log is a java log that will rotate once it hits 500mb and is then deleted once there's > 3 logs. At 50k line/s we'd expect the log producer is running at a higher rate then what fluent-bit is likely able to consume.
Expected behavior
Performance to degrade without seg fault crash.
Screenshots
Your Environment
config.zip
Additional context
Stress testing fluent-bit attempting to understand performance limitations. Possibly we need throttle fluent-bit but unclear if this will actually resolve the seg fault.
The text was updated successfully, but these errors were encountered: