8000 could not enqueue records into the ring buffer · Issue #9906 · fluent/fluent-bit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

could not enqueue records into the ring buffer #9906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pawel-lmcb opened this issue Feb 2, 2025 · 5 comments
Open

could not enqueue records into the ring buffer #9906

pawel-lmcb opened this issue Feb 2, 2025 · 5 comments

Comments

@pawel-lmcb
Copy link
pawel-lmcb commented Feb 2, 2025

Bug Report

Describe the bug

We've got a fluent-bit aggregator VM running 3.2.5.

The node is behaving in a strange manner, throughput used to be about 80MB/s, 40 in and 40 out. However, now it's doing < 10MB in, and 0 out. The only time there is output is when I restarted the process. Also resources are readily available.

It was working fine, but all of a sudden the traffic came crashing down and output went to 0. Almost like it hit a race condition.

After a processes reboot, network traffic will spike up to 70-80mb/s then come down to 10mb/s.

We're seeing the following errors pop up once every second:

[2025/02/02 03:13:53] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:54] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:55] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:56] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:57] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:58] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:13:59] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:14:00] [error] [input:forward:forward.0] could not enqueue records into the ring buffer [2025/02/02 03:14:01] [error] [input:forward:forward.0] could not enqueue records into the ring buffer

To Reproduce

  • Steps to reproduce the problem:

Install fluent-bit 3.2.5, with config:

[root@localhost fluent-bit]# cat /etc/fluent-bit/fluent-bit.conf 
[SERVICE]
    Flush                   1
    Log_Level               info
    Log_File                /var/log/fluent-bit/fluentbit-kafka.log
    # https://docs.fluentbit.io/manual/administration/monitoring#health-check-for-fluent-bit
    # curl -s http://127.0.0.1:2020/api/v1/metrics/prometheus
    HTTP_Server             on
    HTTP_Listen             0.0.0.0
    HTTP_Port               2020
    storage.path            /var/log/fluent-bit/
    storage.sync            full
    storage.checksum        off
    Storage.metrics         on
    scheduler.base          1
    scheduler.cap           20

[INPUT]
    Name                    forward
    Listen                  0.0.0.0
    Port                    24224
    Buffer_Chunk_Size       64MB
    Buffer_Max_Size         256MB
    Threaded                true
    storage.type            filesystem

[OUTPUT]
    Name                    kafka
    Alias                   kafka-app.analytics_vmwaredatacenter.cloudadmin
    Match                   app.analytics_vmwaredatacenter.cloudadmin
    Brokers                 192.168.100.77:9092,192.168.100.87:9092,192.168.100.72:9092
    Topics                  analytics_development
    Retry_Limit             5
    rdkafka.compression.type gzip

[OUTPUT]
    Name                    kafka
    Alias                   kafka-app.aws_billing_vmwaredatacenter.cloudadmin
    Match                   app.aws_billing_vmwaredatacenter.cloudadmin
    Brokers                 192.168.100.77:9092,192.168.100.87:9092,192.168.100.72:9092
    Topics                  aws_billing_development
    Retry_Limit             False
    Workers                 8

We have 5 VMs acting as forwarders, each sending a 2.2M lined CSV file, which the aggregator then ingests writes out to disk and sends to redpanda (kafka).

There is NO compression on the forwarders, and NO compression going to redpanda (kafka).

Update

So I realized when fluent-bit's network performance decreases, it also seems like all cores stop working and only 1 core is working primarily on the forwarder input.

Image Image

This can best be seen in the screens above, something puts it into this odd single core state, even though as you can see from the config it should be threaded for input and has multiple workers for output.

Expected behavior

Having run a dozen tests we expect throughput to be 80MB/s with 40 in and 40 hour, and the disk maintaining 40mb/s writes.

Your Environment

VMware ESXi, 7.0.3, 21424296
Hardware is a Dell PowerEdge R720XD, 512GB of ram, Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz with NVMe drives and 10G networking.
RHEL 9.5 (Plow)

The VM for fluent-bit aggregator is 16vCPU, 16GB of Mem, 300GB on NVMe, 78% free disk space.

@edsiper
Copy link
Member
edsiper commented Feb 4, 2025

please attach your full Fluent Bit log file

@pawel-lmcb
Copy link
Author

@edsiper do you want me to enable a higher debug level than info and re-run this ?

@vpshibin
Copy link
vpshibin commented Feb 26, 2025

I'm seeing same issue in fluent-bit 3.2.1

It happens with multiple inputs, see log entries below for TCP and prometheus_remote_write inputs. If I change the inputs to threaded: false, the error goes away. However, that puts all inputs in the main thread, and not good for performance.

I have also seen below issue which mentions same error, and the root cause was supposed to be fixed in an earlier version.
#7071

[2025/02/27 08:11:13] [error] [input:tcp:tcp.1] could not enqueue records into the ring buffer
[2025/02/27 08:12:52] [error] [input:tcp:tcp.1] could not enqueue records into the ring buffer
[2025/02/27 08:12:53] [error] [input:tcp:tcp.1] could not enqueue records into the ring buffer


[2025/02/27 08:55:23] [error] [input:prometheus_remote_write:prometheus_remote_write.2] could not enqueue records into the ring buffer
[2025/02/27 08:55:24] [error] [input:prometheus_remote_write:prometheus_remote_write.2] could not enqueue records into the ring buffer
[2025/02/27 08:55:25] [error] [input:prometheus_remote_write:prometheus_remote_write.2] could not enqueue records into the ring buffer
[2025/02/27 08:55:26] [error] [input:prometheus_remote_write:prometheus_remote_write.2] could not enqueue records into the ring buffer

@naegelin
Copy link

Same issue in v3.1.7 here

@cdancy
Copy link
cdancy commented May 1, 2025

Issue is still happening on 4.0.1 Turning off threading for our tail plugins gets things working again. I fluent-bit into debug mode but no extra logs were produced that were connected with this other than lots of:

[2025/05/01 19:34:08] [debug] [input:tail:tail.0] failed buffer write, retries=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0