8000 When tail input plugin enable Threaded, simultaneous generation of multiple files can lead to offset update errors. · Issue #9924 · fluent/fluent-bit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

When tail input plugin enable Threaded, simultaneous generation of multiple files can lead to offset update errors. #9924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gwny opened this issue Feb 6, 2025 · 2 comments

Comments

@gwny
Copy link
Contributor
gwny commented Feb 6, 2025

Bug Report

Describe the bug
fluent-bit version: The latest version still has this issue.
plugin: tail input plugin
If Threaded is set to true, when create multiple files at the same time, the offset maybe update incorrectly.

To Reproduce
fluent-bit.conf

[SERVICE]
  Flush         1
  Log_Level     info
  Daemon        off

[INPUT]
  Name              tail
  Tag               a
  Threaded          true
  Path              /usr/local/logs/*.a.log
  DB                /var/log/flb_kube.db
  DB.Sync           Off

[INPUT]
  Name              tail
  Tag               b
  Threaded          true
  Path              /usr/local/logs/*.b.log
  DB                /var/log/flb_kube.db
  DB.Sync           Off

[OUTPUT]
  NAME null
  Match *

main points:

  1. Threaded is true.
  2. two tail input plugin.
  3. same db file.

reproduce steps:

  1. execute ./fluent-bit -c fluent-bit.conf to start fluent-bit.
  2. execute touch 1.a.log 1.b.log to create two files at the same time.
  3. write something to the two files.
  4. query the records in sqlite db, the offset of one file will always 0.

Expected behavior
The file offset should update correctly when the tail input plugin is running in threading mode.

Additional context
This issue may be caused by multi-threaded manipulation of the database when a new file is scanned, specifically within the method db_file_insert(). When a new file is scanned, this method performs two operations:

  1. Inserts a new record into the SQLite database.
  2. Retrieves the latest ID for subsequent update operations.

There is a time interval between these two operations, which may cause issues in multi-threaded scenarios. . For example:

  1. A record representing file 1.a.log is inserted into the sqlite db, making the latest id in the db is 1.
  2. A record representing file 1.b.log is inserted into the sqlite db, making the latest id in the db is 2.
  3. When retrieving the latest id for 1.a.log, it incorrectly gets 2 instead of 1.
  4. When retrieving the latest id for 1.b.log, it correctly gets 2.

To verify this guess, I added logging to the code and tested it:

  1. add flb_plg_error(ctx->ins, "db instance address=%lu", ctx->db); to confirm that it is two different db instances in different input thread.
  2. add sleep(10); between insert and get latest id operation to exaggerate this problem.
  3. add flb_plg_error(ctx->ins, "file %s last id=%d", file->name, last_id); to print the file name and latest id.

Then I got:
logs:

[2025/02/06 08:10:55] [error] [input:tail:tail.0] db instance address=139646029844640
[2025/02/06 08:10:55] [error] [input:tail:tail.1] db instance address=139646096953504
[2025/02/06 08:11:05] [error] [input:tail:tail.0] file /usr/local/logs/1.a.log last id=2
[2025/02/06 08:11:05] [error] [input:tail:tail.1] file /usr/local/logs/1.b.log last id=2

sqlite db records:

1|/usr/local/logs/1.a.log|0|16929400|1738829455|0
2|/usr/local/logs/1.b.log|0|16929401|1738829455|0

The db id of the file 1.a.log does not match its actual id. No matter which file logs are written to, it always updates the offset of file 1.b.log, and it's probably wrong.

Copy link
Contributor
github-actions bot commented May 8, 2025

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label May 8, 2025
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0