10000 fix: race condition between series creation and garbage collection by colega · Pull Request #16333 · prometheus/prometheus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: race condition between series creation and garbage collection #16333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025

Conversation

colega
Copy link
Contributor
@colega colega commented Mar 27, 2025

Description

This fixes a race condition between series creation and garbage collection that may cause first sample of the series being lost.

Context

This is similar to #8365, but IIRC that considers the case when Append() is retrieving an existing series that is about to be garbage-collected: that is less likely as series should have existed already and its last sample should be in that exact time window. This is something that can happen on every garbage collection.

Bug

Consider a following sequence of events:

  • Append() is called for a new labelset, a new memSeries is created.
    • Append() starts deciding whether the sample is appendable before setting s.pendingCommit = true.
  • While Append() decides whether samples are appendable to that new series, Head.gc() sees the series and since it is not s.pendingCommit, it removes the series.
  • Append() marks s.pendingCommit = true, but that series isn't referenced by the Head anymore.

Fix

When creating series from the appender, create them directly with the s.pendingCommit mark. (This doesn't affect WAL replay where garbage-collection still can't run, so in order to avoid having to unmark all series, we just don't set it).

We also need to un-mark it during Commit/Rollback explicitly because sometimes a series is created without samples, in order to do that we add a new slice of memSeries (the created ones) to the headAppender.

Impact

In the test that I added, from 100k new series appended we lose ~10 while doing 128 garbage collections (maybe less? I'm running more than needed).

This test consistently fails missing ~10 series.
If it doesn't fail on your machine, just increase totalSeries, that's
how race conditions work.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@colega colega requested a review from jesusvazquez as a code owner March 27, 2025 10:08
This fixes TestHead_RaceBetweenSeriesCreationAndGC.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
@jesusvazquez
Copy link
Member

/prombench start main

@prombot
Copy link
Contributor
prombot commented Mar 27, 2025

Incorrect /prombench syntax; command requires one argument that matches (master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*) regex.

Available Commands:

  • To start benchmark: /prombench <branch or git tag to compare with>
  • To restart benchmark: /prombench <branch or git tag to compare with>
  • To stop benchmark: /prombench cancel
  • To print help: /prombench help

Advanced Flags for start and restart Commands:

  • --bench.directory=<sub-directory of github.com/prometheus/test-infra/prombench
    • See the details here, defaults to manifests/prombench.
  • --bench.version=<branch | @commit>
    • See the details here, defaults to master.

Examples:

  • /prombench v3.0.0
  • /prombench v3.0.0 --bench.version=@aca1803ccf5d795eee4b0848707eab26d05965cc --bench.directory=manifests/prombench

@jesusvazquez
Copy link
Member

/prombench main

@prombot
Copy link
Contributor
prombot commented Mar 27, 2025

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-16333 and main

After the successful deployment (check status here), the benchmarking results can be viewed at:

Available Commands:

  • To restart benchmark: /prombench restart main
  • To stop benchmark: /prombench cancel
  • To print help: /prombench help

@jesusvazquez
Copy link
Member

/prombench cancel

@prombot
Copy link
Contributor
prombot commented Mar 27, 2025

Benchmark cancel is in progress.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Member
@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect most existing pendingCommit = true lines to disappear.
Like this one:

https://github.com/prometheus/prometheus/pull/16333/files#diff-c3ae701c35bb1de34870207029d1fb3390613fbba65e6f2f9bffe059bff89001L383

Similarly many pendingCommit = false lines like in commitSample() should be unnecessary given the unmarkCreatedSeriesAsPendingCommit call.

@colega
Copy link
Contributor Author
colega commented Apr 16, 2025

I would expect most existing pendingCommit = true lines to disappear.

I am only setting pendingCommit for new series, I'm trying to solve the race when series is created and GC-ed at the same time.

I am not solving the bug referenced in #8365 where series existed for a while and it gets GC-ed at the same time when we're going to add more samples to it.

Similarly many pendingCommit = false lines like in commitSample() should be unnecessary given the unmarkCreatedSeriesAsPendingCommit call.

For same reason, I'm only doing pendingCommit = false on series that this appender created, this isn't done for series that previously existed.

Copy link
Member
@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. LGTM.

@bboreham bboreham merged commit a117722 into prometheus:main Apr 17, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0