-
Notifications
You must be signed in to change notification settings - Fork 9.6k
fix: race condition between series creation and garbage collection #16333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This test consistently fails missing ~10 series. If it doesn't fail on your machine, just increase totalSeries, that's how race conditions work. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
This fixes TestHead_RaceBetweenSeriesCreationAndGC. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
786f04f
to
e4fe8d8
Compare
/prombench start main |
Incorrect Available Commands:
Advanced Flags for
Examples:
|
/prombench main |
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
/prombench cancel |
Benchmark cancel is in progress. |
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect most existing pendingCommit = true
lines to disappear.
Like this one:
Similarly many pendingCommit = false
lines like in commitSample()
should be unnecessary given the unmarkCreatedSeriesAsPendingCommit
call.
I am only setting I am not solving the bug referenced in #8365 where series existed for a while and it gets GC-ed at the same time when we're going to add more samples to it.
For same reason, I'm only doing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. LGTM.
Description
This fixes a race condition between series creation and garbage collection that may cause first sample of the series being lost.
Context
This is similar to #8365, but IIRC that considers the case when
Append()
is retrieving an existing series that is about to be garbage-collected: that is less likely as series should have existed already and its last sample should be in that exact time window. This is something that can happen on every garbage collection.Bug
Consider a following sequence of events:
Append()
is called for a new labelset, a newmemSeries
is created.Append()
starts deciding whether the sample is appendable before settings.pendingCommit = true
.Append()
decides whether samples are appendable to that new series,Head.gc()
sees the series and since it is nots.pendingCommit
, it removes the series.Append()
markss.pendingCommit = true
, but that series isn't referenced by the Head anymore.Fix
When creating series from the appender, create them directly with the
s.pendingCommit
mark. (This doesn't affect WAL replay where garbage-collection still can't run, so in order to avoid having to unmark all series, we just don't set it).We also need to un-mark it during
Commit
/Rollback
explicitly because sometimes a series is created without samples, in order to do that we add a new slice ofmemSeries
(the created ones) to theheadAppender
.Impact
In the test that I added, from 100k new series appended we lose ~10 while doing 128 garbage collections (maybe less? I'm running more than needed).