-
Notifications
You must be signed in to change notification settings - Fork 9.6k
tsdb/agent: allow ingestion of OOO samples #12897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsdb/agent: allow ingestion of OOO samples #12897
Conversation
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
cc @codesome as well. |
_, err := app.AppendHistogram(0, lset, int64(i), nil, floatHistograms[i]) | ||
require.NoError(t, err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpaschalis So far I see that we are checking that there are no erros on append but we are not checking that the samples are actually there. Could we test to replay and read that there is what we actually inserted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the thing is that there's no way to access the samples by replaying the WAL (they're only used to recalculate lastTs).
What we can do is append a different number of samples and use the prometheus_agent_samples_appended_total
metric to verify that the right number of datapoints was appended. I've implemented this in 242aa57, let me know how it looks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works fine 👍
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com> Signed-off-by: Sheikh-Abubaker <sheikhabubaker761@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com> Signed-off-by: Levi Harrison <git@leviharrison.dev>
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
@tpaschalis - How is this different from the experimental feature of out-of-order ingestion that was already enabled? Trying to understand if this is a new feature that's introduced or just a past bug that got solved? |
After a discussion in the CNCF Slack's #prometheus-dev channel, I'm opening this PR to enable OOO ingestion of samples for the Prometheus Agent.
I've also added a test for this new behaviour. The test is more complicated than expected, given that we have to replay the WAL to read the lastTs for a series, let me know if you think we're better off without it.
Closes #12673.