8000 Fix incorrect lag reported in XINFO GROUPS by nesty92 · Pull Request #1952 · valkey-io/valkey · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix incorrect lag reported in XINFO GROUPS #1952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

nesty92
Copy link
Contributor
@nesty92 nesty92 commented Apr 14, 2025

When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of XINFO GROUPS mystream.

Close: #1951

Copy link
codecov bot commented Apr 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.86%. Comparing base (51387dd) to head (5b4b82c).
Report is 11 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1952      +/-   ##
============================================
- Coverage     70.98%   70.86%   -0.13%     
=========
8000
===================================
  Files           123      123              
  Lines         65768    65941     +173     
============================================
+ Hits          46687    46730      +43     
- Misses        19081    19211     +130     
Files with missing lines Coverage Δ
src/t_stream.c 93.17% <100.00%> (+<0.01%) ⬆️

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor
@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@hpatro Shall we backport it?

@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Apr 15, 2025
@hpatro
Copy link
Collaborator
hpatro commented Apr 15, 2025

Haven't read the code. Is this related to #685 ?

@hpatro Shall we backport it?

Do we backport or just make the change in latest version? If we do we would need to get it to 7.2.x, 8.0.x and 8.1.x I believe.

@zuiderkwast
Copy link
Contributor

Is this related to #685 ?

🤔

It looks very similar, yes. @artikell is it a duplicate? Can you take a look and decide which one we can merge?

@nesty92
Copy link
Contributor Author
nesty92 commented Apr 15, 2025

@zuiderkwast @hpatro #685 try to fix others issues as well and it's more dangerous to handle all of them in a single fix

@hpatro
Copy link
Collaborator
hpatro commented Apr 16, 2025

@zuiderkwast @hpatro #685 try to fix others issues as well and it's more dangerous to handle all of them in a single fix

Right. I agree we shouldn't conflate the PR. I just want to ensure the correctness. @artikell would you mind reviewing this ?

Copy link
Collaborator
@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall change looks good to me. @nesty92 Could you share the before and after behavior (preferrable valkey-cli output) ?

@artikell
Copy link
Contributor

@zuiderkwast @hpatro Yes, this PR is a simple fix that also aligns redis. There is another fix on redis, and this pr is also dealt with together? @nesty92

When using the XTRIM command to trim a stream, it does not update the maximal tombstone (max_deleted_entry_id). This leads to an issue where the lag calculation incorrectly assumes that there are no tombstones after the consumer group's last_id, resulting in an inaccurate lag.

@nesty92
Copy link
Contributor Author
nesty92 commented Apr 16, 2025

Could you share the before and after behavior (preferrable valkey-cli output) ?

@hpatro the full output is in #1951 description, here is just the last result

127.0.0.1:6379> XINFO GROUPS mystream
1)  1) "name"
    2) "g1"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 3
    7) "last-delivered-id"
    8) "4-0"
    9) "entries-read"
   10) (integer) 3
   11) "lag"
   12) (integer) 1 # it should be 0 the message was deleted and there is not lag

this is the new output with the fix

> ./src/valkey-cli
127.0.0.1:6379> xadd mystream 1-0 data a
"1-0"
127.0.0.1:6379> xadd mystream 2-0 data b
"2-0"
127.0.0.1:6379> xadd mystream 3-0 data c
"3-0"
127.0.0.1:6379> xadd mystream 4-0 data d
"4-0"
127.0.0.1:6379> XGROUP CREATE mystream g1 0
OK
127.0.0.1:6379> XREADGROUP GROUP g1 c1 COUNT 2 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) "1-0"
         2) 1) "data"
            2) "a"
      2) 1) "2-0"
         2) 1) "data"
            2) "b"
127.0.0.1:6379> XINFO GROUPS mystream
1)  1) "name"
    2) "g1"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 2
    7) "last-delivered-id"
    8) "2-0"
    9) "entries-read"
   10) (integer) 2
   11) "lag"
   12) (integer) 2
127.0.0.1:6379> xdel mystream 3-0
(integer) 1
127.0.0.1:6379> XINFO GROUPS mystream
1)  1) "name"
    2) "g1"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 2
    7) "last-delivered-id"
    8) "2-0"
    9) "entries-read"
   10) (integer) 2
   11) "lag"
   12) (nil)
127.0.0.1:6379> XREADGROUP GROUP g1 c1 COUNT 2 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) "4-0"
         2) 1) "data"
            2) "d"
127.0.0.1:6379> XINFO GROUPS mystream
1)  1) "name"
    2) "g1"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 3
    7) "last-delivered-id"
    8) "4-0"
    9) "entries-read"
   10) (integer) 4
   11) "lag"
   12) (integer) 0
127.0.0.1:6379> 

@nesty92
Copy link
Contributor Author
nesty92 commented Apr 16, 2025

When using the XTRIM command to trim a stream, it does not update the maximal tombstone (max_deleted_entry_id). This leads to an issue where the lag calculation incorrectly assumes that there are no tombstones after the consumer group's last_id, resulting in an inaccurate lag.

@artikell Oh, I didn't know about this one, but yes you are right I was able to replicate the issue with XTRIM but not sure if we should handle it together. What do you think @hpatro @zuiderkwast @artikell ?

It's my first time trying to contribute here so any feedback is more than welcome

@zuiderkwast
Copy link
Contributor

It's my first time trying to contribute here so any feedback is more than welcome

Yes, thank you, it's great!

I am not very familiar with consumer groups logic so I need some help and advice. I don't care which PR we will merge, as long as it fixes the problems.

If the max_deleted_entry_id problem can still cause an incorrect lag, then we can't say this PR fixes the incorrect lag... It's better if we can fix this too. Is it easy?

#685 doesn't affect max_deleted_entry_id but it solves these problems by calculating the lag in a different way: by correctly calculating entries-read and then calculates lag from entries-read and entries-added. Is there a risk that this new calculation of lag is wrong or is the concern that PR also changes the output for entries-read for empty streams from NULL to a correct number?

@nesty92
Copy link
Contributor Author
nesty92 commented Apr 16, 2025

@zuiderkwast @artikell I included a fix for the XTRIM, left the details also in the commit message

The issue arises when XTRIM or XADD removes entries from the beginning of the stream. If the trimming process removes the next entry after max_deleted_entry_id (and all entries before it), the stored ID becomes stale, it refers to a point before the stream's current start.

This fix updates the trimming logic to reset max_deleted_entry_id whenever the trim operation results in a new first entry ID that is later than the existing max_deleted_entry_id. This ensures the tombstone accurately reflects deletions within the stream's valid range.

In this way we are not changing the behavior of the lag all the existing test continue passing and a new one was added.

note: the problem with the XTRIM, could be also with XADD if trimming is set. That case is cover as well

@nesty92 nesty92 force-pushed the fix/stream_lag branch 3 times, most recently from 272f7dd to a2c0ff4 Compare April 16, 2025 12:50
@zuiderkwast zuiderkwast changed the title Fix incorrect lag in consumer groups Fix incorrect lag reported in XINFO GROUPS Apr 17, 2025
@zuiderkwast zuiderkwast merged commit c8bfd23 into valkey-io:unstable Apr 17, 2025
50 of 51 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to To be backported in Valkey 8.1 Apr 17, 2025
@github-project-automation github-project-automation bot moved this from In Progress to To be backported in Valkey 8.0 Apr 17, 2025
@github-project-automation github-project-automation bot moved this from In Progress to To be backported in Valkey 7.2 Apr 17, 2025
@nesty92 nesty92 deleted the fix/stream_lag branch April 17, 2025 18:05
@nesty92
Copy link
Contributor Author
nesty92 commented Apr 17, 2025

Thanks for the help @zuiderkwast @hpatro @artikell

murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 18, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 18, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 18, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 18, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 21, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request 6D40 Apr 21, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Nitai Caro <caronita@amazon.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
nitaicaro pushed a commit to nitaicaro/valkey that referenced this pull request Apr 22, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
@zuiderkwast zuiderkwast moved this from To be backported to 8.0.3 in Valkey 8.0 Apr 22, 2025
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.1 in Valkey 8.1 Apr 22, 2025
@zuiderkwast zuiderkwast moved this from T 9E88 o be backported to 7.2.9 in Valkey 7.2 Apr 22, 2025
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
zuiderkwast pushed a commit that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: #1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
murphyjacob4 pushed a commit to murphyjacob4/valkey that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
zuiderkwast pushed a commit that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: #1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
zuiderkwast pushed a commit that referenced this pull request Apr 23, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: #1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: Jacob Murphy <jkmurphy@google.com>
hwware pushed a commit to wuranxx/valkey that referenced this pull request Apr 24, 2025
When a tombstone is created after the last_id of a consumer group.

The consumer group lag is reported in the reply of `XINFO GROUPS mystream`.

Close: valkey-io#1951

Signed-off-by: Ernesto Alejandro Santana Hidalgo <ernesto.alejandrosantana@gmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes
Projects
Status: 7.2.9
Status: 8.0.3
Status: 8.1.1
Development

Successfully merging this pull request may close these issues.

[BUG] Stream consumer group lag incorrectly calculated when entries are deleted before being read
4 participants
0