8000 Add CLUSTER FLUSHSLOT command by wuranxx · Pull Request #1384 · valkey-io/valkey · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add CLUSTER FLUSHSLOT command #1384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 2, 2025
Merged

Conversation

wuranxx
Copy link
Contributor
@wuranxx wuranxx commented Dec 3, 2024

Add cluster flushslot command. Closes #1133.

Syntax:

CLUSTER FLUSHSLOT <slot> [ASYNC|SYNC]

Background

We want to implement the ability to SYNC or ASYNC free all data from a single slot. This is useful in two cases:

  1. When a primary loses ownership of a slot, it will temporarily freeze while it synchronously frees the slot.
  2. When doing resharding, you may prefer to forcibly delete all the data from a slot and assign it to another node.

Internal changes

Modify the function signature of delKeysInSlot, adding parameters lazy, propagate_del and send_del_event to control the behavior.

Copy link
codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 88.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 71.11%. Comparing base (374718b) to head (9a2839e).
Report is 18 commits behind head on unstable.

Files with missing lines Patch % Lines
src/cluster.c 81.81% 2 Missing ⚠️
src/cluster_legacy.c 92.85% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1384      +/-   ##
============================================
- Coverage     71.21%   71.11%   -0.11%     
============================================
  Files           122      122              
  Lines         66049    66067      +18     
============================================
- Hits          47038    46984      -54     
- Misses        19011    19083      +72     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/cluster_legacy.c 86.51% <92.85%> (+0.17%) ⬆️
src/cluster.c 90.12% <81.81%> (-0.13%) ⬇️

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wuranxx wuranxx changed the title WIP: add cluster flushslot command. add cluster flushslot command. Dec 5, 2024
@wuranxx
Copy link
Contributor Author
wuranxx commented Dec 5, 2024

Please help review this PR. Thank you. @hwware @madolson @PingXie @zuiderkwast

@hwware
Copy link
Member
hwware commented Dec 5, 2024

There are too much information in issue #1133. To save some of the reviewer energy, could you pls update the top comment to describe the background why we want to this new command? You could reference PR #1392.

Thanks a lot

Copy link
Member
@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Thanks @wuranxx!

@wuranxx wuranxx force-pushed the cluster-flush-slot branch from c66cb6a to ab74efb Compare December 6, 2024 08:24
@wuranxx wuranxx changed the title add cluster flushslot command. WIP: add cluster flushslot command. Dec 6, 2024
@wuranxx
Copy link
Contributor Author
wuranxx commented Dec 6, 2024

Before I rebased to the latest unstable (old commit is 32f7541) , the Tcl tests and manual command tests ran fine locally.

After I rebased to the latest unstable 6df376d, the command failed when executed locally.

It is possible that changes in the code during this period caused the inconsistent behavior. Until I confirm the cause, I have marked this PR as WIP (Work In Progress).

@wuranxx wuranxx force-pushed the cluster-flush-slot branch from ab74efb to 4db74af Compare January 22, 2025 03:20
@wuranxx wuranxx changed the title WIP: add cluster flushslot command. add cluster flushslot command. Jan 22, 2025
@wuranxx
Copy link
Contributor Author
wuranxx commented Jan 22, 2025

Before I rebased to the latest unstable (old commit is 32f7541) , the Tcl tests and manual command tests ran fine locally.

After I rebased to the latest unstable 6df376d, the command failed when executed locally.

It is possible that changes in the code during this period caused the inconsistent behavior. Until I confirm the cause, I have marked this PR as WIP (Work In Progress).

Fixed. For details, see #1399.

@zuiderkwast
Copy link
Contributor

@enjoy-binbin @murphyjacob4 Do you want this command for atomic slot migration, for example when a slot migration is cancelled, the importing node can send CLUSTER FLUSHSLOT to its replicas to clean up the slot?

@murphyjacob4
Copy link
Contributor

I think we will have a protocol for cancelling the slot migration, and this will trigger the drop for all slot ranges being migrated locally. But we should be able to use the async delKeysInSlot functionality to implement this

@zuiderkwast
Copy link
Contributor

I think we will have a protocol for cancelling the slot migration, and this will trigger the drop for all slot ranges being migrated locally. But we should be able to use the async delKeysInSlot functionality to implement this

@murphyjacob4 The replicas are not involved in the slot migration protocol IIUC. So when a primary cleans up a cancelled slot import, do you just replicate a lot of DEL commands to the replicas? Replicating a single FLUSHSLOT may save some work and bandwidth...

@murphyjacob4
Copy link
Contributor

do you just replicate a lot of DEL commands to the replicas? Replicating a single FLUSHSLOT may save some work and bandwidth...

Yeah that's a good idea. Currently, this is what delKeysInSlot does, and it could result in a lot of bandwidth for big slots. Disabling the replication of the individual keys in delKeysInSlot and replacing with a single FLUSHSLOT makes sense to me. This path is also executed in the dirty slot case, so it won't just be an improvement for atomic slot migration (although I anticipate it will be executed more frequently there).

Copy link
Contributor
@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wuranxx and sorry for the delay.

It looks strait-forward, but there are some details we should think about a bit more. I'm not actually sure what's best to do in all cases, but see my comments below.

if (lazy) {
dbAsyncDelete(&server.db[0], key);
} else {
dbSyncDelete(&server.db[0], key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the deleted keys are replicated just by multiple DEL commands. For comparison, FLUSHDB is replicated as a single FLUSHDB command, which is much less traffic than many DEL commands. If we want to replicate FLUSHSLOT as a single FLUSHSLOT command, then we should probably check that the replicas supports this though. We already have the replica's version stored in c->repl_data->replica_version (REPLCONF VERSION is sent by the replica since 8.0) so we could check that it's >= 8.1 for all connected replicas. It's also fine to skip this. We can do it later as a separate optimization.

Copy link
Contributor
@zuiderkwast zuiderkwast Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we don't have to worry about old replicas. This is a new feature and users should use new features only after all nodes have been upgraded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For atomic slot migration - we will need send some FLUSHSLOT commands implicitly:

  1. If an import doesn't succeed (is failed or cancelled) we need to flush the contents of the staged slots to replicas
  2. If there is an import to a target primary and FLUSHDB is run on that target, the target will need to break this out into FLUSHSLOT commands to avoid the importing slots being flushed
  3. If there is an export happening from some source primary and FLUSHDB is run, we need to break this out into FLUSHSLOT commands (so that the target only flushes the impacted importing slots)

Plus - in the case that a dirty slot is found - we currently delete it by sending individual key deletions. It would make sense to create the FLUSHSLOT command implicitly to improve replication traffic.

Due to the cases of implicit FLUSHSLOT propagation - I would suggest that the "delKeysInSlot" helper method include logic for deciding about if we should propagate as individual deletions or as FLUSHSLOT based on the capabilities of the replicas. I'm sure for many customers, they may be running in a cross-version deployment for some amount of time while they do rolling upgrades

Copy link
Contributor
@zuiderkwast zuiderkwast May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@murphyjacob4

we will need send some FLUSHSLOT commands implicitly

Yes, so do you need any changes here? You can call this function and then you explicitly replicate a FLUSHSLOT command. Or do you want this function to do it?

I think you may want to skip keyspace events and skip replication using DEL commands. With the is_cmd argument you can't disable both. See other comment.

include logic for deciding about if we should propagate as individual deletions or as FLUSHSLOT based on the capabilities of the replicas. I'm sure for many customers, they may be running in a cross-version deployment for some amount of time while they do rolling upgrades

I agree that we should support cross-version deployments, but in this case I'm not worried and I don't think we need to decide based on replica capabilities.

We require that all nodes are upgraded before you start using new features. For example, an old replica wouldn't understand a new datatype, or information like hash field expiration, etc. The FLUSHSLOT command and the atomic slot migration are both new features. If a node has ASM, it also has FLUSHSLOT. If you have an 8.1 replica in your cluster, you shouldn't start using these features just yet. Makes sense?

@madolson madolson moved this from Todo to Optional for next release candidate in Valkey 9.0 Mar 31, 2025
@wuranxx wuranxx force-pushed the cluster-flush-slot branch from 4db74af to 0d751a2 Compare April 12, 2025 08:52
@wuranxx
Copy link
Contributor Author
wuranxx commented Apr 12, 2025

I add command scene to delKeysInSlot. In command sence, delKeysInSlot will not propagate del command, but will notify del event.

flushslot now just propagate flushslot command to aof and replica.

please help review this PR, thank you! @zuiderkwast @PingXie

Copy link
Member
@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. We should also add official documentation for this new command.

R 0 CLUSTER FLUSHSLOT $key_slot ASYNC
assert_equal [R 0 CLUSTER COUNTKEYSINSLOT $key_slot] 0
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case to validate the command is indeed replicated as-is, as opposed to the key deletions getting replicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. Added.

@zuiderkwast zuiderkwast added needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. and removed documentation labels Apr 14, 2025
@hwware hwware force-pushed the cluster-flush-slot branch 2 times, most recently from c76db56 to 39d045b Compare April 14, 2025 16:02
@madolson
Copy link
Member

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

@PingXie
Copy link
Member
PingXie commented May 13, 2025

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

@zuiderkwast
Copy link
Contributor

I don't remember if we reached a conclusion?

Thanks Ping for summarizing the discussion. We did reach the conclusion to stick to cluster flushslot, as Madelyn wrote.

Neither of these commands are frequently used so the similarity between the two commands (cluster flushshot vs cluster flushshots) is not a major concern.

@hwware
Copy link
Member
hwware commented May 13, 2025

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

I forgot if you drop off meeting earlier, we have one conclusion that the new command "cluster flushslot" as an internal command.

@hwware
Copy link
Member
hwware commented May 13, 2025

Weekly meeting notes. We will keep the command as is, but we will document this as primarily an internal and debugging command.

I think we discussed two decisions

  1. whether this should be a top level command?

based on the use case described in the issue, this is clearly not a "data access" command meant for the application developers, hence the guidance of keeping it under cluster.

  1. whether it is a good idea to introduce a new command whose name differs from that of an existing command by just one character? cluster flushshot vs cluster flushshots.

I don't remember if we reached a conclusion?

I forgot if you drop off meeting earlier, we have one conclusion that the new command "cluster flushslot" as an internal command, like cluster-SYNCSLOTS (a new internal command too)
image

Copy link
Contributor
@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@murphyjacob4 Flush one slot at a time, is that enough for cancelled atomic slot migration? If you have 1000 ongoing slot migrations that are cancelled, will you need to send 1000 flushslot commands to the replicas at the same time?

@murphyjacob4
Copy link
Contributor
murphyjacob4 commented May 20, 2025

@murphyjacob4 Flush one slot at a time, is that enough for cancelled atomic slot migration? If you have 1000 ongoing slot migrations that are cancelled, will you need to send 1000 flushslot commands to the replicas at the same time?

I think it is fine. ASM actually has a few scenarios for FLUSHSLOT:

  1. If a slot migration is cancelled, each slot within this migration will be removed by the target primary and FLUSHSLOT will be propagated to the target replica(s)
  2. If FLUSHSLOT is run on the source primary, it will be sent to the target primary (via slot migration link) foreach ongoing slot export on that node (if any). Target primary will forward this to the target replicas.
  3. If FLUSHSLOT is run on the target primary (by user), it will be denied if the slot is being imported (should be covered by this CL I think, will return -MOVED)
  4. If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  5. If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

In the worst case, it would be something like ~700KiB of replication data to send 16384 FLUSHSLOT commands (each would be <= 43 bytes, *3\r\n$7\r\nCLUSTER\r\n$9\r\nFLUSHSLOT\r\n$5\r\n16383). But this seems like a very rare case. We have a path to optimizing with something like CLUSTER FLUSHSLOTSRANGE a la CLUSTER ADDSLOTSRANGE, but I don't think it is required, and just FLUSHSLOT should better than propagating DEL for each key (at least for 99% of use cases)

@zuiderkwast
Copy link
Contributor
  • If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  • If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

@murphyjacob4 Now that we have multi-db in cluster mode, FLUSHSLOT will affects all databases. FLUSHDB should only flush the currently SELECT'ed database.

Signed-off-by: wuranxx <wuranxx@foxmail.com>
@hwware
Copy link
Member
hwware commented May 23, 2025

@murphyjacob4 Flush one slot at a time, is that enough for cancelled atomic slot migration? If you have 1000 ongoing slot migrations that are cancelled, will you need to send 1000 flushslot commands to the replicas at the same time?

I think it is fine. ASM actually has a few scenarios for FLUSHSLOT:

  1. If a slot migration is cancelled, each slot within this migration will be removed by the target primary and FLUSHSLOT will be propagated to the target replica(s)
  2. If FLUSHSLOT is run on the source primary, it will be sent to the target primary (via slot migration link) foreach ongoing slot export on that node (if any). Target primary will forward this to the target replicas.
  3. If FLUSHSLOT is run on the target primary (by user), it will be denied if the slot is being imported (should be covered by this CL I think, will return -MOVED)
  4. If FLUSHDB is run on the source primary, if there is a slot export occurring, it will replicate as a FLUSHSLOT command for each owned slot. For all target primaries and their replicas, the FLUSHSLOT commands that match the ongoing slot migration will be propagated via the path in 2
  5. If FLUSHDB is run on the target primary, if there is a slot import occurring, it will replicate as a FLUSHSLOT command for each owned slot. Locally, we will leave the importing data. We will not replicate a FLUSHSLOT for the ongoing slot import (since we want to keep that data

In the worst case, it would be something like ~700KiB of replication data to send 16384 FLUSHSLOT commands (each would be <= 43 bytes, *3\r\n$7\r\nCLUSTER\r\n$9\r\nFLUSHSLOT\r\n$5\r\n16383). But this seems like a very rare case. We have a path to optimizing with something like CLUSTER FLUSHSLOTSRANGE a la CLUSTER ADDSLOTSRANGE, but I don't think it is required, and just FLUSHSLOT should better than propagating DEL for each key (at least for 99% of use cases)

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

@zuiderkwast
Copy link
Contributor
zuiderkwast commented May 23, 2025

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

Agree, but I'm not sure about case 4 and 5. Maybe we need CLUSTER FLUSHSLOT 2333 DB 2 to handle FLUSHDB on one db during atomic slot migration. It can be extended later if we need it. Otherwise, we could change flushdb so it only affects owned slots?

@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes to-be-merged Almost ready to merge labels May 25, 2025
@hwware
Copy link
Member
hwware commented May 26, 2025

We have consensus that flushslot is an internal command only, I think case 2 and 3 will not happen. How do you think?

Agree, but I'm not sure about case 4 and 5. Maybe we need CLUSTER FLUSHSLOT 2333 DB 2 to handle FLUSHDB on one db during atomic slot migration. It can be extended later if we need it. Otherwise, we could change flushdb so it only affects owned slots?

I think case 4 is a valid scenario. However, for case 5, it depends on how the operator does. One worst case is causing data inconsistent. (source primary is exporting data, and FLUSHDB command runs target primary, then some data will lost)

@zuiderkwast zuiderkwast changed the title add cluster flushslot command. Add CLUSTER FLUSHSLOT command May 27, 2025
@soloestoy
Copy link
Member

For the rollback of atomic slot migration, using FLUSHSLOT to affect all databases is more appropriate. However, considering the replication of FLUSHDB during migration and the possibility that users might need to clear a specific slot in a particular database, I think CLUSTER FLUSHSLOT <slotid> [DB id] would be a good option.

@hwware
Copy link
Member
hwware commented May 28, 2025

For the rollback of atomic slot migration, using FLUSHSLOT to affect all databases is more appropriate. However, considering the replication of FLUSHDB during migration and the possibility that users might need to clear a specific slot in a particular database, I think CLUSTER FLUSHSLOT <slotid> [DB id] would be a good option.

I think the proper command format as below:

CLUSTER FLUSHSLOT [[DB id] | all] [ASYNC|SYNC]

if there is no any argument after , this command will only apply current db, it is aligned with FLUSHDB
if the command as CLUSTER FLUSHSLOT [DB id], it will only remove specific db keys
if command as CLUSTER FLUSHSLOT all, it will remove keys from all dbs

I add this PR in our next Monday meeting agenda.

@hwware hwware removed the to-be-merged Almost ready to merge label Jun 2, 2025
@hwware hwware merged commit 992c5b9 into valkey-io:unstable Jun 2, 2025
52 checks passed
@github-project-automation github-project-automation bot moved this from Optional for next release candidate to Done in Valkey 9.0 Jun 2, 2025
@madolson
Copy link
Member
madolson commented Jun 4, 2025

@valkey-io/core-team Shouldn't this have had a major decision tag? I remember we decided this was an internal command, but it doesn't look like it was really restricted to just be internal.

@zuiderkwast
Copy link
Contributor

The issue has the major decision tag. Seemed enough...(?)

it doesn't look like it was really restricted to just be internal.

If we don't add a markdown file for it in the docs, it will not show up in the docs, but maybe it will show up somewhere else? How can we make it completely internal?

chzhoo pushed a commit to chzhoo/valkey that referenced this pull request Jun 12, 2025
Add cluster flushslot command. Closes valkey-io#1133.

Syntax:

    CLUSTER FLUSHSLOT <slot> [ASYNC|SYNC]

## Background

We want to implement the ability to SYNC or ASYNC free all data from a
single slot. This is useful in two cases:

1. When a primary loses ownership of a slot, it will temporarily freeze
while it synchronously frees the slot.
2. When doing resharding, you may prefer to forcibly delete all the data
from a slot and assign it to another node.

## Internal changes

Modify the function signature of `delKeysInSlot`, adding parameters
`lazy`, `propagate_del` and `send_del_event` to control the behavior.

---------

Signed-off-by: wuranxx <wuranxx@foxmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: wuranxx <155042264+wuranxx@users.noreply.github.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: chzhoo <czawyx@163.com>
shanwan1 pushed a commit to shanwan1/valkey that referenced this pull request Jun 13, 2025
Add cluster flushslot command. Closes valkey-io#1133.

Syntax:

    CLUSTER FLUSHSLOT <slot> [ASYNC|SYNC]

## Background

We want to implement the ability to SYNC or ASYNC free all data from a
single slot. This is useful in two cases:

1. When a primary loses ownership of a slot, it will temporarily freeze
while it synchronously frees the slot.
2. When doing resharding, you may prefer to forcibly delete all the data
from a slot and assign it to another node.

## Internal changes

Modify the function signature of `delKeysInSlot`, adding parameters
`lazy`, `propagate_del` and `send_del_event` to control the behavior.

---------

Signed-off-by: wuranxx <wuranxx@foxmail.com>
Signed-off-by: hwware <wen.hui.ware@gmail.com>
Signed-off-by: wuranxx <155042264+wuranxx@users.noreply.github.com>
Co-authored-by: hwware <wen.hui.ware@gmail.com>
Co-authored-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: shanwan1 <shanwan1@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Feature] Implement CLUSTER FLUSHSLOT command
7 participants
0