8000 [resharding] Rewrite state column gc by shreyan-gupta · Pull Request #13598 · near/nearcore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[resharding] Rewrite state column gc #13598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 22, 2025
Merged

Conversation

shreyan-gupta
Copy link
Contributor
@shreyan-gupta shreyan-gupta commented May 22, 2025

Update the state garbage collection code to take into account the new solution for refcount issue in resharding.

I don't completely understand what the old solution was doing, but the new solution divides gc into two parts, one specific to resharding, where we are simply deleting the parent shard_uid state, and second, deleting untracked shard_uids.

The logic to find untracked shard_uids is updated and simplified.

Don't worry if the old code and new code don't make 100% sense. I took several days to figure out what exactly is going on :)

@shreyan-gupta shreyan-gupta requested a review from a team as a code owner May 22, 2025 08:04
1 => {
let shards_split_map = vec![vec![ShardId::new(0), ShardId::new(1), ShardId::new(2)]];
#[allow(deprecated)]
ShardLayout::v1(boundary_accounts, Some(shards_split_map), 3)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShardLayout::v1 is neither used nor supported. We can remove these tests.

let (key, _) = kv.unwrap();
let shard_uid = ShardUId::try_from_slice(&key[0..8]).unwrap();
shard_uid_prefixes.insert(shard_uid);
}
// TODO(resharding): Handle GC after TrieStateResharder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new implementation of refcount fix, we no longer have shard_uid mapping. We can however re-enable the state column GC check that we had commented out earlier. This is the main test addition of this PR.

/// Run one more epoch.
/// "Restart from the snapshot" is to ensure that we can continue producing blocks without relying on caches.
#[test]
fn test_long_chain_with_restart_from_snapshot() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know why this test is failing but it has been superseded by testloop restarts tests.

Copy link
codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 16 lines in your changes missing coverage. Please review.

Project coverage is 69.48%. Comparing base (6c7c2d4) to head (d6fc327).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/garbage_collection.rs 77.27% 0 Missing and 15 partials ⚠️
core/primitives/src/shard_layout.rs 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #13598   +/-   ##
=======================================
  Coverage   69.48%   69.48%           
=======================================
  Files         856      856           
  Lines      168182   168114   -68     
  Branches   168182   168114   -68     
=======================================
- Hits       116857   116816   -41     
+ Misses      46542    46520   -22     
+ Partials     4783     4778    -5     
Flag Coverage Δ
pytests 1.52% <0.00%> (+<0.01%) ⬆️
pytests-nightly 1.62% <0.00%> (+<0.01%) ⬆️
unittests 69.10% <80.00%> (-0.01%) ⬇️
unittests-nightly 69.01% <80.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor
@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -1134,133 +1128,104 @@ impl<'a> ChainStoreUpdate<'a> {
}
}

/// Returns shards that we tracked in an epoch, given a hash of the last block in the epoch.
/// The block has to be available, so this function has to be called before gc is run for the block.
/// Cleans up the state of the parent shard once we stop tracking the epoch where the resharding happened.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify a bit the "tracking the epoch" part?

Copy link
Contributor Author
@shreyan-gupta shreyan-gupta May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the language a bit, but high level I'm trying to convey, we need to cleanup parent if this was the last epoch in which the parent existed, i.e. was resharded.

let mut trie_store_update = store.trie_store().store_update();
for parent_shard_uid in shard_layout.get_split_parent_shard_uids()? {
// Delete the state of the parent shard
tracing::info!(target: "garbage_collection", ?parent_shard_uid, "resharding state cleanup");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it deserve to be info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ideally happens just once for the parent_shard_uid, and is quite instructive, so might as well be an info instead of debug. If you'd like me to change to debug, I can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to debug

for parent_shard_uid in shard_layout.get_split_parent_shard_uids()? {
// Delete the state of the parent shard
tracing::info!(target: "garbage_collection", ?parent_shard_uid, "resharding state cleanup");
trie_store_update.delete_shard_uid_prefixed_state(parent_shard_uid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity check - this should not be called on the archival nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from my understanding, GC code only runs on normal nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't GC run differently on archival nodes?
It's like a normal node GC with the addition that there's a copy loop to cold DB. Plus GC is stalling to proceed if cold DB lag is high.

I might be confusing the high level vision with the impl details though..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I might lack the understanding of cold storage loop. I think it's better to take this discussion offline.

{
tracked_shards.push(shard_uid);
}
assert_eq!(get_shard_uid_mapping(&store, shard_uid), shard_uid, "Incomplete Resharding");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do that before actually deleting the data :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wouldn't be deleted yet if I understand? seems it is just added to the batch so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly just a sanity check that resharding should have definitely been completed by now. Any node that continues to have this mapping would likely fail.

/// where each `ShardUId` is potentially mapped to its ancestor to get the database key prefix.
/// We only remove a shard State if all its descendants are ready to be cleaned up,
/// in which case, we also remove the mapping from `StateShardUIdMapping`.
/// block_hash is the last block of the epoch we are cleaning up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it can be any block, it's just this method with be a no-op for any block other than the last of the epoch

Comment on lines 1189 to 1190
let latest_block_hash = chain_store_update.head()?.last_block_hash;
let mut block_info = epoch_manager.get_block_info(&latest_block_hash)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is very confusing since the block_info is not the block of the block_hash. Can you rename those to something like target_block_x and current_block_x?

Comment on lines 1193 to 1195
let mut shards_to_cleanup = epoch_manager
.get_shard_layout(epoch_manager.get_block_info(block_hash)?.epoch_id())?
.shard_uids()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit1: There is a lot crammed into a single line, can you split it down?
nit2: Can you use retain for consistency with the other logic below?

let target_block_info = 
let target_epoch_id = 
let target_shard_layout = ..
let mut shards_to_cleanup = shard_layout.shard_uids();
shards_to_cleanup.retain(...)

.shard_uids()
.filter(|shard_uid| {
// Remove shards that we are tracking right now
!shard_tracker.cares_about_shard_this_or_next_epoch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you're using different logic for the current epoch because it's not guaranteed that TrieChanges will already be present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes precisely. This was leading to scenarios where we continued to track state of child shards somehow, but I'm not sure how exactly was that happening.


// reverse iterate over the epochs
let store = chain_store_update.store();
while !shards_to_cleanup.is_empty() && block_info.hash() != block_hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to remove the empty condition, it's seems like a tiny and irrelevant optimization. Correct me if I'm wrong ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, mostly as an optimization... Removed it

}

// Delete State of `shards_to_cleanup` and associated ShardUId mapping.
tracing::info!(target: "garbage_collection", ?shards_to_cleanup, ?shard_uid_mappings_to_remove, "state_cleanup");
tracing::info!(target: "garbage_collection", ?shards_to_cleanup, "state_cleanup");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again I'm not sure if this deserves info

Copy 10000 link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to debug

return Ok(None);
) -> Result<(), Error> {
// If we are GC'ing the resharding block, i.e. the last block of the epoch, clear out state for the parent shard
if !epoch_manager.will_shard_layout_change(block_hash)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, that returns true for each block in the epoch before resharding. Was not the intention to run gc_resharding for the last block only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated to include condition is_last_block_in_finished_epoch

.get_shard_layout(epoch_manager.get_block_info(block_hash)?.epoch_id())?
.shard_uids()
.filter(|shard_uid| {
// Remove shards that we are tracking right now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not tracking"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to say // Remove shards that we are tracking from shards_to_cleanup

// Remove shards that we are tracking right now
!shard_tracker.cares_about_shard_this_or_next_epoch(
me,
block_info.prev_hash(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename it to last_block_info?


// reverse iterate over the epochs
let store = chain_store_update.store();
while !shards_to_cleanup.is_empty() && block_info.hash() != block_hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about renaming block_hash to last_block_hash_in_gced_epoch.
block_info.hash() != block_hash might be confusing

let mut block_info = epoch_manager.get_block_info(&latest_block_hash)?;

// Get all the shards that belong to the epoch we are cleaning up
let mut shards_to_cleanup = epoch_manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe rename it to potential_shards_to_cleanup, as it is superset of shards that will be cleaned up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm.. I feel that makes the name of the variable a bit too long without adding too much extra information about it. I did try it but would prefer to stick with shards_to_cleanup

staffik

This comment was marked as off-topic.

.build();
fn slow_test_resharding_v3_load_memtrie() {
let params =
TestReshardingParametersBuilder::default().load_memtries_for_tracked_shards(false).build();
Copy link
Contributor
@darioush darioush May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of confusing why the test for _load_memtrie would set load_memtries_for_tracked_shards(false). should we add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't understand this test too much either to add any sensible comment :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to set load_memtries_for_tracked_shards(false) is that...
🥁
... we want to check if memtries are loaded (requirement for resharding) even when the node doesn't automatically load memtries for tracked shard.

Basically memtrie of parent must be loaded regardless of config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧠

{
tracked_shards.push(shard_uid);
}
assert_eq!(get_shard_uid_mapping(&store, shard_uid), shard_uid, "Incomplete Resharding");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wouldn't be deleted yet if I understand? seems it is just added to the batch so far.

@shreyan-gupta shreyan-gupta added this pull request to the merge queue May 22, 2025
Merged via the queue into master with commit 89a06ac May 22, 2025
28 checks passed
@shreyan-gupta shreyan-gupta deleted the shreyan/resharding/gc_new branch May 22, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0