-
You must be signed in to change notification settings -
[resharding] Rewrite state column gc #13598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1 => { | ||
let shards_split_map = vec![vec![ShardId::new(0), ShardId::new(1), ShardId::new(2)]]; | ||
#[allow(deprecated)] | ||
ShardLayout::v1(boundary_accounts, Some(shards_split_map), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ShardLayout::v1
is neither used nor supported. We can remove these tests.
let (key, _) = kv.unwrap(); | ||
let shard_uid = ShardUId::try_from_slice(&key[0..8]).unwrap(); | ||
shard_uid_prefixes.insert(shard_uid); | ||
} | ||
// TODO(resharding): Handle GC after TrieStateResharder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new implementation of refcount fix, we no longer have shard_uid mapping. We can however re-enable the state column GC check that we had commented out earlier. This is the main test addition of this PR.
/// Run one more epoch. | ||
/// "Restart from the snapshot" is to ensure that we can continue producing blocks without relying on caches. | ||
#[test] | ||
fn test_long_chain_with_restart_from_snapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know why this test is failing but it has been superseded by testloop restarts tests.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13598 +/- ##
=======================================
Coverage 69.48% 69.48%
=======================================
Files 856 856
Lines 168182 168114 -68
Branches 168182 168114 -68
=======================================
- Hits 116857 116816 -41
+ Misses 46542 46520 -22
+ Partials 4783 4778 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -1134,133 +1128,104 @@ impl<'a> ChainStoreUpdate<'a> { | |||
} | |||
} | |||
|
|||
/// Returns shards that we tracked in an epoch, given a hash of the last block in the epoch. | |||
/// The block has to be available, so this function has to be called before gc is run for the block. | |||
/// Cleans up the state of the parent shard once we stop tracking the epoch where the resharding happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify a bit the "tracking the epoch" part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the language a bit, but high level I'm trying to convey, we need to cleanup parent if this was the last epoch in which the parent existed, i.e. was resharded.
let mut trie_store_update = store.trie_store().store_update(); | ||
for parent_shard_uid in shard_layout.get_split_parent_shard_uids()? { | ||
// Delete the state of the parent shard | ||
tracing::info!(target: "garbage_collection", ?parent_shard_uid, "resharding state cleanup"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it deserve to be info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ideally happens just once for the parent_shard_uid, and is quite instructive, so might as well be an info instead of debug. If you'd like me to change to debug, I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to debug
for parent_shard_uid in shard_layout.get_split_parent_shard_uids()? { | ||
// Delete the state of the parent shard | ||
tracing::info!(target: "garbage_collection", ?parent_shard_uid, "resharding state cleanup"); | ||
trie_store_update.delete_shard_uid_prefixed_state(parent_shard_uid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanity check - this should not be called on the archival nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from my understanding, GC code only runs on normal nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't GC run differently on archival nodes?
It's like a normal node GC with the addition that there's a copy loop to cold DB. Plus GC is stalling to proceed if cold DB lag is high.
I might be confusing the high level vision with the impl details though..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I might lack the understanding of cold storage loop. I think it's better to take this discussion offline.
{ | ||
tracked_shards.push(shard_uid); | ||
} | ||
assert_eq!(get_shard_uid_mapping(&store, shard_uid), shard_uid, "Incomplete Resharding"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do that before actually deleting the data :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wouldn't be deleted yet if I understand? seems it is just added to the batch so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly just a sanity check that resharding should have definitely been completed by now. Any node that continues to have this mapping would likely fail.
/// where each `ShardUId` is potentially mapped to its ancestor to get the database key prefix. | ||
/// We only remove a shard State if all its descendants are ready to be cleaned up, | ||
/// in which case, we also remove the mapping from `StateShardUIdMapping`. | ||
/// block_hash is the last block of the epoch we are cleaning up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically it can be any block, it's just this method with be a no-op for any block other than the last of the epoch
let latest_block_hash = chain_store_update.head()?.last_block_hash; | ||
let mut block_info = epoch_manager.get_block_info(&latest_block_hash)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming is very confusing since the block_info is not the block of the block_hash. Can you rename those to something like target_block_x
and current_block_x
?
let mut shards_to_cleanup = epoch_manager | ||
.get_shard_layout(epoch_manager.get_block_info(block_hash)?.epoch_id())? | ||
.shard_uids() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit1: There is a lot crammed into a single line, can you split it down?
nit2: Can you use retain for consistency with the other logic below?
let target_block_info =
let target_epoch_id =
let target_shard_layout = ..
let mut shards_to_cleanup = shard_layout.shard_uids();
shards_to_cleanup.retain(...)
.shard_uids() | ||
.filter(|shard_uid| { | ||
// Remove shards that we are tracking right now | ||
!shard_tracker.cares_about_shard_this_or_next_epoch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you're using different logic for the current epoch because it's not guaranteed that TrieChanges will already be present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes precisely. This was leading to scenarios where we continued to track state of child shards somehow, but I'm not sure how exactly was that happening.
|
||
// reverse iterate over the epochs | ||
let store = chain_store_update.store(); | ||
while !shards_to_cleanup.is_empty() && block_info.hash() != block_hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to remove the empty condition, it's seems like a tiny and irrelevant optimization. Correct me if I'm wrong ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, mostly as an optimization... Removed it
} | ||
|
||
// Delete State of `shards_to_cleanup` and associated ShardUId mapping. | ||
tracing::info!(target: "garbage_collection", ?shards_to_cleanup, ?shard_uid_mappings_to_remove, "state_cleanup"); | ||
tracing::info!(target: "garbage_collection", ?shards_to_cleanup, "state_cleanup"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I'm not sure if this deserves info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to debug
return Ok(None); | ||
) -> Result<(), Error> { | ||
// If we are GC'ing the resharding block, i.e. the last block of the epoch, clear out state for the parent shard | ||
if !epoch_manager.will_shard_layout_change(block_hash)? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, that returns true for each block in the epoch before resharding. Was not the intention to run gc_resharding for the last block only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated to include condition is_last_block_in_finished_epoch
.get_shard_layout(epoch_manager.get_block_info(block_hash)?.epoch_id())? | ||
.shard_uids() | ||
.filter(|shard_uid| { | ||
// Remove shards that we are tracking right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not tracking"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment to say // Remove shards that we are tracking from shards_to_cleanup
// Remove shards that we are tracking right now | ||
!shard_tracker.cares_about_shard_this_or_next_epoch( | ||
me, | ||
block_info.prev_hash(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename it to last_block_info
?
|
||
// reverse iterate over the epochs | ||
let store = chain_store_update.store(); | ||
while !shards_to_cleanup.is_empty() && block_info.hash() != block_hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about renaming block_hash
to last_block_hash_in_gced_epoch
.
block_info.hash() != block_hash
might be confusing
let mut block_info = epoch_manager.get_block_info(&latest_block_hash)?; | ||
|
||
// Get all the shards that belong to the epoch we are cleaning up | ||
let mut shards_to_cleanup = epoch_manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename it to potential_shards_to_cleanup
, as it is superset of shards that will be cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm.. I feel that makes the name of the variable a bit too long without adding too much extra information about it. I did try it but would prefer to stick with shards_to_cleanup
.build(); | ||
fn slow_test_resharding_v3_load_memtrie() { | ||
let params = | ||
TestReshardingParametersBuilder::default().load_memtries_for_tracked_shards(false).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of confusing why the test for _load_memtrie
would set load_memtries_for_tracked_shards(false)
. should we add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I don't understand this test too much either to add any sensible comment :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason to set load_memtries_for_tracked_shards(false)
is that...
🥁
... we want to check if memtries are loaded (requirement for resharding) even when the node doesn't automatically load memtries for tracked shard.
Basically memtrie of parent must be loaded regardless of config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧠
{ | ||
tracked_shards.push(shard_uid); | ||
} | ||
assert_eq!(get_shard_uid_mapping(&store, shard_uid), shard_uid, "Incomplete Resharding"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wouldn't be deleted yet if I understand? seems it is just added to the batch so far.
Update the state garbage collection code to take into account the new solution for refcount issue in resharding.
I don't completely understand what the old solution was doing, but the new solution divides gc into two parts, one specific to resharding, where we are simply deleting the parent shard_uid state, and second, deleting untracked shard_uids.
The logic to find untracked shard_uids is updated and simplified.
Don't worry if the old code and new code don't make 100% sense. I took several days to figure out what exactly is going on :)