-
Notifications
You must be signed in to change notification settings - Fork 710
feat: garbage collect StateTransitionData for on block finalization #10599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #10599 +/- ##
==========================================
+ Coverage 72.15% 72.16% +0.01%
==========================================
Files 725 726 +1
Lines 147512 147695 +183
Branches 147512 147695 +183
==========================================
+ Hits 106436 106591 +155
- Misses 36295 36303 +8
- Partials 4781 4801 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
6e0a3e4
to
bbe3dfd
Compare
@shreyan-gupta @staffik @jancionear someone wants to review so I could just stamp approval on top of it? :) |
On it! Give me some time |
start_height | ||
} else { | ||
*computed_start_heights | ||
.get_or_try_init(|| compute_start_heights(chain_store))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the purpose of this is basically for the first time initialization? i.e. when we don't have the start_heights stored in metadata? Could you confirm ideally this should be called only once to initialize STATE_TRANSITIONS_METADATA_KEY
?
Instead of calling compute_start_heights
here, how about we just take the tail height from chain here? Basically the height at which we are currently doing GC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we only need it for the first initialisation when start_heights
is not yet in the db. Now it is easier to see that with d0b14fc.
Using tail height from chain is not really feasible since it will result in 5 epoch-length worth of deletes which could slow down block processing, so I'd rather not risk it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, isn't that what we are currently doing with the first time initialization as well? compute_start_heights
iterates through the whole DBCol::StateTransitionData
and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like in either case we are taking this one time hit of iterating through a lot of entries in DBCol::StateTransitionData
for first time initialization of StateTransitionStartHeights
. Assuming this is to solve the issue we are seeing with shadow validation, is it fine to take this one time slowdown hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and now I'm a bit concerned about archival nodes that maybe don't clear the DBCol::StateTransitionData
column? I'm guessing there are no such nodes yet, but having this PR in place before DBCol::StateTransitionData grows for archival nodes should do the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterates through the whole DBCol::StateTransitionData and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?
This can only be the case for stateless net, which shouldn't be an issue. Also StateTransitionData
instances are fairly small there.
On mainnet it will be enabled for the first time along with stateless validation protocol upgrade, first time cleanup will only have to read a few entries. I even considered not storing start_heights
at all and always deriving that from the db, but the issues that even a few StateTransitionData
result in multiple megabytes of data read from the disk, so I'd rather not do that.
archival nodes
We we should never write StateTransitionData
on archival nodes, so no need to clean it up.
} | ||
|
||
impl Chain { | ||
pub(crate) fn garbage_collect_state_transition_data(&self, block: &Block) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to push this function to chain/chain/src/garbage_collection.rs
?
I understand this isn't really GC code so we can keep it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially considered putting it there, but decided against it for the following reasons:
- as you've mentioned it is not a part of chain GC, so can be confusing
garbage_collection.rs
file is already pretty big- it is more ergonomic if everything related to state transition data is in one file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quick optional suggestion, how about we put this in the chain/client/src/stateless_validation/....
directory? I'm fine either way but I guess StateTransitionData
is only used for stateless validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client is one level above chain, so it is not possible to have impl Chain
as part of chain/client
I will move it to chain/src/stateless_validation
instead. chunk_endorsements.rs
also belongs there, will move it in a separate PR.
0594ee5
to
03b2c83
Compare
…10612) A follow-up to [this comment](#10599 (comment)).
…r analysis (#11137) Save all of the observed instances of `ChunkStateWitness` to the database. They can later be fetched for debugging and analysis. Sometimes things go wrong with witness validation, and we're not sure why. In such cases it would be good to take a look at the witness that failed the validation, but currently there's no way to see the witness, it disappears after validation. Saving the witnesses in the database will allow us to inspect the witnesses after it failed validation and see what exactly is wrong, even after node crash. Adds a new database column `DBCol::LatestChunkStateWitnesses`, which keeps a set of latest observed witnesses. Size of this set is limited to `4GB` and `60*30 (30 min)` of instances. When the limit is hit, the witness with the lowest height is removed from the set. We can't really save all witnesses, at 4MB/s that would add up to 345GB/day, so we're forced to garbage collect the old witnesses. Having only the latest witnesses should be enough for debugging and analysis. Witness saving could potentially be an attack vector, as we save all witnesses, even the ones that failed validation. An attacker could spam the node with thousands of witnesses, which would all be saved to the database, which could cause denial of service. Because of that the feature is guarded by a new config option: `save_latest_witnesses`. By default it's false, so production nodes won't save anything, and won't be vulnerable to such attacks. It can be selectively enabled on test/canary nodes when needed. A new database command is added to access the saved witnesses: ```console Print observed ChunkStateWitnesses at the given block height (and shard id). Observed witnesses are only saved when `save_latest_witnesses` is set to true in config.json Usage: neard database show-latest-witnesses [OPTIONS] --height <HEIGHT> Options: --height <HEIGHT> Block height (required) --shard-id <SHARD_ID> Shard id (optional) --pretty Pretty-print using the "{:#?}" formatting --binary Print the raw &[u8], can be pasted into rust code -h, --help Print help ``` The tool allows to print observed witnesses with given height (and optionally shard id). Either as a debug print `{:?}/{:#?}`, or as a binary blob that can be pasted into rust code. I'm not sure if that's the best way to expose it for debugging, but it was the easiest one to implement. Maybe it should somehow be integrated with `debug-ui`? I'm not very familiar with it, idk if it'd make sense. Fixes: #11110 Similar to: #10599
Currently
StateTransitionData
is a part of regular chain GC process. Unfortunately it still reaches 300GB on mainnet, see #10578.This PR introduces more aggressive garbage collection for
StateTransitionData
. If a chunk is present in a final block then we can safely clean up entries for that shard which correspond to blocks with lower height.Note that we still keep
StateTransitionData
as part of epoch-based GC just in case.Testing:
near_rocksdb_live_sst_files_size
and making sure that we still have required entries vianear_shadow_chunk_validation_failed_total
.