8000 feat: garbage collect StateTransitionData for on block finalization by pugachAG · Pull Request #10599 · near/nearcore · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: garbage collect StateTransitionData for on block finalization #10599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 13, 2024

Conversation

pugachAG
Copy link
Contributor
@pugachAG pugachAG commented Feb 9, 2024

Currently StateTransitionData is a part of regular chain GC process. Unfortunately it still reaches 300GB on mainnet, see #10578.

This PR introduces more aggressive garbage collection for StateTransitionData. If a chunk is present in a final block then we can safely clean up entries for that shard which correspond to blocks with lower height.

Note that we still keep StateTransitionData as part of epoch-based GC just in case.

Testing:

  • unit testing
  • running mainnet rpc node with shadow validation and monitoring data on disk size via near_rocksdb_live_sst_files_size and making sure that we still have required entries via near_shadow_chunk_validation_failed_total.
Screenshot 2024-02-13 at 10 32 15

Copy link
codecov bot commented Feb 9, 2024

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (61cec49) 72.15% compared to head (ec51751) 72.16%.
Report is 1 commits behind head on master.

Files Patch % Lines
.../src/stateless_validation/state_transition_data.rs 91.66% 3 Missing and 12 partials ⚠️
core/primitives/src/block_header.rs 0.00% 0 Missing and 3 partials ⚠️
chain/chain/src/chain.rs 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10599      +/-   ##
==========================================
+ Coverage   72.15%   72.16%   +0.01%     
==========================================
  Files         725      726       +1     
  Lines      147512   147695     +183     
  Branches   147512   147695     +183     
==========================================
+ Hits       106436   106591     +155     
- Misses      36295    36303       +8     
- Partials     4781     4801      +20     
Flag Coverage Δ
backward-compatibility 0.08% <0.00%> (-0.01%) ⬇️
db-migration 0.08% <0.00%> (-0.01%) ⬇️
genesis-check 1.25% <0.00%> (-0.01%) ⬇️
integration-tests 36.99% <36.02%> (-0.07%) ⬇️
linux 71.19% <75.26%> (+<0.01%) ⬆️
linux-nightly 71.61% <89.78%> (+0.01%) ⬆️
macos 54.87% <75.26%> (-0.20%) ⬇️
pytests 1.46% <0.00%> (-0.01%) ⬇️
sanity-checks 1.26% <0.00%> (-0.01%) ⬇️
unittests 68.08% <88.70%> (+0.02%) ⬆️
upgradability 0.13% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pugachAG pugachAG force-pushed the fast-state-transition-data-gc branch from 6e0a3e4 to bbe3dfd Compare February 9, 2024 22:16
@pugachAG pugachAG added the A-stateless-validation Area: stateless validation label Feb 9, 2024
@pugachAG pugachAG linked an issue Feb 9, 2024 that may be closed by this pull request
@pugachAG pugachAG marked this pull request as ready for review February 12, 2024 14:22
@pugachAG pugachAG requested a review from a team as a code owner February 12, 2024 14:22
@Longarithm
Copy link
Member

@shreyan-gupta @staffik @jancionear someone wants to review so I could just stamp approval on top of it? :)

@shreyan-gupta
Copy link
Contributor

@shreyan-gupta @staffik @jancionear someone wants to review so I could just stamp approval on top of it? :)

On it! Give me some time

start_height
} else {
*computed_start_heights
.get_or_try_init(|| compute_start_heights(chain_store))?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the purpose of this is basically for the first time initialization? i.e. when we don't have the start_heights stored in metadata? Could you confirm ideally this should be called only once to initialize STATE_TRANSITIONS_METADATA_KEY?

Instead of calling compute_start_heights here, how about we just take the tail height from chain here? Basically the height at which we are currently doing GC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we only need it for the first initialisation when start_heights is not yet in the db. Now it is easier to see that with d0b14fc.
Using tail height from chain is not really feasible since it will result in 5 epoch-length worth of deletes which could slow down block processing, so I'd rather not risk it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, isn't that what we are currently doing with the first time initialization as well? compute_start_heights iterates through the whole DBCol::StateTransitionData and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like in either case we are taking this one time hit of iterating through a lot of entries in DBCol::StateTransitionData for first time initialization of StateTransitionStartHeights. Assuming this is to solve the issue we are seeing with shadow validation, is it fine to take this one time slowdown hit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and now I'm a bit concerned about archival nodes that maybe don't clear the DBCol::StateTransitionData column? I'm guessing there are no such nodes yet, but having this PR in place before DBCol::StateTransitionData grows for archival nodes should do the trick?

Copy link
Contributor Author
@pugachAG pugachAG Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterates through the whole DBCol::StateTransitionData and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?

This can only be the case for stateless net, which shouldn't be an issue. Also StateTransitionData instances are fairly small there.
On mainnet it will be enabled for the first time along with stateless validation protocol upgrade, first time cleanup will only have to read a few entries. I even considered not storing start_heights at all and always deriving that from the db, but the issues that even a few StateTransitionData result in multiple megabytes of data read from the disk, so I'd rather not do that.

archival nodes

We we should never write StateTransitionData on archival nodes, so no need to clean it up.

}

impl Chain {
pub(crate) fn garbage_collect_state_transition_data(&self, block: &Block) -> Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to push this function to chain/chain/src/garbage_collection.rs?

I understand this isn't really GC code so we can keep it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially considered putting it there, but decided against it for the following reasons:

  1. as you've mentioned it is not a part of chain GC, so can be confusing
  2. garbage_collection.rs file is already pretty big
  3. it is more ergonomic if everything related to state transition data is in one file

Copy link
Contributor
@shreyan-gupta shreyan-gupta Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick optional suggestion, how about we put this in the chain/client/src/stateless_validation/.... directory? I'm fine either way but I guess StateTransitionData is only used for stateless validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client is one level above chain, so it is not possible to have impl Chain as part of chain/client
I will move it to chain/src/stateless_validation instead. chunk_endorsements.rs also belongs there, will move it in a separate PR.

@pugachAG pugachAG force-pushed the fast-state-transition-data-gc branch from 0594ee5 to 03b2c83 Compare February 12, 2024 20:34
@pugachAG pugachAG added this pull request to the merge queue Feb 13, 2024
Merged via the queue into master with commit e00dcaa Feb 13, 2024
@pugachAG pugachAG deleted the fast-state-transition-data-gc branch February 13, 2024 17:26
github-merge-queue bot pushed a commit that referenced this pull request Feb 14, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 30, 2024
…r analysis (#11137)

Save all of the observed instances of `ChunkStateWitness` to the
database.
They can later be fetched for debugging and analysis.

Sometimes things go wrong with witness validation, and we're not sure
why. In such cases it would be good to take a look at the witness that
failed the validation, but currently there's no way to see the witness,
it disappears after validation. Saving the witnesses in the database
will allow us to inspect the witnesses after it failed validation and
see what exactly is wrong, even after node crash.

Adds a new database column `DBCol::LatestChunkStateWitnesses`, which
keeps a set of latest observed witnesses. Size of this set is limited to
`4GB` and `60*30 (30 min)` of instances. When the limit is hit, the
witness with the lowest height is removed from the set.
We can't really save all witnesses, at 4MB/s that would add up to
345GB/day, so we're forced to garbage collect the old witnesses. Having
only the latest witnesses should be enough for debugging and analysis.

Witness saving could potentially be an attack vector, as we save all
witnesses, even the ones that failed validation. An attacker could spam
the node with thousands of witnesses, which would all be saved to the
database, which could cause denial of service.
Because of that the feature is guarded by a new config option:
`save_latest_witnesses`. By default it's false, so production nodes
won't save anything, and won't be vulnerable to such attacks. It can be
selectively enabled on test/canary nodes when needed.

A new database command is added to access the saved witnesses:
```console
Print observed ChunkStateWitnesses at the given block height (and shard id).
Observed witnesses are only saved when `save_latest_witnesses` is set to true in config.json

Usage: neard database show-latest-witnesses [OPTIONS] --height <HEIGHT>

Options:
      --height <HEIGHT>      Block height (required)
      --shard-id <SHARD_ID>  Shard id (optional)
      --pretty               Pretty-print using the "{:#?}" formatting
      --binary               Print the raw &[u8], can be pasted into rust code
  -h, --help                 Print help
```
The tool allows to print observed witnesses with given height (and
optionally shard id).
Either as a debug print `{:?}/{:#?}`, or as a binary blob that can be
pasted into rust code.

I'm not sure if that's the best way to expose it for debugging, but it
was the easiest one to implement. Maybe it should somehow be integrated
with `debug-ui`? I'm not very familiar with it, idk if it'd make sense.


Fixes: #11110
Similar to: #10599
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-stateless-validation Area: stateless validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StateTransitionData disk usage on mainnet
4 participants
0