feat: garbage collect StateTransitionData for on block finalization #10599

pugachAG · 2024-02-09T21:40:54Z

Currently StateTransitionData is a part of regular chain GC process. Unfortunately it still reaches 300GB on mainnet, see #10578.

This PR introduces more aggressive garbage collection for StateTransitionData. If a chunk is present in a final block then we can safely clean up entries for that shard which correspond to blocks with lower height.

Note that we still keep StateTransitionData as part of epoch-based GC just in case.

Testing:

unit testing
running mainnet rpc node with shadow validation and monitoring data on disk size via near_rocksdb_live_sst_files_size and making sure that we still have required entries via near_shadow_chunk_validation_failed_total.

codecov · 2024-02-09T22:02:16Z

Codecov Report

Attention: 19 lines in your changes are missing coverage. Please review.

Comparison is base (61cec49) 72.15% compared to head (ec51751) 72.16%.
Report is 1 commits behind head on master.

Files	Patch %	Lines
.../src/stateless_validation/state_transition_data.rs	91.66%	3 Missing and 12 partials ⚠️
core/primitives/src/block_header.rs	0.00%	0 Missing and 3 partials ⚠️
chain/chain/src/chain.rs	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #10599      +/-   ##
==========================================
+ Coverage   72.15%   72.16%   +0.01%     
==========================================
  Files         725      726       +1     
  Lines      147512   147695     +183     
  Branches   147512   147695     +183     
==========================================
+ Hits       106436   106591     +155     
- Misses      36295    36303       +8     
- Partials     4781     4801      +20

Flag	Coverage Δ
backward-compatibility	`0.08% <0.00%> (-0.01%)`	⬇️
db-migration	`0.08% <0.00%> (-0.01%)`	⬇️
genesis-check	`1.25% <0.00%> (-0.01%)`	⬇️
integration-tests	`36.99% <36.02%> (-0.07%)`	⬇️
linux	`71.19% <75.26%> (+<0.01%)`	⬆️
linux-nightly	`71.61% <89.78%> (+0.01%)`	⬆️
macos	`54.87% <75.26%> (-0.20%)`	⬇️
pytests	`1.46% <0.00%> (-0.01%)`	⬇️
sanity-checks	`1.26% <0.00%> (-0.01%)`	⬇️
unittests	`68.08% <88.70%> (+0.02%)`	⬆️
upgradability	`0.13% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Longarithm · 2024-02-12T15:54:47Z

@shreyan-gupta @staffik @jancionear someone wants to review so I could just stamp approval on top of it? :)

shreyan-gupta · 2024-02-12T15:55:43Z

@shreyan-gupta @staffik @jancionear someone wants to review so I could just stamp approval on top of it? :)

On it! Give me some time

chain/chain/src/state_transition_data.rs

shreyan-gupta · 2024-02-12T16:16:08Z

chain/chain/src/state_transition_data.rs

+            start_height
+        } else {
+            *computed_start_heights
+                .get_or_try_init(|| compute_start_heights(chain_store))?


Seems like the purpose of this is basically for the first time initialization? i.e. when we don't have the start_heights stored in metadata? Could you confirm ideally this should be called only once to initialize STATE_TRANSITIONS_METADATA_KEY?

Instead of calling compute_start_heights here, how about we just take the tail height from chain here? Basically the height at which we are currently doing GC?

Yes, we only need it for the first initialisation when start_heights is not yet in the db. Now it is easier to see that with d0b14fc.
Using tail height from chain is not really feasible since it will result in 5 epoch-length worth of deletes which could slow down block processing, so I'd rather not risk it.

Just for my understanding, isn't that what we are currently doing with the first time initialization as well? compute_start_heights iterates through the whole DBCol::StateTransitionData and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?

Sounds like in either case we are taking this one time hit of iterating through a lot of entries in DBCol::StateTransitionData for first time initialization of StateTransitionStartHeights. Assuming this is to solve the issue we are seeing with shadow validation, is it fine to take this one time slowdown hit?

Oh and now I'm a bit concerned about archival nodes that maybe don't clear the DBCol::StateTransitionData column? I'm guessing there are no such nodes yet, but having this PR in place before DBCol::StateTransitionData grows for archival nodes should do the trick?

iterates through the whole DBCol::StateTransitionData and gets the min start_heights for each shard right? And we can potentially have entries from upto 5 epochs (or GC period)?

This can only be the case for stateless net, which shouldn't be an issue. Also StateTransitionData instances are fairly small there.
On mainnet it will be enabled for the first time along with stateless validation protocol upgrade, first time cleanup will only have to read a few entries. I even considered not storing start_heights at all and always deriving that from the db, but the issues that even a few StateTransitionData result in multiple megabytes of data read from the disk, so I'd rather not do that.

archival nodes

We we should never write StateTransitionData on archival nodes, so no need to clean it up.

shreyan-gupta · 2024-02-12T16:21:31Z

chain/chain/src/state_transition_data.rs

+}
+
+impl Chain {
+    pub(crate) fn garbage_collect_state_transition_data(&self, block: &Block) -> Result<(), Error> {


Do we want to push this function to chain/chain/src/garbage_collection.rs?

I understand this isn't really GC code so we can keep it here as well.

I initially considered putting it there, but decided against it for the following reasons:

as you've mentioned it is not a part of chain GC, so can be confusing

garbage_collection.rs file is already pretty big

it is more ergonomic if everything related to state transition data is in one file

Another quick optional suggestion, how about we put this in the chain/client/src/stateless_validation/.... directory? I'm fine either way but I guess StateTransitionData is only used for stateless validation?

client is one level above chain, so it is not possible to have impl Chain as part of chain/client
I will move it to chain/src/stateless_validation instead. chunk_endorsements.rs also belongs there, will move it in a separate PR.

chain/chain/src/state_transition_data.rs

…10612) A follow-up to [this comment](#10599 (comment)).

…r analysis (#11137) Save all of the observed instances of `ChunkStateWitness` to the database. They can later be fetched for debugging and analysis. Sometimes things go wrong with witness validation, and we're not sure why. In such cases it would be good to take a look at the witness that failed the validation, but currently there's no way to see the witness, it disappears after validation. Saving the witnesses in the database will allow us to inspect the witnesses after it failed validation and see what exactly is wrong, even after node crash. Adds a new database column `DBCol::LatestChunkStateWitnesses`, which keeps a set of latest observed witnesses. Size of this set is limited to `4GB` and `60*30 (30 min)` of instances. When the limit is hit, the witness with the lowest height is removed from the set. We can't really save all witnesses, at 4MB/s that would add up to 345GB/day, so we're forced to garbage collect the old witnesses. Having only the latest witnesses should be enough for debugging and analysis. Witness saving could potentially be an attack vector, as we save all witnesses, even the ones that failed validation. An attacker could spam the node with thousands of witnesses, which would all be saved to the database, which could cause denial of service. Because of that the feature is guarded by a new config option: `save_latest_witnesses`. By default it's false, so production nodes won't save anything, and won't be vulnerable to such attacks. It can be selectively enabled on test/canary nodes when needed. A new database command is added to access the saved witnesses: ```console Print observed ChunkStateWitnesses at the given block height (and shard id). Observed witnesses are only saved when `save_latest_witnesses` is set to true in config.json Usage: neard database show-latest-witnesses [OPTIONS] --height <HEIGHT> Options: --height <HEIGHT> Block height (required) --shard-id <SHARD_ID> Shard id (optional) --pretty Pretty-print using the "{:#?}" formatting --binary Print the raw &[u8], can be pasted into rust code -h, --help Print help ``` The tool allows to print observed witnesses with given height (and optionally shard id). Either as a debug print `{:?}/{:#?}`, or as a binary blob that can be pasted into rust code. I'm not sure if that's the best way to expose it for debugging, but it was the easiest one to implement. Maybe it should somehow be integrated with `debug-ui`? I'm not very familiar with it, idk if it'd make sense. Fixes: #11110 Similar to: #10599

pugachAG force-pushed the fast-state-transition-data-gc branch from 6e0a3e4 to bbe3dfd Compare February 9, 2024 22:16

pugachAG added the A-stateless-validation Area: stateless validation label Feb 9, 2024

pugachAG linked an issue Feb 9, 2024 that may be closed by this pull request

StateTransitionData disk usage on mainnet #10578

Closed

pugachAG requested review from Longarithm, shreyan-gupta and robin-near February 12, 2024 14:16

pugachAG marked this pull request as ready for review February 12, 2024 14:22

pugachAG requested a review from a team as a code owner February 12, 2024 14:22

shreyan-gupta reviewed Feb 12, 2024

View reviewed changes

jancionear reviewed Feb 12, 2024

View reviewed changes

chain/chain/src/state_transition_data.rs Outdated Show resolved Hide resolved

pugachAG added 3 commits February 12, 2024 21:34

impl

f3898e3

add tests

4dc6259

handle genesis

03b2c83

pugachAG force-pushed the fast-state-transition-data-gc branch from 0594ee5 to 03b2c83 Compare February 12, 2024 20:34

use height created

d0b14fc

pugachAG requested a review from shreyan-gupta February 12, 2024 21:19

update comment

e811b6f

shreyan-gupta approved these changes Feb 13, 2024

View reviewed changes

chain/chain/src/state_transition_data.rs Outdated Show resolved Hide resolved

pugachAG added 2 commits February 13, 2024 13:07

apply start_heights update suggestion

fd8c2d2

add chain stateless_validation mod

ec51751

jancionear approved these changes Feb 13, 2024

View reviewed changes

pugachAG added this pull request to the merge queue Feb 13, 2024

Merged via the queue into master with commit e00dcaa Feb 13, 2024

pugachAG deleted the fast-state-transition-data-gc branch February 13, 2024 17:26

pugachAG mentioned this pull request Feb 14, 2024

refactor: move chunk_endorsement.rs to stateless_validation folder #10612

Merged

github-merge-queue bot pushed a commit that referenced this pull request Feb 14, 2024

refactor: move chunk_endorsement.rs to stateless_validation folder (#…

8512574

…10612) A follow-up to [this comment](#10599 (comment)).

jancionear mentioned this pull request Apr 4, 2024

fix(stateless-validation) - fix protocol version checks #10933

Merged

jancionear mentioned this pull request Apr 23, 2024

Save observed instances of ChunkStateWitness to the database for later analysis #11137

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: garbage collect StateTransitionData for on block finalization #10599

feat: garbage collect StateTransitionData for on block finalization #10599

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: garbage collect StateTransitionData for on block finalization #10599

feat: garbage collect StateTransitionData for on block finalization #10599

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!