Description
Was tendermint/tendermint#9743
Summary
Experiment with adding in-proces compaction, so that nodes don't need to be stopped to perform compaction. This issue was originally targeting levelDB but we added support for this to all cometbft database backends that support this feature: RocksDB, PebbleDB and LevelDB.
Problem Definition
Background
One of the most common problem that operators signal is that storage growth is unbounded and compaction doesn't work. Some operators stop their node, trigger experimental-compact-goleveldb
(#8564) which deletes old data, and then restart their node.
Why do we need this feature?
The use of command experimental-compact-goleveldb
has the disadvantage that while this is running the node is stopped and is missing blocks. It typically take on the order of tens of minutes to finish compaction of a node on a production network, so the number of missed blocks can be significant.
Proposal
We'll go about this incrementally
- Tendermint team does initial de-risking and sanity checks to see that in-process compaction can be implemented safely
- Add a new database type that does compaction
- We ask an operator to deploy an early experiment replacing one of their full nodes with the patched tendermint version that has in-process compaction
- relayer team tests relaying against that node, monitor general health
- We collect advanced metrics on latency in particular, as well as storage growth evolution