You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CometBFT considers the crash-recovery failure model, meaning that nodes may crash and then recovery, rejoining the distributed computation in a consistent state. For this to happen, nodes should persist relevant information and state changes during their regular operation, so that during recovery they are able to restore the state they had just before crashing.
Recovering the state of a node after a crash is a tricky operation. Several modules of CometBFT persist information that they are expected to recover after a crash. The consensus protocol keeps a Write-Ahead Log (WAL) to persist crucial information. The block store, the state store, the evidence reactor, the transaction indexer, and the address book persist data to their own DBs. And the application itself should adhere to the crash-recovery failure model, implementing a persistence strategy.
Among the mentioned modules, probably the best documented recovery procedure regards ABCI applications. The consensus WAL is very superficially covered, while the other DBs are essentially not documented. In any case, the assumptions regarding the persisted state and its recovery are not documented.
It is worth noting that when the state persistence is delegated to a database, the recovery procedure tends to be straightforward, as it is provided by the database implementation. As far as I known, consensus is the only module that adopts transactional semantics for persisted data, based on a WAL. The recovery of the consensus WAL is particularly tricky and undocumented.
Definition of Done:
List all databases adopted by CometBFT modules, summarize the persistence assumptions, and document, when it is the case, the relevant aspects of the recovery procedures
Document the consensus Write-Ahead Log and the operation of the consensus protocol during recovery. This should include the interaction between consensus and the ABCI application, covered only on the application side in the existing documentation.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
CometBFT considers the crash-recovery failure model, meaning that nodes may crash and then recovery, rejoining the distributed computation in a consistent state. For this to happen, nodes should persist relevant information and state changes during their regular operation, so that during recovery they are able to restore the state they had just before crashing.
Recovering the state of a node after a crash is a tricky operation. Several modules of CometBFT persist information that they are expected to recover after a crash. The consensus protocol keeps a Write-Ahead Log (WAL) to persist crucial information. The block store, the state store, the evidence reactor, the transaction indexer, and the address book persist data to their own DBs. And the application itself should adhere to the crash-recovery failure model, implementing a persistence strategy.
Among the mentioned modules, probably the best documented recovery procedure regards ABCI applications. The consensus WAL is very superficially covered, while the other DBs are essentially not documented. In any case, the assumptions regarding the persisted state and its recovery are not documented.
It is worth noting that when the state persistence is delegated to a database, the recovery procedure tends to be straightforward, as it is provided by the database implementation. As far as I known, consensus is the only module that adopts transactional semantics for persisted data, based on a WAL. The recovery of the consensus WAL is particularly tricky and undocumented.
Definition of Done:
The text was updated successfully, but these errors were encountered: