8000 spec:abci2.0 - clarify crash recovery mechanism by jmalicevic · Pull Request #469 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

spec:abci2.0 - clarify crash recovery mechanism #469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 17, 2023
108 changes: 51 additions & 57 deletions spec/abci/abci++_app_requirements.md
Original file line number Diff line number Diff line change
Expand Up @@ -804,71 +804,65 @@ The expectation is for there to be some number of high level paths
differentiating concerns, like `/p2p`, `/store`, and `/app`. Currently,
CometBFT only uses `/p2p`, for filtering peers. For more advanced use, see the
implementation of
[Query in the Cosmos-SDK](https://github.com/cosmos/cosmos-sdk/blob/v0.23.1/baseapp/baseapp.go#L333).
[Query in the Cosmos-SDK](https://github.com/cosmos/cosmos-sdk/blob/e2037f7696fed4fdd4bc076f9e7053fe8178a881/baseapp/abci.go#L557-L565).

### Crash Recovery

On startup, CometBFT calls the `Info` method on the Info Connection to get the latest
committed state of the app. The app MUST return information consistent with the
last block it successfully completed Commit for.

If the app successfully committed block H, then `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`. If the app
failed during the Commit of block H, then `last_block_height = H-1` and
`last_block_app_hash = <hash returned by Commit for block H-1, which is the hash in the header of block H>`.
CometBFT and the application are expected to crash together and there should not
exist a scenario where the application has persisted state of a height greater than the
latest height persisted by CometBFT.

We now distinguish three heights, and describe how CometBFT syncs itself with
the app.

```md
storeBlockHeight = height of the last block CometBFT saw a commit for
stateBlockHeight = height of the last block for which CometBFT completed all
block processing and saved all ABCI results to disk
appBlockHeight = height of the last block for which ABCI app successfully
completed Commit
In practice, persisting the state of a height consists of three steps, the last of which
is the call to the application's `Commit` method, the only place where the application is expected to
persist/commit its state.
On startup (upon recovery), CometBFT calls the `Info` method on the Info Connection to get the latest
committed state of the app. The app MUST return information consistent with the
last block for which it successfully completed `Commit`.

The three steps performed before the state of a height is considered persisted are:
- The block is stored by CometBFT in the blockstore
- CometBFT has stored the state returned by the application through `FinalizeBlockResponse`
- The application has committed its state within `Commit`.

The following diagram depicts the order in which these events happen, and the corresponding
ABCI functions that are called and executed by CometBFT and the application:


```
APP: Execute block Persist application state
/ return ResultFinalizeBlock /
/ /
Event: ------------- block_stored ------------ / ------------ state_stored --------------- / ----- app_persisted_state
| / | / |
CometBFT: Decide --- Persist block -- Call FinalizeBlock - Persist results ---------- Call Commit --
on in the (txResults, validator
Block block store updates...)

```

Note we always have `storeBlockHeight >= stateBlockHeight` and `storeBlockHeight >= appBlockHeight`
Note also CometBFT never calls Commit on an ABCI app twice for the same height.

The procedure is as follows.

First, some simple start conditions:

If `appBlockHeight == 0`, then call InitChain.

If `storeBlockHeight == 0`, we're done.

Now, some sanity checks:

If `storeBlockHeight < appBlockHeight`, error
If `storeBlockHeight < stateBlockHeight`, panic
If `storeBlockHeight > stateBlockHeight+1`, panic

Now, the meat:

If `storeBlockHeight == stateBlockHeight && appBlockHeight < storeBlockHeight`,
replay all blocks in full from `appBlockHeight` to `storeBlockHeight`.
This happens if we completed processing the block, but the app forgot its height.

If `storeBlockHeight == stateBlockHeight && appBlockHeight == storeBlockHeight`, we're done.
This happens if we crashed at an opportune spot.

If `storeBlockHeight == stateBlockHeight+1`
This happens if we started processing the block but didn't finish.

If `appBlockHeight < stateBlockHeight`
replay all blocks in full from `appBlockHeight` to `storeBlockHeight-1`,
and replay the block at `storeBlockHeight` using the WAL.
This happens if the app forgot the last block it committed.

If `appBlockHeight == stateBlockHeight`,
replay the last block (storeBlockHeight) in full.
This happens if we crashed before the app finished Commit
As these three steps are not atomic, we observe different cases based on which steps have been executed
before the crash occurred
(we assume that at least `block_stored` has been executed, otherwise, there is no state persisted,
and the operations for this height are repeated entirely):

- `block_stored`: we replay `FinalizeBlock` and the steps afterwards.
- `block_stored` and `state_stored`: As the app did not persist its state within `Commit`, we need to re-execute
`FinalizeBlock` to retrieve the results and compare them to the state stored by CometBFT within `state_stored`.
The expected case is that the states will match, otherwise CometBFT panics.
- `block_stored`, `state_stored`, `app_persisted_state`: we move on to the next height.

Based on the sequence of these events, CometBFT will panic if any of the steps in the sequence happen out of order,
that is if:
- The application has persisted a block at a height higher than the blocked saved during `state_stored`.
- The `block_stored` step persisted a block at a height smaller than the `state_stored`
- And the difference between the heights of the blocks persisted by `state_stored` and `block_stored` is more
than 1 (this corresponds to a scenario where we stored two blocks in the block store but never persisted the state of the first
block, which should never happen).

A special case is when a crash happens before the first block is committed - that is, after calling
`InitChain`. In that case, the application's state should still be at height 0 and thus `InitChain`
will be called again.

If `appBlockHeight == storeBlockHeight`
update the state using the saved ABCI responses but don't run the block against the real app.
This happens if we crashed after the app finished Commit but before CometBFT saved the state.

### State Sync

Expand Down
0