-
Notifications
You must be signed in to change notification settings - Fork 636
spec:abci2.0 - clarify crash recovery mechanism #469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking.
spec/abci/abci++_app_requirements.md
Outdated
last block it successfully completed Commit for. | ||
last block it successfully completed `Commit` for. | ||
|
||
The application is expected to persist it's state during the `Commit` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application is expected to persist it's state during the `Commit` method. | |
The application is expected to persist its state during the `Commit` method. |
spec/abci/abci++_app_requirements.md
Outdated
In other words, if `Commit` for a block H has not been executed and CometBFT crashed, | ||
the application | ||
should not have persisted and relied on any state between the `Commit` for H - 1 and | ||
H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really part of this PR, but in line 824, below, it states that either Commit completed and last_block_height is now H or Commit failed and last_block_height is still H-1, but even if Commit crashed midway, it may have crashed after the advancing to H. So the restriction should be last_block_height and last_block_app_hash agree, not that they are agree on H-1.
Suggestion
If the app successfully committed block `H`, then `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`.
If the app failed during the Commit of block `H`, then either
- `last_block_height = H-1` and `last_block_app_hash = <hash returned by Commit for block H-1, which is the hash in the header of block H>`; or,
- `last_block_height = H` and `last_block_app_hash = &l
8000
t;hash returned by Commit for block H>`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would, first, refer that last_block_height
and last_block_app_hash
are the fields returned by the Info
method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second, is it not possible to have one of the fields updated (to H) but the other not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third, I agree with Lasaro here that the app could crash during the Commit
of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope #493 can help us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?
It must be the case that both values are updated during commit, very likely as the the last steps, and the problem is ensuring that both updates are part of one atomic operation, probably along with putting the final nail on the old's state coffin. Since we don't control that, as it happens on the app side, we must require the app to implement this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application typically uses something like leveldb. Correct me if I am wrong, but this kind of databases do not support transactions. So, while both writes are probably atomic, if a crash occurs between them the state can become inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main invariant is that the App state is always committed after consensus persists the results (returned by the app) of executing the block.
So the app and consensus "persist" actions don't need to be atomic. Upon recovery, the logic described later on in this section is used to replay anything the app has "forgotten" (this is figured out with the handshake after recovery, using Info
method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking++.
spec/abci/abci++_app_requirements.md
Outdated
In other words, if `Commit` for a block H has not been executed and CometBFT crashed, | ||
the application | ||
should not have persisted and relied on any state between the `Commit` for H - 1 and | ||
H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would, first, refer that last_block_height
and last_block_app_hash
are the fields returned by the Info
method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.
spec/abci/abci++_app_requirements.md
Outdated
In other words, if `Commit` for a block H has not been executed and CometBFT crashed, | ||
the application | ||
should not have persisted and relied on any state between the `Commit` for H - 1 and | ||
H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second, is it not possible to have one of the fields updated (to H) but the other not?
spec/abci/abci++_app_requirements.md
Outdated
In other words, if `Commit` for a block H has not been executed and CometBFT crashed, | ||
the application | ||
should not have persisted and relied on any state between the `Commit` for H - 1 and | ||
H. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third, I agree with Lasaro here that the app could crash during the Commit
of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.
spec/abci/abci++_app_requirements.md
Outdated
``` | ||
APP: Execute block Persist state | ||
/ return ResultFinalizeBlock / | ||
/ / | ||
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state | ||
| / | / | | ||
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit---- | ||
on in the (txResults, validator | ||
Block block store updated,..) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``` | |
APP: Execute block Persist state | |
/ return ResultFinalizeBlock / | |
/ / | |
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state | |
| / | / | | |
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit---- | |
on in the (txResults, validator | |
Block block store updated,..) |
APP: Execute block Persist application
/ return ResultFinalizeBlock / state
/ / |
Event: -------------- block_stored -----------/-------------state_stored -------------/---app_persisted_state
| / | /
CometBFT: Decide ---- Persist ----------- Call ------------- Persist ------------ Call Commit----
on Block block in the FinalizeBlock results
blockstore (txResults, validator updated,..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rendered weirdly :) I got the suggestion I think, but please confirm :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome improvement of the crash-recovery section, finally! Thanks for taking care of this!
Co-authored-by: Sergio Mena <sergio@informal.systems>
Closes #203 .