8000 spec:abci2.0 - clarify crash recovery mechanism by jmalicevic · Pull Request #469 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

spec:abci2.0 - clarify crash recovery mechanism #469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 17, 2023

Conversation

jmalicevic
Copy link
Contributor

Closes #203 .

@jmalicevic jmalicevic added abci Application blockchain interface spec Specification-related labels Mar 6, 2023
@jmalicevic jmalicevic added this to the 2023-Q1 milestone Mar 6, 2023
@jmalicevic jmalicevic requested a review from a team as a code owner March 6, 2023 12:13
@jmalicevic jmalicevic self-assigned this Mar 6, 2023
@jmalicevic jmalicevic requested a review from a team as a code owner March 6, 2023 12:13
Copy link
Contributor
@lasarojc lasarojc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking.

last block it successfully completed Commit for.
last block it successfully completed `Commit` for.

The application is expected to persist it's state during the `Commit` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The application is expected to persist it's state during the `Commit` method.
The application is expected to persist its state during the `Commit` method.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really part of this PR, but in line 824, below, it states that either Commit completed and last_block_height is now H or Commit failed and last_block_height is still H-1, but even if Commit crashed midway, it may have crashed after the advancing to H. So the restriction should be last_block_height and last_block_app_hash agree, not that they are agree on H-1.

Suggestion

If the app successfully committed block `H`, then `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`. 
If the app failed during the Commit of block `H`, then either 
- `last_block_height = H-1` and `last_block_app_hash = <hash returned by Commit for block H-1, which is the hash in the header of block H>`; or, 
- `last_block_height = H` and `last_block_app_hash = &l
8000
t;hash returned by Commit for block H>`. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, first, refer that last_block_height and last_block_app_hash are the fields returned by the Info method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it not possible to have one of the fields updated (to H) but the other not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third, I agree with Lasaro here that the app could crash during the Commit of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope #493 can help us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value assigned mid-commit before the crash persisted? As in, is the state not updated atomically (or at least saved)?

It must be the case that both values are updated during commit, very likely as the the last steps, and the problem is ensuring that both updates are part of one atomic operation, probably along with putting the final nail on the old's state coffin. Since we don't control that, as it happens on the app side, we must require the app to implement this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The application typically uses something like leveldb. Correct me if I am wrong, but this kind of databases do not support transactions. So, while both writes are probably atomic, if a crash occurs between them the state can become inconsistent.

Copy link
Contributor
@sergio-mena sergio-mena Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main invariant is that the App state is always committed after consensus persists the results (returned by the app) of executing the block.
So the app and consensus "persist" actions don't need to be atomic. Upon recovery, the logic described later on in this section is used to replay anything the app has "forgotten" (this is figured out with the handshake after recovery, using Info method).

Copy link
Contributor
@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking++.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would, first, refer that last_block_height and last_block_app_hash are the fields returned by the Info method (https://github.com/cometbft/cometbft/blob/7c7b48039453aa66fd6b128af6701651e1398319/spec/abci/abci%2B%2B_methods.md). This because it was not clear to me, and these variables are only used here on the text.

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second, is it not possible to have one of the fields updated (to H) but the other not?

In other words, if `Commit` for a block H has not been executed and CometBFT crashed,
the application
should not have persisted and relied on any state between the `Commit` for H - 1 and
H.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third, I agree with Lasaro here that the app could crash during the Commit of height H and it will have the two fields updated to H state, although Comet will not be aware of this because the crash occurs before returning the control to Comet.

Comment on lines 831 to 839
```
APP: Execute block Persist state
/ return ResultFinalizeBlock /
/ /
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state
| / | / |
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit----
on in the (txResults, validator
Block block store updated,..)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
APP: Execute block Persist state
/ return ResultFinalizeBlock /
/ /
Event: ------------- block_stored -------------/ -------------state_stored ----------------/---app_persisted_state
| / | / |
CometBFT: Decides - Persistes block ---Call `FinalizeBlock` Persist results------------Commit----
on in the (txResults, validator
Block block store updated,..)

APP: Execute block Persist application
/ return ResultFinalizeBlock / state
/ / |
Event: -------------- block_stored -----------/-------------state_stored -------------/---app_persisted_state
| / | /
CometBFT: Decide ---- Persist ----------- Call ------------- Persist ------------ Call Commit----
on Block block in the FinalizeBlock results
blockstore (txResults, validator updated,..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rendered weirdly :) I got the suggestion I think, but please confirm :)

Copy link
Contributor
@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome improvement of the crash-recovery section, finally! Thanks for taking care of this!

@jmalicevic jmalicevic merged commit 19700ac into feature/abci++vef Mar 17, 2023
@jmalicevic jmalicevic deleted the jasmina/spec-crash-recovery branch March 17, 2023 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abci Application blockchain interface spec Specification-related
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants
0