8000 Change the hash on the doc, to a hash on the genesis file itself. · Issue #1287 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Change the hash on the doc, to a hash on the genesis file itself. #1287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #1037
sergio-mena opened this issue Aug 29, 2023 · 1 comment · Fixed by #1293
Closed
Tracked by #1037

Change the hash on the doc, to a hash on the genesis file itself. #1287

sergio-mena opened this issue Aug 29, 2023 · 1 comment · Fixed by #1293
Assignees
Labels
P:storage-optimization Priority: Give operators greater control over storage and storage optimization
Milestone

Comments

@sergio-mena
Copy link
Contributor
sergio-mena commented Aug 29, 2023

This is a follow-up issue from the PR #1017 submitted and merged

During the PR review, there were a few points highlighted to address some concerns the CometBFT team had regarding this particular PR. Since this PR was merged from a fork, a few of the suggestions by the CometBFT team to address the concerns were not possible to be accepted directly on the PR. So we merged the PR #1017 and will address these concerns in this issue.

Concerns to be addressed:

  1. The logic in PR 1017 uses the marshalled JSON bytes as the input for the hashing function. As discussed in the PR, this approach might cause some non-deterministic behaviours or it might be very error prone. JSON is not a canonical data representation, so any semantically insignificant variation, in the genesis file might result in a hash mismatch.
  2. The AppState part of the genesis is application dependent and CometBFT has no "understanding" on its internal data structures. In the GenesisDoc type, the AppState property is just a json.RawMessage so any variations in its content might cause side effects that are beyond CometBFT's control such as hash mismatches.
  3. In the current logic on PR 1017, if the node crashes before saving the genesis hash into storage some undesired side effects can happen. As per comment, the logic on this line should be changed by applying the suggestion in the comment.

Context on genesis file chain initialization and upgrades:

  1. Historically the way chain operators (node operators and validators) validate the genesis file is to compare a checksum obtained from a reference repository or a group of operators during a coordinated upgrade or new chain initialization. For example, during a chain upgrade, validators might export the current state via the CLI (e.g. gaiad export) and then they can generate a checksum of the exported genesis (based on a agreed hashing function (e.g. `sha256sum) and then they coordinate the upgrade by validating their own checksum against the "agreed" checksum of the majority of operators to ensure they all match in order to start the chain and achieve consensus. Therefore, implementing a logic that leverages the same hashing functionality on the same input used by operators is critical to ensure a reliable and trusted solution.
  2. Currently, as far as we know, there's no central repository or on-chain stored information about these genesis checksum . Historically, most chains maintain a mainnet repository somewhere in Github where they list the checksum on a page for each genesis upgrade, or the checksum might be part of the release information for a chain mainnet or upgrade in Github.

Suggested changes to be made to improve the solution:

  1. The main concern to be addressed is around the hashing functionality as outlined in the sections #1 and #2 above. In our opinion, relying on a hash of the marshalled JSON input (bytes) is not the best solution (as mentioned in the sections above). We believe that a better approach is to rely on the checksum of the genesis.json file content. By applying this logic, we use the same approach chain operators use to manually validate a the genesis file (perform a checksum of the file contents, e.g. $ sha256sum genesis.json). We believe operators will have more confidence in the solution since they can also have an external way to validate the hash information.
  2. [OPTIONAL] Add a way for operators to specify the hash (checksum) value when starting (or initializing) a node. It could be a CLI command parameter or allow them to add the value in the configuration file (e.g. genesis_hash = "b2fca9af31aae6953124439df5bc300c1ef11fb8be938aeb570767c5b6bfdaaf" that can be used to validate the hash computed by CometBFT when reading the genesis file. If the hashes don't match (computed = specified) then the node fails to start, if they match then store the hash in the database. This will ensure that they are confident on the hash that will be stored in the database. As part of the future planning, we can also return the hash information via RPC and this will make it possible for operators and other users to validate the hash information against peers or public RPC nodes and that would address point #6 above
### Tasks
- [ ] Create genesis hash using `sha`. Make sure it is properly saved into the DB.
@sergio-mena sergio-mena changed the title Change the (internal, json-based) hash on the doc, to one (likely sha256 or similar) on the genesis file itself. Change the hash on the doc, to a hash on the genesis file itself. Aug 29, 2023
@andynog andynog added the P:storage-optimization Priority: Give operators greater control over storage and storage optimization label Aug 29, 2023
@andynog andynog added this to the 2023-Q3 milestone Aug 29, 2023
@mzabaluev
Copy link
Contributor
mzabaluev commented Aug 30, 2023

I sense a wider issue about the source of truth for the chain state on genesis and how to validate it by external means.
If there is no precise, canonical representation for that in chain storage, hashing the JSON file is probably the best we can get. But as pointed out in #1017, someone can invalidate the genesis JSON by adding a newline, even if this changes nothing in the on-chain data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P:storage-optimization Priority: Give operators greater control over storage and storage optimization
Projects
No open projects
Status: Done
3 participants
0