8000 [API] Fixed corrupt governance state after rewinding chain by hyunsooda · Pull Request #1966 · klaytn/klaytn · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Aug 19, 2024. It is now read-only.

[API] Fixed corrupt governance state after rewinding chain #1966

Merged
merged 20 commits into from
Oct 16, 2023

Conversation

hyunsooda
Copy link
Contributor
@hyunsooda hyunsooda commented Sep 12, 2023

Proposed changes

Added additional clear logics after debug.SetHead.

Problem Statement

The existing setHead implementation considered only for the block data such as receipts and transactions. This PR addresses governance (header and contract) and staking state.

What PR Does

---------------------
setHead (remove transaction, receipt, and block)
remove governance database
---------------------
Post-task
  1. Initiailize snapshot cache
  2. Initiailize governance cache and last state block number

Compared to two related PRs (#1186, #1756), this PR does not bridge additional implementation to fix this problem. I just defined the post-task and these utilize the existing implementation.

Cases

Case 1: header governance (+ cache)

  • Verification error fixed
  • Delete local database and memory cache

Case 2: contract governance

  • We don't need to put additional workaround because the contract-based governance state does not have a dependency on local MiscDB and in-memory cache.

Case 3: staking info DB (+ cache)

  • Delete local database and cache

Case 4: Other DBs (i.e., state DB, MiscDB, etc)

  • There seems not a strong motivation to remove existing state DB by the setHead call.

Bug fix

The original codebase contained a bug where a discrepancy arises between the blockchain state and the headerchain state when lost state data appears because blockchain goes further delete operation exceeding the origin target number which is the origin block number to be rewound. Fixed in this PR together.

Test

[Prepration]

epoch = 10
# of validators = 4

[Execution]

  1. Vote the following item at the beginning of the network. governance.vote("reward.mintingamount","123")
  2. klay.getHeader(5) contains non-empty vote data
  3. klay.getHeader(10) contains non-empty governance data
  4. Let's rewind 1,2,3,4,5,6,7,8,9,10
    ex) debug.setHead(7)
  5. Wait to be synced from connected peers
    • Before this PR: A bad block appears at the time of verifying block number 10, which contains non-empty governance data.
    • After this PR: Fixed
  6. Catch up the latest block

[Secanrio test]
(Preparation is the same with above)

  1. Vote at block 33
  2. Terminate a node at block 36
  3. Restart the node and wait to sync latest block number
  4. Failed to verify block 40 that contains non-empty governnace data
  5. call klay.setHead(30).
  6. Successfully catchup the latest block number
  7. (additional) call klay.setHead() with 10, 20, 30, 33, 35, 40, n works.

Types of changes

Please put an x in the boxes related to your change.

  • Bugfix
  • New feature or enhancement
  • Others

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING GUIDELINES doc
  • I have sign 8000 ed the CLA
  • Lint and unit tests pass locally with my changes ($ make test)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Related issues

  • Please leave the issue numbers or links related to this PR here.

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

Co-authored-by: Yunjong Jeong (ollie) <jeongyoonjong@gmail.com>
@hyunsooda hyunsooda self-assigned this Sep 18, 2023
@hyunsooda
Copy link
Contributor Author

As this change deletes memory cache and local database only, resync resolve it natrually. Not changed process and communication at consensus and governance layers, thus no hardfork seems required.

blukat29
blukat29 previously approved these changes Sep 19, 2023
@blukat29 blukat29 changed the title Fixed corrupt state after rewinding chain Fixed corrupt governance state after rewinding chain Sep 20, 2023
@hyunsooda
Copy link
Contributor Author

@blukat29, Updated the description of how to test. PTAL.

2dvorak
2dvorak previously approved these changes Sep 25, 2023
@blukat29 blukat29 marked this pull request as draft September 26, 2023 01:53
@hyunsooda hyunsooda dismissed stale reviews from 2dvorak and blukat29 via 48a5151 September 26, 2023 02:44
@hyunsooda hyunsooda marked this pull request as ready for review September 26, 2023 02:51
@hyunsooda
Copy link
Contributor Author

Given feedback, the updated version removes snapshot and staking info database and memory cache. Also, added an additional fix for the previously existing bug. The description updated. PTAL. Thank you very much.

@hyunsooda
Copy link
Contributor Author

@yoomee1313, I added the scenario test you provided in the description.

Also, during the scenario test, I found a bug, which is not related to the sethead implementation. Rather than fixing it together here, allow me to upload a dedicated one separately. Thanks.

@yoomee1313
Copy link
Contributor

Thanks @hyunsooda. I didn't know there was another issue..

Apart from that, I have tested next scenario and it works well with this PR.
Steps.

  1. Block 15: vote
  2. Block 30: governance block
  3. Block 34: call setHead(20)

Result.

  • dev
    bad block has been occured when processing governance block(30)
ERROR[10/06,14:14:25 +08] [30] Verification Error                        len(receivedChangeSet)=1 len(changeSet)=0
ERROR[10/06,14:14:25 +08] [30] Verification Error                        key=reward.ratio received=30/40/30 have=nil receivedType=string haveType=nil
WARN[10/06,14:14:25 +08] [48] Failed to load old bad blocks             error="data is not found with the given key"
ERROR[10/06,14:14:25 +08] [5] ########## BAD BLOCK ######### Chain config: {ChainID: 940625
  • PR
    bad block disappeared

@hyunsooda hyunsooda merged commit 9a68d71 into klaytn:dev Oct 16, 2023
@hyunsooda hyunsooda changed the title Fixed corrupt governance state after rewinding chain [API] Fixed corrupt governance state after rewinding chain Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0