10000 Fix txpool loop reset logic to support setHead by hqjang-pepper · Pull Request #1316 · klaytn/klaytn · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Aug 19, 2024. It is now read-only.

Fix txpool loop reset logic to support setHead #1316

Merged
merged 6 commits into from
Jul 4, 2022

Conversation

hqjang-pepper
Copy link
Contributor

Proposed changes

When setHead function is executed, panic has occured intermittently when syncing chain segments.
This error occured when txpool tries to reinject discarded transactions of blocks, but can't get block because blockchain is rewinded.
This PR is containing this error handling.

I tested enough with private network (4cn) environment.

Types of changes

Please put an x in the boxes related to your change.

  • Bugfix
  • New feature or enhancement
  • Others

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING GUIDELINES doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes ($ make test)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Related issues

  • Please leave the issue numbers or links related to this PR here.

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@ehnuje
Copy link
Contributor
ehnuje commented Apr 12, 2022

Have you tried implementing a unit test for this case?

Copy link
Member
@aidan-kwon aidan-kwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this suggestion is proper mitigation for your issue. Can you explain the detail of your issue or share the panic log?
It seems that all behaviors of txpool as well as txpool.reset should be stopped during the SetHead. In that case, we can consider locking txpool mutex or other way.

@henry-will henry-will added the do not merge don't merge this PR yet label Apr 14, 2022
@henry-will henry-will modified the milestones: v1.8.3, v1.9.0 Apr 14, 2022
@henry-will henry-will removed the do not merge don't merge this PR yet label Apr 18, 2022
@hqjang-pepper
Copy link
Contributor Author

Sorry for the late response.
First, I will explain in detail the situation when the error occurs after executed the setHead method.
When performing TxPool.reset(oldHead,newHead)
oldHead is the head of the chain before setHead, and newHead is the head which is during syncing after completing setHead (arbitrary).
This newHead may have a number larger or smaller than the oldHead, which depends on how the downloader receives the chunk.
It is not a problem when oldHead is smaller than newHead, but when it is large, the block corresponding to oldHead does not exist in pool.chain, due to the block is deleted by setHead function.
There was no logic to handle this in the existing txpool.go code. This is because the situation which the header exists but the block does not exist does not occur when the chain operates normally.

@ehnuje I tried to write test code, but it was difficult to reproduce the situation where only the block header exists and block doesn't.

@aidan-kwon What I modified is handling of a special error that occurs only when the block corresponding to oldHead is nil when an event occurs in chainHeadCh. The reason that event does not occur in chainHeadCh during setHead is the downloader function is turned off while SetHead.
I don't think there is any logic to stop txpool with this code change.


// If newNum >= oldNum, then it's not a case of setHead.
if newNum >= oldNum {
logger.Warn("Transaction pool reset with missing oldhead",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an unexpected situation happens and stops something to do, the log should be ERROR.

Copy link
Member
@aidan-kwon aidan-kwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this PR is not harmful in normal situations. However, I don't think it is the correct mitigation for the setHead panic issue. I think we are supposed to find a way that prevents calling reset with no existing old blockhead. This change could be a partial solution to the issue, but it could conceal the real problem our code might have.
Can you explain more detail about the situation the empty oldHead is used?

@aidan-kwon aidan-kwon added the need to merge Need to merge for the next time label Jul 1, 8000 2022
aidan-kwon
aidan-kwon previously approved these changes Jul 4, 2022
Copy link
Member
@aidan-kwon aidan-kwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hqjang-pepper Please, make an issue for the following task

@hqjang-pepper hqjang-pepper merged commit 4a3164b into klaytn:dev Jul 4, 2022
@blukat29 blukat29 removed the need to merge Need to merge for the next time label Feb 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0