8000 refactor(consensus): separate sending block parts from "pick part" logic by cason · Pull Request #2663 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

refactor(consensus): separate sending block parts from "pick part" logic #2663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 17, 2024

Conversation

cason
Copy link
Contributor
@cason cason commented Mar 20, 2024

This was done by @ebuchman as part of a refactoring spree :-)

The goal of this is to improve testability of gossipDataRoutine by separating out the logic for picking which block part to send. We introduced a new pure function, pickPartToSend, which returns a block part, and then call the new SendPartSetHasPart method to send it and update the peer state. This greatly simplifies the loop in gossipDataRoutine, and allows the pickPartToSend function to be independently tested.

Note for reviewers:


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments
  • Title follows the Conventional Commits spec

@cason cason changed the title WIP: Bucky/gossip data extract io refactor(consensus): separate sending block parts from "pick part" logic Mar 20, 2024
@cason cason marked this pull request as ready for review March 20, 2024 15:36
@cason cason requested review from a team as code owners March 20, 2024 15:36
@cason cason assigned cason and ebuchman and unassigned cason Mar 20, 2024
Co-authored-by: Daniel <daniel.cason@informal.systems>
Base automatically changed from sergio/gossip-votes-extract-io to main March 20, 2024 16:39
Copy link
Contributor
@melekes melekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this ❤️

@adizere adizere modified the milestone: 2024-Q1 Mar 25, 2024
// continue the loop since prs is a copy and not affected by this initialization
return nil, true // continue OUTER_LOOP
}
part := pickPartForCatchup(heightLogger, rs, prs, blockStore)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original code, if we don't have the part to send, we:

  1. sleep: time.Sleep(conR.conS.config.PeerGossipSleepDuration)
  2. continue OUTER_LOOP (we do that in any case, after calling gossipDataForCatchup)

If we have the part, we continue to OUTER_LOOP. If we don't, we don't continue.

Different behavior again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a different behavior. As commented for line 727, if sending the part to the peer fails, the main routine should sleep. This is achieved with continueLoop = false.

Copy link
Contributor Author
@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should double-check some possible changes in operation.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Apr 14, 2024
@cason cason added wip Work in progress and removed stale For use by stalebot labels Apr 15, 2024
Copy link
Contributor Author
@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me and @sergio-mena have discussed the differences in behavior of the original and the refactored code.

The conclusions were added as a comments. A minor refactoring is required, these changes will come in a following commit.

}) {
ps.SetHasProposalBlockPart(prs.Height, prs.Round, index)
}
continue OUTER_LOOP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observe that we continue even though the send operation fails (i.e., the BlockPart messages is not sent to the peer). Although this behavior can be discussed, this PR is not intended to change the existing behavior, but just to refactor the implementation.

} else {
logger.Debug("Sending block part for catchup failed")
// sleep to avoid retrying too fast
time.Sleep(conR.conS.config.PeerGossipSleepDuration)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to what reported in the previous comment (line 607), if the send operation does not succeed s (i.e., the BlockPart message is not sent to the peer), the routine sleeps for config.PeerGossipSleepDuration before this method returns. After the call of this method (gossipDataForCatchup), on line 629, the main routine is started again (continue OUTER_LOOP).

When mapped to the new, refactored behavior, this means that the routine does not continue, sleeping at the end of the for loop. If the send operation does succeed, the main routine does continue, i.e., the for loop is restarted without a previous sleep step.

// continue the loop since prs is a copy and not affected by this initialization
return nil, true // continue OUTER_LOOP
}
part := pickPartForCatchup(heightLogger, rs, prs, blockStore)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a different behavior. As commented for line 727, if sending the part to the peer fails, the main routine should sleep. This is achieved with continueLoop = false.

}
// continue the loop since prs is a copy and not effected by this initialization
if part, continueLoop := pickPartToSend(logger, conR.conS.blockStore, rs, ps, prs); part != nil {
if ps.SendPartSetHasPart(part, prs) {
continue OUTER_LOOP
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If sending the part fails, we have two different behaviors:

  1. In line 607, as detailed in the commented there anchored, we should continue (i.e., restart the loop without sleeping)
  2. In line 727, as detailed in the commented there anchored, we should not continue (i.e, continue the loop, thus sleeping at the end of the loop)

This implementation does not allow for this distinction. In fact, in the absence of an else clause, we only implement the behavior 2.

F438
@cason cason added this pull request to the merge queue Apr 17, 2024
Merged via the queue into main with commit 163f2d5 Apr 17, 2024
35 checks passed
@cason cason deleted the bucky/gossip-data-extract-io branch April 17, 2024 07:41
@melekes
Copy link
Contributor
melekes commented May 9, 2024

@mergify backport v1.x

Copy link
Contributor
mergify bot commented May 9, 2024

backport v1.x

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request May 9, 2024
…gic (#2663)

This was done by @ebuchman as part of a refactoring spree :-)

The goal of this is to improve testability of `gossipDataRoutine` by
separating out the logic for picking which block part to send. We
introduced a new pure function, `pickPartToSend`, which returns a block
part, and then call the new `SendPartSetHasPart` method to send it and
update the peer state. This greatly simplifies the loop in
`gossipDataRoutine`, and allows the `pickPartToSend` function to be
independently tested.

Note for reviewers:
* This PR is built on top of #2659
* This is pure refactoring, no logic change was intended.
* This PR can be reviewed commit by commit

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec

---------

Co-authored-by: Sergio Mena <sergio@informal.systems>
Co-authored-by: Ethan Buchman <ethan@coinculture.info>
(cherry picked from commit 163f2d5)

# Conflicts:
#	internal/consensus/reactor.go
@melekes
Copy link
Contributor
melekes commented May 9, 2024

@mergify backport v1.x

Copy link
Contributor
mergify bot commented May 9, 2024

backport v1.x

✅ Backports have been created

melekes added a commit that referenced this pull request May 9, 2024
…gic (backport #2663) (#3046)

This was done by @ebuchman as part of a refactoring spree :-)

The goal of this is to improve testability of `gossipDataRoutine` by
separating out the logic for picking which block part to send. We
introduced a new pure function, `pickPartToSend`, which returns a block
part, and then call the new `SendPartSetHasPart` method to send it and
update the peer state. This greatly simplifies the loop in
`gossipDataRoutine`, and allows the `pickPartToSend` function to be
independently tested.

Note for reviewers:
* This PR is built on top of #2659
* This is pure refactoring, no logic change was intended.
* This PR can be reviewed commit by commit

---

#### PR checklist

- [ ] Tests written/updated
- [ ] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] Updated relevant documentation (`docs/` or `spec/`) and code
comments
- [x] Title follows the [Conventional
Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec
<hr>This is an automatic backport of pull request #2663 done by
[Mergify](https://mergify.com).

---------

Co-authored-by: Daniel <daniel.cason@informal.systems>
Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus wip Work in progress
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants
0