-
Notifications
You must be signed in to change notification settings - Fork 636
(*pex.addrBook).Wait()
never called, occasionally continues writing after (*node.Node).Wait()
returns
#646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Legit. For the solution I would propose something different. I think the change, which is needed, has to be made in the PEX reactor, which in fact controls the address book. It is up to the PEX reactor, to stop the address book in a proper way, waiting for the persistence routine to stop. More specifically and for context, the address book is created by the |
This bug is very likely to be observed on |
Closes #646, #652 <!-- Please add a reference to the issue that this PR addresses and indicate which files are most critical to review. If it fully addresses a particular issue, please include "Closes #XXX" (where "XXX" is the issue number). If this PR is non-trivial/large/complex, please ensure that you have either created an issue that the team's had a chance to respond to, or had some discussion with the team prior to submitting substantial pull requests. The team can be reached via GitHub Discussions or the Cosmos Network Discord server in the #cometbft channel. GitHub Discussions is preferred over Discord as it allows us to keep track of conversations topically. https://github.com/cometbft/cometbft/discussions If the work in this PR is not aligned with the team's current priorities, please be advised that it may take some time before it is merged - especially if it has not yet been discussed with the team. See the project board for the team's current priorities: https://github.com/orgs/cometbft/projects/1 --> --- #### PR checklist - [ ] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code comments~~
Closes #646, #652 <!-- Please add a reference to the issue that this PR addresses and indicate which files are most critical to review. If it fully addresses a particular issue, please include "Closes #XXX" (where "XXX" is the issue number). If this PR is non-trivial/large/complex, please ensure that you have either created an issue that the team's had a chance to respond to, or had some discussion with the team prior to submitting substantial pull requests. The team can be reached via GitHub Discussions or the Cosmos Network Discord server in the #cometbft channel. GitHub Discussions is preferred over Discord as it allows us to keep track of conversations topically. https://github.com/cometbft/cometbft/discussions If the work in this PR is not aligned with the team's current priorities, please be advised that it may take some time before it is merged - especially if it has not yet been discussed with the team. See the project board for the team's current priorities: https://github.com/orgs/cometbft/projects/1 --> --- #### PR checklist - [ ] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code comments~~ (cherry picked from commit 8a0a2f6) # Conflicts: # p2p/pex/pex_reactor.go
fix(p2p/pex): gracefully shutdown <
8000
a title="fix(p2p/pex): gracefully shutdown `Reactor` (#2010)
Closes https://github.com/cometbft/cometbft/issues/646,
https://github.com/cometbft/cometbft/issues/652
<!--
Please add a reference to the issue that this PR addresses and indicate
which
files are most critical to review. If it fully addresses a particular
issue,
please include "Closes #XXX" (where "XXX" is the issue number).
If this PR is non-trivial/large/complex, please ensure that you have
either
created an issue that the team's had a chance to respond to, or had some
discussion with the team prior to submitting substantial pull requests.
The team
can be reached via GitHub Discussions or the Cosmos Network Discord
server in
the #cometbft channel. GitHub Discussions is preferred over Discord as
it
allows us to keep track of conversations topically.
https://github.com/cometbft/cometbft/discussions
If the work in this PR is not aligned with the team's current
priorities, please
be advised that it may take some time before it is merged - especially
if it has
not yet been discussed with the team.
See the project board for the team's current priorities:
https://github.com/orgs/cometbft/projects/1
-->
---
#### PR checklist
- [ ] Tests written/updated
- [x] Changelog entry added in `.changelog` (we use
[unclog](https://github.com/informalsystems/unclog) to manage our
changelog)
- [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code
comments~~" data-pjax="true" class="Link--secondary markdown-title" href="/cometbft/cometbft/commit/8f5552b96bd047cec8aa8f9ea9805472c1433e82">Reactor
(#2010)
Closes #646, #652 <!-- Please add a reference to the issue that this PR addresses and indicate which files are most critical to review. If it fully addresses a particular issue, please include "Closes #XXX" (where "XXX" is the issue number). If this PR is non-trivial/large/complex, please ensure that you have either created an issue that the team's had a chance to respond to, or had some discussion with the team prior to submitting substantial pull requests. The team can be reached via GitHub Discussions or the Cosmos Network Discord server in the #cometbft channel. GitHub Discussions is preferred over Discord as it allows us to keep track of conversations topically. https://github.com/cometbft/cometbft/discussions If the work in this PR is not aligned with the team's current priorities, please be advised that it may take some time before it is merged - especially if it has not yet been discussed with the team. See the project board for the team's current priorities: https://github.com/orgs/cometbft/projects/1 --> --- #### PR checklist - [ ] Tests written/updated - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [ ] ~~Updated relevant documentation (`docs/` or `spec/`) and code comments~~
Bug Report
Setup
CometBFT version (use
cometbft version
orgit rev-parse --verify HEAD
if installed from source): v0.37.0Have you tried the latest version: no
ABCI app (name for built-in, URL for self-written if it's publicly available): simapp in Cosmos SDK
Environment:
go test
inside macOSnode command runtime flags:
Config
Default config.
What happened?
In the Cosmos SDK, I am working on a set of tests that run a real CometBFT
*node.Node
in-process along with a simapp instance.The tests defer
node.Stop(); node.Wait()
. Usually they pass.Occasionally, there are failures like:
testing.go:1206: TempDir RemoveAll cleanup: unlinkat /var/folders/.../TestCometStarter_PortContentionattempt_0278742771/001/config: directory not empty
. If I switch away from usingt.TempDir
so that I can inspect theconfig
directory which isn't empty, it always containsaddrbook.json
.If I print
debug.Stack() from inside
(*pex.addrBook).saveToFile, I see that the final call writing out
addrbook.jsonis the final
a.saveToFilecall in
(*pex.addrBook).saveRoutine`:cometbft/p2p/pex/addrbook.go
Lines 494 to 509 in 365b0a7
At the top of that function you will see a
defer a.wg.Done()
call. If we call(*pex.addrBook).Wait()
(which simply callsa.wg.Wait()
), then we would block untila.SaveRoutine
returned.However... nothing in the codebase calls
(*pex.addrBook).Wait()
. If you comment out the function,go test -short ./...
passes without any compilation errors.What did you expect to happen?
How to reproduce it
There are usually several successful runs before it fails.
Logs
n/a
dump_consensus_state
outputn/a
Anything else we need to know
The simplest solution looks like it would be:
pex.AddrBook
interface to include aWait()
method.(*pex.addrBook).Wait() to call
a.BaseService.Wait(); a.wg.Wait()`.Wait()
method to*node.Node
which callsn.BaseService.Wait(); n.addrBook.Wait()
Patching CometBFT v0.37.0 with those changes appears to fully prevent the non-empty temp directory test failure that I could previously reproduce.
The text was updated successfully, but these errors were encountered: