-
Notifications
You must be signed in to change notification settings - Fork 37.4k
wallet: fix deadlock in bdb read write operation #27556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look closer at the changes yet, but can confirm that running the newly introduced unit test via
$ ./src/test/test_bitcoin --log_level=all --run_test=walletdb_tests/walletdb_read_write_deadlock
succeeds on the PR branch head and hangs if the bugfix commit bda562d is reverted.
ACK d3419f4 |
Backport shouldn't be required as I don't think there's any user action that can trigger the deadlock. |
d3419f4
to
de202f8
Compare
Updated per feedba 8000 ck. Thanks achow101. Only diff is |
ACK de202f8 |
Concept ACK. |
Does this PR makes bitcoin/test/sanitizer_suppressions/tsan Line 28 in 137a98c
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK de202f8.
@hebasto hmm, interesting. It seems to be outdated in master. Just ran the test suite there, without the suppression, and it passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK de202f8, tested on Ubutnu 23.04.
To replicate it, need bdb support and drop the first commit. The test will run forever.
The test is deadlocked with the second commit dropped as well.
…yCursor If the batch ptr is not passed, the cursor will not use the db active txn context which could lead to a deadlock if the code tries to modify the db while it is traversing it. E.g. the 'EraseRecords()' function.
so we erase all the records atomically or abort the entire procedure. and, at the same time, we can share the same db txn context for the db cursor and the erase functionality. extra note from the Db.cursor doc: "If transaction protection is enabled, cursors must be opened and closed within the context of a transaction" thus why added a `CloseCursor` call before calling to `TxnAbort/TxnCommit`.
de202f8
to
69d4390
Compare
rebased, conflicts solved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 69d4390
ACK 69d4390 |
69d4390 test: add coverage for wallet read write db deadlock (furszy) 12daf6f walletdb: scope bdb::EraseRecords under a single db txn (furszy) 043fcb0 wallet: bugfix, GetNewCursor() misses to provide batch ptr to BerkeleyCursor (furszy) Pull request description: Decoupled from bitcoin#26644 so it can closed in favor of bitcoin#26715. Basically, with bdb 6D40 , we can't make a write operation while we are traversing the db with the same db handler. These two operations are performed in different txn contexts and cause a deadlock. Added coverage by using `EraseRecords()` which is the simplest function that executes this process. To replicate it, need bdb support and drop the first commit. The test will run forever. ACKs for top commit: achow101: ACK 69d4390 hebasto: re-ACK 69d4390 Tree-SHA512: b3773be78925f674e962f4a5c54b398a9d0cfe697148c01c3ec0d68281cc5c1444b38165960d219ef3cf1a57c8ce6427f44a876275958d49bbc0808486e19d7d
Basically, with bdb, we can't make a write operation while we are traversing the db with the same db handler. These two operations are performed in different txn contexts and cause a deadlock.
Added coverage by using
EraseRecords()
which is the simplest function that executes this process.To replicate it, need bdb support and drop the first commit. The test will run forever.