-
Notifications
You must be signed in to change notification settings - Fork 37.4k
blockstorage: Return on fatal flush errors #27866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about all the changes in this PR because it seems like some validation code is intentionally written to ignore flush errors, and now they are returning early and skipping other work. It also seems reasonable that some failed flushes like periodic flushes might not need to be fatal errors.
But adding return values and nodiscard attributes could at least make it clear if errors are being ignored intentionally or not.
360fe5a
to
1fe06a8
Compare
This is a good point, and I did not think of this before. A flush may fail once due to some sporadic error, but may succeed again at a later point in time. I'm still not sure though if it is really correct to ignore the flush error. If we fail to flush, but succeed in writing the block index, and then crash, we might have indexed a block, that has not been synced to disk. It also feels weird that we are currently okay with the block files failing to flush, while the call to |
At least, such a flush error can be logged. |
I'm not familiar enough with existing code to know why it's written the way it is. It could have a great rationale or just be weird like it appears to be. I'm less concerned about this PR than I am about #27861, which is treating flush errors as fatal errors when they are not inherently fatal. If code writing to disk fails it could either mean (1) state on disk is corrupt or invalid and can't be recovered from or (2) state on disk is valid but out of date and requires more work to catch up. These cases are inherently different and I think kernel API would benefit off having separate "flush error" and "fatal error" notifications to distinguish them, regardless of implementation details of current code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
The blockstorage changes seem pretty straightforward, the validation.cpp
one I'll need to dive deeper into since it's got quite a few callsites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed in IRC libbitcoinkernel updates:
<achow101> it looks like the current pr is #27866, although #27861 has also been getting review
IMO, this PR could be an improvement, but right now seems like a draft change. I think the main thing that's missing is a clear description of how it changes behavior, and why the behavior change is safe.
1fe06a8
to
d1c92b5
Compare
Rebased 1fe06a8 -> d1c92b5 (kernelReturnOnAbort_0 -> kernelReturnOnAbort_1, compare)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review d1c92b5. I think the extra error reporting here is great, and some of the changes in behavior seem good too. But I think there is too much happening in one commit. There are 3-4 separate changes in behavior that could be implemented in separate commits, and I think would be more clearly understood and evaluated that way.
src/node/blockstorage.cpp
Outdated
@@ -644,7 +650,9 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne | |||
if (!fKnown) { | |||
LogPrint(BCLog::BLOCKSTORAGE, "Leaving block file %i: %s\n", m_last_blockfile, m_blockfile_info[m_last_blockfile].ToString()); | |||
} | |||
FlushBlockFile(!fKnown, finalize_undo); | |||
if (!FlushBlockFile(!fKnown, finalize_undo)) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Return on fatal flush errors" (d1c92b5)
Let me know if I understand this correctly: The new return false
avoids writing block data to a new block file if the previous block file is full and cannot be synced and trimmed. This seems to be safe because this FindBlockPos
function only has one caller, SaveBlockToDisk
, which only has two callers, Chainstate::AcceptBlock
and Chainstate::LoadGenesisBlock
which both seem to handle this error by stopping what they are doing and writing "Failed to find position to write new block" log messages. I did not look up AcceptBlock
and LoadGenesisBlock
callers but assume they are handling this case reasonably.
I think existing log messages are vague / misleading, so would suggest adding a new log message here like:
LogPrintf("Failed to flush previous block file %05i (finalize=%i, finalize_undo=%i) before opening new block file %05i", m_last_blockfile, !fKnown, finalize_undo, nFile)`.
I'm not sure what the benefits are of returning false here. I think unlike the other changes in this commit, this change doesn't help avoid any "potential write from WriteBlockIndexDB that may refer to a block that is not fully flushed to disk yet", because this flush call is happening after previous block data has already been successfully written and the database has already been updated.
I think the clearest and most conservative thing to do would be to drop "return false" line in this commit, and only log an error message here. If you think returning false is a good idea, just do it in a followup commit decoupled from the other changes here with a commit message that explains why that is safe / desirable.
src/node/blockstorage.cpp
Outdated
} | ||
assert(static_cast<int>(m_blockfile_info.size()) > m_last_blockfile); | ||
|
||
FlatFilePos block_pos_old(m_last_blockfile, m_blockfile_info[m_last_blockfile].nSize); | ||
if (!BlockFileSeq().Flush(block_pos_old, fFinalize)) { | ||
m_opts.notifications.flushError("Flushing block file to disk failed. This is likely the result of an I/O error."); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Return on fatal flush errors" (d1c92b5)
This skips trying to flush the undo file if flushing the block file fails. It think it would be safer and more conservative to keep previous behavior of flushing both files in sequence.
You can still return false if either flush call fails, but I don't see a benefit in changing behavior here and potentially making code less robust, when it's not necessary just to add a return code.
I think it would also make review simpler to get rid of this unnecessary change, since the other changes in this commit are already pretty complex.
src/node/blockstorage.cpp
Outdated
@@ -734,7 +742,9 @@ bool BlockManager::WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValid | |||
// with the block writes (usually when a synced up node is getting newly mined blocks) -- this case is caught in | |||
// the FindBlockPos function | |||
if (_pos.nFile < m_last_blockfile && static_cast<uint32_t>(block.nHeight) == m_blockfile_info[_pos.nFile].nHeightLast) { | |||
FlushUndoFile(_pos.nFile, true); | |||
if (!FlushUndoFile(_pos.nFile, true)) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Return on fatal flush errors" (d1c92b5)
This change seems fragile to me, and it's unclear what the benefits are. I think it would be great to add a log print here, but if this this code is going to be changed to return false, I think that should happen in a separate followup commit, not combined with other changes here, and have a clear explanation and justification.
IIUC, this change introduces an inconsistency. For all block data, and for vast majority of undo data, the data is considered to be successfully written if the write call succeeeds regardless of whether syncing and trimming succeeds later. But now with this change, for some small fraction of undo block data, the data will be considered not written if the syncing or trimming fails. And the fraction of blocks this will be true for will be essentially random due to the undo flush heuristic (described in a new comment from #27746)
I understand that your goal here is to have libbitcoinkernel functions return more complete error information. I understand that goal is easier to achieve by changing behavior of validation and block storage code and returning early in certain cases. But fundamentally it is not necessary to change behavior of any existing code to have it return more error information. Without returning early, you can always accumulate errors and just return the error code at the end.
If returning early makes sense, I think it should be justified in its own terms and done carefully in smaller commits.
src/validation.cpp
Outdated
@@ -2505,7 +2505,9 @@ bool Chainstate::FlushStateToDisk( | |||
LOG_TIME_MILLIS_WITH_CATEGORY("write block and undo data to disk", BCLog::BENCH); | |||
|
|||
// First make sure all block and undo data is flushed to disk. | |||
m_blockman.FlushBlockFile(); | |||
if (!m_blockman.FlushBlockFile()) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Return on fatal flush errors" (d1c92b5)
This line appears to implement the major change in behavior which is avoiding the "write from
WriteBlockIndexDB
that may refer to a block that is not fully flushed to disk yet."
I wonder what this failure looks like in practice, though. Could we write a test for it? I think it would at least make sense to have a log print that explains the problem that's happening.
There are also a lot of callers to FlushStateToDisk
and I haven't looked at all of them to determine if they are handling this correctly. Can you describe what they should be doing, and do you know if they actually are doing it? The commit message says "no extra logic for functions calling Chainstate::FlushStateToDisk
is required" mentioning an "abort error", but there is no abort error here, only a "return false" so I'm not sure how this is supposed to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what this failure looks like in practice, though. Could we write a test for it?
I could not come up with a way to test this with our existing code. If anybody has any pointers for how to achieve this, I'd happy to give it a shot at implementing the test.
Can you describe what they should be doing, and do you know if they actually are doing it?
Callers of FlushStateToDisk
already need to handle failures of the block index writing, so I don't think they require extra logic for handling block file flushing failures. We already have places in the code where we ignore flushing errors (including the block index writing), so I think in places where we are fine with that failure, we should also be fine with the block file flushing failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #27866 (comment)
I think this return false
should be removed and replaced by a log print and comment, at least in this PR. Maybe a followup PR could try to change behavior in a bigger way, but it's not clear here that the intended change of returning false is here actually good, and that there aren't other unintended effects.
If a FlatFileSeq::Flush
call fails, effectively meaning that an fflush
or ftruncate
call failed on block file (or undo file after the next commit), it's not clear to me that the new behavior of shutting down right away without trying to save state to disk is better than the old behavior of updating the block index, pruning old block files (which may free up disk space), and updating the coins database. This appears to be the intended change, and it does not seem to be a clear win. In general if there is some low level filesystem error which might be transient or spurious, it seems better to me for an application to handle it by at least trying to save its state than to just give up immediately and throw away data that's still in memory.
I'm also concerned about possible unintended effects of this change because FlushStateToDisk is called so many places, that it's not clear what other effects are outside of this function if it now returns false instead of true.
I like all the other changes in this PR of adding more return information and adding comments explaining error how errors are handled. And I wouldn't be even be surprised if it turns out the new behavior here is actually better than the old behavior in relevant cases. But I think if behavior is going to change, it should really be motivated by something specific like a bug report or test case, and ideally it should happen in a separate PR that doesn't mostly consist of code cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I think it is fine to change the behavior here but also don't mind if it's done in a follow-up. To me this just seems like a weird quirk given that we do return a fatal error for other flush operations.
I think, a failed flush opens the node up to be crashed in any case, e.g.:
bitcoin/src/net_processing.cpp
Lines 2217 to 2219 in 5aa67eb
if (!m_chainman.m_blockman.ReadBlockFromDisk(*pblockRead, *pindex)) { | |
assert(!"cannot load block from disk"); | |
} |
Cleanly shutting down seems preferable.
Concept ACK. |
Putting this back to draft until I've invested the time to properly address @ryanofsky's comments. |
d1c92b5
to
f7072be
Compare
Thank you for the excellent review @ryanofsky. Updated d1c92b5 -> 5720741 (kernelReturnOnAbort_1 -> kernelReturnOnAbort_2, compare)
|
f7072be
to
5720741
Compare
5720741
to
7721ab9
Compare
Rebased 5720741 -> 7721ab9 (kernelReturnOnAbort_2 -> kernelReturnOnAbort_3, compare)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 7721ab9
The changes in blockstorage.cpp
and blockstorage.h
are strict improvements, increasing visibility in flushing failures. I found the change in validation.cpp
much more difficult to review, given the wide variety of ways in which FlushStateToDisk()
is used. I've found that:
- it improves behaviour in some ways. For example, if
FlushBlockFile()
fails and theWriteBlockIndexDB()
call still succeeds, we can end up in a situation where we expect to have a block, but don't actually have the data on disk. I found this quite easy to replicate by modifyingFlushBlockFile()
to a no-op, pkill bitcoind during IBD, and then callgetblock
on the latest synced block. This PR fixes that. - I don't see how it can degrade behaviour:
- logically, it doesn't make sense to flush the block index and chainstate if the block data may be missing
- we already return early in
FlushStateToDisk()
for flushing failures, just not forFlushBlockFile()
- I don't think this can affect consensus, e.g. the
BlockValidationState &state
out-parameter is unaffected;AcceptBlock()
is unaffected - this code path is only reached when
fDoFullFlush || fPeriodicWrite
, which pretty much happens only every hour, when we shutdown, or when we exceed cache sizes.
(thanks @dergoegge for helping me understand the latter two points)
{ | ||
bool success = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can be declared closer to where it's used
src/node/blockstorage.cpp
Outdated
// a reindex. A flush error might also leave some of the data files | ||
// untrimmed. | ||
if (!FlushBlockFile(!fKnown, finalize_undo)) { | ||
E54B | LogPrintf("Failed to flush previous block file %05i (finalize=%i, finalize_undo=%i) before opening new block file %05i\n", m_last_blockfile, !fKnown, finalize_undo, nFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: long line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review 7721ab9. Thanks for splitting up the commits and adding helpful log prints and comments. I like everything in this PR except the new return false
after FlushBlockFile()
fails in Chainstate::FlushStateToDisk
. I think stickies-v makes the best possible case for it here: #27866 (review), but the test mentioned there of commenting out the block flush code entirely seems not very realistic, and none of the other cleanup in the PR requires this change in behavior, so I would really like to see it moved to another PR and justified on its own terms if it is a change worth making.
src/validation.cpp
Outdated
@@ -2505,7 +2505,9 @@ bool Chainstate::FlushStateToDisk( | |||
LOG_TIME_MILLIS_WITH_CATEGORY("write block and undo data to disk", BCLog::BENCH); | |||
|
|||
// First make sure all block and undo data is flushed to disk. | |||
m_blockman.FlushBlockFile(); | |||
if (!m_blockman.FlushBlockFile()) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #27866 (comment)
I think this return false
should be removed and replaced by a log print and comment, at least in this PR. Maybe a followup PR could try to change behavior in a bigger way, but it's not clear here that the intended change of returning false is here actually good, and that there aren't other unintended effects.
If a FlatFileSeq::Flush
call fails, effectively meaning that an fflush
or ftruncate
call failed on block file (or undo file after the next commit), it's not clear to me that the new behavior of shutting down right away without trying to save state to disk is better than the old behavior of updating the block index, pruning old block files (which may free up disk space), and updating the coins database. This appears to be the intended change, and it does not seem to be a clear win. In general if there is some low level filesystem error which might be transient or spurious, it seems better to me for an application to handle it by at least trying to save its state than to just give up immediately and throw away data that's still in memory.
I'm also concerned about possible unintended effects of this change because FlushStateToDisk is called so many places, that it's not clear what other effects are outside of this function if it now returns false instead of true.
I like all the other changes in this PR of adding more return information and adding comments explaining error how errors are handled. And I wouldn't be even be surprised if it turns out the new behavior here is actually better than the old behavior in relevant cases. But I think if behavior is going to change, it should really be motivated by something specific like a bug report or test case, and ideally it should happen in a separate PR that doesn't mostly consist of code cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 7721ab9
src/validation.cpp
Outdated
@@ -2505,7 +2505,9 @@ bool Chainstate::FlushStateToDisk( | |||
LOG_TIME_MILLIS_WITH_CATEGORY("write block and undo data to disk", BCLog::BENCH); | |||
|
|||
// First make sure all block and undo data is flushed to disk. | |||
m_blockman.FlushBlockFile(); | |||
if (!m_blockman.FlushBlockFile()) { | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I think it is fine to change the behavior here but also don't mind if it's done in a follow-up. To me this just seems like a weird quirk given that we do return a fatal error for other flush operations.
I think, a failed flush opens the node up to be crashed in any case, e.g.:
bitcoin/src/net_processing.cpp
Lines 2217 to 2219 in 5aa67eb
if (!m_chainman.m_blockman.ReadBlockFromDisk(*pblockRead, *pindex)) { | |
assert(!"cannot load block from disk"); | |
} |
Cleanly shutting down seems preferable.
Putting this back in draft while I address the change requests here. |
A false return value indicates a fatal error (disk space being too low), so make sure we always consume this error code. This commit is part of an ongoing process for making the handling of fatal errors more transparent and easier to understand.
By returning an error code if `FlushBlockFile` fails, the caller now has to explicitly handle block file flushing errors. Before this change such errors were non-explicitly ignored without a clear rationale. Prior to this patch `FlushBlockFile` may have failed silently in `Chainstate::FlushStateToDisk`. Improve this with a log line. Also add a TODO comment to flesh out whether returning early in the case of an error is appropriate or not. Returning early might be appropriate to prohibit `WriteBlockIndexDB` from writing a block index entry that does not refer to a fully flushed block. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`. Don't change the abort behavior there, since we don't want to fail the function if the flushing of already written blocks fails. Instead, just document it.
By returning an error code if either `FlushUndoFile` or `FlushBlockFile` fail, the caller now has to explicitly handle block undo file flushing errors. Before this change such errors were non-explicitly ignored without a clear rationale. Besides the call to `FlushUndoFile` in `FlushBlockFile`, ignore its return code at its call site in `WriteUndoDataForBlock`. There, a failed flush of the undo data should not be indicative of a failed write. Add [[nodiscard]] annotations to `FlushUndoFile` such that its return value is not just ignored in the future.
7721ab9
to
d8041d4
Compare
Updated 7721ab9 -> d8041d4 (kernelReturnOnAbort_3 -> kernelReturnOnAbort_4, compare)
|
re-ACK d8041d4 I was okay with the validation logic change that we had before, but as I don't have a lot of familiarity with the complexities of |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK d8041d4. Thanks for making suggested changes. This is now an easier PR to review since it is only adding new return codes, not changing existing ones, and no longer changing the way flush errors are handled other than by adding log prints.
If I could make one more suggestion, it would be to switch the order of the second and third commits so the FlushBlockFile
return value doesn't get added with some initial behavior in the second commit (f0207e0) which is later modified in the third commit (d8041d4). If the commit order were switched, the FlushBlockFile
return value could just be determined one way with no code churn.
It may be a good idea to follow up this PR with more changes that change error handling beyond adding log prints and return codes, but since FlushStateToDisk is called so many places, I think changes like these are harder to review, so it would be better for them to happen in a dedicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK d8041d4
There should be no change in behavior other than logging here. Just to remind my self why, the reason is that it mostly adds new return values and doesn't change existing ones. The only case where it does change a return value is in FlushBlockFile
in the third commit. But there is only one call to FlushBlockFile
, and if that fails it just triggers a log print.
d8041d4 blockstorage: Return on fatal undo file flush error (TheCharlatan) f0207e0 blockstorage: Return on fatal block file flush error (TheCharlatan) 5671c15 blockstorage: Mark FindBlockPos as nodiscard (TheCharlatan) Pull request description: The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site. Prior to this patch `FlushBlockFile` may have failed without returning in `Chainstate::FlushStateToDisk`, leading to a potential write from `WriteBlockIndexDB` that may refer to a block that is not fully flushed to disk yet. By returning if either `FlushUndoFile` or `FlushBlockFile` fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add `[[nodiscard]]` annotations to them such that they are not ignored in future. Functions that call either `FlushUndoFile` or `FlushBlockFile`, need to handle these extra abort cases properly. Since `Chainstate::FlushStateToDisk` already produces an abort error in case of `WriteBlockIndexDB` failing, no extra logic for functions calling `Chainstate::FlushStateToDisk` is required. Besides `Chainstate::FlushStateToDisk`, `FlushBlockFile` is also called by `FindBlockPos`, while `FlushUndoFile` is only called by `FlushBlockFile` and `WriteUndoDataForBlock`. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases. --- This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request [comment](bitcoin#27711 (comment)). For ease of review of these critical changes, a first step would be checking that `AbortNode` leads to early and error-conveying returns at its call site. Further work for enforcing returns when `AbortNode` is called is done in bitcoin#27862. ACKs for top commit: stickies-v: re-ACK d8041d4 ryanofsky: Code review ACK d8041d4 Tree-SHA512: 47ade9b873b15e567c8f60ca538d5a0daf32163e1031be3212a3a45eb492b866664b225f2787c9e40f3e0c089140157d8fd1039abc00c7bdfeec1b52ecd7e219
The goal of this PR is to establish that fatal blockstorage flush errors should be treated as errors at their call site.
Prior to this patch
FlushBlockFile
may have failed without returning inChainstate::FlushStateToDisk
, leading to a potential write fromWriteBlockIndexDB
that may refer to a block that is not fully flushed to disk yet. By returning if eitherFlushUndoFile
orFlushBlockFile
fail, we ensure that no further write operations take place that may lead to an inconsistent database when crashing. Add[[nodiscard]]
annotations to them such that they are not ignored in future.Functions that call either
FlushUndoFile
orFlushBlockFile
, need to handle these extra abort cases properly. SinceChainstate::FlushStateToDisk
already produces an abort error in case ofWriteBlockIndexDB
failing, no extra logic for functions callingChainstate::FlushStateToDisk
is required.Besides
Chainstate::FlushStateToDisk
,FlushBlockFile
is also called byFindBlockPos
, whileFlushUndoFile
is only called byFlushBlockFile
andWriteUndoDataForBlock
. For both these cases, the flush error is not further bubbled up. Instead, the error is logged and a comment is provided why bubbling up an error would be less desirable in these cases.This pull request is part of a larger effort towards improving the shutdown / abort / fatal error handling in validation code. It is a first step towards implementing proper fatal error return type enforcement similar as proposed by theuni in this pull request comment. For ease of review of these critical changes, a first step would be checking that
AbortNode
leads to early and error-conveying returns at its call site. Further work for enforcing returns whenAbortNode
is called is done in #27862.