-
Notifications
You must be signed in to change notification settings - Fork 37.4k
Return EXIT_FAILURE on post-init fatal errors #27708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was wondering if this was attempted before and skimmed through the pull request history, but I did not find prior attempts.
2247ba1
to
d44dd73
Compare
Feedback tackled. Thanks TheCharlatan!. Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Regarding the conclusion in the first commit ("'StartShutdown' should only be used for user requested shutdowns. Internal errors that cause a shutdown should use 'AbortNode'."), the following is probably another candidate where AbortNode
is preferred over StartShutdown
?
Lines 33 to 41 in 8cc65f0
template <typename... Args> | |
static void FatalError(const char* fmt, const Args&... args) | |
{ | |
std::string strMessage = tfm::format(fmt, args...); | |
SetMiscWarning(Untranslated(strMessage)); | |
LogPrintf("*** %s\n", strMessage); | |
InitError(_("A fatal internal error occurred, see debug.log for details")); | |
StartShutdown(); | |
} |
(not saying it necessarily has to happen in this PR though, if it makes sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the conclusion in the first commit ("'StartShutdown' should only be used for user requested shutdowns. Internal errors that cause a shutdown should use 'AbortNode'."), the following is probably another candidate where
AbortNode
is preferred overStartShutdown
?
Yeah absolutely. Good eye. If you compare it in-detail, that function is a plain copy of AbortNode
: it (1) calls SetMiscWarning
, (2) logs the error, (3) notifies the UI by calling InitError
(noui listeners as well) and (4) calls to the shutdown function..
So, we can just get rid of it.
Oh you're right, didn't look at |
d44dd73
to
292a610
Compare
done, cherry-picked 🚜. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK d44dd73. But it would be nice to extend this change to work with bitcoin-qt as well. I also think it would be good to apply the following changes, which replace the bool shutdown_due_error
global with an int exit_status{EXIT_SUCCESS}
NodeContext
member.
diff
diff --git a/src/bitcoind.cpp b/src/bitcoind.cpp
index c305d5e6cdc1..0a2a6ed4ebc2 100644
--- a/src/bitcoind.cpp
+++ b/src/bitcoind.cpp
@@ -169,6 +169,9 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
// Set this early so that parameter interactions go to console
InitLogging(args);
InitParameterInteraction(args);
+ if (!InitShutdownState(node.exit_status)) {
+ return InitError(Untranslated("Initializing wait-for-shutdown state failed."));
+ }
if (!AppInitBasicSetup(args)) {
// InitError will have been called with detailed error, which ends up on console
return false;
@@ -236,11 +239,6 @@ static bool AppInit(NodeContext& node, int argc, char* argv[])
}
#endif
SetSyscallSandboxPolicy(SyscallSandboxPolicy::SHUTOFF);
- if (fRet) {
- fRet = WaitForShutdown();
- }
- Interrupt(node);
- Shutdown(node);
return fRet;
}
@@ -264,5 +262,12 @@ MAIN_FUNCTION
// Connect bitcoind signal handlers
noui_connect();
- return (AppInit(node, argc, argv) ? EXIT_SUCCESS : EXIT_FAILURE);
+ if (AppInit(node, argc, argv)) {
+ WaitForShutdown();
+ } else {
+ node.exit_status = EXIT_FAILURE;
+ }
+ Interrupt(node);
+ Shutdown(node);
+ return node.exit_status;
}
diff --git a/src/init.cpp b/src/init.cpp
index 1ae21ec8d2dd..c4b09eb2373c 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -812,9 +812,6 @@ bool AppInitBasicSetup(const ArgsManager& args)
// Enable heap terminate-on-corruption
HeapSetInformation(nullptr, HeapEnableTerminationOnCorruption, nullptr, 0);
#endif
- if (!InitShutdownState()) {
- return InitError(Untranslated("Initializing wait-for-shutdown state failed."));
- }
if (!SetupNetworking()) {
return InitError(Untranslated("Initializing networking failed."));
diff --git a/src/node/context.h b/src/node/context.h
index 84f4053c8407..b24f778f4f64 100644
--- a/src/node/context.h
+++ b/src/node/context.h
@@ -7,7 +7,9 @@
#include <kernel/context.h>
+#include <atomic>
#include <cassert>
+#include <cstdlib>
#include <functional>
#include <memory>
#include <vector>
@@ -62,6 +64,7 @@ struct NodeContext {
interfaces::WalletLoader* wallet_loader{nullptr};
std::unique_ptr<CScheduler> scheduler;
std::function<void()> rpc_interruption_point = [] {};
+ std::atomic<int> exit_status{EXIT_SUCCESS};
//! Declare default constructor and destructor that are not inline, so code
//! instantiating the NodeContext struct doesn't need to #include class
diff --git a/src/shutdown.cpp b/src/shutdown.cpp
index 72f23be21ca0..7641ee3b066a 100644
--- a/src/shutdown.cpp
+++ b/src/shutdown.cpp
@@ -11,6 +11,7 @@
#include <logging.h>
#include <node/interface_ui.h>
+#include <util/check.h>
#include <util/tokenpipe.h>
#include <warnings.h>
@@ -21,8 +22,7 @@
#endif
static std::atomic<bool> fRequestShutdown(false);
-// Whether shutdown was requested by the user or because of an internal error
-static std::atomic<bool> shutdown_due_error{false};
+static std::atomic<int>* g_exit_status{nullptr};
#ifdef WIN32
/** On windows it is possible to simply use a condition variable. */
@@ -35,8 +35,9 @@ static TokenPipeEnd g_shutdown_r;
static TokenPipeEnd g_shutdown_w;
#endif
-bool InitShutdownState()
+bool InitShutdownState(std::atomic<int>& exit_status)
{
+ g_exit_status = &exit_status;
#ifndef WIN32
std::optional<TokenPipe> pipe = TokenPipe::Make();
if (!pipe) return false;
@@ -46,9 +47,8 @@ bool InitShutdownState()
return true;
}
-static void StartShutdown(bool error)
+void StartShutdown()
{
- if (error) shutdown_due_error.exchange(error);
#ifdef WIN32
std::unique_lock<std::mutex> lk(g_shutdown_mutex);
fRequestShutdown = true;
@@ -68,8 +68,6 @@ static void StartShutdown(bool error)
#endif
}
-void StartShutdown() { return StartShutdown(/*error=*/false); }
-
bool AbortNode(const std::string& strMessage, bilingual_str user_message)
{
SetMiscWarning(Untranslated(strMessage));
@@ -78,7 +76,8 @@ bool AbortNode(const std::string& strMessage, bilingual_str user_message)
user_message = _("A fatal internal error occurred, see debug.log for details");
}
InitError(user_message);
- StartShutdown(/*error=*/true);
+ Assert(g_exit_status)->store(EXIT_FAILURE);
+ StartShutdown();
return false;
}
@@ -97,7 +96,7 @@ bool ShutdownRequested()
return fRequestShutdown;
}
-bool WaitForShutdown()
+void WaitForShutdown()
{
#ifdef WIN32
std::unique_lock<std::mutex> lk(g_shutdown_mutex);
@@ -109,5 +108,4 @@ bool WaitForShutdown()
assert(0);
}
#endif
- return !shutdown_due_error;
}
diff --git a/src/shutdown.h b/src/shutdown.h
index 34d5d30bc074..2b8fdb779ac6 100644
--- a/src/shutdown.h
+++ b/src/shutdown.h
@@ -8,10 +8,12 @@
#include <util/translation.h> // For bilingual_str
+#include <atomic>
+
/** Initialize shutdown state. This must be called before using either StartShutdown(),
* AbortShutdown() or WaitForShutdown(). Calling ShutdownRequested() is always safe.
*/
-bool InitShutdownState();
+bool InitShutdownState(std::atomic<int>& exit_status);
/** Request shutdown of the application. */
void StartShutdown();
@@ -29,9 +31,7 @@ bool ShutdownRequested();
/** Wait for StartShutdown to be called in any thread. This can only be used
* from a single thread.
- * Returns true if shutdown was requested by the user. Otherwise, it's an
- * internally requested shutdown which only happens when an error arises.
*/
-bool WaitForShutdown();
+void WaitForShutdown();
#endif // BITCOIN_SHUTDOWN_H
I think this approach has a few advantages:
- Avoids adding a new shutdown global right before #27711 removes all the shutdown globals, so there is less cleanup #27711 needs to do later.
- I think semantics of an
exit_status
field are more obvious than ashutdown_due_error
field. Easier to think concretely about what the exit code will be than abstractly about "whether shutdown was requested by the user". - Having a
NodeContext::exit_status
field should make it easier to port this change to the GUI, since the GUI doesn't use theWaitForShutdown
function.
(EDIT: Updated code suggestion above to fix early return bug in initial post)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ryanofsky! I see most of the changes great, except one.
If we change AppInit
return value to be an int, we need to change all the
return lines to return an int, not only the ones at the end of the function.
Otherwise, the return false
(or return error(msg)
) statements will be treated as
return 0
which means EXIT_SUCCESS
, which is the opposite that we want.
Other than that nuance, pulling them :).
Oh, you are right. I thought I searched for early returns, but somehow I missed seeing them. I updated the suggestion above to fix this. I think AppInit should keep returning bool not int to keep the changes minimal, and because returning bool is better for following existing coding conventions. |
292a610
to
ac5d488
Compare
'StartShutdown' should only be used for user requested shutdowns. Internal errors that cause a shutdown should use 'AbortNode'.
Deduplicates code in the `FatalError` template function by using `AbortNode` which does the exact same thing if called without any user message (i.e. without second parameter specified). The template is still kept for ease-of-use w.r.t. not having to call `tfm::format(...)` at the call-side each time, and also to keep the diff minimal.
ac5d488
to
b0267f2
Compare
Thanks ryanofsky, updated per feedback with a tiny diff: Didn't move
Funny enough, Another topic would be whether to drop 99d3212 or not. It is not needed anymore. |
b0267f2
to
500d3e9
Compare
Reworked the diff a bit more check. It's because Probably we should refactor |
61fe1af
to
da60d86
Compare
Thanks @ryanofsky! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK da60d86
Thanks for following up on earlier suggestions. Left some new comments below, but this also looks good in it's current form.
src/qt/bitcoin.cpp
Outdated
@@ -441,6 +440,7 @@ void BitcoinApplication::initializeResult(bool success, interfaces::BlockAndHead | |||
#endif | |||
pollShutdownTimer->start(SHUTDOWN_POLLING_DELAY); | |||
} else { | |||
m_node->context()->exit_status.store(EXIT_FAILURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "gui: return EXIT_FAILURE on post-init fatal errors" (624340f)
Not necessary to address here, but with #10102 when the node and GUI are running in separate processes, m_node->context()
will be null, so this will crash.
You could fix and future-proof this by adding a interfaces::Node::setExitStatus()
function, or just update the existing appInitMain()
function to set the error code:
bool appInitMain(interfaces::BlockAndHeaderTipInfo* tip_info) override
{
if (!AppInitMain(*m_context, tip_info)) {
m_context->exit_status.store(EXIT_FAILURE);
return false;
}
return true;
}
If you let appInitMain
set the error code, you could also simplify GUI code even more, skipping Q_EMIT initializeResult(rv, tip_info)
when appInitMain
fails and getting rid of the rv
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could fix and future-proof this by adding a interfaces::Node::setExitStatus() function, or just update the existing appInitMain() function to set the error code.
Great, added it. Thanks!
If you let appInitMain set the error code, you could also simplify GUI code even more, skipping Q_EMIT initializeResult(rv, tip_info) when appInitMain fails and getting rid of the rv parameter.
Hmm, that would be nice but, in a quick glance, I don't think that we can remove the rv
parameter without introducing another signal to the InitExecutor
class (which wouldn't be bad). Because we still need to call to the shutdown function at the GUI level: the BitcoinApplication::requestShutdown
that is called inside BitcoinApplication::initializeResult
when the rv parameter is false.
Still, even when I would love to keep cleaning stuff, maybe better to leave it for a follow-up so the gui related changes are minimal here.
Cleaned up the init flow to make it more obvious when the 'exit_status' value will and won't be returned. This is because it was confusing that `AppInit` was returning true under two different circumstances: 1) When bitcoind was launched only to retrieve the "-help" or "-version" information. In this case, the app was not initialized. 2) When the user triggers a shutdown. In this case, the app was fully initialized.
da60d86
to
61c569a
Compare
concept ACK I was looking for this writing #27850 was surprised that a fatal internal error didn't satisfy the python test for assert init error on startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 61c569a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 61c569a
This should be a really nice change because it actually makes bitcoind
return an error code when there is an error. But it also clears out StartShutdown
clutter and makes the main()
function comprehensible, so is a nice code improvment.
I think this would be ready for merge if another reviewer added an ACK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 61c569a
A few non-breaking questions below. Reviewed each commit and ran all tests locally. Confirmed that #2039 behavior now exits with code 1
instead of 0
like it did on master. Also confirmed that remaining StartShutdown()
calls are all user-triggered.
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK 61c569ab6069d04079a0831468eb713983919636
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmSHODIACgkQ5+KYS2KJ
yTpd4g/+I56MeCxiLqHXVtFe2yUw9JY1wYNeo6kuLU2A5Yx+kWeTBKZl9dA/lTIu
5/TYbFaTai5px1HZzynnfD+XrFLtjaxynGqHhbfRmX8cGUyo5WkRRY6IcYUuqiXs
FBUsTqlXrEq00j012RL63xpiZhswUZTEpw3BzHD5c1IlK7FkdCWCVT0xl3o68ivs
W0pfXRXnAcAtseWgMlrZtPHGaEsQS1CMHAaz2KXvRPbOCu0p0vjAuGkzqpBZuu3x
2kzVcPnuUunb6F+t62Q2Mrk4bCuKYlT9I5CozGVV8W9rWJODvzKyTCmTLCXbIxC0
MrGN6DHn0AzBbCBm6E65o5B80czNzM0Wci/7ZElmjBkDJxWP0KPDtaNp9QsazYbO
AVWhKfMMBDEqmwRyXvMK1iv/zJrZ2Y2QQf/LPt1vbLjVwHT2wF7HyEOUJHKKmrI2
fLWf62JKvBvrrNdeb12s5ZaMwMgcM5UA/8H2YuDw/zOvgJYZZiXAiRI3ETDvsm5B
dFIgn0M5njMq3UFgRl2hHW/qBwKiufs8QvrKw8zxuXxF5H+4sQcwCVtiwe//dsDY
179WwJQcQ3t7iwg1TNgscv9nVY5nRJvPzHs4iopLbPhks60akDic1g6ruMCdbAEj
/Md+oVk7d59HJLvhDlR6xpS8iKycyadFcsiMlOuK/Xn56ZPLKt0=
=N/HN
-----END PGP SIGNATURE-----
pinheadmz's public key is on keybase
@@ -28,6 +31,7 @@ bool AbortNode(const std::string& strMessage, bilingual_str user_message) | |||
user_message = _("A fatal internal error occurred, see debug.log for details"); | |||
} | |||
InitError(user_message); | |||
Assert(g_exit_status)->store(EXIT_FAILURE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make InitShutdownState()
required before calling AbortNode()
?
I suppose that could be added to the comment here. I'm just making sure I understand, I don't think you need to retouch:
Lines 14 to 17 in 6f5f37e
/** Initialize shutdown state. This must be called before using either StartShutdown(), | |
* AbortShutdown() or WaitForShutdown(). Calling ShutdownRequested() is always safe. | |
*/ | |
bool InitShutdownState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make
InitShutdownState()
required before callingAbortNode()
? I suppose that could be added to the comment here.
It could be mentioned but AbortNode
is about to be removed in #27861, so it isn't too important
@@ -20,6 +21,8 @@ | |||
#include <condition_variable> | |||
#endif | |||
|
|||
static std::atomic<int>* g_exit_status{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing we need a new global here because we can't otherwise access NodeContext::exit_status
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this could be de-globalized again by the interrupt class introduced in #27861.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this could be de-globalized again by the interrupt class introduced in #27861.
Yes #27861 replaces the AbortNode
function with a KernelNotifications::fatalError
method that can access whatever state it needs through the KernelNotifications
class without needing any globals. AbortNode
is the only function using g_exit_status
so it should just disappear then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK 61c569a
# unless 'expected_ret_code' is provided. | ||
if expected_ret_code is not None: | ||
assert return_code == expected_ret_code, self._node_msg( | ||
"Node returned unexpected exit code (%d) vs (%d) when stopping" % (return_code, expected_ret_code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, could take the opportunity to change to f-strings here (and below), as suggested in the functional test style guidelines:
"Node returned unexpected exit code (%d) vs (%d) when stopping" % (return_code, expected_ret_code)) | |
f"Node returned unexpected exit code ({return_code}) vs ({expected_ret_code}) when stopping") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a question / test nit for follow-up
if expected_ret_code is not None: | ||
assert return_code == expected_ret_code, self._node_msg( | ||
"Node returned unexpected exit code (%d) vs (%d) when stopping" % (return_code, expected_ret_code)) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 3b2c61e:
Any reason to make expected_ret_code
optional? Seems easier to just make the default value 0
and drop this whole else
block.
'StartShutdown' should only be used for user requested shutdowns. Internal errors that cause a shutdown should use 'AbortNode'. Github-Pull: bitcoin#27708 Rebased-From: 9ddf7e0
Deduplicates code in the `FatalError` template function by using `AbortNode` which does the exact same thing if called without any user message (i.e. without second parameter specified). The template is still kept for ease-of-use w.r.t. not having to call `tfm::format(...)` at the call-side each time, and also to keep the diff minimal. Github-Pull: bitcoin#27708 Rebased-From: 3c06926
It seems odd to return `EXIT_SUCCESS` when the node aborted execution due a fatal internal error or any post-init problem that triggers an unrequested shutdown. e.g. blocks or coins db I/O errors, disconnect block failure, failure during thread import (external blocks loading process error), among others. Co-authored-by: Ryan Ofsky <ryan@ofsky.org> Github-Pull: bitcoin#27708 Rebased-From: 3b2c61e
Github-Pull: bitcoin#27708 Rebased-From: 4927167
It seems odd to return
EXIT_SUCCESS
when the node aborted execution due a fatal internal erroror any post-init problem that triggers an unrequested shutdown.
e.g. blocks or coins db I/O errors, disconnect block failure, failure during thread import (external
blocks loading process error), among others.