-
Notifications
You must be signed in to change notification settings - Fork 37.4k
index: make startup more efficient #27607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index: make startup more efficient #27607
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 86752e0 and it mostly looks good, but does have a null pointer deference currently (see comments and suggested fix). I like this PR because it will simplify #24230 and could potentially improve performance. I'm not actually how sure how much it actually would improve performance in practice though, so I'm curious if that's was the original motivation here or if this is related to one of your other changes. I'm also curious about the todo:
Pending todo: Fix remaining test by using the block index map instead of the active chain. The active chain is not available at the point where the indexers are started.
Unsure if this is just referring to the null pointer problem, or to something else
Thanks for the deep review @ryanofsky!
Yeah, the motivation came from a mix of thoughts actually. I was reviewing #24230 and wasn't completely sold by the We could also move the entire Plus, It plays well with #25193. And.. I'm also thinking that after #24230 and the parallel sync work (#26966), we could have one initial sync thread with a workers pool for all the indexers instead of the current per indexer sync thread.
Yeah ok. That comment is inaccurate and I forgot to update it. I initially thought that the issue was due an initialization ordering, that we weren’t having the active chain activated at that point (thought that was done only post load-blk thread completion). The issue is that empty indexers don’t set the best block to the genesis So, the fix is easy. But.. I haven’t done it because have found another possible index sync "fatal error" in master and wanted to fix it prior continue moving forward here. Essentially, we are not checking whether the node has the post fork point blocks in disk prior starting the index: The pruning violation checks whether we have blocks from the fork point up to the active chain tip. But it does not check if the node has blocks from the fork point up to the fork tip.. which are required by coinstatsindex to rever 8000 se its state during the reorg.. E.g. Active chain Index chain The “failure error” will happen when G or F are not in disk or were pruned. So, if this happens, it causes a “fatal error“ during the coin stats index reorg process due the impossibility to read the block from disk. |
re: #27607 (comment)
Yes I don't think it makes sense to expose that method as part of the
Yes letting indexes just receive notifications to get in sync and not have to implement sync logic is the goal of #24230. And the threading issue should be orthogonal after that PR. Indexes (and wallets) could read and process blocks in single thread, or read blocks in a single thread and process them in parallel, or read and process blocks in parallel like happens currently, or use some other form of scheduling. But regardless of what ordering is used, the interface an individual index uses to get in sync should not have to change, and indexes shouldn't have to create sync threads or deal with race conditions between the notification threads and sync threads.
Oh, I see. That's also what I concluded from seeing the failure, but I guess it is not the complete picture.
I didn't know about this. I guess the fix would be to treat the genesis as the fork point in this case. And to consider the index already synced if there is nochainstate and the genesis pointer is null.
It seems like it would be a good thing to fix this. But this sounds like something that was already broken, so I'm not sure if the fix has to be bundled here, necessarily. I think the "has data from tip down" check is a useful check that can run early and warn if any block data is missing. But if the check isn't perfect and doesn't catch missing block, it shouldn't be the worst thing because the missing blocks will just be reported later rather than earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see how the current approach could work, even if the problems discussed above were solved:
With this PR, the newly introduced function update_indexes_start_block
is executed right after creating the index, but before executing Start()
. But the index is basically just an empty object at this point, because its constructor doesn't interact with the database stored on disk at all. So calling GetSummary()
can't possibly give any meaningful data about the best block at this stage. The load from disk is being done in Init()
/CustomInit()
, but that happens only later within Start()
.
I think the necessary order would be to
- create all indexes and read their best block / other data from disk
- determine the oldest block for all indexes
- Do the pruning check once
86752e0
to
3af2301
Compare
Good catch noticing Init() was not called! It doesn't seem like it should be that hard to fix, though. The PR was already moving most of the code out of Init(), anyway, so now a little more code needs to move. I didn't look very deeply but I would probably make Init() a public method and call it after constructing the index. Also stop calling Init() from Start() and move the RegisterValidationInterface() from Start() to Init(). |
Thanks for the review @mzumsande. Funny that I pushed a small update at the same time that you were commenting. Have few more changes on the pipeline that will be pushing soon. e.g. the
Yeah.. I was thinking on the other issue and forgot that |
6dfec1b
to
2a7c2f1
Compare
Looking at current version of the PR 6dfec1b it seems to have changed a lot, and the other thing it is doing now is delaying startup of indexes until blocks are loaded, so there is no longer a conflict with
On this PR, I like the approach and thinks the code looks pretty good overall. The only thing I don't like is all the complexity added to AppInitMain(). I think that complexity could go away if you got rid of the |
Was about to send the updates comment hehe. I was too tired last night to write it. Thanks for the review ryanofsky! Updates list:
Feedback:
Yeah, I updated the PR description last night stating that this is now built on top of #25193 (but my slightly modified version of it #25193 (review)). @mzumsande said that he was going to give them a look and probably take them. So, all good if them get squashed there or here. Either way is fine for me. Happy to re-review #25193 whenever is ready to go again.
Sounds great. Was also thinking about moving the lambda to a standalone function but wasn't finding the right place and didn't want to create a new file only for this. |
d03d6dd
to
e38d09a
Compare
e38d09a
to
e5b418f
Compare
Note: the branch tests are all passing locally. What is failing is the CI compiling it on top of master. Fixing it.. |
The mempool load can take a while, and it is not needed for the indexes' synchronization. Also, having the mempool load function call inside 'blockstorage.cpp' wasn't structurally correct.
The thread does not only load blocks, it loads the mempool and, in a future commit, will start the indexes as well. Also, renamed the 'ThreadImport' function to 'ImportBlocks' And the 'm_load_block' class member to 'm_thread_load'. -BEGIN VERIFY SCRIPT- sed -i "s/ThreadImport/ImportBlocks/g" $(git grep -l ThreadImport -- ':!/doc/') sed -i "s/loadblk/initload/g" $(git grep -l loadblk -- ':!/doc/release-notes/') sed -i "s/m_load_block/m_thread_load/g" $(git grep -l m_load_block) -END VERIFY SCRIPT-
No behavior change. The goal here is to group indexes, so we can perform the same initialization and verification process equally for all of them. The checks performed inside `StartIndexes` will be expanded in the subsequent commits.
988f3f6
to
4223d80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 988f3f6, just suggested StartIndexes simplification and comment changes since last review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased on master due a silent conflict. Only had to adapt an AbortNode()
line (which now is under the fatalError()
alias).
Should be easy to re-check with a git range-diff 988f3f6...4223d80
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ACK 4223d80
Nit can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the tests fail
So indexes can be initialized without spawning the sync thread. This makes asynchronous indexes startup possible in the following commits.
And transfer the responsibility of verifying whether 'start_block' has data or not to the caller. This is because the 'GetFirstStoredBlock' function responsibility is to return the first block containing data. And the current implementation can return 'start_block' when it has no data!. Which is misleading at least. Edge case behavior change: Previously, if the block tip lacked data but all preceding blocks contained data, there was no prune violation. And now, such scenario will result in a prune violation.
By generalizing 'GetFirstStoredBlock' and implementing 'CheckBlockDataAvailability' we can dedup code and avoid repeating work when multiple indexes are enabled. E.g. get the oldest block across all indexes and perform the pruning violation check from that point up to the tip only once (this feature is being introduced in a follow-up commit). This commit shouldn't change behavior in any way. Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
By moving the 'StartIndexes()' call into the 'initload' thread, we can remove the threads active wait. Optimizing the available resources. The only difference with the current state is that now the indexes threads will only be started when they can process work and not before it.
At present, during init, we traverse the chain (once per index) to confirm that all necessary blocks to sync each index up to the current tip are present. To make the process more efficient, we can fetch the oldest block from the indexers and perform the chain data existence check from that point only once. This also moves the pruning violation check to the end of the 'loadinit' thread, which is where the reindex, block loading and chain activation processes happen. Making the node's startup process faster, allowing us to remove the global g_indexes_ready_to_sync flag, and enabling the execution of the pruning violation verification even when the reindex or reindex-chainstate flags are enabled (which has being skipped so far).
4223d80
to
ca91c24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK ca91c24. Just rebase and suggested changes since last review (Start return check, and code simplification)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK ca91c24
…storage 462390c refactor: Move stopafterblockimport handling out of blockstorage (TheCharlatan) Pull request description: This has the benefit of moving this StartShutdown call out of the blockstorage file and thus out of the kernel's responsibility. The user can now decide if he wants to start shutdown / interrupt after a block import or not. This also simplifies #28048, making it one fewer shutdown call to handle. ACKs for top commit: MarcoFalke: lgtm ACK 462390c 🗝 ryanofsky: Code review ACK 462390c. Just has been rebased and is a simpler change after #27607 Tree-SHA512: 84e58256b1c61f10e7ec5ecf32916f40a2ab1ea7cce703de0fa1c61ee2be94bd45ed32718bc99903b6eff3e6d3d5b506470bf567ddbb444a58232913918e8ab8
Simplifies index startup code, eliminating the
g_indexes_ready_to_sync
variable,deduplicating code and moving the prune violation check out of the
BaseIndex
class.Also makes startup more efficient by running the prune violation check once for all indexes
instead of once for each index, and by delaying the prune violation check and moving it off
of the main thread so the node can start up faster and perform the block data availability
verification even when the '-reindex" or the "-reindex-chainstate" flags are enabled (which
hasn't being possible so far).