index: make startup more efficient #27607

furszy · 2023-05-09T14:55:04Z

Simplifies index startup code, eliminating the g_indexes_ready_to_sync variable,
deduplicating code and moving the prune violation check out of the BaseIndex class.

Also makes startup more efficient by running the prune violation check once for all indexes
instead of once for each index, and by delaying the prune violation check and moving it off
of the main thread so the node can start up faster and perform the block data availability
verification even when the '-reindex" or the "-reindex-chainstate" flags are enabled (which
hasn't being possible so far).

DrahtBot · 2023-05-09T14:55:07Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	ryanofsky, TheCharlatan
Stale ACK	mzumsande

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28053 (refactor: Move stopafterblockimport option out of blockstorage by TheCharlatan)
#28051 (Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly by ryanofsky)
#27850 (test: Add unit & functional test coverage for blockstore by pinheadmz)
#26762 (bugfix: Make CCheckQueue RAII-styled by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ryanofsky

Reviewed 86752e0 and it mostly looks good, but does have a null pointer deference currently (see comments and suggested fix). I like this PR because it will simplify #24230 and could potentially improve performance. I'm not actually how sure how much it actually would improve performance in practice though, so I'm curious if that's was the original motivation here or if this is related to one of your other changes. I'm also curious about the todo:

Pending todo: Fix remaining test by using the block index map instead of the active chain. The active chain is not available at the point where the indexers are started.

Unsure if this is just referring to the null pointer problem, or to something else

src/index/base.cpp

src/init.cpp

src/node/interfaces.cpp

furszy · 2023-05-11T20:40:56Z

Thanks for the deep review @ryanofsky!

Reviewed 86752e0 and it mostly looks good, but does have a null pointer deference currently (see comments and suggested fix). I like this PR because it will simplify #24230 and could potentially improve performance. I'm not actually how sure how much it actually would improve performance in practice though, so I'm curious if that's was the original motivation here or if this is related to one of your other changes.

Yeah, the motivation came from a mix of thoughts actually.

I was reviewing #24230 and wasn't completely sold by the hasDataFromTipDown() call inside the attachChain function. It seems odd to call to a chain data verification function in an event registering method.
Then went deeper over the function, realized about the work duplication, and took that argument to simplify #24230 a bit.

We could also move the entire hasDataFromTipDown from the interface to somewhere else in this way. So there is no CBlockIndex dependency in the interface neither here nor in the #24230 intermediate commits.

Plus, It plays well with #25193.

And.. I'm also thinking that after #24230 and the parallel sync work (#26966), we could have one initial sync thread with a workers pool for all the indexers instead of the current per indexer sync thread.
Which should speedup the sync process quite a lot. We are currently reading the entire chain block by block from disk on every index thread. So instead, we could read blocks just once, then dispatch events to the indexers. Making indexers purely listeners with no associated thread.

I'm also curious about the todo:

Pending todo: Fix remaining test by using the block index map instead of the active chain. The active chain is not available at the point where the indexers are started.

Unsure if this is just referring to the null pointer problem, or to something else

Yeah ok. That comment is inaccurate and I forgot to update it.

I initially thought that the issue was due an initialization ordering, that we weren’t having the active chain activated at that point (thought that was done only post load-blk thread completion).
So I implemented 4738a1d and 837acfd instead of the current version. Which is better than what we have here currently, but it’s not the solution for the null ptr deference.

The issue is that empty indexers don’t set the best block to the genesis CBlockIndex, they just leave it null. So, the index summary returns a valid height=0 with an empty block hash.. which crashes on the block hash assertion.

So, the fix is easy. But.. I haven’t done it because have found another possible index sync "fatal error" in master and wanted to fix it prior continue moving forward here.

Essentially, we are not checking whether the node has the post fork point blocks in disk prior starting the index:

The pruning violation checks whether we have blocks from the fork point up to the active chain tip. But it does not check if the node has blocks from the fork point up to the fork tip.. which are required by coinstatsindex to rever 8000 se its state during the reorg..

E.g.

Active chain
A -> B -> C -> D -> E

Index chain
A -> B -> C -> G -> F -> H

The “failure error” will happen when G or F are not in disk or were pruned.

So, if this happens, it causes a “fatal error“ during the coin stats index reorg process due the impossibility to read the block from disk.

ryanofsky · 2023-05-15T17:17:41Z

re: #27607 (comment)

We could also move the entire hasDataFromTipDown from the interface to somewhere else in this way. So there is no CBlockIndex dependency in the interface neither here nor in the #24230 intermediate commits.

Yes I don't think it makes sense to expose that method as part of the Chain interface, and even if it did make sense, it wouldn't make sense to put the implementation in the ChainImpl class, because the class is mostly meant to hold glue code, not complicated functions. That's the reason for the suggestion to make it a standalone function bool ChainDataFromTipDown(ChainstateManager& chainman, const CBlockIndex& start_block) in #27607 (comment)

And.. I'm also thinking that after #24230 and the parallel sync work (#26966), we could have one initial sync thread with a workers pool for all the indexers instead of the current per indexer sync thread. Which should speedup the sync process quite a lot. We are currently reading the entire chain block by block from disk on every index thread. So instead, we could read blocks just once, then dispatch events to the indexers. Making indexers purely listeners with no associated thread.

Yes letting indexes just receive notifications to get in sync and not have to implement sync logic is the goal of #24230. And the threading issue should be orthogonal after that PR. Indexes (and wallets) could read and process blocks in single thread, or read blocks in a single thread and process them in parallel, or read and process blocks in parallel like happens currently, or use some other form of scheduling. But regardless of what ordering is used, the interface an individual index uses to get in sync should not have to change, and indexes shouldn't have to create sync threads or deal with race conditions between the notification threads and sync threads.

I initially thought that the issue was due an initialization ordering, that we weren’t having the active chain activated at that point (thought that was done only post load-blk thread completion).

Oh, I see. That's also what I concluded from seeing the failure, but I guess it is not the complete picture.

The issue is that empty indexers don’t set the best block to the genesis CBlockIndex, they just leave it null. So, the index summary returns a valid height=0 with an empty block hash.. which crashes on the block hash assertion.

So, the fix is easy.

I didn't know about this. I guess the fix would be to treat the genesis as the fork point in this case. And to consider the index already synced if there is nochainstate and the genesis pointer is null.

But.. I haven’t done it because have found another possible index sync "fatal error" in master and wanted to fix it prior continue moving forward here.

It seems like it would be a good thing to fix this. But this sounds like something that was already broken, so I'm not sure if the fix has to be bundled here, necessarily. I think the "has data from tip down" check is a useful check that can run early and warn if any block data is missing. But if the check isn't perfect and doesn't catch missing block, it shouldn't be the worst thing because the missing blocks will just be reported later rather than earlier.

mzumsande

I can't see how the current approach could work, even if the problems discussed above were solved:

With this PR, the newly introduced function update_indexes_start_block is executed right after creating the index, but before executing Start(). But the index is basically just an empty object at this point, because its constructor doesn't interact with the database stored on disk at all. So calling GetSummary() can't possibly give any meaningful data about the best block at this stage. The load from disk is being done in Init()/CustomInit(), but that happens only later within Start().

I think the necessary order would be to

create all indexes and read their best block / other data from disk
determine the oldest block for all indexes
Do the pruning check once

ryanofsky · 2023-05-16T22:09:21Z

I can't see how the current approach could work, even if the problems discussed above were solved:

Good catch noticing Init() was not called! It doesn't seem like it should be that hard to fix, though. The PR was already moving most of the code out of Init(), anyway, so now a little more code needs to move. I didn't look very deeply but I would probably make Init() a public method and call it after constructing the index. Also stop calling Init() from Start() and move the RegisterValidationInterface() from Start() to Init().

furszy · 2023-05-16T22:10:01Z

Thanks for the review @mzumsande.

Funny that I pushed a small update at the same time that you were commenting.

Have few more changes on the pipeline that will be pushing soon. e.g. the hasDataFromTipDown entire function can be written in two lines.. just need to re-purpose the GetFirstStoredBlock function a bit :).

I think the necessary order would be to

create all indexes and read their best block / other data from disk
determine the oldest block for all indexes
Do the pruning check once

Yeah.. I was thinking on the other issue and forgot that Init() isn't being called in the constructor.. :face_palm:. Will re-order it, Thanks!

ryanofsky · 2023-05-17T13:31:57Z

Looking at current version of the PR 6dfec1b it seems to have changed a lot, and the other thing it is doing now is delaying startup of indexes until blocks are loaded, so there is no longer a conflict with reindex-chainstate. So it is basically reimplementing #25193 in a different way that doesn't require a sleep_for(std::chrono::milliseconds(500)) waitloop. I think this approach is cleaner, but also that #25193 is a smaller more targeted change with a test, so I would probably prefer to see #25193 merged first and with this cleanup and rewrite merged later.

~~#25193 also has two ACKs, so I can re-ack and merge it soon.~~ EDIT: Never mind, just noticed it needs rebase currently

On this PR, I like the approach and thinks the code looks pretty good overall. The only thing I don't like is all the complexity added to AppInitMain(). I think that complexity could go away if you got rid of the std::set<BaseIndex*> indexers local variable there and added a NodeContext std::vector<BaseIndex*> indexes member instead. Then there would be no need for the func_start_indexes lambda. The lambda could be replaced with a regular StartIndexes function instead.

furszy · 2023-05-17T13:54:24Z

Was about to send the updates comment hehe. I was too tired last night to write it. Thanks for the review ryanofsky!

Updates list:

Decoupled index Init() from Start(). So indexers can be initialized without spawning the sync thread.
Simplified the pruning violation code by re-purposing the GetFirstStoredBlock function. Now called IsBlockDataAvailable.
Fixed a small race, where we set the index m_synced flag (which enables BlockConnected events) before calling to the child class init function. So, for example, the block filter index could theoretically process a block before initializing the next filter position field and end up overwriting the first stored filter.

Feedback:

Looking at current version of the PR 6dfec1b it seems to have changed a lot, and the other thing it is doing now is delaying startup of indexes until blocks are loaded, so there is no longer a conflict with reindex-chainstate. So it is basically reimplementing #25193 in a different way that doesn't require a sleep_for(std::chrono::milliseconds(500)) waitloop. I think this approach is cleaner, but also that #25193 is a smaller more targeted change with a test, so I would probably prefer to see #25193 merged first and with this cleanup and rewrite merged later.

Yeah, I updated the PR description last night stating that this is now built on top of #25193 (but my slightly modified version of it #25193 (review)).

@mzumsande said that he was going to give them a look and probably take them. So, all good if them get squashed there or here. Either way is fine for me. Happy to re-review #25193 whenever is ready to go again.

On this PR, I like the approach and thinks the code looks pretty good overall. The only thing I don't like is all the complexity added to AppInitMain(). I think that complexity could go away if you got rid of the std::set<BaseIndex*> indexers local variable there and added a NodeContext std::vector<BaseIndex*> indexes member instead. Then there would be no need for the func_start_indexes lambda. The lambda could be replaced with a regular StartIndexes function instead.

Sounds great. Was also thinking about moving the lambda to a standalone function but wasn't finding the right place and didn't want to create a new file only for this.

furszy · 2023-05-17T19:23:59Z

Rebased post #25193 merge. Conflicts solved.

Only change from the last push is on the first commit ca30419, where the index threads active wait and the global flag are replaced by a post-poned indexers start call.

furszy · 2023-07-07T22:31:04Z

Note: the branch tests are all passing locally. What is failing is the CI compiling it on top of master. Fixing it..

The mempool load can take a while, and it is not needed for the indexes' synchronization. Also, having the mempool load function call inside 'blockstorage.cpp' wasn't structurally correct.

The thread does not only load blocks, it loads the mempool and, in a future commit, will start the indexes as well. Also, renamed the 'ThreadImport' function to 'ImportBlocks' And the 'm_load_block' class member to 'm_thread_load'. -BEGIN VERIFY SCRIPT- sed -i "s/ThreadImport/ImportBlocks/g" $(git grep -l ThreadImport -- ':!/doc/') sed -i "s/loadblk/initload/g" $(git grep -l loadblk -- ':!/doc/release-notes/') sed -i "s/m_load_block/m_thread_load/g" $(git grep -l m_load_block) -END VERIFY SCRIPT-

No behavior change. The goal here is to group indexes, so we can perform the same initialization and verification process equally for all of them. The checks performed inside `StartIndexes` will be expanded in the subsequent commits.

ryanofsky

Code review ACK 988f3f6, just suggested StartIndexes simplification and comment changes since last review

furszy

Rebased on master due a silent conflict. Only had to adapt an AbortNode() line (which now is under the fatalError() alias).

Should be easy to re-check with a git range-diff 988f3f6...4223d80.

TheCharlatan

Nice, ACK 4223d80

Nit can be ignored.

src/index/base.cpp

maflcko

Looks like the tests fail

src/index/base.h

src/init.cpp

So indexes can be initialized without spawning the sync thread. This makes asynchronous indexes startup possible in the following commits.

And transfer the responsibility of verifying whether 'start_block' has data or not to the caller. This is because the 'GetFirstStoredBlock' function responsibility is to return the first block containing data. And the current implementation can return 'start_block' when it has no data!. Which is misleading at least. Edge case behavior change: Previously, if the block tip lacked data but all preceding blocks contained data, there was no prune violation. And now, such scenario will result in a prune violation.

By generalizing 'GetFirstStoredBlock' and implementing 'CheckBlockDataAvailability' we can dedup code and avoid repeating work when multiple indexes are enabled. E.g. get the oldest block across all indexes and perform the pruning violation check from that point up to the tip only once (this feature is being introduced in a follow-up commit). This commit shouldn't change behavior in any way. Co-authored-by: Ryan Ofsky <ryan@ofsky.org>

By moving the 'StartIndexes()' call into the 'initload' thread, we can remove the threads active wait. Optimizing the available resources. The only difference with the current state is that now the indexes threads will only be started when they can process work and not before it.

At present, during init, we traverse the chain (once per index) to confirm that all necessary blocks to sync each index up to the current tip are present. To make the process more efficient, we can fetch the oldest block from the indexers and perform the chain data existence check from that point only once. This also moves the pruning violation check to the end of the 'loadinit' thread, which is where the reindex, block loading and chain activation processes happen. Making the node's startup process faster, allowing us to remove the global g_indexes_ready_to_sync flag, and enabling the execution of the pruning violation verification even when the reindex or reindex-chainstate flags are enabled (which has being skipped so far).

furszy

Updated per feedback, thanks Marco and TheCharlatan.

Changes:

Fixed bug in one of the intermediate commits (per comment).
Inlined an if/else in one of the intermediate commits (per comment).

ryanofsky

Code review ACK ca91c24. Just rebase and suggested changes since last review (Start return check, and code simplification)

TheCharlatan

re-ACK ca91c24

…storage 462390c refactor: Move stopafterblockimport handling out of blockstorage (TheCharlatan) Pull request description: This has the benefit of moving this StartShutdown call out of the blockstorage file and thus out of the kernel's responsibility. The user can now decide if he wants to start shutdown / interrupt after a block import or not. This also simplifies #28048, making it one fewer shutdown call to handle. ACKs for top commit: MarcoFalke: lgtm ACK 462390c 🗝 ryanofsky: Code review ACK 462390c. Just has been rebased and is a simpler change after #27607 Tree-SHA512: 84e58256b1c61f10e7ec5ecf32916f40a2ab1ea7cce703de0fa1c61ee2be94bd45ed32718bc99903b6eff3e6d3d5b506470bf567ddbb444a58232913918e8ab8

DrahtBot added the CI failed label May 9, 2023

ryanofsky requested changes May 11, 2023

View reviewed changes

src/index/base.cpp Outdated Show resolved Hide resolved

src/init.cpp Outdated Show resolved Hide resolved

src/node/interfaces.cpp Outdated Show resolved Hide resolved

src/node/interfaces.cpp Outdated Show resolved Hide resolved

mzumsande reviewed May 16, 2023

View reviewed changes

furszy force-pushed the 2023_index_decouple_has_data_checks branch from 86752e0 to 3af2301 Compare May 16, 2023 21:40

furszy force-pushed the 2023_index_decouple_has_data_checks branch 3 times, most recently from 6dfec1b to 2a7c2f1 Compare May 17, 2023 04:44

This was referenced May 17, 2023

refactor: Add util::Result failure values, multiple error and warning messages #25665

Open

build: Check usages of #if defined(...) #25302

Closed

rpc: Add dumpcoinstats #19792

Closed

furszy force-pushed the 2023_index_decouple_has_data_checks branch 2 times, most recently from d03d6dd to e38d09a Compare May 17, 2023 15:49

DrahtBot added the Needs rebase label May 17, 2023

furszy force-pushed the 2023_index_decouple_has_data_checks branch from e38d09a to e5b418f Compare May 17, 2023 19:16

furszy changed the title ~~init: verify blocks data existence only once for all the indexers~~ index: improve initialization and pruning violation checks May 17, 2023

furszy changed the title ~~index: improve initialization and pruning violation checks~~ index: improve initialization and pruning violation check May 17, 2023

DrahtBot removed the Needs rebase label May 17, 2023

DrahtBot mentioned this pull request May 18, 2023

kernel: Remove util/system from kernel library, interface_ui from validation. #27636

Merged

furszy and others added 4 commits July 7, 2023 19:31

init: start indexes sync earlier

ed4462c

The mempool load can take a while, and it is not needed for the indexes' synchronization. Also, having the mempool load function call inside 'blockstorage.cpp' wasn't structurally correct.

doc: describe 'init load' thread actions

2ebc7e6

furszy force-pushed the 2023_index_decouple_has_data_checks branch from 988f3f6 to 4223d80 Compare July 7, 2023 22:37

ryanofsky approved these changes Jul 7, 2023

View reviewed changes

furszy commented Jul 7, 2023

View reviewed changes

DrahtBot removed the CI failed label Jul 8, 2023

DrahtBot mentioned this pull request Jul 8, 2023

Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly #28051

Merged

TheCharlatan mentioned this pull request Jul 8, 2023

refactor: Move stopafterblockimport option out of blockstorage #28053

Merged

TheCharlatan approved these changes Jul 9, 2023

View reviewed changes

src/index/base.cpp Outdated Show resolved Hide resolved

DrahtBot requested a review from ryanofsky July 9, 2023 11:06

maflcko reviewed Jul 10, 2023

View reviewed changes

src/index/base.h Show resolved Hide resolved

src/init.cpp Outdated Show resolved Hide resolved

furszy and others added 5 commits July 10, 2023 10:47

refactor: index, decouple 'Init' from 'Start'

430e702

So indexes can be initialized without spawning the sync thread. This makes asynchronous indexes startup possible in the following commits.

furszy force-pushed the 2023_index_decouple_has_data_checks branch from 4223d80 to ca91c24 Compare July 10, 2023 13:52

furszy commented Jul 10, 2023

View reviewed changes

ryanofsky approved these changes Jul 10, 2023

View reviewed changes

DrahtBot requested a review from TheCharlatan July 10, 2023 14:28

TheCharlatan approved these changes Jul 10, 2023

View reviewed changes

ryanofsky merged commit ef29d5d into bitcoin:master Jul 10, 2023

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 12, 2023

Merge bitcoin#27607: index: make startup more efficient

efd4404

furszy deleted the 2023_index_decouple_has_data_checks branch July 20, 2023 23:00

furszy mentioned this pull request Jan 4, 2024

Prune Node Rescan Project Tracking #29183

Open

18 tasks

bitcoin locked and limited conversation to collaborators Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

index: make startup more efficient #27607

index: make startup more efficient #27607

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

index: make startup more efficient #27607

index: make startup more efficient #27607

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Reviews

Conflicts

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!