8000 index: initial sync speedup, parallelize process by furszy · Pull Request #26966 · bitcoin/bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

index: initial sync speedup, parallelize process #26966

New issue 8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

furszy
Copy link
Member
@furszy furszy commented Jan 25, 2023

The current procedure for building the block filter index involves processing filters one at a time;
Reading blocks, undo data, and previous headers from disk sequentially.

This PR introduces a new mechanism to perform the work concurrently. Dividing the filters
generation workload among a pool of workers that can be configured by the user,
significantly increasing the speed of the index construction process.

The same concurrent processing model has been applied to the transactions index as well.

The newly introduced init flag -indexworkers=<n> enables the concurrent sync
behavior.
Where "n" is the number of worker threads that will be spawned at startup to create ranges
of block filters during the initial sync process. Destroying the workers pool once the
initial sync completes.
Note: by default, the parallelized sync process is not enabled.

Now the juicy part:
In my computer, with the node in debug mode and on IBD, with -indexworkers=4, the
block filter index generation took less than an hour. While, in master, the sync took more than 7 hours.

Important Note:
As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.

@DrahtBot
Copy link
Contributor
DrahtBot commented Jan 25, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/26966.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK w0xlt, TheCharlatan
Approach ACK ryanofsky

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32699 (docs: adds correct updated documentation links by Zeegaths)
  • #32541 (index: store per-block transaction locations for efficient lookups by romanz)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • writings -> writing [incorrect verb form in "Error writings filters"]

drahtbot_id_4_m

@Sjors
Copy link
Member
Sjors commented Jan 26, 2023

Cool, will take it for a spin...

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from d8b0f5e to 65dc850 Compare January 28, 2023 15:04
@furszy
Copy link
Member Author
furszy commented Jan 28, 2023

Cool @Sjors, just pushed a small update. Found a little bug.

Going to add an important note to the PR description (because otherwise testing results will vary a lot):

As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.
(where "substantially" here means full index sync, with 5 workers, in less than 20 minutes in my computer).

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 65dc850 to 09cc56a Compare January 28, 2023 15:40
Copy link
Contributor
@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well? On first glance, txindex, and the indices suggested in PRs (#24539, #26951) seem to be parallelizable as well (not coinstatsindex though).

@furszy
Copy link
Member Author
furszy commented Jan 30, 2023

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well?

Yeah, that is part of the plan. I started with the block filter index because it requires an special treatment that txindex parallelization does not require (block/undo data reading and block filters creation can be parallelized but writing must be done sequentially due the need to link filter headers to their predecessors to create the filters-chain on disk).

My idea was to start reviewing this one, so the process gets as clean as possible, and then move forward with the generalization step. It's usually more natural to abstract processes when the specific cases are well-defined.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 09cc56a to 43e6237 Compare January 30, 2023 16:15
Copy link
Contributor
@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Perhaps it could have separate parallelization and index logic. So it could be reused in other indexes.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from b2b62f5 to 4e1fba1 Compare January 30, 2023 20:55
@Sjors
Copy link
Member
Sjors commented Feb 7, 2023

Building the index on (AMD Ryzen 7950X, blocks stored on SSD):

  • master @ fe86616: 35'15" (mostly 1 alternating CPU thread)
  • this PR (rebased)
    • n=8: 5'20" (uses about 8 of 32 CPU threads as expected)
    • n=32: 4'26" (pleasantly close to 100% CPU usage with a dip every 10 seconds, but it drops to only 1 CPU in the last minute or two)

I made sure to not load any wallets and disabled other indexes.

I didn't test if the index was correct.

I wonder if, for users without this index, it would be faster to generate the index, rescan the wallet and then delete it again. Combined with #26951 you would only have to generate filters up to the age of the wallet (IIUC, cc @pstratem).

Note to self for future benchmarks: -txindex takes 1:07'14", -coinstatsindex takes 3.5 hours.

@furszy
Copy link
Member Author
furszy commented Feb 7, 2023

Great results @Sjors!.

Could also give it a run rebased on top #27006.
On master, the index initial sync is slower when the node is in IBD because the index thread has to compete for access to block data on disk through cs_main acquisition.

I didn't test if the index was correct.

The PR contain a test verifying it.


Side note:
I'm working on generalizing the parallelization flow so other indexes, like the txindex and #26951 can make use of it too.

8000
Copy link
Contributor
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK 349b099 and I reviewed most of the code. Seems like a nice design and good approach. Plan to finish reviewing later.

__func__, pindex->GetBlockHash().ToString());
return;
}
if (!ProcessBlock(pindex)) break; // error logged internally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

It seems like this should be returning, not breaking. Would be good to change or clarify with a comment

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

It seems like this should be returning, not breaking. Would be good to change or clarify with a comment

yeah, good catch!.
I think at the end it doesn't matter much because ProcessBlock aborts the node during a failure but still, this would have logged an extra line "index enabled at height " during shutdown.

interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex, block.get());
if (CustomAppend(block_info)) {

if (ProcessBlock(pindex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

Coud add another // error logged internally comment here since it now looks like this failure is being ignored.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

Coud add another // error logged internally comment here since it now looks like this failure is being ignored.

Done as suggested.

}
}

const CBlockUndo& block_undo = block.height > 0 ? *Assert(block.undo_data) : CBlockUndo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

I'm not sure this is safe. It looks like if block height is 0 this is taking a reference to a temporary object that will go out scope.

In any case I think this could be simplified to const CBlockUndo& block_undo{*Assert(block.undo_data)} because it looks like pointer will be set above even if height is 0.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

I'm not sure this is safe. It looks like if block height is 0 this is taking a reference to a temporary object that will go out scope.

In any case I think this could be simplified to const CBlockUndo& block_undo{*Assert(block.undo_data)} because it looks like pointer will be set above even if height is 0.

yeah, great. Done as suggested.

Mutex cs_work_queue;
std::queue<std::function<void()>> m_work_queue GUARDED_BY(cs_work_queue);
std::condition_variable m_condition;
CThreadInterrupt m_interrupt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

I don't think it makes sense for m_interrupt to use the CThreadInterrupt class because the ThreadPool class already has it's own mutex and condition variable and it would be wasteful to introduce more. I think could just replace CThreadInterrupt with bool here, and replace m_interrupt(); with m_interrupt = true and replace m_interrupt.reset() with m_interrupt = false while holding the mutex.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

I don't think it makes sense for m_interrupt to use the CThreadInterrupt class because the ThreadPool class already has it's own mutex and condition variable and it would be wasteful to introduce more. I think could just replace CThreadInterrupt with bool here, and replace m_interrupt(); with m_interrupt = true and replace m_interrupt.reset() with m_interrupt = false while holding the mutex.

sure, done as suggested. Thanks!

src/sync.h Outdated
@@ -299,6 +299,7 @@ inline MutexType* MaybeCheckNotHeld(MutexType* m) LOCKS_EXCLUDED(m) LOCK_RETURNE
//! The above is detectable at compile-time with the -Wreturn-local-addr flag in
//! gcc and the -Wreturn-stack-address flag in clang, both enabled by default.
#define WITH_LOCK(cs, code) (MaybeCheckNotHeld(cs), [&]() -> decltype(auto) { LOCK(cs); code; }())
#define WITH_REVERSE_LOCK(cs, code) ([&]() -> decltype(auto) { REVERSE_LOCK(cs); code; }())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

This seems ok, but it's is only used one place where WITH_REVERSE_LOCK is not much simpler than plain REVERSE_LOCK. Could consider dropping it.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

This seems ok, but it's is only used one place where WITH_REVERSE_LOCK is not much simpler than plain REVERSE_LOCK. Could consider dropping it.

Sure. I think I did it this way to be very explicit about the code that will be executed without the lock, but yeah, we could achieve the same outcome with another set of brackets too.

class ThreadPool {

private:
Mutex cs_work_queue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

Not important, but would suggest simplifying naming and just calling these members:

Mutex m_mutex;
std::condition_variable m_cv;

The cs_ prefix is an older convention that comes from windows code, and there is as long as this class is going to have one mutex there isn't really a reason to use a more complicated name.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

Not important, but would suggest simplifying naming and just calling these members:

Mutex m_mutex;
std::condition_variable m_cv;

The cs_ prefix is an older convention that comes from windows code, and there is as long as this class is going to have one mutex there isn't really a reason to use a more complicated name.

k sure. Done as suggested.

src/init.cpp Outdated
@@ -2104,7 +2108,20 @@ bool StartIndexBackgroundSync(NodeContext& node)
}
}

std::shared_ptr<ThreadPool> thread_pool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

This doesn't seem like a great use of shared_ptr because makes the shutdown sequence more complicated than it needs to be. I think it would be clearer if instead of taking std::shared_ptr<ThreadPool> references they just used ThreadPool& and a new std::unique_ptr<ThreadPool> m_index_threads member was added to NodeContext. This way the threads could be stopped with an explicit reset() call instead of shutting down more unpredictably when the last index is destroyed.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

This doesn't seem like a great use of shared_ptr because makes the shutdown sequence more complicated than it needs to be. I think it would be clearer if instead of taking std::shared_ptr<ThreadPool> references they just used ThreadPool& and a new std::unique_ptr<ThreadPool> m_index_threads member was added to NodeContext. This way the threads could be stopped with an explicit reset() call instead of shutting down more unpredictably when the last index is destroyed.

Sounds good. Done as suggested. Thanks!

const int max_blocks_to_sync = m_tasks_per_worker * m_thread_pool->WorkersCount() + m_tasks_per_worker; // extra 'm_tasks_per_worker' due the active-wait.
const int tip_height = WITH_LOCK(cs_main, return m_chainstate->m_chain.Height());
const int remaining_blocks = tip_height - pindex_next->nHeight;
work_chunk = remaining_blocks > max_blocks_to_sync ? m_tasks_per_worker : remaining_blocks / (m_thread_pool->WorkersCount() + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

I found this hard to follow:

const int max_blocks_to_sync = m_tasks_per_worker * m_thread_pool->WorkersCount() + m_tasks_per_worker; // extra 'm_tasks_per_worker' due the active-wait.
work_chunk = remaining_blocks > max_blocks_to_sync ? m_tasks_per_worker : remaining_blocks / (m_thread_pool->WorkersCount() + 1);
workers_count = m_thread_pool->WorkersCount();

would suggest dropping the max_blocks_to_sync variable and simplifying to:

workers_count = m_thread_pool->WorkersCount();
work_chunk = std::min(m_tasks_per_worker, remaining_blocks / (workers_count + 1));

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

I found this hard to follow:

const int max_blocks_to_sync = m_tasks_per_worker * m_thread_pool->WorkersCount() + m_tasks_per_worker; // extra 'm_tasks_per_worker' due the active-wait.
work_chunk = remaining_blocks > max_blocks_to_sync ? m_tasks_per_worker : remaining_blocks / (m_thread_pool->WorkersCount() + 1);
workers_count = m_thread_pool->WorkersCount();

would suggest dropping the max_blocks_to_sync variable and simplifying to:

workers_count = m_thread_pool->WorkersCount();
work_chunk = std::min(m_tasks_per_worker, remaining_blocks / (workers_count + 1));

Good idea. Done as suggested.

src/index/base.h Outdated
@@ -88,6 +91,7 @@ class BaseIndex : public CValidationInterface
CThreadInterrupt m_interrupt;

std::shared_ptr<ThreadPool> m_thread_pool;
uint16_t m_tasks_per_worker{INDEX_WORK_PER_CHUNK};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

IMO, this would be clearer if it were called blocks_per_worker or blocks_per_chunk instead of tasks_per_worker. Not knowing that a task is a block made the code harder to understand when initially reading it.

Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

IMO, this would be clearer if it were called blocks_per_worker or blocks_per_chunk instead of tasks_per_worker. Not knowing that a task is a block made the code harder to understand when initially reading it.

sure. Done as suggested.

std::vector<std::future<std::vector<std::any>>> futures;
const CBlockIndex* it_start = pindex_next;

if (parallel_sync_enabled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

Would be really helpful to have a comment saying how this works at a high level. Would suggest something like "If parallel sync is enabled, use WorkersCount()+1 threads (including the current thread) to each process block ranges of up to m_tasks_per_worker blocks. The blocks in each range are processed in sequence by calling the index's CustomProcessBlock method which returns std::any values that are collected into vectors. As the threads finish their work, the std::any values are processed in order by calling the index's CustomPostProcessBlocks method, and the process repeats until no blocks are remaining to be processed and post-processed."

I guess at a high level this seems reasonable, but perhaps too rigid. Like if there are 3 threads processing block ranges 1-10, 11-20, 21-30, and the first 2 threads finish while the third thread is slow. Why should the loop need to wait for the third thread before beginning to process blocks 31-50 and there are two idle threads doing nothing?

It seems like this idleness could be avoided by moving the CustomPostProcessBlocks calls into the worker threads. So that whenever each worker thread finishes processing blocks, it then opportunistically calls CustomPostProcessBlocks to post-process any blocks that are available (given the ordering constraint for post-processing). This way all the worker threads would continuously have work to do, and I suspect the resulting code might be simpler too since there would just be a single phase of work, not alternating Processing/PostProcessing phases.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 349b099 to 55f23cd Compare June 5, 2025 20:00
Copy link
Member Author
@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, andrewtoth and ryanofsky!
Addressed most of the suggestions, but not all yet. Will finish the rest soon.

__func__, pindex->GetBlockHash().ToString());
return;
}
if (!ProcessBlock(pindex)) break; // error logged internally
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

It seems like this should be returning, not breaking. Would be good to change or clarify with a comment

yeah, good catch!.
I think at the end it doesn't matter much because ProcessBlock aborts the node during a failure but still, this would have logged an extra line "index enabled at height " during shutdown.

interfaces::BlockInfo block_info = kernel::MakeBlockInfo(pindex, block.get());
if (CustomAppend(block_info)) {

if (ProcessBlock(pindex)) {
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

Coud add another // error logged internally comment here since it now looks like this failure is being ignored.

Done as suggested.

}
}

const CBlockUndo& block_undo = block.height > 0 ? *Assert(block.undo_data) : CBlockUndo();
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: remove CBlockIndex access from node internals" (c4723fb)

I'm not sure this is safe. It looks like if block height is 0 this is taking a reference to a temporary object that will go out scope.

In any case I think this could be simplified to const CBlockUndo& block_undo{*Assert(block.undo_data)} because it looks like pointer will be set above even if height is 0.

yeah, great. Done as suggested.

Mutex cs_work_queue;
std::queue<std::function<void()>> m_work_queue GUARDED_BY(cs_work_queue);
std::condition_variable m_condition;
CThreadInterrupt m_interrupt;
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

I don't think it makes sense for m_interrupt to use the CThreadInterrupt class because the ThreadPool class already has it's own mutex and condition variable and it would be wasteful to introduce more. I think could just replace CThreadInterrupt with bool here, and replace m_interrupt(); with m_interrupt = true and replace m_interrupt.reset() with m_interrupt = false while holding the mutex.

sure, done as suggested. Thanks!

src/sync.h Outdated
@@ -299,6 +299,7 @@ inline MutexType* MaybeCheckNotHeld(MutexType* m) LOCKS_EXCLUDED(m) LOCK_RETURNE
//! The above is detectable at compile-time with the -Wreturn-local-addr flag in
//! gcc and the -Wreturn-stack-address flag in clang, both enabled by default.
#define WITH_LOCK(cs, code) (MaybeCheckNotHeld(cs), [&]() -> decltype(auto) { LOCK(cs); code; }())
#define WITH_REVERSE_LOCK(cs, code) ([&]() -> decltype(auto) { REVERSE_LOCK(cs); code; }())
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

This seems ok, but it's is only used one place where WITH_REVERSE_LOCK is not much simpler than plain REVERSE_LOCK. Could consider dropping it.

Sure. I think I did it this way to be very explicit about the code that will be executed without the lock, but yeah, we could achieve the same outcome with another set of brackets too.

class ThreadPool {

private:
Mutex cs_work_queue;
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "util: introduce general purpose thread pool" (a382501)

Not important, but would suggest simplifying naming and just calling these members:

Mutex m_mutex;
std::condition_variable m_cv;

The cs_ prefix is an older convention that comes from windows code, and there is as long as this class is going to have one mutex there isn't really a reason to use a more complicated name.

k sure. Done as suggested.

src/init.cpp Outdated
@@ -2104,7 +2108,20 @@ bool StartIndexBackgroundSync(NodeContext& node)
}
}

std::shared_ptr<ThreadPool> thread_pool;
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

This doesn't seem like a great use of shared_ptr because makes the shutdown sequence more complicated than it needs to be. I think it would be clearer if instead of taking std::shared_ptr<ThreadPool> references they just used ThreadPool& and a new std::unique_ptr<ThreadPool> m_index_threads member was added to NodeContext. This way the threads could be stopped with an explicit reset() call instead of shutting down more unpredictably when the last index is destroyed.

Sounds good. Done as suggested. Thanks!

const int max_blocks_to_sync = m_tasks_per_worker * m_thread_pool->WorkersCount() + m_tasks_per_worker; // extra 'm_tasks_per_worker' due the active-wait.
const int tip_height = WITH_LOCK(cs_main, return m_chainstate->m_chain.Height());
const int remaining_blocks = tip_height - pindex_next->nHeight;
work_chunk = remaining_blocks > max_blocks_to_sync ? m_tasks_per_worker : remaining_blocks / (m_thread_pool->WorkersCount() + 1);
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

I found this hard to follow:

const int max_blocks_to_sync = m_tasks_per_worker * m_thread_pool->WorkersCount() + m_tasks_per_worker; // extra 'm_tasks_per_worker' due the active-wait.
work_chunk = remaining_blocks > max_blocks_to_sync ? m_tasks_per_worker : remaining_blocks / (m_thread_pool->WorkersCount() + 1);
workers_count = m_thread_pool->WorkersCount();

would suggest dropping the max_blocks_to_sync variable and simplifying to:

workers_count = m_thread_pool->WorkersCount();
work_chunk = std::min(m_tasks_per_worker, remaining_blocks / (workers_count + 1));

Good idea. Done as suggested.

src/index/base.h Outdated
@@ -88,6 +91,7 @@ class BaseIndex : public CValidationInterface
CThreadInterrupt m_interrupt;

std::shared_ptr<ThreadPool> m_thread_pool;
uint16_t m_tasks_per_worker{INDEX_WORK_PER_CHUNK};
Copy link
Member Author
@furszy furszy Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "index: implement index parallel sync" (bc0e521)

IMO, this would be clearer if it were called blocks_per_worker or blocks_per_chunk instead of tasks_per_worker. Not knowing that a task is a block made the code harder to understand when initially reading it.

sure. Done as suggested.

@DrahtBot
Copy link
Contributor
DrahtBot commented Jun 5, 2025

🚧 At least one of the CI tasks failed.
Task TSan, depends, gui: https://github.com/bitcoin/bitcoin/runs/43571812170
LLM reason (✨ experimental): The CI failure is caused by a segmentation fault occurring in CBlockIndex::GetBlockPos().

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@furszy
Copy link
Member Author
furszy commented Jun 6, 2025

Decoupled part of this work inside #32694 - combining it with part of #24230.

achow101 added a commit that referenced this pull request Jun 12, 2025
029ba1a index: remove CBlockIndex access from CustomAppend() (furszy)
91b7ab6 refactor: index, simplify CopyHeightIndexToHashIndex to process single block (furszy)
6f1392c indexes, refactor: Remove remaining CBlockIndex* uses in index Rewind methods (Ryan Ofsky)
0a24870 indexes, refactor: Stop requiring CBlockIndex type to call IsBIP30Unspendable (Ryan Ofsky)
331a25c test: indexes, avoid creating threads when sync runs synchronously (furszy)

Pull request description:

  Combining common refactors from #24230 and #26966, aiming to move both efforts forward while reducing their size and review burden.

  Broadly, #24230 focuses on enabling indexes to run in a separate process, and #26966 aims to parallelize the indexes initial synchronization process. A shared prerequisite for both is ensuring that only the base index class interacts with the node’s chain internals - child index classes should instead operate solely through chain events.

  This PR moves disk read lookups from child index classes to the base index class. It also includes a few documentation improvements and a test-only code cleanup.

ACKs for top commit:
  maflcko:
    review ACK 029ba1a 👡
  achow101:
    ACK 029ba1a
  TheCharlatan:
    Re-ACK 029ba1a
  davidgumberg:
    ACK 029ba1a
  mzumsande:
    Code Review ACK 029ba1a

Tree-SHA512: f073af407fc86f228cb47a32c7bcf2241551cc89ff32059317eb81d5b86fd5fda35f228d2567e0aedbc9fd6826291f5fee05619db35ba44108421ae04d11e6fb
furszy added 5 commits June 12, 2025 20:14
And add option to customize thread pool workers count
It also adds coverage for initial sync from a particular block.
Mimicking a node restart.
@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from f3f5e4c to 53bc8b9 Compare June 13, 2025 00:20
const auto& [filter, height] = std::any_cast<std::pair<BlockFilter, int>>(obj);
const uint256& header = filter.ComputeHeader(m_last_header);
if (!Write(filter, height, header)) {
LogError("Error writings filters, shutting down block filters index\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writings -> writing [incorrect verb form in "Error writings filters"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0