8000 validation: improve performance of CheckBlockIndex by mzumsande · Pull Request #28339 · bitcoin/bitcoin · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

validation: improve performance of CheckBlockIndex #28339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

mzumsande
Copy link
Contributor
@mzumsande mzumsande commented Aug 24, 2023

CheckBlockIndex() are consistency checks that are currently enabled by default on regtest.

The function is rather slow, which is annoying if you

  • attempt to run it on other networks, especially if not fully synced
  • want to generate a long chain on regtest and see block generation slow down because you forgot to disable -checkblockindex or don't know it existed.

One reason why it's slow is that in order to be able to traverse the block tree depth-first from genesis, it inserts pointers to all block indices into a std::multimap - for which inserts and lookups become slow once there are hundred thousands of entries.
However, typically the block index is mostly chain-like with just a few forks so a multimap isn't really needed for the most part. This PR suggests to store the block indices of the chain ending in the best header in a vector instead, and store only the rest of the indices in a multimap. This does not change the actual consistency checks that are being performed for each index, just the way the block index tree is stored and traversed.

This adds a bit of complication to make sure each block is visited (note that there are asserts that check it), making sure that the two containers are traversed correctly, but it speeds up the function considerably:

On master, a single invocation of CheckBlockIndex takes ~1.4s on mainnet for me (4.9s on testnet which has >2.4 million blocks).
With this branch, the runtime goes down to ~0.27s (0.85s on testnet).This is a speedup by a factor ~5.

@DrahtBot
8000 Copy link
Contributor
DrahtBot commented Aug 24, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK furszy, ryanofsky, achow101
Concept ACK jonatack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@mzumsande mzumsande marked this pull request as draft August 24, 2023 18:18
@mzumsande mzumsande marked this pull request as ready for review August 25, 2023 00:42
@fanquake
Copy link
Member

cc @martinus

@martinus
Copy link
Contributor
martinus commented Aug 25, 2023

Hi, I wonder how you measured the runtime, is there a benchmark? Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> forward.

@mzumsande
Copy link
Contributor Author

Hi, I wonder how you measured the runtime, is there a benchmark?

So far only manually, by adding log statements to the beginning and end of CheckBlockIndex and running with -logtimemicros. I will add a benchmark!

Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> forward.

Thanks, I'll look into that!

@mzumsande mzumsande marked this pull request as draft September 11, 2023 21:37
@Sjors
Copy link
Member
Sjors commented Sep 14, 2023

While you're at it, I noticed that the help text for -checkblockindex says "occasionally" (added in #7290), but afaik it happens every block (and every header during the initial header sync). We should probably drop that word, or alternatively provide an interval argument.

@mzumsande
Copy link
Contributor Author

While you're at it, I noticed that the help text for -checkblockindex says "occasionally" (added in #7290), but afaik it happens every block (and every header during the initial header sync). We should probably drop that word, or alternatively provide an interval argument.

Good point, I think that an interval argument similar to addrman or mempool consistency checks makes sense. Will try that in addition to the other points above (it might take a few weeks though, therefore put the PR into draft).

@mzumsande mzumsande force-pushed the 202308_speedup_checkblockindex branch from 3e0f001 to 87d6155 Compare December 22, 2023 16:12
@mzumsande
Copy link
Contributor Author
mzumsande commented Dec 22, 2023

I added a commit with a benchmark, and another one that allow to specify the frequency for -checkblockindex (analogous to the way -checkaddrman and -checkmempool work).

Another optimization that might be worth it, it is to replace the std::multimap<CBlockIndex*,CBlockIndex*> forward with an std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> forward.

I tried that in mzumsande@51aae54 (directly on master without the changes of this PR), however it doesn't seem to improve performance - I'll post benchmark results below.

@mzumsande
Copy link
Contributor Author
mzumsande commented Dec 22, 2023

Benchmark results:

master
ns/op op/s err% total benchmark
331,165.67 3,019.64 0.3% 0.01 CheckBlockIndex
320,603.00 3,119.12 1.7% 0.01 CheckBlockIndex
315,683.67 3,167.73 0.4% 0.01 CheckBlockIndex
329,077.33 3,038.80 0.3% 0.01 CheckBlockIndex
315,345.33 3,171.13 0.4% 0.01 CheckBlockIndex
PR
ns/op op/s err% total benchmark
127,273.00 7,857.13 1.8% 0.01 CheckBlockIndex
123,687.22 8,084.91 0.1% 0.01 CheckBlockIndex
136,637.17 7,318.65 0.1% 0.01 CheckBlockIndex
123,805.67 8,077.17 0.3% 0.01 CheckBlockIndex
124,162.62 8,053.95 0.2% 0.01 CheckBlockIndex
martinus' idea

Uses std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>>, see mzumsande@51aae54 for my implementation

ns/op op/s err% total benchmark
384,395.00 2,601.49 0.5% 0.01 CheckBlockIndex
380,674.67 2,626.92 0.3% 0.01 CheckBlockIndex
384,337.33 2,601.88 1.1% 0.01 CheckBlockIndex
410,836.50 2,434.06 3.4% 0.01 CheckBlockIndex
384,461.33 2,601.04 2.0% 0.01 CheckBlockIndex

The benchmark currently uses a linear chain of 1100 headers. In general, the efficiency gain becomes larger with a longer chain, for example with just 100 headers the speedup is less.

So there is a speedup of a facto >2.5 with the benchmark (that becomes larger on larger chains, see results for mainnet / testnet in OP).
The suggestion by @martinus to use std::unordered_map<CBlockIndex*, std::vector<CBlockIndex*>> doesn't seem to improve performance (though it's also very possible that I might have not implemented it in a great away).

@mzumsande mzumsande marked this pull request as ready for review December 22, 2023 16:45
@jonatack
Copy link
Member
jonatack commented Jan 6, 2024

Concept ACK

@mzumsande
Copy link
Contributor Author

87d6155 to efdc8e3: rebased

bool ChainstateManager::ShouldCheckBlockIndex() const
{
if (!*Assert(m_options.check_block_index)) return false;
if (GetRand(*m_options.check_block_index) >= 1) return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be better to do this deterministically to avoid non-determinism in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have consistency among -checkaddrman, -checkmempool and -checkblockindex. Since non-determinism would only occur for values other than 0 and 1, maybe it'd be best not to use other values than that in tests? Or change all three do be deterministic?

@DrahtBot DrahtBot requested a review from furszy March 28, 2024 14:09
@mzumsande mzumsande force-pushed the 202308_speedup_checkblockindex branch 2 times, most recently from 7015231 to e880ee0 Compare April 1, 2024 14:53
@mzumsande
Copy link
Contributor Author

b51eb42 to e880ee0
Addressed feedback by @ryanofsky - now the diff to master should be a bit smaller than before.

Still haven't had the time to write more unit tests for CheckBlockIndex().

Copy link
Contributor
8000
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK e880ee0. Just rebase and suggested simplifications since the last review

@DrahtBot DrahtBot requested a review from furszy April 18, 2024 13:17
@furszy
Copy link
Member
furszy commented Apr 18, 2024

@DrahtBot, I just reviewed it..

@mzumsande
Copy link
Contributor Author

e880ee0 to 3c3895c:
Addressed review feedback.

Copy link
Contributor
@ryanofsky ryanofsky 103CE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 3c3895c

Left another suggestion, but this looks good in its current form.

mzumsande and others added 2 commits April 26, 2024 13:31
by not saving all indexes in a std::multimap, but only
those that are not part of the best header chain.
The indexes of the best header chain are stored in a vector,
which, in the typical case of a mostly linear chain with
a few forks, results in a much smaller multimap, and increases
performance noticeably for long chains.

This does not change the actual consistency checks that are being
performed for each index, just the way the block index tree is
stored and traversed.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
This makes it similar to -checkaddrman and -checkmempool, which
also allow to run the check occasionally instead of always / never.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
@mzumsande mzumsande force-pushed the 202308_speedup_checkblockindex branch from 3c3895c to 5bc2077 Compare April 26, 2024 17:35
@mzumsande
Copy link
Contributor Author

3c3895c to 5bc2077:
Addressed review feedback

Copy link
Member
@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5bc2077

@DrahtBot DrahtBot requested a review from ryanofsky May 20, 2024 13:15
Copy link
Contributor
@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 5bc2077. Just added suggested assert and simplification since last review

@achow101
Copy link
Member

ACK 5bc2077

@achow101 achow101 merged commit 891e4bf into bitcoin:master Jun 11, 2024
@mzumsande mzumsande deleted the 202308_speedup_checkblockindex branch June 11, 2024 21:24
@hebasto
Copy link
Member
hebasto commented Jul 31, 2024

Ported to the CMake-based build system in hebasto#290.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0