-
Notifications
You must be signed in to change notification settings - Fork 37.4k
bench: BlockAssembler on a mempool with packages #26695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
Change and idea are good, small issue with the first commit.
e2ef70c
to
4c20bee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 4c20bee (jamesob/ackr/26695.2.glozow.bench_blockassembler_on
)
Nice! Good thing to have.
Could possibly later populate the mempool with a more stressful load, something
like 10,000 transactions?
Show signature data
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 4c20bee65009bd2c95c2a3426030b042f55007ad ([`jamesob/ackr/26695.2.glozow.bench_blockassembler_on`](https://github.com/jamesob/bitcoin/tree/ackr/26695.2.glozow.bench_blockassembler_on))
Nice! Good thing to have.
Could possibly later populate the mempool with a more stressful load, something
like 10,000 transactions?
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmOZx4sACgkQepNdrbLE
TwVdwA//bHGKG97lUP5JPiknjulnwVzPvfPyXJi8azZQ2XoK3T1HEA/kJo31wHrY
2mdRgis+kst/emY7wfu/LAAlc06M8cKt5DuPfEgJIPzkidV8JAePMvw8QpDgbd/K
mTu+6mdeGq5Mspy0eS2ItsQEeC4NvZeQSIIf9yCSIhlnCBmeshqBEydN3tduW6Ot
nr6RKGizFO7yJ1ecWX+UA9eYNUYUz4isLehBgoyTh5hz0IJobWIFRmzVCXUQci0y
S3WxFf4RlGnJpWMOOwd/6WZHAdsFOtZ6T4Nux9HEC7nboxb+pfib47CvqCWc0kG3
4mAsJtxasIna/0HTMFD9gApOteG3NByNDKfOkhG1gZYu8tnRA6wm+61WKX6BgJT9
MHHd4WMlov9ye4YaQQazpV8ApXqbjMbFpOLxSeCn3vad/bXo4IWq2x+rpQZTpLJs
+pPNTvqahtk96njeqZcukbnu+JYuEbst4sSlOGMd29fV52M2lyOICK5Jt0PgjfBR
f/lZXhNdd7+1ydj5XOkKE4RAqaQhEf1FYhvR0u+HmFcIVUsEfIYQkvB4z89j28KI
hFGQD3eFPxFPdxBV32GMWHKQ52KBgvzL0Hbbe4a7D/FSbE8YAnRhFXx3jd4AhlAV
8xCDcN/eAthz45ZsTvvoF6NagwRfoXQoP4jzjzFIsNDGD0CKri0=
=cB3y
-----END PGP SIGNATURE-----
Show platform data
Tested on Linux-6.0.11-arch1-1-x86_64-with-glibc2.36
Configured with ./configure --without-bdb 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=. -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha i
Compiler version: clang version 14.0.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Left some nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK - nice to have this kind of benchmark, will be helpful in e.g. #26289
src/bench/block_assemble.cpp
Outdated
}); | ||
} | ||
|
||
BENCHMARK(AssembleBlock, benchmark::PriorityLevel::HIGH); | ||
BENCHMARK(BlockAssemblerAddPackageTxns, benchmark::PriorityLevel::HIGH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is among the slowest benchmarks in the suite atm. On my machine, --sanity-check
takes ~1.2s. I think that's an acceptable additional slowdown for make check
, but perhaps we should consider downgrading this to PriorityLevel::MEDIUM
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I get when trying to run the BlockAssemblerAddPackageTxns
benchmark
Specs:
mac book pro 2019
2.3 GHz 8-Core Intel Core i9
16 GB 2667 MHz DDR4
AMD Radeon Pro 5500M 4 GB
Results:
./src/bench/bench_bitcoin -filter=BlockAssemblerAddPackageTxns --sanity-check
Running with --sanity-check option, benchmark results will be useless.
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 813,714,108.00 | 1.23 | 0.0% | 0.81 | `BlockAssemblerAddPackageTxns`
Without --sanity-check
:
./src/bench/bench_bitcoin -filter=BlockAssemblerAddPackageTxns
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 830,079,779.00 | 1.20 | 1.4% | 9.16 | `BlockAssemblerAddPackageTxns`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When compiled with --enable-debug
, this slows down make check
by ~30s for me. When run without --sanity-check
, it takes ~280s. I think this needs to be a low priority benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to low, I think we don't have medium? Is priority based on time only, and if so at what time threshold would we want to knock the priority down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, we had medium in the initial PR but it was removed later on, so agreed with changing it to low. We haven't defined any guidelines on when to downgrade to low afaik, but since devs run make check
often, I think adding benches with meaningful (e.g. >1s?) --sanity-check
overhead should run in low priority - unless their complexity/importance warrants otherwise.
628ce92
to
b053472
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 628ce92 (jamesob/ackr/26695.3.glozow.bench_blockassembler_on
)
re-ACK
Show signature data
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 628ce92c494c6d3b6181d2ee6f64494de38a67e3 ([`jamesob/ackr/26695.3.glozow.bench_blockassembler_on`](https://github.com/jamesob/bitcoin/tree/ackr/26695.3.glozow.bench_blockassembler_on))
re-ACK
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEGNRVI1NPYuZCSIrGepNdrbLETwUFAmOcpWIACgkQepNdrbLE
TwW/5w/+MpLKrQ5eLWc7uXQ2P1ML82Q1EKDXzGC4bmBFZZ2t7Ynj78enPUf1ukD2
o8hT28touxw5XSWF2Zcwjhh+DRWU6sqYJInpOXhW1366iqS2DcvDQZOctwEEUasO
RM6lcRfNNvz4FyuiV4Cfn/liVHK6ipCQoPY88mjyy/I32XzwB6wXKNopRCV8GfFV
pIU10BM9tVobwHJhFRvTQve3nhwXjIbwJ0Vn9peUqKRSmuxDkFf2R6l/3YUwEFjk
iPJQ+SW4fo0h6bDThZ8GOfT8F4u52cnsx97zOck9f6wmehQoKUDIj6s2Epa/n/0q
WZXx6+xI2RS3GuS7fM9et1+0H8TmyHhkvSpzkSiOg+npWBqPl+8tCgqQftUS4s+D
HY/BoP1arD2nHsQ/yBksymASmnFKMyR0RWz8r8JheUY3gGXRDCQZkUCkVMMjpV3O
tOIx6k1h8dfyaJqxW2e6o3r2dNho79Ap2nQS7UnRfFu80KebgyV0SZYusK3+OT4a
Wkm4VBO3l36BQJwOBmu6nQ3mpbgF4TvQOhj9F65GUaLrsAKrNn/YrcnV2U+g/RmS
JaXsSUio7DukkYccLLLOh9WvYOXkxohIXNA7DBpk+VES8PBDN5Q5ZBuCmJSRt0HC
VwSxSbNA4Pedk6pmaebOAgTUcQxY9yJOaBQLEqyhR3SRE4YgrVQ=
=rMYG
-----END PGP SIGNATURE-----
Show platform data
Tested on Linux-6.0.11-arch1-1-x86_64-with-glibc2.36
Configured with ./configure --without-bdb 'CXXFLAGS=-fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare ' --disable-gui-tests --disable-fuzz-binary --enable-wallet --enable-debug --with-daemon --enable-natpmp-default CXX=/usr/bin/clang++ CC=/usr/bin/clang PYTHONPATH= --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --disable-shared --with-pic --enable-benchmark=no --enable-module-recovery --enable-module-schnorrsig --no-create --no-recursion
Compiled with /usr/bin/ccache /usr/bin/clang++ -std=c++17 -mavx -mavx2 -mpclmul -fPIE -pipe -O2 -g -Wthread-safety-analysis -Wall -Werror=sign-compare -O0 -g3 -ftrapv -fdebug-prefix-map=$(abs_top_srcdir)=. -Wstack-protector -fstack-protector-all -fcf-protection=full -fstack-clash-protection -msse4.1 -msse4.2 -msse4 -msha i
Compiler version: clang version 14.0.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b053472 (jamesob/ackr/26695.4.glozow.bench_blockassembler_on
)
Oops, previous ACK already stale. New ACK incorporates
- m_node.mempool->addUnchecked(CTxMemPoolEntry(ptx, 1000, 0, 1, false, 4, lp));
+ m_node.mempool->addUnchecked(CTxMemPoolEntry(ptx, /*fee=*/(total_in - num_outputs * amount_per_output),
+ /*time=*/0, /*entry_height=*/1,
+ /*spends_coinbase=*/false, /*sigops_cost=*/4, lp));
}
--num_transactions;
}
Yes apologies @jamesob, I noticed I was making the tx fee random but still setting the mempool entry fee to 1000. Thanks for such a quick review! |
auto testing_setup{MakeNoLogFileContext<TestChain100Setup>()}; | ||
testing_setup->PopulateMempool(det_rand, /*num_transactions=*/1000, /*submit=*/true); | ||
node::BlockAssembler::Options assembler_options; | ||
assembler_options.test_block_validity = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in disabling it? Sure, the benchmark will be less "micro", but at the same time it would be missing if a "package" block is slowing down CNB via TBV. Or is taking TBV a lot longer than the remainder of CNB, which would make this benchmark ineffective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason is TBV taking time, another is TBV would fail because PopulateMempool()
doesn't sign the txns and spends immature coinbases. I can change it to generate keys and mine 100 blocks in between creating and submitting the transactions if people want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer keeping TBV disabled here. Block validation is important to benchmark too, but since this scales linearly with the amount of transactions selected - I think the current AssembleBlock
benchmark suffices there? Narrower benchmarks help more easily pinpoint where performance issues are.
I saw in 5021832 you change |
It's an inconsistent order of acquiring mutexes, which along with other things can cause deadlock. I missed it when first implementing |
Ok I'll try compiling with |
Tested ACK b053472 this is what I get when running the Specs:
Without --sanity-check:
Specs:
Without --sanity-check:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b053472 . Tested / bench'ed correctly
Tested ACK b053472 left some test results here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK.
I've executed the bench successfully; compiled with --enable-debug
.
This allows us to both manually manipulate options and grab values from ArgsManager (i.e. -blockmaxweight and -blockmintxfee config options) when constructing BlockAssembler::Options. Prior to this change, the only way to apply the config options is by ctoring BlockAssembler with no options, which calls DefaultOptions().
Allows us to test BlockAssembler on transactions without signatures or mature coinbases (which is what PopulateMempool creates). Also means that `TestBlockValidity()` is not included in the bench timing.
This makes the contents of the mempool more realistic and iterating by ancestor feerate order more meaningful. If transactions have varying feerates, it's also more likely that packages will need to be updated during block template assembly.
The current BlockAssembler bench only tests on a mempool where all transactions have 0 ancestors or descendants, which does not exercise any of the package-handling logic in BlockAssembler
b053472
to
0452805
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 0452805
I think this comment can be resolved in a follow-up if preferable but I'd prefer the gArgs
parameter be renamed in this PR already since it's quite confusing.
@@ -72,18 +74,22 @@ BlockAssembler::BlockAssembler(Chainstate& chainstate, const CTxMemPool* mempool | |||
nBlockMaxWeight = std::max<size_t>(4000, std::min<size_t>(MAX_BLOCK_WEIGHT - 4000, options.nBlockMaxWeight)); | |||
} | |||
|
|||
static BlockAssembler::Options DefaultOptions() | |||
void ApplyArgsManOptions(const ArgsManager& gArgs, BlockAssembler::Options& options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void ApplyArgsManOptions(const ArgsManager& gArgs, BlockAssembler::Options& options) | |
void ApplyArgsManOptions(const ArgsManager& args, BlockAssembler::Options& options) |
auto testing_setup{MakeNoLogFileContext<TestChain100Setup>()}; | ||
testing_setup->PopulateMempool(det_rand, /*num_transactions=*/1000, /*submit=*/true); | ||
node::BlockAssembler::Options assembler_options; | ||
assembler_options.test_block_validity = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer keeping TBV disabled here. Block validation is important to benchmark too, but since this scales linearly with the amount of transactions selected - I think the current AssembleBlock
benchmark suffices there? Narrower benchmarks help more easily pinpoint where performance issues are.
src/bench/block_assemble.cpp
Outdated
}); | ||
} | ||
|
||
BENCHMARK(AssembleBlock, benchmark::PriorityLevel::HIGH); | ||
BENCHMARK(BlockAssemblerAddPackageTxns, benchmark::PriorityLevel::HIGH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, we had medium in the initial PR but it was removed later on, so agreed with changing it to low. We haven't defined any guidelines on when to downgrade to low afaik, but since devs run make check
often, I think adding benches with meaningful (e.g. >1s?) --sanity-check
overhead should run in low priority - unless their complexity/importance warrants otherwise.
Tested ACK 0452805 Specs: mac book pro 2019
|
ACK 0452805 |
Follow-ups in #26883 |
6a5e88e miner: don't re-apply default Options value if argument is unset (stickies-v) ea72c3d refactor: avoid duplicating BlockAssembler::Options members (stickies-v) cba749a refactor: rename local gArgs to args (stickies-v) Pull request description: Two follow-ups for #26695, both refactoring and no observed (*) behaviour change: - Rename `gArgs` to `args` because it's not actually a global - Add `BlockAssembler::Options` as a (private) member to `BlockAssembler` to avoid having to assign all the options individually, essentially duplicating them Reduces LoC and makes the code more readable, in my opinion. --- (*) as [pointed out by ajtowns](#26883 (comment)), this PR changes the interface of `ApplyArgsManOptions()`, making this not a pure refactoring PR. In practice, `ApplyArgsManOptions()` is never called in such a way that this leads to observed behaviour change. Regardless, I've carved out the potential behaviour change into a separate commit and would be okay with dropping it, should it turn out to be controversial. ACKs for top commit: glozow: ACK 6a5e88e TheCharlatan: Light code review ACK 6a5e88e Tree-SHA512: 15c30442ff0e070b1a58dc4c9615550d619ce35b4a2596b2c0a9d790259bbf987cab708f7cbb1057a8cf8b4c3226f3ad981282d3499ac442094806492a5f68ce
Performance of block template building matters as miners likely want to be able to start mining on a block with transactions asap after a block is found. We would want to know if a mempool PR accidentally caused, for example, a 100x slowdown. An
AssembleBlock()
bench exists, but it operates on a mempool with 101 transactions, each with 0 ancestors or descendants and with the same fee. Adding a bench with a more complex mempool is useful because (1) it's more realistic (2) updating packages can potentially cause the algorithm to take a long time.