-
Notifications
You must be signed in to change notification settings - Fork 636
Make reactor.onReceive not block unmarshalling more packets from peer #3199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I propose a two-PR short-term fix, and then decide on a longer term API breaking fix later. PR #1: First notice that the key property is that we only need in-order delivery per peer<>channel pair, but we have no in-order delivery guarantees across channels. So we can achieve this by making every PR #2: Make each of the main reactors have an incoming buffer, and a separate goroutine that proccesses the inbound. This would eventually be an API break on reactor, but short term should drop these incoming costs notably. @evan-forbes has a change for this that he's already been testing (Optional) PR #3: Change the |
For context, before tendermint/tendermint#9622, reactors received raw messages (bytes), not protobuf messages decoded from the raw messages. This allows producing per-message-type p2p metrics and saves some coding on the reactors implementation. But this of course introduced unmarshalling to the critical path of the p2p layer. And this is the main problem here. |
This probably means one go routine per channel per peer. This is a lot to handle, I am afraid that the overhead here might not pay the possible benefits. |
Uh oh!
There was an error while loading. Please reload this page.
Feature Request
Summary
We want to be able to ingest packets from peers faster. There is some evidence (cc @evan-forbes) that we have blocking behavior on our ability to ingest data from peers. But as my explanation goes on, it should become clear based on intuition this is a problem.
Here is a profile from osmosis mainnet taken last night over one hour, from recvRoutine:

p2p.createMConnection.func1 is
reactor.onReceive
.The flow we currently have is:
The issue is reactor.OnReceive blocks reading and proto-decoding more data for all channels to that peer. Some messages, e.g. IBC txs, take over 5ms. This means that if a peer gossips you an IBC tx, it will take at least 5ms before you even attempt to decode the subsequent packets they send you. (Leading to IBC enabled-DOS', amongst many others)
We see under low-load, CheckTx is already dominant item here.
Another problem is some consensus packets block on the cs mutex, which will be locked during block execution and vote processing.
Its clear we need to split these into different processes.
Proposal
If we need to guarantee in-order delivery across each channel, then I think the answers are:
If we don't need to guarantee in order delivery, and just achieve it in ~almost every case, then we can just call this function in a goroutine: https://github.com/cometbft/cometbft/blob/main/p2p/conn/connection.go#L671
The text was updated successfully, but these errors were encountered: