Make sure that reactors' implementation of `Receive()` method are non-blocking #2685

cason · 2024-03-28T07:35:10Z

Motivated by #2533.

The Reactor interface defines methods that a component must implement in order in order to use the communication services provided by the p2p layer. A core method that a reactor has to be implement is Receive(), invoked by the p2p layer to deliver a message to the reactor. The implementation of this method should be non-blocking (see details here).

If the implementation of Receive() in some reactor blocks when processing a message received from a given peer, the receive routine (recvRoutine()) of the p2p connection with that peer also blocks until the Receive() call returns:

cometbft/p2p/conn/connection.go

Lines 654 to 655 in 0147e63

    
           // NOTE: This means the reactor.Receive runs in the same thread as the p2p recv routine 
        
           c.onReceive(channelID, msgBytes)

This scenario has some consequences:

Messages received from the same peer are not handled by the p2p connection. This means that other reactors are affected, since they will not receive new messages until the blocking Receive() call returns;
As described in feat: pause peer timeouts during longer block execution #2533, ping/pong messages, that play the role of protocol-level keepalive messages, are not handled either. As a result, the p2p connection pong timeout expires, then the connection produces an error and quits, which leads the switch to disconnect from that peer.

Possible solutions

This situation could be avoided if the Receive() call was not executed by the receive routine of the p2p connection. This change, however, is far from trivial, given the design of the p2p multiples connections. Moreover, even if the Receive() call is executed in a different routine, this other routine will eventually block if the Receive() method of a reactor blocks for a long period of time.

The other approach, suggested in this issue, is to review the implementation of each reactor to make sure that the Receive() method is not blocking. This approach is also not trivial, as ultimately the way to prevent this is to buffer received messages that cannot be immediately processed. Since buffers are always finite in size, this would eventually lead to dropping messages.

The text was updated successfully, but these errors were encountered:

evan-forbes · 2024-06-03T07:50:52Z

one simple solution that we played around with was

https://github.com/celestiaorg/celestia-core/blob/146746b114624029432720cfa74e0a4a55a87b9b/p2p/base_reactor.go#L98-L105

the envelop buffers have to be quite large in order not to block for the consensus reactor, at least when blocks are large. As one might expect, most of messages that end up blocking are small messages, not larger messages such as block parts. The most common blocking channels are consensus state and votes, then block parts, at least from some recent experiments.

We're collecting some tcp packet traces soon, but when traced on a local machine we can see that the reason we're not able to utilize all of the bandwidth is simply because we are unable to empty the tcp buffers fast enough. This also explains why other optimizations (like not using a mempool) or changes to prioritization don't make any meaningful change to throughput, at least for large blocks.

One reason, but not the only reason, we block is because all incoming messages from all peers are effectively processed synchronously in the consensus reactor. This also explains the age old issue when connecting more peers reduces throughput. While not frequent, this can result in processing a vote, block part, or state message taking up to 700ms (!). Below is the graph of a particularly egregious example where we see max waits of 2.5seconds. (time taken to process a msg after receiving it in ms on y axis, channel on the x axis, max, average w/ stdev bars, and then the number of msgs for that channel that took over 100ms)

Another is because we are not buffering the tcp connection properly. For example when we increase this constant, we see a meaningful but modest increase in throughput. I'm still working through the like 5 io.Reader/io.Writer buffered and unbuffered wrappers around the tcp connections. There are so many io.Reader/io.Writer wrapper around the tcp connections, its difficult to grok which need buffers and which actually degrade performance when we increase them.

cason · 2024-12-31T12:52:58Z

The evidence reactor is particularly impacted by this issue.

Processing an evidence of misbehavior is a very costly procedure, which is mostly implemented by the (blocking) Receive() method. This is probably the main reason for the nightly failures currently observed in the main branch, such as https://github.com/cometbft/cometbft/actions/runs/12522933974/job/35000943900.

For context, from the log produced by this experiment:

$ du -s 6-full.txt 6-evidence.txt 6-no-evidence.txt                                                                    
5343232	6-full.txt
5290312	6-evidence.txt
53376	6-no-evidence.txt
$ grep 'INF Evidence already pending, ignoring this one' 6-evidence.txt | wc -l
 1075393
$ wc -l 6-evidence.txt                                                                                   
 2166384

Namely, most of the 2.5GB of logs were produced by the evidence module (filter was a grep 'module=evidence'). And about half of the entries are duplicated evidences, so completely useless work.

Previous analysis and the comprehensive work performed in the mempool reactor appear to demonstrate that its Receive() method is not really blocking. There are other problems associated with it, under high load, but not the blocking of this method.

In the case of the evidence reactor the problem appears to be more evident, but unfortunately, contrarily to the mempool reactor, we don't have metrics and detailed information regarding how much time the evidence handling are blocking the p2p channels.

cason added bug Something isn't working enhancement New feature or request p2p labels Mar 28, 2024

cason added this to CometBFT Mar 28, 2024

github-project-automation bot moved this to Todo in CometBFT Mar 28, 2024

cason mentioned this issue 8000 Mar 28, 2024

feat: pause peer timeouts during longer block execution #2533

Closed

cason mentioned this issue May 10, 2024

[Tracking issue] p2p connection optimizations #3053

Open

24 tasks

evan-forbes mentioned this issue Jun 10, 2024

feat: don't block on receives #3230

Closed

4 tasks

cason self-assigned this Jun 11, 2024

cason removed the bug Something isn't working label Jun 11, 2024

cason mentioned this issue Jun 12, 2024

p2p: connection multiplexing does not work as expected on the receiving side #3250

Open

This was referenced Jun 27, 2024

Mempool Rechecking all txs blocks consensus #2925

Open

Node starting from BlockSync may never catch up to latest height #3398

Open

This was referenced Dec 18, 2024

Mitigate blocking cometbft reactors impact on p2p connectivity Agoric/agoric-sdk#10742

Closed

feat(p2p): avoid blocking ping/pong; feed msgs through a limited buffer to a separate goroutine agoric-labs/cometbft#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make sure that reactors' implementation of `Receive()` method are non-blocking #2685

Make sure that reactors' implementation of `Receive()` method are non-blocking #2685

Uh oh!

Uh oh!

Make sure that reactors' implementation of Receive() method are non-blocking #2685

Make sure that reactors' implementation of Receive() method are non-blocking #2685

Comments

Possible solutions

Uh oh!

Uh oh!

Uh oh!

Make sure that reactors' implementation of `Receive()` method are non-blocking #2685

Make sure that reactors' implementation of `Receive()` method are non-blocking #2685