Description
A multiplex connection instance, used to connect peers at p2p level, is equipped with two rate limiters (flow.Monitor
), intended to enforce a maximum send and receive rates between two nodes. The maximum send and receive rates are configured by the p2p.send_rate
and p2p.recv_rate
parameters, with the same default value of 5120000
, i.e., 5000 KB/s.
During regular operation and under operational load, the rate limiters shouldn't affect the operation of the connections. They essentially compute the instant send and receive rates, attest that they are under the configured limits and immediately return. But once the load applied on the sender side, namely the number and size of messages enqueued to be sent to the peer, increases and provided that the underlying connection has available bandwidth to support that load, the rate limiters start to impact the operation. They do that essentially by sleeping for some amount of time, so that, by not sending any bytes during that period, the instant rates drop onto levels below the configured maximum rates. Whenever one of these rate limiters sleep, they block the sending or receiving routines, meaning that no data is exchanged by the peers. The undesirable result of which is that all messages that are being sent will be artificially delayed by the amount of time that the rate limiters sleep.
The issue that me and mostly @sergio-mena have identified is that there is some asymmetry on the use of the send and receive rate limiters. This means that given the same (or maybe even similar) observed instant load they are likely to allow a different amount of traffic (bytes) to be written to the underlying channel, in the case of the send limiter, and be read from the underlying channel, in the case of the receive limiter. In short, as this needs to be measured in more details, the sending side can send a batch of messages once allowed by the rate limiter, while the receiving side is stricter, receiving a single packet per call to the rate limiter. Being a little more precise, the send routine asks to the rate limiter to send X
bytes and may end up sending Y > X
bytes to the channel (potentially, in the worst case, Y = 10X
), while the receive routine asks to the rate limiter to receive X
bytes and reads at most X
bytes from the underlying channel. Notice that in both cases the exact amount of bytes read or written from or to the channel is a posteri provided to the rate limiters so that they can update their measured instant rates.
This asymmetry on the use of both rate limiters was not recently introduced, this code and the flow.Monitor
code are more than 5 years old and haven't been really changed since then. This means that this issue is potentially observed in all versions of CometBFT that we currently support. This also means that the effects of this asymmetry are not evident under ordinary load, but can be one explanation of the instability observed in CometBFT when subjected to high loads (in particular, in the mempool and block parts traffic). The issue also does not have a simple fix and the full understanding of what is really happening is a pending task, that potentially will require creating tools to investigate p2p channels in isolation (i.e., using only experimental reactors).
But we can attest that this asymmetry is indeed a source of performance instability and issues because there is a workaround that does not fix the issue, but at least reduces the observed instability. In fact, @sergio-mena has run some experiments where the configured maximum receive rate (p2p.recv_rate
) is much higher than the configured maximum send rate (p2p.send_rate
) and we have observed a much less unstable behavior in our testing setup. Thus, while we don't fully understand and fix this issue, the recommendation is to set p2p.recv_rate
to a larger value than p2p.send_rate
. In our testbed, we used a 10x larger value.
Associated issues
The following issues are associated with the investigation and possible solution for this problem: