p2p: forever connected to lazy peer with no progress

As I was implementing the same geo peering feature, I started by peering to a single outbound peer to test that my feature was working as intended. With same geo peering, the pool of peers you fetch is much smaller. There was one peer that, any time I would connect to it, I would be stuck in the ensurePeers 30s check indefinitely. It seems there is no check to determine if a peer is "lazy", it only checks how many peers we are connected to. Just as a proof of concept I added the following logic branch to ensurePeers:

https://github.com/osmosis-labs/cometbft/blob/2c8bf088f89f6374a8eae3124060eabb4536ba60/p2p/pex/pex_reactor.go#L464-L469

When I connected to the lazy peer, we eventually disconnected, connected to a new peer, and began syncing. Obviously part of the solution here is "don't connect to only one peer", but we shouldn't be wasting peer slots on peers that do absolutely nothing in the first place.

Would like to brainstorm here on what the proper solution is. In my mind, what I did is close to what we want, but I think this Idle check should only be done on blocksync right? As in, during block sync, we really expect all the peers we are connected to to be sending us data. However, (this could be a wrong assumption), when we are caught up to the head, it would be acceptable for a peer to send us nothing for 30s, yet we would still want to be connected to them.

Please let me know your thoughts and I will be happy to upstream the desired solution.

As a side note, this does happen on the non geo peering branch, its just less likely to connect to a dead peer given the pool is every country rather than just the country the node is in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions