8000 Peers erroneously removed during playback · Issue #5135 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Peers erroneously removed during playback #5135
Open
@SpicyLemon

Description

@SpicyLemon

Bug Report

Setup

CometBFT version: v0.38.17

Have you tried the latest version: no (but the code is the same as main at 8583452)

ABCI app: https://github.com/provenance-io/provenance

Environment:

  • OS: Mac and Ubuntu
  • Install tools: ??
  • Others: ??

node command runtime flags: ??

Config

N/A

What happened?

Background: To start up a new node, our common practice is to copy the data directory of another node, then start the new node with that and let it catch back up. That data directory might be up to a day old, though, which means its got up to 20,000 blocks to replay before being back in the game.

The problem we have is that our testnet (and mainnet sometimes too) has a lot of consecutive blocks without any Txs. There isn't very much data in those blocks, though, and there's a decent amount of overhead to process the empty block which results in a low "data rate". It regularly gets "slow" enough that the peer gets stopped because "peer is not sending us data fast enough". It then just sits there doing nothing for a minute or two before it decides to try again. Then it quickly runs through a chunk of blocks until this bug gets triggered again.

It's so bad that it's sometimes impossible for a node to ever catch up; new blocks are being created faster than the node is getting the old ones (because it spends a lot of time doing nothing).

Here's the code chunk where the error is produced:

if curRate != 0 && curRate < minRecvRate {
err := errors.New("peer is not sending us data fast enough")
pool.sendError(err, peer.id)
pool.Logger.Error("SendTimeout", "peer", peer.id,
"reason", err,
"curRate", fmt.Sprintf("%d KB/s", curRate/1024),
"minRate", fmt.Sprintf("%d KB/s", minRecvRate/1024))
peer.didTimeout = true
}

What did you expect to happen?

The node continues to receive and process blocks from peers until its caught up again.

How to reproduce it

Unfortunately, I don't have an easy way to reproduce this other than to grab an older quicksync (zipped data directory) and fire up a node.

Logs

Empty lines added for emphasis:

{"level":"info","module":"state","height":23960149,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960149,"app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960149,"block_app_hash":"695B48042721CC0CDB99085BAB1449A91276DADDE22EC12EB657EB7E0263EA75","time":"2025-05-21T12:23:16-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960150,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960150,"app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960150,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960151,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960151,"app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960151,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960152,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960152,"app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960152,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960153,"num_txs_res":0,"num_val_updates":0,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960153,"app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960153,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960154,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960154,"app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960154,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"committed state"}

{"level":"error","module":"blocksync","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4","reason":"peer is not sending us data fast enough","curRate":"127 KB/s","minRate":"128 KB/s","time":"2025-05-21T12:23:17-06:00","message":"SendTimeout"}

{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"error with peer fb4633e337486f9931b152f0ff254ccee6c0fce4: peer is not sending us data fast enough","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping MConnection service","impl":"MConn{[2a02:c206:2076:7284::2]:26656}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"read tcp [2600:6c67:617f:4226:b44e:7ae4:e7bc:97f5]:60511->[2a02:c206:2076:7284::2]:26656: use of closed network connection","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping Peer service","impl":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","error":"already stopped","time":"2025-05-21T12:23:17-06:00","message":"error while stopping peer"}

It was receiving blocks at about 5 per second.

dump_consensus_state output

N/A

Anything else we need to know

I've had others complain about this recently too recently which is why I'm now getting around to opening this issue. I've also seen folks with other chains complain about the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds-triageThis issue/PR has not yet been triaged by the team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0