Description
Bug Report
Setup
CometBFT version: v0.38.17
Have you tried the latest version: no (but the code is the same as main
at 8583452
)
ABCI app: https://github.com/provenance-io/provenance
Environment:
- OS: Mac and Ubuntu
- Install tools: ??
- Others: ??
node command runtime flags: ??
Config
N/A
What happened?
Background: To start up a new node, our common practice is to copy the data directory of another node, then start the new node with that and let it catch back up. That data directory might be up to a day old, though, which means its got up to 20,000 blocks to replay before being back in the game.
The problem we have is that our testnet (and mainnet sometimes too) has a lot of consecutive blocks without any Txs. There isn't very much data in those blocks, though, and there's a decent amount of overhead to process the empty block which results in a low "data rate". It regularly gets "slow" enough that the peer gets stopped because "peer is not sending us data fast enough
". It then just sits there doing nothing for a minute or two before it decides to try again. Then it quickly runs through a chunk of blocks until this bug gets triggered again.
It's so bad that it's sometimes impossible for a node to ever catch up; new blocks are being created faster than the node is getting the old ones (because it spends a lot of time doing nothing).
Here's the code chunk where the error is produced:
cometbft/internal/blocksync/pool.go
Lines 161 to 169 in 8583452
What did you expect to happen?
The node continues to receive and process blocks from peers until its caught up again.
How to reproduce it
Unfortunately, I don't have an easy way to reproduce this other than to grab an older quicksync (zipped data directory) and fire up a node.
Logs
Empty lines added for emphasis:
{"level":"info","module":"state","height":23960149,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960149,"app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960149,"block_app_hash":"695B48042721CC0CDB99085BAB1449A91276DADDE22EC12EB657EB7E0263EA75","time":"2025-05-21T12:23:16-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960150,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960150,"app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960150,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960151,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960151,"app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960151,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960152,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960152,"app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960152,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960153,"num_txs_res":0,"num_val_updates":0,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960153,"app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960153,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960154,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960154,"app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960154,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"error","module":"blocksync","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4","reason":"peer is not sending us data fast enough","curRate":"127 KB/s","minRate":"128 KB/s","time":"2025-05-21T12:23:17-06:00","message":"SendTimeout"}
{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"error with peer fb4633e337486f9931b152f0ff254ccee6c0fce4: peer is not sending us data fast enough","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping MConnection service","impl":"MConn{[2a02:c206:2076:7284::2]:26656}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"read tcp [2600:6c67:617f:4226:b44e:7ae4:e7bc:97f5]:60511->[2a02:c206:2076:7284::2]:26656: use of closed network connection","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping Peer service","impl":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","error":"already stopped","time":"2025-05-21T12:23:17-06:00","message":"error while stopping peer"}
It was receiving blocks at about 5 per second.
dump_consensus_state
output
N/A
Anything else we need to know
I've had others complain about this recently too recently which is why I'm now getting around to opening this issue. I've also seen folks with other chains complain about the issue.