You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: To start up a new node, our common practice is to copy the data directory of another node, then start the new node with that and let it catch back up. That data directory might be up to a day old, though, which means its got up to 20,000 blocks to replay before being back in the game.
The problem we have is that our testnet (and mainnet sometimes too) has a lot of consecutive blocks without any Txs. There isn't very much data in those blocks, though, and there's a decent amount of overhead to process the empty block which results in a low "data rate". It regularly gets "slow" enough that the peer gets stopped because "peer is not sending us data fast enough". It then just sits there doing nothing for a minute or two before it decides to try again. Then it quickly runs through a chunk of blocks until this bug gets triggered again.
It's so bad that it's sometimes impossible for a node to ever catch up; new blocks are being created faster than the node is getting the old ones (because it spends a lot of time doing nothing).
Here's the code chunk where the error is produced:
The node continues to receive and process blocks from peers until its caught up again.
How to reproduce it
Unfortunately, I don't have an easy way to reproduce this other than to grab an older quicksync (zipped data directory) and fire up a node.
Logs
Empty lines added for emphasis:
{"level":"info","module":"state","height":23960149,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960149,"app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:16-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960149,"block_app_hash":"695B48042721CC0CDB99085BAB1449A91276DADDE22EC12EB657EB7E0263EA75","time":"2025-05-21T12:23:16-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960150,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level"
8000
span>:"info","module":"state","height":23960150,"app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960150,"block_app_hash":"043316A328C77FBD6B88FBE41E298EC1CA39A45FE18E8D3743B23E3E6A8B55A8","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960151,"num_txs_res":3,"num_val_updates":0,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960151,"app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960151,"block_app_hash":"D09257B0133D105B156F1141E5FE6BCDBD9D39167568BA142D39CA1DB8E65501","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960152,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960152,"app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960152,"block_app_hash":"B210D19A93E5E9185D70B1C00C80681DAAAA91088B9C2BAF35624BA400599D4A","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960153,"num_txs_res":0,"num_val_updates":0,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960153,"app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960153,"block_app_hash":"C8CA497558F584758B6DB7FC220248FD5D574D5CFEFBBA0D78A2C17ED6F1785C","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"info","module":"state","height":23960154,"num_txs_res":1,"num_val_updates":0,"block_app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"finalized block"}
{"level":"info","module":"state","height":23960154,"app_hash":"1DD34DC156CDB5BFF477268599EBB1E057756023EBCABE7C794CA8FB1FD32DD8","time":"2025-05-21T12:23:17-06:00","message":"executed block"}
{"level":"info","module":"state","height":23960154,"block_app_hash":"F1C536CFAE7F993A8B7257AEA97026C3C5816650186D0E33DB432E185F155F47","time":"2025-05-21T12:23:17-06:00","message":"committed state"}
{"level":"error","module":"blocksync","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4","reason":"peer is not sending us data fast enough","curRate":"127 KB/s","minRate":"128 KB/s","time":"2025-05-21T12:23:17-06:00","message":"SendTimeout"}
{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"error with peer fb4633e337486f9931b152f0ff254ccee6c0fce4: peer is not sending us data fast enough","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping MConnection service","impl":"MConn{[2a02:c206:2076:7284::2]:26656}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","peer":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","err":"read tcp [2600:6c67:617f:4226:b44e:7ae4:e7bc:97f5]:60511->[2a02:c206:2076:7284::2]:26656: use of closed network connection","time":"2025-05-21T12:23:17-06:00","message":"Stopping peer for error"}
{"level":"info","module":"p2p","peer":"fb4633e337486f9931b152f0ff254ccee6c0fce4@[2a02:c206:2076:7284::2]:26656","msg":"Stopping Peer service","impl":"Peer{MConn{[2a02:c206:2076:7284::2]:26656} fb4633e337486f9931b152f0ff254ccee6c0fce4 out}","time":"2025-05-21T12:23:17-06:00","message":"service stop"}
{"level":"error","module":"p2p","error":"already stopped","time":"2025-05-21T12:23:17-06:00","message":"error while stopping peer"}
It was receiving blocks at about 5 per second.
dump_consensus_state output
N/A
Anything else we need to know
I've had others complain about this recently too recently which is why I'm now getting around to opening this issue. I've also seen folks with other chains complain about the issue.
The text was updated successfully, but these errors were encountered:
Bug Report
Setup
CometBFT version:
v0.38.17
Have you tried the latest version: no (but the code is the same as
main
at8583452
)ABCI app: https://github.com/provenance-io/provenance
Environment:
node command runtime flags: ??
Config
N/A
What happened?
Background: To start up a new node, our common practice is to copy the data directory of another node, then start the new node with that and let it catch back up. That data directory might be up to a day old, though, which means its got up to 20,000 blocks to replay before being back in the game.
The problem we have is that our testnet (and mainnet sometimes too) has a lot of consecutive blocks without any Txs. There isn't very much data in those blocks, though, and there's a decent amount of overhead to process the empty block which results in a low "data rate". It regularly gets "slow" enough that the peer gets stopped because "
peer is not sending us data fast enough
". It then just sits there doing nothing for a minute or two before it decides to try again. Then it quickly runs through a chunk of blocks until this bug gets triggered again.It's so bad that it's sometimes impossible for a node to ever catch up; new blocks are being created faster than the node is getting the old ones (because it spends a lot of time doing nothing).
Here's the code chunk where the error is produced:
cometbft/internal/blocksync/pool.go
Lines 161 to 169 in 8583452
What did you expect to happen?
The node continues to receive and process blocks from peers until its caught up again.
How to reproduce it
Unfortunately, I don't have an easy way to reproduce this other than to grab an older quicksync (zipped data directory) and fire up a node.
Logs
Empty lines added for emphasis:
It was receiving blocks at about 5 per second.
dump_consensus_state
outputN/A
Anything else we need to know
I've had others complain about this recently too recently which is why I'm now getting around to opening this issue. I've also seen folks with other chains complain about the issue.
The text was updated successfully, but these errors were encountered: