8000 Node starting from BlockSync may never catch up to latest height · Issue #3398 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Node starting from BlockSync may never catch up to latest height #3398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hvanz opened this issue Jul 2, 2024 · 4 comments
Open

Node starting from BlockSync may never catch up to latest height #3398

hvanz opened this issue Jul 2, 2024 · 4 comments

Comments

@hvanz
Copy link
Member
hvanz commented Jul 2, 2024

Summary

A node starting from BlockSync may never reach the latest height. We have observed in e2e testnets that, at the end of BlockSync when the node switches to consensus, the node is still lagging by 2 or 3 blocks. Simultaneously the mempool is enabled and starts receiving a flood of transactions from its peers, while consensus is still trying to catch up.

What happens?

At the end of BlockSync we have the following scenario:

  • The node is still not at the latest height. Other nodes were progressing adding new blocks when this node was running BlockSync.
  • The consensus and mempool reactors are enabled simultaneously: mempool does EnableInOutTxs, and consensus, SwitchToConsensus.
  • The mempool reactor immediately starts receiving and disseminating transactions. Its peers flood the node with all the transactions they have in their mempools. This happens even when nodes are not supposed to send transactions to peers that are lagging (see below).
  • Our node won't be able to use these transactions to propose blocks until it reaches the latest height. In the meantime the only thing it can do is to validate them (at its current height), and disseminate them, while also rechecking them.

Example

In these pictures we see a e2e testnet with node validator05 starting at height 10, and node full01 starting at height 30 with StateSync enabled. They are able to catch up only after the tx load finishes and their mempools are empty. Note that in this testnet we inject a constant load of 2 tx/s, with each tx having 1kb. In real world scenarios, the load is not constant, which can give time to the node to catch up faster.

Screenshot 2024-06-28 at 17 49 56

This is the manifest file of the testnet. In particular, check_tx_delay is set to a high value (150ms) to be able to reproduce the failing scenario consistently.

# In this testnet, the nodes validator05 and full01 start late, perform BlockSync, and then never
# catch up to the latest height. The `check_tx_delay` setting is set to a high value to be able to
# reproduce the scenario consistently.

abci_protocol = "tcp"
prepare_proposal_delay = "200ms"
process_proposal_delay = "200ms"
check_tx_delay = "150ms"
vote_extension_delay = "100ms"
finalize_block_delay = "500ms"
prometheus = true
vote_extension_size = 2048
vote_extensions_enable_height = 11
pbts_enable_height = 11

[validators]
  validator01 = 83
  validator02 = 46
  validator03 = 50
  validator04 = 36
  validator05 = 70

[node]
  [node.full01]
    mode = "full"
    persistent_peers = ["validator04", "validator01"]
    privval_protocol = "tcp"
    start_at = 30
    state_sync = true
    snapshot_interval = 3
  [node.validator01]
    mode = "validator"
    privval_protocol = "tcp"
    persist_interval = 1
    snapshot_interval = 3
  [node.validator02]
    mode = "validator"
    persistent_peers = ["validator01"]
    privval_protocol = "tcp"
    persist_interval = 5
    snapshot_interval = 3
  [node.validator03]
    mode = "validator"
    persistent_peers = ["validator01"]
    privval_protocol = "tcp"
  [node.validator04]
    mode = "validator"
    persistent_peers = ["validator01", "validator02"]
    privval_protocol = "tcp"
    snapshot_interval = 3
  [node.validator05]
    mode = "validator"
    persistent_peers = ["validator04"]
    privval_protocol = "tcp"
    start_at = 10
    persist_interval = 5
    snapshot_interval = 3
    retain_blocks = 56

Do not send txs to lagging nodes

Nodes are not supposed to send transactions to peers that are lagging because of this condition before sending the transaction:

if peerState.GetHeight() < memTx.Height()-1 {
    time.Sleep(PeerCatchupSleepIntervalMS * time.Millisecond)
    continue
}

Here, peerState.GetHeight() is the height that the peer has of our node, and memTx.Height() is the height at which the transaction was added to the peer's mempool (not necessarily the current height of the peer).
For example, this is a scenario observed in the testnet:

  • The peer is at height 9 and adds to the mempool a transaction tx.
  • Our node is at height 0 while doing BlockSync, so the peer doesn't send any transaction.
  • Our node finishes BlockSync and it's at height 10, while the peer is now at height 12.
  • On the peer, the condition peerState.GetHeight() < memTx.Height()-1 equivalent to 10 < 9-1 becomes false, so it sends the transaction, even though our node is still catching up.

Possible solutions

I found that a simple solution is just to enable the mempool reactor a bit later than consensus. We see on these metrics that with 5 seconds delay to start the mempool, the nodes catch up pretty fast.

Screenshot 2024-07-02 at 09 01 06

The ideal solution would be to start the mempool when the node is at the latest height (minus one). And then go back to BlockSync when the node is lagging (see #3372). The latest height could be defined as a function of the node's state, for instance, when 1/3+ of stake is at some height.

@hvanz hvanz added this to CometBFT Jul 2, 2024
@github-project-automation github-project-automation bot moved this to Todo in CometBFT Jul 2, 2024
@hvanz hvanz changed the title Node starting from BlockSync may never caught up to latest height Node starting from BlockSync may never catch up to latest height Jul 2, 2024
@cason
Copy link
Contributor
cason commented Jul 2, 2024

Do not send txs to lagging nodes

Another solution is to fix this logic. I thought we compared the current height of the node with the peer's last known height.

We should have the latest height of a node from the Update() call to the mempool.

With this solution, the peer only starts receiving transactions when it is caught-up, the cost of which is having some slow nodes never receiving transactions...

@cason
Copy link
Contributor
cason commented Jul 2, 2024

Another solution is to prevent the flood of mempool messages to block consensus and everything else. This has been extensively documented, e.g. #3250, #2685, etc.

@cason
Copy link
Contributor
cason commented Jul 2, 2024

In other words, instead of delaying the start of the mempool reactor for an arbitrary amount of time, make it to reject transactions if it is not able to keep up with the incoming txs load...

@cason
Copy link
Contributor
cason commented Jul 2, 2024

See also #2925 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Todo
Development

No branches or pull requests

2 participants
0