8000 mempool: Add push-pull gossip protocol (CAT) by hvanz · Pull Request #1472 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

mempool: Add push-pull gossip protocol (CAT) #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 31 commits into from
Closed

Conversation

hvanz
Copy link
Member
@hvanz hvanz commented Oct 11, 2023

Closes: #2027.

Relates to #1058.

The CAT (for Content-Addressable Transaction) pool is a gossip protocol for the mempool originally implemented by Celestia. CAT is a push-pull protocol in contrast to CometBFT's default push protocol.

The code in this PR was ported from Celestia's feature/cat branch. The original protocol is built on top of the priority mempool implementation (aka v1), which existed in CometBFT until v0.37. The current code was ported on top of CometBFT's default mempool implementation (CListMempool), so we had to make to some changes to adapt it to the different underlying implementation.


PR checklist

  • Tests written/updated
  • Changelog entry added in .changelog (we use unclog to manage our changelog)
  • Updated relevant documentation (docs/ or spec/) and code comments

@hvanz hvanz added the mempool label Oct 11, 2023
@hvanz hvanz added this to the 2023-Q4 milestone Oct 11, 2023
@hvanz hvanz self-assigned this Oct 13, 2023
@hvanz
Copy link
Member Author
hvanz commented Oct 16, 2023

These are the results of some preliminary experiments run on a laptop using the e2e framework. For more thorough results we would still need to run these experiments with around 200 nodes in the cloud, as we do for the QA tests, probably also with a different network topology.

Here, each experiment instance has:

  • 8 nodes, in a complete graph topology, that is, each node is connected to all the others. Note that this topology is where a CAT mempool should perform better and where the default mempool would perform worst.
  • The first node receives all the transaction load.
  • Each run lasts 4 minutes.

Instances are defined by the permutation of transaction load rate (r) in tx/s and transaction size (s) in bytes, with the following values:

  • r = [100, 200, 400, 800]
  • s = [256, 512, 1024, 2048]

For each (r,s) value, we run two consecutive instances: one with the CAT mempool and then one with the default mempool, which we call Flood, as a baseline for comparison.

Screenshot 2023-10-13 at 23 31 07 Screenshot 2023-10-17 at 11 17 11

We can see in the above graphs that:

  • Bandwidth consumption (bytes sent and received) of CAT is always lower than Flood, with Flood being 3 to 6 times larger.
  • The chain height, chain size, and block size are, in each instance, almost the same value for CAT and Flood, meaning that CAT does not miss transactions when compared to Flood, or that all transactions sent to the nodes are eventually included in the chain. For big transactions, the chain size of Flood is lower than CAT. This is expected as CAT is supposed to work better with large transactions.
  • Remember that the saturation point defined in the last QA experiments is r=400, s=1024, meaning that the performance of a node is degraded for that value or bigger. This can be seen in the mempool size metrics, where for higher values the nodes become unstable.
  • The metric 'Already received txs' counts the number of times a received transaction is already in the mempool cache. In all CAT instances, this metric is zero: the node receiving transactions from the client and pushing to the other nodes does not receive the transactions back from them. And the other nodes receive less duplicated transactions than with Flood.

@hvanz
Copy link
Member Author
hvanz commented Oct 16, 2023

These are the same experiments as above but with the following topology, which is more realistic, where node 1 receives all the load:
Screenshot 2023-10-16 at 17 28 29

Screenshot 2023-10-16 at 17 39 39 Screenshot 2023-10-17 at 11 04 32

These graphs look similar to the graphs of the above experiment. The main difference is that now the bandwidth of Flood is 2 times higher than that of CAT, instead of 3 to 6 times higher. This was expected, as now the push-pull protocol needs to transmit the transaction in various steps from node 1 to node 8.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Oct 28, 2023
@hvanz hvanz added wip Work in progress and removed stale For use by stalebot labels Oct 31, 2023
@faddat
Copy link
Contributor
faddat commented Jan 5, 2024

but I must kindly ask that we backport this.

It is badly needed.

@adizere
Copy link
Member
adizere commented Jan 5, 2024

but I must kindly ask that we backport this.

It is badly needed.

Agree. Thanks Jacob!

Is there a specific app-chain team that is waiting on it? If so, my bad, I was not aware!

Even more importantly, are Celestia mainnet nodes employing the CAT mempool? Our last chat with their team (Oct/Nov) we agreed we'd push CAT over the finish line if they give us the green light they will use it in their mainnet, and the Comet+Celestia teams will do complementary testing of CAT. But I have not re-checked since then so my info is very stale. We should get an update. cc @cmwaters

@adizere
Copy link
Member
adizere commented Jan 8, 2024

@hvanz should we deprecate this PR in favor of #1971 ?

@adizere adizere modified the milestones: 2023-Q4, 2024-Q2 Jan 8, 2024
@hvanz
Copy link
Member Author
hvanz commented Jan 9, 2024

@hvanz should we deprecate this PR in favor of #1971 ?

I think there's no need to. This PR is now up-to-date with main (and it contains all experiment results).

@adizere
Copy link
Member
adizere commented Jan 9, 2024

Then deprecate 1971 in favor of present PR? Not sure 1971 still has anything worth keeping, let me know if so. cc @faddat

BTW thanks for bringing the present PR up-to-date with main Hernán !

@faddat
Copy link
Contributor
faddat commented Jan 9, 2024

Big thanks from me too.

😍

I'll gladly close the other.

@adizere adizere linked an issue Jan 12, 2024 that may be closed by this pull request
@adizere adizere removed this from the 2024-Q2 milestone Jan 12, 2024
@adizere
Copy link
Member
adizere commented Jun 19, 2024

For the moment deprioritized in favor of #3297

@adizere adizere closed this Jun 19, 2024
@zrbecker zrbecker deleted the experimental/cat branch February 7, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mempool wip Work in progress
Projects
No open projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Support for the CAT (push-pull gossip) mempool
4 participants
0