8000 QA report for 0.37.x alpha3 by lasarojc · Pull Request #376 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

QA report for 0.37.x alpha3 #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/qa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,6 @@ The results obtained in each release are stored in their own directory.
The following releases have undergone the Quality Assurance process, and the corresponding reports include detailed information on tests and comparison with the baseline.

* [TM v0.34.x](./v034/TMCore.md) - Tested prior to releasing Tendermint Core v0.34.22.
* [v0.34.x](./v034/README.md) - Tested prior to releasing v0.34.27, using TM v0.34.x results as baseline.
* [v0.37.x](./v037/) - with TM v.34.x acting as a baseline
* [v0.34.x](./v034/CometBFT.md) - Tested prior to releasing v0.34.27, using TM v0.34.x results as baseline.
* [TM v0.37.x](./v037/TMCore.md) - Tested prior to releasing TM v0.37.x, using TM v0.34.x results as baseline.
* [v0.37.x](./v037/CometBFT.md) - Tested on CometBFT v0.37.0-alpha3, using TM v0.37.x results as baseline.
163 changes: 163 additions & 0 deletions docs/qa/v037/CometBFT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
---
order: 1
parent:
title: CometBFT Quality Assurance Results for v0.37.x
description: This is a report on the results obtained when running CometBFT v0.37.x on testnets
order: 2
---

# v0.37.x

This iteration of the QA was run on CometBFT `v0.37.0-alpha3`, the first `v0.37.x` version from the CometBFT repository.

The changes with respect to the baseline, `TM v0.37.x` as of Oct 12, 2022 (Commit: 1cf9d8e276afe8595cba960b51cd056514965fd1), include the rebranding of our fork of Tendermint Core to CometBFT and several improvements, described in the CometBFT [CHANGELOG](https://github.com/cometbft/cometbft/blob/v0.37.0-alpha.3/CHANGELOG.md).

## Testbed

As in other iterations of our QA process, we have used a 200-node network as testbed, plus nodes to introduce load and collect metrics.

### Saturation point

As in previous iterations, in our QA experiments, the system is subjected to a load slightly under a saturation point.
The method to identify the saturation point is explained [here](../v034/README.md#finding-the-saturation-point) and its application to the baseline is described [here](./TMCore.md#finding-the-saturation-point).
We use the same saturation point, that is, `c`, the number of connections created by the load runner process to the target node, is 2 and `r`, the rate or number of transactions issued per second, is 200.

## Examining latencies

The following figure plots six experiments carried out with the network.
Unique identifiers, UUID, for each execution are presented on top of each graph.

![latencies](./img/200nodes_cmt037/all_experiments.png)

We can see that the latencies follow comparable patterns across all experiments.
Therefore, in the following sections we will only present the results for one representative run, chosen randomly, with UUID starting with `75cb89a8`.

![latencies](./img/200nodes_cmt037/e_75cb89a8-f876-4698-82f3-8aaab0b361af.png).

For reference, the following figure shows the latencies of different configuration of the baseline.
`c=02 r=200` corresponds to the same configuration as in this experiment.

![all-latencies](./img/200nodes_tm037/v037_200node_latencies.png)

As can be seen, latencies are similar.

## Prometheus Metrics on the Chosen Experiment

This section further examines key metrics for this experiment extracted from Prometheus data regarding the chosen experiment.

### Mempool Size

The mempool size, a count of the number of transactions in the mempool, was shown to be stable and homogeneous at all full nodes.
It did not exhibit any unconstrained growth.
The plot below shows the evolution over time of the cumulative number of transactions inside all full nodes' mempools at a given time.

![mempoool-cumulative](./img/200nodes_cmt037/mempool_size.png)

The following picture shows the evolution of the average mempool size over all full nodes, which mostly oscilates between 1500 and 2000 outstanding transactions.

![mempool-avg](./img/200nodes_cmt037/avg_mempool_size.png)

The peaks observed coincide with the moments when some nodes reached round 1 of consensus (see below).


The behavior is similar to the observed in the baseline, presented next.

![mempool-cumulative-baseline](./img/200nodes_tm037/v037_r200c2_mempool_size.png)

![mempool-avg-baseline](./img/200nodes_tm037/v037_r200c2_mempool_size_avg.png)


### Peers

The number of peers was stable at all nodes.
It was higher for the seed nodes (around 140) than for the rest (between 16 and 78).
The red dashed line denotes the average value.

![peers](./img/200nodes_cmt037/peers.png)

Just as in the baseline, shown next, the fact that non-seed nodes reach more than 50 peers is due to [\#9548].

![peers](./img/200nodes_tm037/v037_r200c2_peers.png)


### Consensus Rounds per Height

Most heights took just one round, that is, round 0, but some nodes needed to advance to round 1 and eventually round 2.

![rounds](./img/200nodes_cmt037/rounds.png)

The following specific run of the baseline presented better results, only requiring up to round 1, but reaching higher rounds is not uncommon in the corresponding software version.

![rounds](./img/200nodes_tm037/v037_r200c2_rounds.png)

### Blocks Produced per Minute, Transactions Processed per Minute

The following plot shows the rate in which blocks were created, from the point of view of each node.
That is, it shows when each node learned that a new block had been agreed upon.

![heights](./img/200nodes_cmt037/block_rate.png)

For most of the time when load was being applied to the system, most of the nodes stayed around 20 to 25 blocks/minute.

The spike to more than 175 blocks/minute is due to a slow node catching up.

The collective spike on the right of the graph marks the end of the load injection, when blocks become smaller (empty) and impose less on the network.
This behavior is reflected in the following graph, which shows the number of transactions processed per minute.

![total-txs](./img/200nodes_cmt037/total_txs_rate.png)

The baseline experienced a similar behavior, shown in the following graphs, where the gradient of the curves show the rate of block and transactions per minute.

Over a period of 2 minutes, the height goes from 477 to 524.
This results in an average of 23.5 blocks produced per minute, 6D4E a rate similar to this experiment.

![heights-baseline](./img/200nodes_tm037/v037_r200c2_heights.png)

Over a period of 2 minutes, the total goes from 64525 to 100125 transactions,
resulting in 17800 transactions per minute. However, we can see in the plot that
all transactions in the load are process long before the two minutes.
If we adjust the time window when transactions are processed (approx. 90 seconds),
we obtain 23733 transactions per minute, again similar to this experiment.

![total-txs-baseline](./img/200nodes_tm037/v037_r200c2_total-txs.png)

### Memory Resident Set Size

The Resident Set Size of all monitored processes is plotted below, with maximum memory usage of 2GB.

![rss](./img/200nodes_cmt037/memory.png)

A similar behavior was shown in the baseline, presented next.

![rss](./img/200nodes_tm037/v037_r200c2_rss.png)

The memory of all processes went down as the load as removed, showing no signs of unconstrained growth.


#### CPU utilization

The best metric from Prometheus to gauge CPU utilization in a Unix machine is `load1`,
as it usually appears in the
[output of `top`](https://www.digitalocean.com/community/tutorials/load-average-in-linux).

It is contained below 5 on most nodes, as seen in the following graph.

![load1](./img/200nodes_cmt037/cpu.png)

A similar behavior was seen in the baseline.

![load1-baseline](./img/200nodes_tm037/v037_r200c2_load1.png)


## Test Results

The comparison against the baseline results show that both scenarios had similar numbers and are therefore equivalent.

A conclusion of these tests is shown in the following table, along with the commit versions used in the experiments.

| Scenario | Date | Version | Result |
|--|--|--|--|
|CometBFT | 2023-02-14 | v0.37.0-alpha3 (bef9a830e7ea7da30fa48f2cc236b1f465cc5833) | Pass


[\#9548]: https://github.com/tendermint/tendermint/issues/9548
40 changes: 20 additions & 20 deletions docs/qa/v037/README.md → docs/qa/v037/TMCore.md
< 10000 td class="blob-num blob-num-deletion empty-cell">
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
order: 1
parent:
title: CometBFT Quality Assurance Results for v0.37.x
description: This is a report on the results obtained when running v0.37.x on testnets
description: This is a report on the results obtained when running TM v0.37.x on testnets
order: 2
---

Expand Down Expand Up @@ -36,7 +36,7 @@ For further details, see [this paragraph](../v034/README.md#finding-the-saturati
in the baseline version.

The following table summarizes the results for v0.37.x, for the different experiments
(extracted from file [`v037_report_tabbed.txt`](./img/v037_report_tabbed.txt)).
(extracted from file [`v037_report_tabbed.txt`](./img/200nodes_tm037/v037_report_tabbed.txt)).

The X axis of this table is `c`, the number of connections created by the load runner process to the target node.
The Y axis of this table is `r`, the rate or number of transactions issued per second.
Expand Down Expand Up @@ -75,7 +75,7 @@ The load runner's CPU load was negligible (near 0) when running `r=200,c=2`.
The method described [here](../method.md) allows us to plot the latencies of transactions
for all experiments.

![all-latencies](./img/v037_200node_latencies.png)
![all-latencies](./img/200nodes_tm037/v037_200node_latencies.png)

The data seen in the plot is similar to that of the baseline.

Expand All @@ -88,7 +88,7 @@ The following plot summarizes average latencies versus overall throughputs
across different numbers of WebSocket connections to the node into which
transactions are being loaded.

![latency-vs-throughput](./img/v037_latency_throughput.png)
![latency-vs-throughput](./img/200nodes_tm037/v037_latency_throughput.png)

This is similar to that of the baseline plot:

Expand All @@ -106,11 +106,11 @@ at all full nodes. It did not exhibit any unconstrained growth.
The plot below shows the evolution over time of the cumulative number of transactions inside all full nodes' mempools
at a given time.

![mempool-cumulative](./img/v037_r200c2_mempool_size.png)
![mempool-cumulative](./img/200nodes_tm037/v037_r200c2_mempool_size.png)

The plot below shows evolution of the average over all full nodes, which oscillate between 1500 and 2000 outstanding transactions.

![mempool-avg](./img/v037_r200c2_mempool_size_avg.png)
![mempool-avg](./img/200nodes_tm037/v037_r200c2_mempool_size_avg.png)

The peaks observed coincide with the moments when some nodes reached round 1 of consensus (see below).

Expand All @@ -125,7 +125,7 @@ The peaks observed coincide with the moments when some nodes reached round 1 of
The number of peers was stable at all nodes.
It was higher for the seed nodes (around 140) than for the rest (between 16 and 78).

![peers](./img/v037_r200c2_peers.png)
![peers](./img/200nodes_tm037/v037_r200c2_peers.png)

Just as in the baseline, the fact that non-seed nodes reach more than 50 peers is due to #9548.

Expand All @@ -137,7 +137,7 @@ Just as in the baseline, the fact that non-seed nodes reach more than 50 peers i

Most heights took just one round, but some nodes needed to advance to round 1 at some point.

![rounds](./img/v037_r200c2_rounds.png)
![rounds](./img/200nodes_tm037/v037_r200c2_rounds.png)

**This plot yields slightly better results than the baseline**:

Expand All @@ -147,14 +147,14 @@ Most heights took just one round, but some nodes needed to advance to round 1 at

The blocks produced per minute are the gradient of this plot.

![heights](./img/v037_r200c2_heights.png)
![heights](./img/200nodes_tm037/v037_r200c2_heights.png)

Over a period of 2 minutes, the height goes from 477 to 524.
This results in an average of 23.5 blocks produced per minute.

The transactions processed per minute are the gradient of this plot.

![total-txs](./img/v037_r200c2_total-txs.png)
![total-txs](./img/200nodes_tm037/v037_r200c2_total-txs.png)

Over a period of 2 minutes, the total goes from 64525 to 100125 transactions,
resulting in 17800 transactions per minute. However, we can see in the plot that
Expand All @@ -172,11 +172,11 @@ we obtain 23733 transactions per minute.

Resident Set Size of all monitored processes is plotted below.

![rss](./img/v037_r200c2_rss.png)
![rss](./img/200nodes_tm037/v037_r200c2_rss.png)

The average over all processes oscillates around 380 MiB and does not demonstrate unconstrained growth.

![rss-avg](./img/v037_r200c2_rss_avg.png)
![rss-avg](./img/200nodes_tm037/v037_r200c2_rss_avg.png)

**These plots yield similar results to the baseline**:

Expand All @@ -190,7 +190,7 @@ The best metric from Prometheus to gauge CPU utilization in a Unix machine is `l
as it usually appears in the
[output of `top`](https://www.digitalocean.com/community/tutorials/load-average-in-linux).

![load1](./img/v037_r200c2_load1.png)
![load1](./img/200nodes_tm037/v037_r200c2_load1.png)

It is contained below 5 on most nodes.

Expand Down Expand Up @@ -218,7 +218,7 @@ Finally, note that this setup allows for a fairer comparison between this versio

The plot of all latencies can be seen here.

![rotating-all-latencies](./img/v037_rotating_latencies.png)
![rotating-all-latencies](./img/200nodes_tm037/v037_rotating_latencies.png)

Which is similar to the baseline.

Expand All @@ -238,7 +238,7 @@ We also show the baseline results for comparison.

The blocks produced per minute are the gradient of this plot.

![rotating-heights](./img/v037_rotating_heights.png)
![rotating-heights](./img/200nodes_tm037/v037_rotating_heights.png)

Over a period of 4446 seconds, the height goes from 5 to 3323.
This results in an average of 45 blocks produced per minute,
Expand All @@ -249,7 +249,7 @@ which is similar to the baseline, shown below.
The following two plots show only the heights reported by ephemeral nodes.
The second plot is the baseline plot for comparison.

![rotating-heights-ephe](./img/v037_rotating_heights_ephe.png)
![rotating-heights-ephe](./img/200nodes_tm037/v037_rotating_heights_ephe.png)

![rotating-heights-ephe-bl](../v034/img/v034_rotating_heights_ephe.png)

Expand All @@ -258,7 +258,7 @@ catch up slightly faster.

The transactions processed per minute are the gradient of this plot.

![rotating-total-txs](./img/v037_rotating_total-txs.png)
![rotating-total-txs](./img/200nodes_tm037/v037_rotating_total-txs.png)

Over a period of 3852 seconds, the total goes from 597 to 267298 transactions in one of the validators,
resulting in 4154 transactions per minute, which is slightly lower than the baseline,
Expand All @@ -272,7 +272,7 @@ For comparison, this is the baseline plot.

The plot below shows the evolution of the number of peers throughout the experiment.

![rotating-peers](./img/v037_rotating_peers.png)
![rotating-peers](./img/200nodes_tm037/v037_rotating_peers.png)

This is the baseline plot, for comparison.

Expand All @@ -287,7 +287,7 @@ For further details on these plots, see the baseline report.
The average Resident Set Size (RSS) over all processes looks slightly more stable
on `v0.37` (first plot) than on the baseline (second plot).

![rotating-rss-avg](./img/v037_rotating_rss_avg.png)
![rotating-rss-avg](./img/200nodes_tm037/v037_rotating_rss_avg.png)

![rotating-rss-avg-bl](../v034/img/v034_rotating_rss_avg.png)

Expand All @@ -298,7 +298,7 @@ just as observed in the baseline.

The plot shows metric `load1` for all nodes.

![rotating-load1](./img/v037_rotating_load1.png)
![rotating-load1](./img/200nodes_tm037/200nodes_tm037/v037_rotating_load1.png)

This is the baseline plot.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/qa/v037/img/200nodes_cmt037/cpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/qa/v037/img/200nodes_cmt037/memory.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/qa/v037/img/200nodes_cmt037/peers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/qa/v037/img/200nodes_cmt037/rounds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading