fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

Fraser999 · 2024-12-02T20:28:37Z

Summary

This updates the sequencer benchmarks to actually run on macOS, and fixes a couple of issues around the mempool benchmarks.

Background

The benchmarks could previously only be run on Linux. Since they're also not being run regularly, they got broken at some point. They were further disabled during the large PR #2142.

Since I was working on fixing them anyway, I took the opportunity to improve the initialization of the test data. This has resulted in the total time to execute the full mempool suite dropping from ~6.5 mins to ~1.5 mins and means all tests are now able to complete 100 iterations (several could only manage one iteration previously).

Changes

Used an item from the astria-sequencer crate in the benchmark's main function to resolve the issue of the tests not running on macOS.
Fixed an issue where the tests were reusing identical txs, causing initialization to fail with an InsertionError::AlreadyPresent error.
Memoized test data to reduce test initialization times.
Refactored PendingTransactionsForAccount::find_demotables to use an existing method, deduplicating code.
Refactored tests to use updated test utilities.

Testing

Confirmed the benchmarks ran on macOS locally. The app ones take a very long time to run and this should be investigated and fixed. To run only the mempool ones:

cargo bench --features=benchmark -qp astria-sequencer mempool

Changelogs

No updates required.

Related Issues

Closes #1355.

SuperFluffy · 2024-12-03T05:59:49Z

Can we rearchitect our benchmarks to move away from Divan toward Criterion instead? I am thinking of breaking out code parts that are benched into free standing crates for example.

I understand one advantage is that Divan makes it relatively easy to bench code that's not in the public API. On the other hand we seem to be needing lots of workarounds for it.

Fraser999 · 2024-12-03T10:38:58Z

@SuperFluffy

Can we rearchitect our benchmarks to move away from Divan toward Criterion instead? I am thinking of breaking out code parts that are benched into free standing crates for example.

Yes, but I'm not sure if we should in the short-term.

The workarounds for Divan are constrained to just the benchmarking code, except for possibly one which would be required for Criterion anyway, so I'm not too worried about them. Most of the workarounds will hopefully no longer be required in the future according to Divan's roadmap, but whether progress is made on that or not is debatable. It does not seem to be under particularly active development.

I'm not convinced that breaking out benchmarked sections of the code into standalone crates is worthwhile though tbh. I don't see a compelling argument for breaking out App or Mempool right now since they're both (rightly) very specific to our use-case in Sequencer. On the plus side, breaking them out would at least (hopefully!) force us away from using eyre in these places, but that just makes the effort to break them out all the greater. Splitting one or both out later could be useful if we decide they could be reused by new apps, but I would avoid it if we don't have that need.

We could instead look to feature gate public access to the required modules behind the existing benchmark feature. It would be a little ugly, but maybe worthwhile given the benefits of Criterion.

Lilyjjo · 2024-12-03T17:41:51Z

crates/astria-sequencer/src/mempool/transactions_container.rs

            // ensure we have enough balance to cover inclusion
-            for (denom, cost) in tx.cost() {


Nice refactor 👍

Lilyjjo · 2024-12-03T17:59:56Z

This lgtm, great that it runs on macs now :)

joroshiba · 2025-01-18T00:24:44Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be
closed in 7 days.

joroshiba · 2025-01-25T00:25:08Z

This PR was closed because it has been stale.

Fraser999 · 2025-01-30T17:20:33Z

Re-opening as we'd like to actually get this merged eventually. Not a high priority though.

joroshiba · 2025-03-17T00:29:23Z

This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be
closed in 7 days.

## Summary This introduces `CheckedAction`, `CheckedTransaction` and `Checked...` wrappers for each action type. ## Background We want to have stricter checks performed when `CheckTx` is called on a new tx in order to avoid accepting txs which subsequently fail execution. Checks on txs and their included actions are now split into two categories: mutable and immutable. On construction of a checked object, both types of checks are run. Subsequently, only the mutable checks are re-run, and this is currently only done at the start of executing the action/transaction. Nonce and balance checks are excluded from these checks as the Mempool only puts forward for execution txs which should have valid nonces and sufficient account balances. In order to ensure such checks have been performed, the sequencer now deals almost exclusively in checked flavours of actions and txs. As a result of this, many tests which previously used txs had to be refactored to ensure that constructing the equivalent checked txs would succeed. ## Changes - Introduced new `Checked...` types for actions and `Transaction`. Replaced usage throughout sequencer codebase of the un 8000 checked flavours of these. - Changed `ExpandedBlockData::user_submitted_transactions` to UN-parsed transactions so that all checking can be done when constructing a `CheckedTransaction`. - Unified several test utilities into a single, top-level `test_utils` module. As well as pre-existing helper functions, there is now a `Fixture` for intializing a new chain along with several builders/initializers to aid in constructing objects to be used in tests. These are provided with doc comments which should explain how to use them. - Disabled benchmarks for now. These are already not functioning on MacOS (albeit there is an open PR #1842 to address this). I would prefer to handle updating the benchmarks as part of that PR rather than adding to this one. ## Testing - Existing tests were generally refactored to retain their original intent. - Individual checked actions were provided with extensive unit test coverage. Each one generally tests the construction AND execution of the specific action. To generate failures during execution, the unit tests construct the action to be tested while the global state makes construction possible (e.g signer is sudo), and then modify the global state to make execution fail (e.g. change the sudo address). All such state changes are done by executing other checked actions rather than directly manipulating global state, to ensure that such state changes remain feasible in the future. - These new action unit tests made the tests in `tests_execute_transaction.rs` largely redundant, hence that file has been deleted. Coverage was also provided for `IbcRelay` where none existed before, but the setup difficulty for testing `Ics20Withdrawal` is such that only a subset of the code paths are tested (as is the case currently). - `CheckedTransaction` also has extensive unit test coverage for construction and execution, and covers any deleted tests from `tests_execute_transaction.rs` not already catered for by the action unit tests. ## Changelogs Changelogs updated. ## Metrics - Added metrics: - `ASTRIA_SEQUENCER_CHECK_TX_FAILED_ACTION_CHECKS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_ACTIONS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_RECHECK` - Renamed metric `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_TOO_LARGE` to `ASTRIA_SEQUENCER_CHECK_TX_FAILED_TX_TOO_LARGE` - Deleted metrics: - `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_FAILED_STATELESS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_PARSE_TX` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_STATELESS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_TRACKED` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_CHAIN_ID` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_REMOVED` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CONVERT_ADDRESS` ## Breaking Changelist - There should be no consensus, API or state-breaking changes other than the metrics listed above. All checks being done on actions and transactions remain the same, they are simply performed at a different stage of the flow now. ## Related Issues Closes #1958. Closes #1620.

## Summary This updates the sequencer benchmarks to actually run on macOS, and fixes a couple of issues around the mempool benchmarks. ## Background The benchmarks could previously only be run on Linux. Since they're also not being run regularly, they got broken at some point. They were further disabled during the large PR #2142. Since I was working on fixing them anyway, I took the opportunity to improve the initialization of the test data. This has resulted in the total time to execute the full mempool suite dropping from ~6.5 mins to ~1.5 mins and means all tests are now able to co 86BC mplete 100 iterations (several could only manage one iteration previously). ## Changes - Used an item from the `astria-sequencer` crate in the benchmark's `main` function to resolve the issue of the tests not running on macOS. - Fixed an issue where the tests were reusing identical txs, causing initialization to fail with an `InsertionError::AlreadyPresent` error. - Memoized test data to reduce test initialization times. - Refactored `PendingTransactionsForAccount::find_demotables` to use an existing method, deduplicating code. - Refactored tests to use updated test utilities. ## Testing Confirmed the benchmarks ran on macOS locally. The `app` ones take a very long time to run and this should be investigated and fixed. To run only the mempool ones: ``` cargo bench --features=benchmark -qp astria-sequencer mempool ``` ## Changelogs No updates required. ## Related Issues Closes #1355.

Fraser999 added 3 commits Decembe 8000 r 2, 2024 19:49

fix to allow benchmarks to run on macOS

e274217

improve mempool benchmark initialization

a865a6d

fix mempool benchmarks

755f160

Fraser999 requested a review from a team as a code owner December 2, 2024 20:28

Fraser999 requested review from noot and Lilyjjo December 2, 2024 20:28

github-actions bot added the sequencer pertaining to the astria-sequencer crate label Dec 2, 2024

Lilyjjo reviewed Dec 3, 2024

View reviewed changes

joroshiba added the stale label Jan 18, 2025

joroshiba added the closed-stale label Jan 25, 2025

joroshiba closed this Jan 25, 2025

Fraser999 requested review from SuperFluffy and removed request for noot January 30, 2025 17:19

Fraser999 reopened this Jan 30, 2025

Fraser999 requested a review from noot as a code owner January 30, 2025 17:20

joroshiba removed closed-stale stale labels Jan 31, 2025

joroshiba added the stale label Mar 17, 2025

Fraser999 added the ignore-stale Override for issues or PRs which should not be removed if stale. label Mar 17, 2025

Fraser999 mentioned this pull request Apr 28, 2025

feat(core, sequencer): provide checked actions and transaction #2142

Merged

Fraser999 added 2 commits May 15, 2025 22:37

Merge remote-tracking branch 'upstream/main' into ENG-700/fix-benchmarks

230121d

re-enable benchmarks

82d3489

noot approved these changes May 16, 2025

View reviewed changes

Fraser999 added this pull request to the merge queue May 16, 2025

Merged via the queue into main with commit 538c816 May 16, 2025
54 checks passed

Fraser999 deleted the ENG-700/fix-benchmarks branch May 16, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		// ensure we have enough balance to cover inclusion
		for (denom, cost) in tx.cost() {

fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

Uh oh!

Conversation

Uh oh!

Summary

Background

Changes

Testing

Changelogs

Related Issues

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!