8000 fix(sequencer): allow benchmarks to run on macOS and fix issues by Fraser999 · Pull Request #1842 · astriaorg/astria · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix(sequencer): allow benchmarks to run on macOS and fix issues #1842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 16, 2025

Conversation

Fraser999
Copy link
Contributor
@Fraser999 Fraser999 commented Dec 2, 2024

Summary

This updates the sequencer benchmarks to actually run on macOS, and fixes a couple of issues around the mempool benchmarks.

Background

The benchmarks could previously only be run on Linux. Since they're also not being run regularly, they got broken at some point. They were further disabled during the large PR #2142.

Since I was working on fixing them anyway, I took the opportunity to improve the initialization of the test data. This has resulted in the total time to execute the full mempool suite dropping from ~6.5 mins to ~1.5 mins and means all tests are now able to complete 100 iterations (several could only manage one iteration previously).

Changes

  • Used an item from the astria-sequencer crate in the benchmark's main function to resolve the issue of the tests not running on macOS.
  • Fixed an issue where the tests were reusing identical txs, causing initialization to fail with an InsertionError::AlreadyPresent error.
  • Memoized test data to reduce test initialization times.
  • Refactored PendingTransactionsForAccount::find_demotables to use an existing method, deduplicating code.
  • Refactored tests to use updated test utilities.

Testing

Confirmed the benchmarks ran on macOS locally. The app ones take a very long time to run and this should be investigated and fixed. To run only the mempool ones:

cargo bench --features=benchmark -qp astria-sequencer mempool

Changelogs

No updates required.

Related Issues

Closes #1355.

@Fraser999 Fraser999 requested a review from a team as a code owner December 2, 2024 20:28
@Fraser999 Fraser999 requested review from noot and Lilyjjo December 2, 2024 20:28
@github-actions github-actions bot added the sequencer pertaining to the astria-sequencer crate label Dec 2, 2024
@SuperFluffy
Copy link
Member

Can we rearchitect our benchmarks to move away from Divan toward Criterion instead? I am thinking of breaking out code parts that are benched into free standing crates for example.

I understand one advantage is that Divan makes it relatively easy to bench code that's not in the public API. On the other hand we seem to be needing lots of workarounds for it.

@Fraser999
Copy link
Contributor Author

@SuperFluffy

Can we rearchitect our benchmarks to move away from Divan toward Criterion instead? I am thinking of breaking out code parts that are benched into free standing crates for example.

Yes, but I'm not sure if we should in the short-term.

The workarounds for Divan are constrained to just the benchmarking code, except for possibly one which would be required for Criterion anyway, so I'm not too worried about them. Most of the workarounds will hopefully no longer be required in the future according to Divan's roadmap, but whether progress is made on that or not is debatable. It does not seem to be under particularly active development.

I'm not convinced that breaking out benchmarked sections of the code into standalone crates is worthwhile though tbh. I don't see a compelling argument for breaking out App or Mempool right now since they're both (rightly) very specific to our use-case in Sequencer. On the plus side, breaking them out would at least (hopefully!) force us away from using eyre in these places, but that just makes the effort to break them out all the greater. Splitting one or both out later could be useful if we decide they could be reused by new apps, but I would avoid it if we don't have that need.

We could instead look to feature gate public access to the required modules behind the existing benchmark feature. It would be a little ugly, but maybe worthwhile given the benefits of Criterion.

// ensure we have enough balance to cover inclusion
for (denom, cost) in tx.cost() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor 👍

@Lilyjjo
Copy link
Contributor
Lilyjjo commented Dec 3, 2024

This lgtm, great that it runs on macs now :)

@joroshiba
Copy link
Member

This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be
closed in 7 days.

@joroshiba
Copy link
Member

This PR was closed because it has been stale.

@joroshiba joroshiba closed this Jan 25, 2025
@Fraser999 Fraser999 requested review from SuperFluffy and removed request for noot January 30, 2025 17:19
@Fraser999
Copy link
Contributor Author

Re-opening as we'd like to actually get this merged eventually. Not a high priority though.

@joroshiba
Copy link
Member

This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be
closed in 7 days.

@Fraser999 Fraser999 added the ignore-stale Override for issues or PRs which should not be removed if stale. label Mar 17, 2025
github-merge-queue bot pushed a commit that referenced this pull request May 13, 2025
## Summary
This introduces `CheckedAction`, `CheckedTransaction` and `Checked...`
wrappers for each action type.

## Background
We want to have stricter checks performed when `CheckTx` is called on a
new tx in order to avoid accepting txs which subsequently fail
execution.

Checks on txs and their included actions are now split into two
categories: mutable and immutable. On construction of a checked object,
both types of checks are run. Subsequently, only the mutable checks are
re-run, and this is currently only done at the start of executing the
action/transaction.

Nonce and balance checks are excluded from these checks as the Mempool
only puts forward for execution txs which should have valid nonces and
sufficient account balances.

In order to ensure such checks have been performed, the sequencer now
deals almost exclusively in checked flavours of actions and txs.

As a result of this, many tests which previously used txs had to be
refactored to ensure that constructing the equivalent checked txs would
succeed.

## Changes
- Introduced new `Checked...` types for actions and `Transaction`.
Replaced usage throughout sequencer codebase of the unchecked flavours
of these.
- Changed `ExpandedBlockData::user_submitted_transactions` to UN-parsed
transactions so that all checking can be done when constructing a
`CheckedTransaction`.
- Unified several test utilities into a single, top-level `t
8000
est_utils`
module. As well as pre-existing helper functions, there is now a
`Fixture` for intializing a new chain along with several
builders/initializers to aid in constructing objects to be used in
tests. These are provided with doc comments which should explain how to
use them.
- Disabled benchmarks for now. These are already not functioning on
MacOS (albeit there is an open PR #1842 to address this). I would prefer
to handle updating the benchmarks as part of that PR rather than adding
to this one.

## Testing
- Existing tests were generally refactored to retain their original
intent.
- Individual checked actions were provided with extensive unit test
coverage. Each one generally tests the construction AND execution of the
specific action. To generate failures during execution, the unit tests
construct the action to be tested while the global state makes
construction possible (e.g signer is sudo), and then modify the global
state to make execution fail (e.g. change the sudo address). All such
state changes are done by executing other checked actions rather than
directly manipulating global state, to ensure that such state changes
remain feasible in the future.
- These new action unit tests made the tests in
`tests_execute_transaction.rs` largely redundant, hence that file has
been deleted. Coverage was also provided for `IbcRelay` where none
existed before, but the setup difficulty for testing `Ics20Withdrawal`
is such that only a subset of the code paths are tested (as is the case
currently).
- `CheckedTransaction` also has extensive unit test coverage for
construction and execution, and covers any deleted tests from
`tests_execute_transaction.rs` not already catered for by the action
unit tests.

## Changelogs
Changelogs updated.

## Metrics
- Added metrics:
  - `ASTRIA_SEQUENCER_CHECK_TX_FAILED_ACTION_CHECKS`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_ACTIONS`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_RECHECK`
- Renamed metric `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_TOO_LARGE` to
  `ASTRIA_SEQUENCER_CHECK_TX_FAILED_TX_TOO_LARGE`
- Deleted metrics:
  - `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_FAILED_STATELESS`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_PARSE_TX`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_STATELESS`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_TRACKED`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_CHAIN_ID`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_REMOVED`
  - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CONVERT_ADDRESS`

## Breaking Changelist
- There should be no consensus, API or state-breaking changes other than
the metrics listed above. All checks being done on actions and
transactions remain the same, they are simply performed at a different
stage of the flow now.

## Related Issues
Closes #1958.
Closes #1620.
@Fraser999 Fraser999 added this pull request to the merge queue May 16, 2025
Merged via the queue into main with commit 538c816 May 16, 2025
54 checks passed
@Fraser999 Fraser999 deleted the ENG-700/fix-benchmarks branch May 16, 2025 14:44
ethanoroshiba pushed a commit that referenced this pull request May 20, 2025
## Summary
This updates the sequencer benchmarks to actually run on macOS, and
fixes a couple of issues around the mempool benchmarks.

## Background
The benchmarks could previously only be run on Linux. Since they're also
not being run regularly, they got broken at some point. They were
further disabled during the large PR #2142.

Since I was working on fixing them anyway, I took the opportunity to
improve the initialization of the test data. This has resulted in the
total time to execute the full mempool suite dropping from ~6.5 mins to
~1.5 mins and means all tests are now able to complete 100 iterations
(several could only manage one iteration previously).

## Changes
- Used an item from the `astria-sequencer` crate in the benchmark's
`main` function to resolve the issue of the tests not running on macOS.
- Fixed an issue where
86DB
 the tests were reusing identical txs, causing
initialization to fail with an `InsertionError::AlreadyPresent` error.
- Memoized test data to reduce test initialization times.
- Refactored `PendingTransactionsForAccount::find_demotables` to use an
existing method, deduplicating code.
- Refactored tests to use updated test utilities.

## Testing
Confirmed the benchmarks ran on macOS locally. The `app` ones take a
very long time to run and this should be investigated and fixed. To run
only the mempool ones:
```
cargo bench --features=benchmark -qp astria-sequencer mempool
```

## Changelogs
No updates required.

## Related Issues
Closes #1355.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ignore-stale Override for issues or PRs which should not be removed if stale. sequencer pertaining to the astria-sequencer crate stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate why benchmarks fail to run on MacOS
5 participants
0