-
Notifications
You must be signed in to change notification settings - Fork 93
fix(sequencer): allow benchmarks to run on macOS and fix issues #1842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can we rearchitect our benchmarks to move away from Divan toward Criterion instead? I am thinking of breaking out code parts that are benched into free standing crates for example. I understand one advantage is that Divan makes it relatively easy to bench code that's not in the public API. On the other hand we seem to be needing lots of workarounds for it. |
Yes, but I'm not sure if we should in the short-term. The workarounds for Divan are constrained to just the benchmarking code, except for possibly one which would be required for Criterion anyway, so I'm not too worried about them. Most of the workarounds will hopefully no longer be required in the future according to Divan's roadmap, but whether progress is made on that or not is debatable. It does not seem to be under particularly active development. I'm not convinced that breaking out benchmarked sections of the code into standalone crates is worthwhile though tbh. I don't see a compelling argument for breaking out We could instead look to feature gate public access to the required modules behind the existing |
All reactions
// ensure we have enough balance to cover inclusion | ||
for (denom, cost) in tx.cost() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor 👍
This lgtm, great that it runs on macs now :) |
This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be |
This PR was closed because it has been stale. |
Re-opening as we'd like to actually get this merged eventually. Not a high priority though. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or this PR will be |
## Summary This introduces `CheckedAction`, `CheckedTransaction` and `Checked...` wrappers for each action type. ## Background We want to have stricter checks performed when `CheckTx` is called on a new tx in order to avoid accepting txs which subsequently fail execution. Checks on txs and their included actions are now split into two categories: mutable and immutable. On construction of a checked object, both types of checks are run. Subsequently, only the mutable checks are re-run, and this is currently only done at the start of executing the action/transaction. Nonce and balance checks are excluded from these checks as the Mempool only puts forward for execution txs which should have valid nonces and sufficient account balances. In order to ensure such checks have been performed, the sequencer now deals almost exclusively in checked flavours of actions and txs. As a result of this, many tests which previously used txs had to be refactored to ensure that constructing the equivalent checked txs would succeed. ## Changes - Introduced new `Checked...` types for actions and `Transaction`. Replaced usage throughout sequencer codebase of the unchecked flavours of these. - Changed `ExpandedBlockData::user_submitted_transactions` to UN-parsed transactions so that all checking can be done when constructing a `CheckedTransaction`. - Unified several test utilities into a single, top-level `t 8000 est_utils` module. As well as pre-existing helper functions, there is now a `Fixture` for intializing a new chain along with several builders/initializers to aid in constructing objects to be used in tests. These are provided with doc comments which should explain how to use them. - Disabled benchmarks for now. These are already not functioning on MacOS (albeit there is an open PR #1842 to address this). I would prefer to handle updating the benchmarks as part of that PR rather than adding to this one. ## Testing - Existing tests were generally refactored to retain their original intent. - Individual checked actions were provided with extensive unit test coverage. Each one generally tests the construction AND execution of the specific action. To generate failures during execution, the unit tests construct the action to be tested while the global state makes construction possible (e.g signer is sudo), and then modify the global state to make execution fail (e.g. change the sudo address). All such state changes are done by executing other checked actions rather than directly manipulating global state, to ensure that such state changes remain feasible in the future. - These new action unit tests made the tests in `tests_execute_transaction.rs` largely redundant, hence that file has been deleted. Coverage was also provided for `IbcRelay` where none existed before, but the setup difficulty for testing `Ics20Withdrawal` is such that only a subset of the code paths are tested (as is the case currently). - `CheckedTransaction` also has extensive unit test coverage for construction and execution, and covers any deleted tests from `tests_execute_transaction.rs` not already catered for by the action unit tests. ## Changelogs Changelogs updated. ## Metrics - Added metrics: - `ASTRIA_SEQUENCER_CHECK_TX_FAILED_ACTION_CHECKS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_ACTIONS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_RECHECK` - Renamed metric `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_TOO_LARGE` to `ASTRIA_SEQUENCER_CHECK_TX_FAILED_TX_TOO_LARGE` - Deleted metrics: - `ASTRIA_SEQUENCER_CHECK_TX_REMOVED_FAILED_STATELESS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_PARSE_TX` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_STATELESS` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_TRACKED` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_CHAIN_ID` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CHECK_REMOVED` - `ASTRIA_SEQUENCER_CHECK_TX_DURATION_SECONDS_CONVERT_ADDRESS` ## Breaking Changelist - There should be no consensus, API or state-breaking changes other than the metrics listed above. All checks being done on actions and transactions remain the same, they are simply performed at a different stage of the flow now. ## Related Issues Closes #1958. Closes #1620.
## Summary This updates the sequencer benchmarks to actually run on macOS, and fixes a couple of issues around the mempool benchmarks. ## Background The benchmarks could previously only be run on Linux. Since they're also not being run regularly, they got broken at some point. They were further disabled during the large PR #2142. Since I was working on fixing them anyway, I took the opportunity to improve the initialization of the test data. This has resulted in the total time to execute the full mempool suite dropping from ~6.5 mins to ~1.5 mins and means all tests are now able to complete 100 iterations (several could only manage one iteration previously). ## Changes - Used an item from the `astria-sequencer` crate in the benchmark's `main` function to resolve the issue of the tests not running on macOS. - Fixed an issue where 86DB the tests were reusing identical txs, causing initialization to fail with an `InsertionError::AlreadyPresent` error. - Memoized test data to reduce test initialization times. - Refactored `PendingTransactionsForAccount::find_demotables` to use an existing method, deduplicating code. - Refactored tests to use updated test utilities. ## Testing Confirmed the benchmarks ran on macOS locally. The `app` ones take a very long time to run and this should be investigated and fixed. To run only the mempool ones: ``` cargo bench --features=benchmark -qp astria-sequencer mempool ``` ## Changelogs No updates required. ## Related Issues Closes #1355.
Summary
This updates the sequencer benchmarks to actually run on macOS, and fixes a couple of issues around the mempool benchmarks.
Background
The benchmarks could previously only be run on Linux. Since they're also not being run regularly, they got broken at some point. They were further disabled during the large PR #2142.
Since I was working on fixing them anyway, I took the opportunity to improve the initialization of the test data. This has resulted in the total time to execute the full mempool suite dropping from ~6.5 mins to ~1.5 mins and means all tests are now able to complete 100 iterations (several could only manage one iteration previously).
Changes
astria-sequencer
crate in the benchmark'smain
function to resolve the issue of the tests not running on macOS.InsertionError::AlreadyPresent
error.PendingTransactionsForAccount::find_demotables
to use an existing method, deduplicating code.Testing
Confirmed the benchmarks ran on macOS locally. The
app
ones take a very long time to run and this should be investigated and fixed. To run only the mempool ones:Changelogs
No updates required.
Related Issues
Closes #1355.