rework secondary index generation #959

jeffwashington · 2024-04-22T14:35:12Z

Problem

stop mmapping storage files.
We cannot iterate and return StoredAccountMeta for many accounts anymore. The loop for secondary index generation doesn't easily support reworking.

Summary of Changes

Always generate the primary index the same way, using scan_index. Then, if we're building secondary indexes, update the secondary index using scan_accounts. By far the largest cost of secondary indexes is the generation step. We were already calling this update once per account. We are scanning the entire storage a second time, but the performance is dominated by the secondary index generation so this should be negligible.

Fixes #

brooksprumo

The code looks good, and the intuition about perf impacts makes sense too. If this were not RPC-specific, we'd want before and after timing numbers to compare. It would be nice to have that comparison here too.

I also understand that the secondary index generation must change to accommodate no-mmaps. Does the new scan_accounts() not give enough info to create the items that is needed by insert_new_if_missing_into_primary_index()?

jeffwashington · 2024-04-22T16:08:24Z

Does the new scan_accounts() not give enough info to create the items that is needed by insert_new_if_missing_into_primary_index()?

insert_new_if_missing_into_primary_index takes an iter of items: impl Iterator<Item = (Pubkey, T)>,.
We can scan using scan_accounts once and get the index info, too. But, this would be a code duplication that this pr gets rid of. In the new hot storage format, scanning for the index is a smaller scan of i/o than the accounts scan.
The choices are basically duplicate code paths for insertion into primary index or this solution.
What do you think?
I think the perf number changes will be irrelevant.

brooksprumo

The choices are basically duplicate code paths for insertion into primary index or this solution. What do you think? I think the perf number changes will be irrelevant.

I definitely prefer this PR since it removes the code duplication. I'd prefer to have perf numbers to feel more comfortable. I do trust your judgements here too, so

jeffwashington · 2024-04-22T18:04:25Z

@brooksprumo I ran ledger tool on a mnb validator. I ran 3 things:

master
an additional scan_accounts call per storage just like this pr would do
replacing the scan_index call with a scan_accounts call
Remove README and LICENSE #1 is fastest but varies greatly.
Remove LICENSE to get Github to pick it up correctly #2 and Re-add license #3 are the same time and generally slower than the worst Remove README and LICENSE #1. Remove LICENSE to get Github to pick it up correctly #2 and Re-add license #3 are at worst 12% slower than the fastest Remove README and LICENSE #1 and 6% slower than the average Remove README and LICENSE #1 (of 15 tries). This indicates that scan_accounts is slower than scan_index, which we'd expect and will b worse with hot storage. Note this is with no secondary indexing occurring. That is the very slow part.

Conclusion: this second iteration of accounts is insignificant.

rework secondary index generation

bfef716

jeffwashington force-pushed the 4a120 branch from c098882 to bfef716 Compare April 22, 2024 14:37

jeffwashington marked this pull request as ready for review April 22, 2024 14:54

jeffwashington requested a review from brooksprumo April 22, 2024 14:54

brooksprumo reviewed Apr 22, 2024

View reviewed changes

brooksprumo approved these changes Apr 22, 2024

View reviewed changes

jeffwashington merged commit 7389371 into anza-xyz:master Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rework secondary index generation #959

rework secondary index generation #959

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rework secondary index generation #959

rework secondary index generation #959

Uh oh!

Conversation

Problem

Summary of Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!