8000 perf: use pre-allocated arrays for known result sizes by GalacticHypernova · Pull Request #5703 · rollup/rollup · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

perf: use pre-allocated arrays for known result sizes #5703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 29, 2024

Conversation

GalacticHypernova
Copy link
Contributor

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

List any relevant issue numbers:

Description

When the final size of an array is known, it would be more performant to use a pre-allocated array. This will prevent the internal resizing that happens when Array.prototype.push exceeds the array's memory allocation, which the JS engine will then internally re-allocate and apply unnecessary pressure on the GC to clean up the "dead" memory.

Note: This PR is a draft because I will add all relevant files in this single PR, which might take some time

Copy link
vercel bot commented Oct 25, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
rollup ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 29, 2024 8:09am

Copy link
codecov bot commented Oct 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.01%. Comparing base (9048a4f) to head (794d81a).
Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5703      +/-   ##
==========================================
- Coverage   99.03%   99.01%   -0.02%     
==========================================
  Files         257      257              
  Lines        8044     8055      +11     
  Branches     1357     1358       +1     
==========================================
+ Hits         7966     7976      +10     
  Misses         52       52              
- Partials       26       27       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@GalacticHypernova
Copy link
Contributor Author

I haven't yet added all of the files I wanted, but the failure appears to prioritize the push method, which is basically directly against what this PR aims to do

@lukastaegert
Copy link
Member
lukastaegert commented Oct 27, 2024

Tip: If you want to change anything in bufferParsers or bufferToAst, heed the tip at the top of the file that these files are entirely generated code. They will be overwritten by a full build and are mostly checked in so that you can compare and inspect the code and that TypeScript does not complain.
Perform the change in generate-buffer-parsers and generate-buffer-to-ast first, then run npm run build:ast-converters.

Otherwise, I am definitely curious to see if this provides any tangible performance benefits, especially in the generated code it would be a quick win.

@GalacticHypernova
Copy link
Contributor Author
GalacticHypernova commented Oct 27, 2024

Ah, I didn't know they were generated. I was so used to the scripts directory being for miscellaneous actions like bumping releases, that I didn't even decide to take a look there, that was my bad. I'll update them soon.

As for the performance, constant time always wins against amortized constant time (constant being pre allocated array, and amortized being push). That's because every once in a while push will not be O(1) but rather O(n) because the engine will need to copy it to a new, bigger array, in order to adjust for the growth. With pre-allocated array, the engine can immediately allocate the necessary space which will prevent dynamic resizing.

In terms of memory, the difference is even more significant than that of the performance, because the entire operation will be much more memory efficient, due to the elimination of the dynamic resizing. This removes the pressure on the GC that exists when dynamically pushing to an unknown final size.

We can make a quick calculation to see how much more efficient it is:
From the scarce amount of resources on the internal operations in arrays in JS engines like V8, JSC, and Bun, an initial push allocates space for ~8-16 elements. Then it doubles capacity for every push that exceeds the array's allocated memory, and copies the array into the newly allocated array, discarding of the old one

If we take a look at a 100 element array, it would go as follows:
Let's take initial allocation as 12 elements, the middle. Let's say we're only storing values that take a byte each. With the push method, we would have 12 bytes, followed by 24 bytes, followed by 48 bytes, followed by 96 bytes, followed by 192 bytes (even though not all of it is used, it will all be allocated and reserved for the array in case of further pushes).

Total amount of memory used would be 12 + 24 + 48 + 96 + 192 = 372 bytes

With the pre allocated array method, we directly reserve spots for 100 bytes. That means the push method will use almost x4 more memory for just 100 elements

@lukastaegert
Copy link
Member

I understand the theory. However, if the impact turns out to be close to not-measurable, I would suggest to limit these kind of changes only to hot code paths as of course they are not entirely free, they make the code slightly less straightforward to read and maintain.
That being said, we have a performance report in CI to answer those questions.

@GalacticHypernova
Copy link
Contributor Author
GalacticHypernova commented Oct 27, 2024

However, if the impact turns out to be close to not-measurable,

I can't help but to think about resource-constrained environments in this case. I agree, on unbound computers this may not be of any significance, but on computers with limited resources this could be a significant change, especially considering memory itself is more of a bottleneck than performance in most cases.

I also believe that objectively speaking, even if the readability is a bit impacted by it, it's very minimal. Most of the syntax it's the same, and I believe people are familiar with array access, even if they are unaware of low-level optimizations like pre-allocated arrays. But of course, the final decision will be up to you.

It goes without saying that the difference becomes more noticeable the more elements are in the array, so if there are some arrays in this PR that are not influenced by user-code and are purely internally managed, then if you'd like we could revert them. But for anything that can directly be affected by the code it is bundling, then it would likely be better, for performance, to keep them with the current approach, just for the slight chance it would be exceptionally big.

Copy link
github-actions bot commented Oct 27, 2024

Thank you for your contribution! ❤️

You can try out this pull request locally by installing Rollup via

npm install GalacticHypernova/rollup#patch-1

Notice: Ensure you have installed the latest stable Rust toolchain. If you haven't installed it yet, please see https://www.rust-lang.org/tools/install to learn how to download Rustup and install Rust.

or load it into the REPL:
https://rollup-ame4tgk35-rollup-js.vercel.app/repl/?pr=5703

Copy link
github-actions bot commented Oct 27, 2024

Performance report

  • BUILD: 8729ms, 736 MB
    • initialize: 0ms, 26.2 MB
    • generate module graph: 3322ms, 554 MB (-4%)
      • generate ast: 1558ms, 547 MB (-4%)
    • sort and bind modules: 462ms, 594 MB (-4%)
    • mark included statements: 4924ms, 736 MB
      • treeshaking pass 1: 1688ms, 698 MB (-3%)
      • treeshaking pass 2: 817ms, 717 MB (-3%)
      • treeshaking pass 3: 317ms, 722 MB (-3%)
      • treeshaking pass 4: 290ms, 726 MB (-2%)
      • treeshaking pass 5: 339ms, 729 MB (-3%)
      • treeshaking pass 6: 276ms, 728 MB (-4%)
      • treeshaking pass 7: 256ms, 732 MB (-3%)
      • treeshaking pass 8: 247ms, 730 MB (-3%)
      • treeshaking pass 9: 224ms, 729 MB (-4%)
      • treeshaking pass 10: 226ms, 732 MB (-3%)
      • treeshaking pass 11: 226ms, 736 MB
  • GENERATE: 952ms, 980 MB (-4%)
    • initialize render: 0ms, 871 MB (-5%)
    • generate chunks: 103ms, 883 MB (-4%)
      • optimize chunks: 0ms, 877 MB (-4%)
    • render chunks: 830ms, 956 MB (-5%)
    • transform chunks: 19ms, 980 MB (-4%)
    • generate bundle: 0ms, 980 MB (-4%)

@lukastaegert
Copy link
Member

I am just saying that I wanted to wait for the measurements. And the measurements do support your approach. While apparently performance changes were negligible, the changes seem to save 3-5% memory in the test, which is good enough for me to keep the changes.

In order to merge this, could you please run npm run lint locally and commit the changes? See the test results.

@GalacticHypernova
Copy link
Contributor Author
GalacticHypernova commented Oct 28, 2024

Sure thing, thanks!

I apologize if I seemed overly pushy about this. I'm just trying to contribute as much as I can and I know memory efficiency is a hugely overlooked, yet equally important, aspect in performance, given that most people only pay attention strictly to runtime performance (speed of execution), without worrying about the memory due to the GC being a thing.

I also didn't know that rollup has its own memory benchmark, so I wanted to explain essentially what the benchmark showed but in my own words

Edit: I might need a little more to fix the AST generators

@lukastaegert
Copy link
Member
lukastaegert commented Oct 28, 2024

And this is really appreciated, I hope I did not sound too negative. Honestly I never thought too much about this aspect before and am positively surprised that it really does make a measurable impact in our benchmark.
Let me know if I can help from my side. npm run lint should also fix things automatically.

@GalacticHypernova
Copy link
Contributor Author
GalacticHypernova commented Oct 29, 2024

npm run lint should also fix things automatically.

Yea, the issue isn't the lint. For some reason it says the husky pre-commit script failed so I have to manually update from GitHub itself, one by one.

I believe I fixed it all, but I could be wrong. Would appreciate if you could review to see if I missed anything related to the PR

@lukastaegert
Copy link
Member

Looks good. Not sure what is up with husky but in the worst case, you can add --no-verify to git commit to skip the hook.

@lukastaegert lukastaegert added this pull request to the merge queue Oct 29, 2024
Merged via the queue into rollup:master with commit 4c5cc76 Oct 29, 2024
39 of 40 checks passed
Copy link

This PR has been released as part of rollup@4.24.3. You can test it via npm install rollup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0