8000 feat: enable the new compiler by zwarich · Pull Request #8577 · leanprover/lean4 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

feat: enable the new compiler #8577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 20, 2025
Merged

Conversation

zwarich
Copy link
Contributor
@zwarich zwarich commented Jun 1, 2025

This PR enables the new compiler by default, adjusts test results, and updates stage0.

@zwarich zwarich requested a review from leodemoura as a code owner June 1, 2025 19:50
@zwarich zwarich added the changelog-compiler Compiler, runtime, and FFI label Jun 1, 2025
@zwarich zwarich marked this pull request as draft June 1, 2025 19:51
@zwarich
Copy link
Contributor Author
zwarich commented Jun 1, 2025

!bench

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Jun 1, 2025
@leanprover-community-bot
Copy link
Collaborator
leanprover-community-bot commented Jun 1, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 193f59aefeb1fd233656c811444bb014f94e8129 --onto 72141b05fd9a3328c1e5d99211dc4da4495cbd42. You can force Mathlib CI using the force-mathlib-ci label. (2025-06-01 20:16:00)
  • ❗ Mathlib CI can not be attempted yet, as the nightly-testing-2025-06-14 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Mathlib CI should run now. You can force Mathlib CI using the force-mathlib-ci label. (2025-06-14 15:33:52)
  • ❗ Mathlib CI can not be attempted yet, as the nightly-testing-2025-06-15 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Mathlib CI should run now. You can force Mathlib CI using the force-mathlib-ci label. (2025-06-15 22:03:57)
  • 💥 Mathlib branch lean-pr-testing-8577 build failed against this PR. (2025-06-17 21:31:20) View Log
  • 💥 Mathlib branch lean-pr-testing-8577 build failed against this PR. (2025-06-19 22:43:11) View Log
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 50cfe354bee117c3471abe1bbf54780696674566 --onto db499e96aac8ad654c8ed5ab40c4e6885d38c9a1. You can force Mathlib CI using the force-mathlib-ci label. (2025-06-20 13:44:59)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase a8d5982fce107e702ba92042ff51a7545c7f3a0d --onto db499e96aac8ad654c8ed5ab40c4e6885d38c9a1. You can force Mathlib CI using the force-mathlib-ci label. (2025-06-20 15:34:26)
  • ❌ Mathlib branch lean-pr-testing-8577 built against this PR, but testing failed. (2025-06-22 04:34:50) View Log

@zwarich zwarich force-pushed the new_codegen_mathlib branch 2 times, most recently from 70cf3e5 to 52cfb7c Compare June 1, 2025 20:51
@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 52cfb7c.
There were significant changes against commit 193f59a:

  Benchmark                              Metric                   Change
  ===================================================================================
- Init size                              bytes .olean               7.1%
- Init.Data.BitVec.Lemmas                branch-misses              9.6%     (21.2 σ)
- Init.Data.BitVec.Lemmas                branches                   8.3%    (435.0 σ)
- Init.Data.BitVec.Lemmas                instructions               7.7%    (379.0 σ)
- Init.Data.BitVec.Lemmas                wall-clock                 9.3%     (42.3 σ)
- Init.Data.BitVec.Lemmas re-elab        branch-misses              9.3%     (66.0 σ)
- Init.Data.BitVec.Lemmas re-elab        branches                   8.0%    (281.3 σ)
- Init.Data.BitVec.Lemmas re-elab        instructions               7.4%    (247.0 σ)
- Init.Data.List.Sublist async           branches                   8.1%    (342.2 σ)
- Init.Data.List.Sublist async           instructions               8.0%    (290.9 σ)
- Init.Data.List.Sublist re-elab -j4     branches                   7.9%     (53.6 σ)
- Init.Data.List.Sublist re-elab -j4     instructions               7.7%     (53.0 σ)
- Init.Data.List.Sublist re-elab -j4     task-clock                 5.4%     (26.2 σ)
- Init.Data.List.Sublist re-elab -j4     wall-clock                 5.8%     (29.0 σ)
- Init.Prelude async                     branches                  20.9%   (1000.4 σ)
- Init.Prelude async                     instructions              18.7%    (773.0 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branch-misses             11.9%     (39.7 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                   8.1%   (1706.1 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions               8.0%   (1408.9 σ)
- Std.Data.Internal.List.Associative     branch-misses             10.6%     (25.5 σ)
- Std.Data.Internal.List.Associative     branches                   7.5%    (668.6 σ)
- Std.Data.Internal.List.Associative     instructions               7.6%    (605.3 σ)
- big_do                                 branches                9408.3%     (43.8 σ)
- big_do                                 instructions            8224.9%     (35.5 σ)
- big_do                                 maxrss                   898.1%     (78.7 σ)
- big_omega.lean                         branches                   5.4%    (353.3 σ)
- big_omega.lean                         instructions               3.8%    (244.9 σ)
- big_omega.lean MT                      branches                   5.4%    (359.5 σ)
- big_omega.lean MT                      instructions               3.9%    (343.6 σ)
- bv_decide_inequality.lean              branches                   6.6%    (120.6 σ)
- bv_decide_inequality.lean              instructions               5.2%     (97.0 σ)
+ bv_decide_large_aig.lean               branches                  -5.5%   (-345.4 σ)
+ bv_decide_large_aig.lean               instructions              -4.7%   (-361.2 σ)
- bv_decide_mod                          branches                   6.0%     (85.1 σ)
- bv_decide_mod                          instructions               4.9%     (72.9 σ)
- bv_decide_mul                          branches                   8.8%    (440.0 σ)
- bv_decide_mul                          instructions               7.7%    (397.3 σ)
- bv_decide_mul                          wall-clock                 3.9%     (20.6 σ)
- bv_decide_realworld                    branches                  10.0%     (84.9 σ)
- bv_decide_realworld                    instructions               8.2%     (68.1 σ)
- bv_decide_realworld                    wall-clock                 8.6%     (34.2 σ)
- deriv                                  instructions               5.8%   (2287.9 σ)
- identifier auto-completion             branches                   3.4%     (43.9 σ)
- identifier auto-completion             instructions               3.0%     (32.4 σ)
+ identifier auto-completion             maxrss                    -3.7%    (-20.4 σ)
- ilean roundtrip                        branches                   5.7%    (436.8 σ)
- ilean roundtrip                        instructions               5.3%    (444.8 σ)
+ import Lean                            branches                 -15.0%  (-1127.8 σ)
+ import Lean                            instructions             -15.5%   (-914.4 σ)
+ import Lean                            maxrss                   -14.9%  (-1723.8 σ)
+ lake build clean                       instructions              -2.0%    (-34.6 σ)
+ lake build clean                       maxrss                    -8.5%    (-42.4 σ)
+ lake build clean                       task-clock                -6.9%    (-23.4 σ)
- lake build no-op                       instructions               2.3%     (20.1 σ)
- lake config elab                       instructions               9.6%    (567.0 σ)
+ lake config elab                       maxrss                   -12.0%   (-309.3 σ)
+ lake config import                     instructions             -13.7%   (-289.4 σ)
+ lake config import                     maxrss                   -13.2%   (-266.8 σ)
+ lake config tree                       instructions             -15.3%   (-761.9 σ)
+ lake config tree                       maxrss                   -13.1%   (-231.6 σ)
+ lake config tree                       task-clock               -22.5%    (-21.4 σ)
+ lake config tree                       wall-clock               -22.3%    (-21.3 σ)
+ lake env                               instructions             -13.3%   (-670.1 σ)
+ lake env                               maxrss                   -13.1%   (-313.7 σ)
- lake startup                           instructions               1.7%     (55.0 σ)
+ language server startup                instructions              -1.1%    (-20.7 σ)
- liasolver                              instructions               1.0%    (224.2 σ)
+ libleanshared.so                       binary size              -12.1%
- omega_stress.lean async                branch-misses              9.6%     (25.5 σ)
- omega_stress.lean async                branches                   6.2%    (322.2 σ)
- omega_stress.lean async                instructions               5.5%    (246.7 σ)
+ parser                                 instructions              -2.0%   (-460.8 σ)
- rbmap                                  instructions             210.9%  (28329.7 σ)
+ rbmap_1                                instructions             -13.8% (-11726.1 σ)
- rbmap_10                               instructions             144.7%   (9203.8 σ)
- rbmap_library                          instructions             199.4%  (25514.2 σ)
+ reduceMatch                            wall-clock                -4.8%    (-24.8 σ)
- riscv-ast.lean                         branches                   3.2%     (55.8 σ)
- riscv-ast.lean                         instructions               2.6%     (44.2 σ)
- simp_arith1                            branches                   3.3%     (67.2 σ)
+ simp_arith1                            maxrss                   -13.3%   (-115.1 σ)
- stdlib                                 blocked                    8.9%     (64.4 σ)
- stdlib                                 blocked (unaccounted)     25.3%     (68.3 σ)
- stdlib                                 instructions              23.4%  (11237.8 σ)
+ stdlib                                 maxrss                   -14.8%  (-2287.5 σ)
- stdlib                                 tactic execution           9.4%     (33.2 σ)
- stdlib                                 task-clock                12.6%    (132.6 σ)
- stdlib                                 wall-clock                34.8%     (36.9 σ)
- stdlib size                            bytes .olean               6.7%
+ stdlib size                            lines C                  -18.3%
+ stdlib size                            max dynamic symbols      -12.0%
- tests/bench/ interpreted               instructions               6.2%    (534.1 σ)
+ tests/bench/ interpreted               maxrss                   -12.9%   (-276.3 σ)
+ tests/compiler                         sum binary sizes          -8.9%
+ unionfind                              instructions              -1.7%   (-502.3 σ)
+ workspaceSymbols                       instructions             -80.2% (-63155.5 σ)
+ workspaceSymbols                       maxrss                   -15.2%  (-1735.1 σ)
+ workspaceSymbols                       task-clock               -77.1%    (-82.7 σ)
+ workspaceSymbols                       wall-clock               -77.2%    (-82.0 σ)

@zwarich
Copy link
Contributor Author
zwarich commented Jun 1, 2025

The big_do slowdown is due to bad asymptotic behavior in LCNF simp, not the compiled code. The rbmap slowdowns appear to be due to the new compiler being better about using join points, but this appears to cause the reference counting optimizations to make suboptimal decisions (optimizing the first join point it sees, and then doing nothing for the other code paths).

@zwarich
Copy link
Contributor Author
zwarich commented Jun 14, 2025

!bench

@zwarich
Copy link
Contributor Author
zwarich commented Jun 14, 2025

Sadly this won't build Mathlib, even if I push the required changes, because lean4 nightly and mathlib4 nightly-testing are already broken with each other.

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 5ec19d8.
There were significant changes against commit 4445958:

  Benchmark                              Metric                        Change
  =========================================================================================
- Init size                              bytes .olean                    6.7%
- Init.Data.BitVec.Lemmas                branches                        4.9%     (497.0 σ)
- Init.Data.BitVec.Lemmas                instructions                    4.5%     (412.9 σ)
- Init.Data.BitVec.Lemmas re-elab        branches                        4.7%      (43.4 σ)
- Init.Data.BitVec.Lemmas re-elab        instructions                    4.3%      (42.3 σ)
- Init.Data.List.Sublist async           branches                        4.8%     (354.6 σ)
- Init.Data.List.Sublist async           instructions                    4.7%     (362.4 σ)
- Init.Data.List.Sublist re-elab -j4     branches                        4.6%     (101.5 σ)
- Init.Data.List.Sublist re-elab -j4     instructions                    4.5%     (106.3 σ)
- Init.Prelude async                     branches                        8.3%     (261.9 σ)
- Init.Prelude async                     instructions                    6.4%     (176.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                        4.9%     (414.7 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions                    4.7%     (344.9 σ)
- Std.Data.Internal.List.Associative     branches                        5.2%     (584.2 σ)
- Std.Data.Internal.List.Associative     instructions                    5.1%     (498.9 σ)
- big_do                                 branches                      137.9%    (3496.5 σ)
- big_do                                 instructions                  127.6%    (2744.3 σ)
- big_do                                 task-clock                     90.3%     (134.0 σ)
- big_do                                 wall-clock                     92.8%     (134.6 σ)
- big_omega.lean                         branches                        2.6%     (223.4 σ)
- big_omega.lean                         instructions                    1.5%     (173.0 σ)
- big_omega.lean MT                      branches                        2.7%     (597.7 σ)
- big_omega.lean MT                      instructions                    1.6%     (396.4 σ)
- bv_decide_inequality.lean              branches                      117.1%   (11111.5 σ)
- bv_decide_inequality.lean              instructions                   76.8%    (7219.5 σ)
+ bv_decide_inequality.lean              maxrss                         -4.6%    (-608.6 σ)
+ bv_decide_large_aig.lean               branches                       -5.5%    (-220.7 σ)
+ bv_decide_large_aig.lean               instructions                   -4.9%    (-199.9 σ)
- bv_decide_mod                          branches                     1117.6%   (51305.2 σ)
- bv_decide_mod                          instructions                  749.6%   (38282.1 σ)
- bv_decide_mul                          branches                     1554.7%   (29797.6 σ)
- bv_decide_mul                          instructions                 1112.9%   (20634.8 σ)
- bv_decide_mul                          task-clock                    414.9%     (406.2 σ)
- bv_decide_mul                          wall-clock                    412.6%     (380.7 σ)
- bv_decide_realworld                    branches                      200.6%    (1776.1 σ)
- bv_decide_realworld                    instructions                  144.6%    (1183.8 σ)
- bv_decide_realworld                    task-clock                     61.8%      (74.0 σ)
- bv_decide_realworld                    wall-clock                     34.5%      (30.7 σ)
+ channel.lean                           boundedn_seq                  -10.4%
- deriv                                  instructions                    5.8%    (2555.3 σ)
- identifier auto-completion             branches                        2.5%      (49.7 σ)
- identifier auto-completion             instructions                    2.2%      (37.3 σ)
+ identifier auto-completion             maxrss                         -3.9%     (-29.2 σ)
- ilean roundtrip                        branches                        5.6%    (3182.3 σ)
- ilean roundtrip                        instructions                    5.3%     (703.4 σ)
+ import Lean                            branches                      -17.7%   (-1489.9 σ)
+ import Lean                            instructions                  -18.0%   (-1383.6 σ)
+ import Lean                            maxrss                        -15.3%    (-150.8 σ)
+ lake build clean                       instructions                   -5.9%    (-251.5 σ)
+ lake build clean                       maxrss                         -9.0%     (-86.3 σ)
+ lake build clean                       wall-clock                     -6.6%     (-31.4 σ)
+ lake config elab                       instructions                   -3.7%    (-962.3 σ)
+ lake config elab                       maxrss                        -12.7%    (-247.5 σ)
+ lake config import                     instructions                  -15.3%    (-985.5 σ)
+ lake config import                     maxrss                        -13.4%    (-169.0 σ)
+ lake config tree                       instructions                  -16.9%   (-1116.9 σ)
+ lake config tree                       maxrss                        -13.4%    (-536.2 σ)
+ lake env                               instructions                  -15.1%    (-775.1 σ)
+ lake env                               maxrss                        -13.4%    (-178.9 σ)
- lake startup                           instructions                    1.4%      (27.5 σ)
+ language server startup                branches                       -7.3%   (-1030.1 σ)
+ language server startup                instructions                   -8.7%   (-1509.4 σ)
+ language server startup                maxrss                         -6.3%     (-24.2 σ)
+ liasolver                              instructions                   -2.8%    (-393.6 σ)
+ libleanshared.so                       binary size                   -14.2%
- omega_stress.lean async                branches                        1.9%      (99.6 σ)
- omega_stress.lean async                instructions                    1.6%      (73.1 σ)
- parser                                 instructions                    1.8%    (1858.9 σ)
+ qsort                                  task-clock                     -8.8%     (-24.2 σ)
+ qsort                                  wall-clock                     -8.8%     (-24.2 σ)
- rbmap                                  instructions                  210.9%   (26898.4 σ)
+ rbmap_1                                instructions                  -13.8%  (-10142.3 σ)
- rbmap_10                               instructions                  144.7%   (51644.3 σ)
- rbmap_10                               task-clock                     94.3%      (70.6 σ)
- rbmap_10                               wall-clock                     94.2%      (70.0 σ)
- rbmap_library                          instructions                  199.1%   (25320.7 σ)
- rbmap_library                          task-clock                    169.6%      (81.2 σ)
- rbmap_library                          wall-clock                    169.3%      (81.4 σ)
+ reduceMatch                            instructions                   -3.1%    (-182.4 σ)
+ simp_arith1                            branches                       -5.8%    (-106.5 σ)
+ simp_arith1                            instructions                   -7.0%    (-107.2 σ)
+ simp_arith1                            maxrss                        -13.8%    (-134.2 σ)
+ stdlib                                 blocked (unaccounted)         -18.6%    (-663.3 σ)
+ stdlib                                 grind                         -13.9%     (-69.4 σ)
+ stdlib                                 instructions                   -5.4%   (-3489.6 σ)
+ stdlib                                 maxrss                        -14.9%    (-229.7 σ)
- stdlib                                 number of imported bytes        2.1%
+ stdlib                                 number of imported consts     -54.9%
- stdlib                                 number of imported entries     21.6%
+ stdlib                                 task-clock                     -5.9%     (-53.5 σ)
+ stdlib                                 wall-clock                     -5.6%     (-57.0 σ)
- stdlib size                            bytes .olean                    4.9%
+ stdlib size                            lines C                       -20.3%
+ stdlib size                            max dynamic symbols           -13.2%
- tests/bench/ interpreted               instructions                    2.2%     (432.3 σ)
+ tests/bench/ interpreted               maxrss                        -13.3%   (-3577.1 σ)
+ tests/compiler                         sum binary sizes              -10.4%
+ unionfind                              instructions                   -1.7%   (-3482.5 σ)
+ unionfind                              task-clock                    -13.7%     (-33.4 σ)
+ unionfind                              wall-clock                    -13.7%     (-33.7 σ)
+ workspaceSymbols                       instructions                  -81.2% (-446754.1 σ)
+ workspaceSymbols                       maxrss                        -15.9%   (-1270.3 σ)
+ workspaceSymbols                       task-clock                    -79.0%   (-3585.1 σ)
+ workspaceSymbols                       wall-clock                    -79.0%   (-3654.8 σ)

@zwarich
Copy link
Contributor Author
zwarich commented Jun 14, 2025

The benchmark numbers generally look improved.

Elaboration times are actually improved when testing files other than Prelude, for reasons I have not yet discovered.

The remaining big_do regression is mostly due to the old compiler inlining all local fun decls, and forcing the new compiler to do the same makes this way closer. There are a few different things we could do to improve this, but I don’t really feel like tweaking inlining heuristics (especially for an individual test) until we add recursive join points.

The rbmap regression is due to reuse optimizations being pessimistic in the face of increased usage of join points. It is possible to fix this (and in fact is probably required for correctness with recursion JPs), but it my attempt fails for other reasons (e.g. similar bugs in later passes) I will defer it until after we enable the new compiler.

The only new thing here is a large number of bv_decide regressions, which seem so bad that they must be due to a shared linearity regression. Hopefully it won’t be too difficult to debug the underlying cause.

@zwarich zwarich force-pushed the new_codegen_mathlib branch 2 times, most recently from 30bf42a to 820593b Compare June 15, 2025 21:31
@zwarich
Copy link
Contributor Author
zwarich commented Jun 15, 2025

!bench

@zwarich
Copy link
Contributor Author
zwarich commented Jun 15, 2025

(this is testing a fix for the bv_decide linearity issue)

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 820593b.
There were significant changes against commit 957b904:

  Benchmark                              Metric                       Change
  =========================================================================================
- Init size                              bytes .olean                   6.8%
- Init.Data.BitVec.Lemmas                branches                       4.6%      (259.8 σ)
- Init.Data.BitVec.Lemmas                instructions                   4.3%      (220.6 σ)
- Init.Data.BitVec.Lemmas re-elab        branches                       4.6%       (52.3 σ)
- Init.Data.BitVec.Lemmas re-elab        instructions                   4.3%       (51.4 σ)
- Init.Data.List.Sublist async           branches                       4.7%       (79.3 σ)
- Init.Data.List.Sublist async           instructions                   4.7%       (70.0 σ)
- Init.Data.List.Sublist re-elab -j4     branches                       4.6%      (421.6 σ)
- Init.Data.List.Sublist re-elab -j4     instructions                   4.6%      (337.1 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss                        -8.8%      (-22.7 σ)
- Init.Prelude async                     branches                       8.3%      (357.1 σ)
- Init.Prelude async                     instructions                   6.5%      (266.3 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                       4.9%     (1611.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions                   4.7%     (1296.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas   task-clock                     5.0%       (34.1 σ)
- Std.Data.Internal.List.Associative     branches                       5.1%      (598.3 σ)
- Std.Data.Internal.List.Associative     instructions                   5.1%      (503.7 σ)
- big_do                                 branches                     175.8%    (10083.7 σ)
- big_do                                 instructions                 160.5%     (8193.4 σ)
- big_omega.lean                         branches                       2.6%     (1398.5 σ)
- big_omega.lean                         instructions                   1.6%      (574.0 σ)
- big_omega.lean MT                      branches                       2.6%      (240.1 σ)
- big_omega.lean MT                      instructions                   1.7%      (216.4 σ)
- bv_decide_inequality.lean              branches                       5.3%      (421.0 σ)
- bv_decide_inequality.lean              instructions                   4.0%      (284.5 σ)
+ bv_decide_large_aig.lean               branches                      -5.4%     (-102.7 σ)
+ bv_decide_large_aig.lean               instructions                  -4.8%      (-86.6 σ)
+ bv_decide_mod                          maxrss                        -1.1%      (-21.0 σ)
- bv_decide_mul                          branches                       8.1%      (613.9 σ)
- bv_decide_mul                          instructions                   6.8%      (780.7 σ)
+ bv_decide_mul                          maxrss                        -2.5%      (-23.7 σ)
- bv_decide_realworld                    branches                       6.2%       (86.1 σ)
- bv_decide_realworld                    instructions                   4.9%       (67.0 σ)
- deriv                                  instructions                   5.8%     (4666.5 σ)
- identifier auto-completion             branches                       2.6%       (84.8 σ)
- identifier auto-completion             instructions                   2.3%       (62.9 σ)
+ identifier auto-completion             maxrss                        -3.9%      (-22.7 σ)
- ilean roundtrip                        branches                       5.7%      (438.9 σ)
- ilean roundtrip                        instructions                   5.3%      (497.5 σ)
+ import Lean                            branches                     -17.9%    (-1724.8 σ)
+ import Lean                            instructions                 -18.1%    (-1586.6 σ)
+ import Lean                            maxrss                       -15.4%     (-561.0 σ)
+ import Lean                            task-clock                   -19.1%      (-91.7 σ)
+ import Lean                            wall-clock                   -19.0%      (-84.5 σ)
+ lake build clean                       instructions                  -5.9%      (-96.4 σ)
+ lake build clean                       task-clock                    -8.1%     (-150.9 σ)
+ lake build no-op                       wall-clock                    -6.6%      (-22.5 σ)
+ lake config elab                       instructions                  -3.4%     (-205.4 σ)
+ lake config elab                       maxrss                       -12.3%     (-260.0 σ)
+ lake config import                     instructions                 -15.5%    (-1117.6 σ)
+ lake config import                     maxrss                       -13.5%     (-387.4 σ)
+ lake config tree                       instructions                 -17.1%    (-1734.3 σ)
+ lake config tree                       maxrss                       -13.5%     (-365.0 σ)
+ lake env                               instructions                 -15.2%     (-516.2 σ)
+ lake env                               maxrss                       -13.5%     (-383.3 σ)
- lake startup                           instructions                   1.2%       (21.9 σ)
+ language server startup                branches                      -7.4%     (-177.8 σ)
+ language server startup                instructions                  -8.8%     (-179.8 σ)
+ language server startup                maxrss                        -6.3%     (-214.1 σ)
+ liasolver                              instructions                  -2.9%    (-2038.4 σ)
+ libleanshared.so                       binary size                  -14.1%
- omega_stress.lean async                branches                       1.8%      (134.7 σ)
- omega_stress.lean async                instructions                   1.5%       (94.6 σ)
- parser                                 instructions                   3.8%      (950.5 σ)
- rbmap                                  instructions                 210.9%    (23125.6 σ)
+ rbmap_1                                instructions                 -13.8%   (-14868.5 σ)
- rbmap_10                               instructions                 144.7%    (29030.0 σ)
- rbmap_library                          instructions                 199.2%    (27553.8 σ)
- rbmap_library                          task-clock                   202.8%       (21.0 σ)
- rbmap_library                          wall-clock                   202.2%       (21.0 σ)
+ reduceMatch                            instructions                  -3.1%      (-39.2 σ)
+ simp_arith1                            branches                      -5.7%     (-169.0 σ)
+ simp_arith1                            instructions                  -6.9%     (-181.5 σ)
+ simp_arith1                            maxrss                       -13.4%     (-101.8 σ)
+ stdlib                                 blocked (unaccounted)        -12.2%     (-112.8 σ)
+ stdlib                                 grind dsimp                  -28.3%      (-47.7 σ)
+ stdlib                                 instructions                  -3.6%    (-1404.6 σ)
+ stdlib                                 maxrss                       -15.0%     (-260.4 σ)
- stdlib                                 number of imported bytes       2.1%
+ stdlib                                 number of imported consts    -54.9%
- stdlib                                 number of imported entries    21.6%
- stdlib size                            bytes .olean                   4.9%
+ stdlib size                            lines C                      -20.1%
+ stdlib size                            max dynamic symbols          -13.1%
- tests/bench/ interpreted               instructions                   2.2%     (1017.7 σ)
+ tests/bench/ interpreted               maxrss                       -13.4%     (-115.3 σ)
+ tests/compiler                         sum binary sizes             -10.3%
+ unionfind                              instructions                  -1.7%     (-364.5 σ)
+ workspaceSymbols                       instructions                 -81.2% (-4395347.8 σ)
+ workspaceSymbols                       maxrss                       -15.7%      (-97.8 σ)
+ workspaceSymbols                       task-clock                   -78.8%    (-2071.8 σ)
+ workspaceSymbols                       wall-clock                   -78.8%    (-1879.7 σ)

@tobiasgrosser
Copy link
Contributor

It seems the last PR rebase has broken the release CI for this PR. Not sure what the issue is, but merging master has fixed it for me. Not sure this warrants a deep investigation, but I would appreciate a rebase to see if this unbreaks our experimental uses of the new codegen.

@zwarich
Copy link
Contributor Author
zwarich commented Jun 17, 2025

It seems the last PR rebase has broken the release CI for this PR. Not sure what the issue is, but merging master has fixed it for me. Not sure this warrants a deep investigation, but I would appreciate a rebase to see if this unbreaks our experimental uses of the new codegen.

I was going to rebase again to see what effect some perf fixes I made have on the bench results, so I'll do that now. There's one more blocking issue for Mathlib (introduced by a new use of Batteries' open private in Mathlib), which I'll hopefully fix soon. After that, there are no true blockers, but I am going to see whether I can easily fix the pessimism in the RC reuse optimizations caused by the new compiler's increased use of join points.

@zwarich zwarich force-pushed the new_codegen_mathlib branch from 820593b to 656e8e1 Compare June 17, 2025 20:58
@zwarich
Copy link
Contributor Author
zwarich commented Jun 17, 2025

!bench

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Jun 17, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Jun 17, 2025
@leanprover-community-bot leanprover-community-bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Jun 17, 2025
@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 656e8e1.
There were significant changes against commit 3b2990b:

  Benchmark                              Metric                       Change
  =======================================================================================
- Init size                              bytes .olean                   6.6%
- Init.Data.BitVec.Lemmas                branches                       4.7%    (540.7 σ)
- Init.Data.BitVec.Lemmas                instructions                   4.3%    (456.7 σ)
- Init.Data.BitVec.Lemmas re-elab        branch-misses                  1.6%     (23.9 σ)
- Init.Data.BitVec.Lemmas re-elab        branches                       4.4%     (65.5 σ)
- Init.Data.BitVec.Lemmas re-elab        instructions                   4.2%     (68.4 σ)
- Init.Data.List.Sublist async           branches                       4.8%    (307.3 σ)
- Init.Data.List.Sublist async           instructions                   4.8%    (272.9 σ)
- Init.Data.List.Sublist re-elab -j4     branches                       4.6%    (203.7 σ)
- Init.Data.List.Sublist re-elab -j4     instructions                   4.6%    (240.5 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss                        -9.2%   (-110.1 σ)
- Init.Prelude async                     branches                       8.1%    (514.6 σ)
- Init.Prelude async                     instructions                   6.3%    (381.7 σ)
- Init.Prelude async                     maxrss                         1.9%     (21.2 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                       4.8%    (557.9 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions                   4.7%    (456.7 σ)
- Std.Data.Internal.List.Associative     branches                       5.2%    (441.3 σ)
- Std.Data.Internal.List.Associative     instructions                   5.1%    (393.9 σ)
- big_do                                 branches                     175.8%   (3775.6 σ)
- big_do                                 instructions                 160.4%   (2889.5 σ)
- big_omega.lean                         branches                       2.6%    (236.3 σ)
- big_omega.lean                         instructions                   1.5%    (285.2 σ)
- big_omega.lean MT                      branches                       2.6%    (563.9 σ)
- big_omega.lean MT                      instructions                   1.7%    (450.3 σ)
- bv_decide_inequality.lean              branches                       5.4%     (35.6 σ)
- bv_decide_inequality.lean              instructions                   4.5%     (29.6 σ)
+ bv_decide_inequality.lean              maxrss                        -1.2%    (-67.0 σ)
+ bv_decide_large_aig.lean               branches                      -5.3%    (-78.4 σ)
+ bv_decide_large_aig.lean               instructions                  -4.6%    (-70.2 σ)
- bv_decide_mod                          branches                       5.5%    (484.1 σ)
- bv_decide_mod                          instructions                   4.8%    (487.0 σ)
- bv_decide_mul                          branches                       8.0%    (204.1 σ)
- bv_decide_mul                          instructions                   7.3%    (173.6 σ)
+ bv_decide_mul                          maxrss                        -2.5%    (-30.3 σ)
- bv_decide_realworld                    branches                       6.1%    (116.4 σ)
- bv_decide_realworld                    instructions                   5.1%     (95.2 σ)
+ channel.lean                           boundedn_seq                 -18.9%
- deriv                                  instructions                   5.8%   (3976.8 σ)
- deriv                                  task-clock                     6.7%     (30.4 σ)
- deriv                                  wall-clock                     6.7%     (29.9 σ)
- identifier auto-completion             branches                       1.7%     (62.5 σ)
- identifier auto-completion             instructions                   1.9%     (60.9 σ)
+ identifier auto-completion             maxrss                        -3.7%    (-23.6 σ)
- ilean roundtrip                        branches                       4.2%    (500.6 σ)
- ilean roundtrip                        instructions                   4.8%   (1564.5 σ)
+ import Lean                            branches                     -17.5%  (-5182.0 σ)
+ import Lean                            instructions                 -17.6%  (-4834.3 σ)
+ import Lean                            maxrss                       -15.3%   (-532.7 σ)
+ lake build clean                       instructions                  -5.2%   (-341.3 σ)
+ lake build clean                       maxrss                        -9.5%    (-93.2 σ)
+ lake build clean                       task-clock                    -6.8%    (-27.6 σ)
+ lake build clean                       wall-clock                    -6.1%    (-53.4 σ)
+ lake config elab                       instructions                  -3.5%   (-396.5 σ)
+ lake config elab                       maxrss                       -12.2%   (-117.2 σ)
+ lake config import                     instructions                 -15.5%   (-677.4 σ)
+ lake config import                     maxrss                       -13.3%   (-291.6 σ)
+ lake config tree                       instructions                 -17.0%  (-1444.0 σ)
+ lake config tree                       maxrss                       -13.3%   (-326.6 σ)
+ lake env                               instructions                 -15.2%  (-1339.6 σ)
+ lake env                               maxrss                       -13.3%   (-365.3 σ)
- lake startup                           instructions                   1.3%     (34.7 σ)
+ language server startup                branches                      -7.2%   (-189.9 σ)
+ language server startup                instructions                  -8.5%   (-175.5 σ)
+ language server startup                maxrss                        -6.6%    (-43.1 σ)
+ liasolver                              instructions                  -2.8%   (-557.7 σ)
+ liasolver                              wall-clock                    -3.3%    (-23.8 σ)
+ libleanshared.so                       binary size                  -14.3%
- omega_stress.lean async                branches                       1.8%    (483.9 σ)
- omega_stress.lean async                instructions                   1.6%    (346.5 σ)
- parser                                 instructions                   3.8%    (802.6 σ)
- rbmap                                  instructions                 210.9%  (37713.4 σ)
+ rbmap_1                                instructions                 -13.8% (-10366.5 σ)
+ rbmap_1                                task-clock                    -4.3%    (-46.4 σ)
+ rbmap_1                                wall-clock                    -4.3%    (-46.3 σ)
- rbmap_10                               instructions                 144.7%  (23617.1 σ)
- rbmap_library                          instructions                 199.1%  (24925.3 σ)
- rbmap_library                          task-clock                   204.2%     (20.3 σ)
- rbmap_library                          wall-clock                   203.7%     (20.3 σ)
+ reduceMatch                            instructions                  -3.0%    (-66.1 σ)
- riscv-ast.lean                         branch-misses                  6.8%     (49.7 σ)
+ simp_arith1                            branches                      -5.6%   (-192.0 σ)
+ simp_arith1                            instructions                  -6.7%   (-200.0 σ)
+ simp_arith1                            maxrss                       -13.8%   (-847.1 σ)
+ stdlib                                 blocked (unaccounted)         -9.4%    (-21.1 σ)
+ stdlib                                 instructions                  -3.6%  (-5166.1 σ)
+ stdlib                                 maxrss                       -15.0%   (-332.4 σ)
- stdlib                                 number of imported bytes       2.0%
+ stdlib                                 number of imported consts    -55.3%
- stdlib                                 number of imported entries    21.4%
+ stdlib                                 type checking                 -1.5%    (-58.8 σ)
- stdlib size                            bytes .olean                   4.9%
+ stdlib size                            lines C                      -20.3%
+ stdlib size                            max dynamic symbols          -13.1%
- tests/bench/ interpreted               instructions                   2.2%    (332.2 σ)
+ tests/bench/ interpreted               maxrss                       -13.5%   (-305.1 σ)
+ tests/compiler                         sum binary sizes             -10.5%
+ unionfind                              instructions                  -1.7%   (-670.6 σ)
+ workspaceSymbols                       instructions                 -81.3% (-33935.2 σ)
+ workspaceSymbols                       maxrss                       -15.9%   (-262.2 σ)
+ workspaceSymbols                       task-clock                   -79.9%   (-123.0 σ)
+ workspaceSymbols                       wall-clock                   -79.9%   (-122.4 σ)

@zwarich zwarich force-pushed the new_codegen_mathlib branch from 656e8e1 to d733277 Compare June 17, 2025 22:04
@zwarich
Copy link
Contributor Author
zwarich commented Jun 17, 2025

!bench

@zwarich zwarich closed this Jun 17, 2025
@zwarich zwarich reopened this Jun 17, 2025
@zwarich
Copy link
Contributor Author
zwarich commented Jun 17, 2025

(testing some minor inlining changes)

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit d733277.
There were significant changes against commit 3b2990b:

  Benchmark                              Metric                       Change
  ========================================================================================
- Init size                              bytes .olean                   6.6%
- Init.Data.BitVec.Lemmas                branches                       4.6%     (455.1 σ)
- Init.Data.BitVec.Lemmas                instructions                   4.2%     (362.5 σ)
- Init.Data.BitVec.Lemmas re-elab        branch-misses                  1.3%      (41.6 σ)
- Init.Data.BitVec.Lemmas re-elab        branches                       4.3%      (63.5 σ)
- Init.Data.BitVec.Lemmas re-elab        instructions                   4.1%      (60.9 σ)
- Init.Data.List.Sublist async           branches                       4.7%     (127.0 σ)
- Init.Data.List.Sublist async           instructions                   4.7%     (118.9 σ)
- Init.Data.List.Sublist re-elab -j4     branches                       4.5%      (65.9 σ)
- Init.Data.List.Sublist re-elab -j4     instructions                   4.5%      (69.0 σ)
- Init.Data.List.Sublist re-elab -j4     task-clock                     1.8%      (34.7 σ)
- Init.Prelude async                     branches                       7.9%     (398.6 σ)
- Init.Prelude async                     instructions                   6.0%     (315.6 σ)
- Std.Data.DHashMap.Internal.RawLemmas   branches                       4.8%     (552.3 σ)
- Std.Data.DHashMap.Internal.RawLemmas   instructions                   4.6%     (470.6 σ)
- Std.Data.Internal.List.Associative     branches                       5.0%     (894.9 σ)
- Std.Data.Internal.List.Associative     instructions                   5.0%     (770.5 σ)
- big_do                                 branches                     175.5%    (6456.2 σ)
- big_do                                 instructions                 160.1%    (5143.9 σ)
- big_omega.lean                         branches                       2.6%     (166.6 σ)
- big_omega.lean                         instructions                   1.6%     (181.9 σ)
- big_omega.lean MT                      branches                       2.6%     (224.0 σ)
- big_omega.lean MT                      instructions                   1.7%     (200.3 σ)
- bv_decide_inequality.lean              branches                       5.2%      (51.9 σ)
- bv_decide_inequality.lean              instructions                   4.3%      (51.0 σ)
+ bv_decide_large_aig.lean               branches                      -5.4%    (-187.1 σ)
+ bv_decide_large_aig.lean               instructions                  -4.6%    (-165.7 σ)
- bv_decide_mod                          branches                       5.4%     (336.0 σ)
- bv_decide_mod                          instructions                   4.7%     (236.6 σ)
- bv_decide_mul                          branches                       7.9%     (580.5 σ)
- bv_decide_mul                          instructions                   7.2%     (749.6 σ)
- bv_decide_realworld                    branches                       5.8%     (238.1 σ)
- bv_decide_realworld                    instructions                   4.9%     (220.4 σ)
- deriv                                  instructions                   5.9%    (9233.2 σ)
- identifier auto-completion             branches                       1.8%      (67.3 σ)
- identifier auto-completion             instructions                   2.0%      (59.3 σ)
+ identifier auto-completion             maxrss                        -3.7%     (-33.4 σ)
- ilean roundtrip                        branches                       4.1%     (261.4 σ)
- ilean roundtrip                        instructions                   4.8%     (348.3 σ)
+ import Lean                            branches                     -17.4%    (-883.4 σ)
+ import Lean                            instructions                 -17.4%    (-761.6 σ)
+ import Lean                            maxrss                       -15.4%   (-1218.9 σ)
+ lake build clean                       instructions                  -5.1%    (-133.4 σ)
+ lake build clean                       maxrss                       -10.1%     (-32.8 σ)
+ lake build clean                       task-clock                    -7.3%     (-95.4 σ)
+ lake build clean                       wall-clock                    -6.3%     (-99.0 σ)
+ lake config elab                       instructions                  -3.5%    (-351.8 σ)
+ lake config elab                       maxrss                       -12.2%    (-295.3 σ)
+ lake config import                     instructions                 -15.3%    (-625.4 σ)
+ lake config import                     maxrss                       -13.3%    (-206.0 σ)
+ lake config tree                       instructions                 -16.9%    (-614.7 σ)
+ lake config tree                       maxrss                       -13.3%    (-266.0 σ)
+ lake env                               instructions                 -15.1%    (-923.1 σ)
+ lake env                               maxrss                       -13.3%    (-295.3 σ)
- lake startup                           instructions                   1.5%      (85.4 σ)
+ language server startup                branches                      -7.2%    (-248.6 σ)
+ language server startup                instructions                  -8.4%    (-256.8 σ)
+ language server startup                maxrss                        -6.3%     (-22.0 σ)
+ liasolver                              instructions                  -3.0%    (-419.0 σ)
+ libleanshared.so                       binary size                  -13.3%
- omega_stress.lean async                branches                       1.8%      (46.9 σ)
- omega_stress.lean async                instructions                   1.6%      (40.2 σ)
- parser                                 instructions                   3.3%    (1539.5 σ)
+ qsort                                  instructions                  -1.0%    (-331.4 σ)
- rbmap                                  instructions                 210.9%   (29016.0 σ)
+ rbmap_1                                instructions                 -13.8%    (-171.3 σ)
- rbmap_10                               instructions                 144.7%   (76220.4 σ)
- rbmap_library                          instructions                 199.6%   (33843.1 σ)
+ reduceMatch                            instructions                  -3.0%     (-94.3 σ)
+ simp_arith1                            branches                      -5.6%     (-55.9 σ)
+ simp_arith1                            instructions                  -6.8%     (-55.5 σ)
+ simp_arith1                            maxrss                       -13.7%     (-63.3 σ)
+ stdlib                                 grind canon                  -11.1%     (-91.2 σ)
- stdlib                                 grind simp                     5.4%      (57.4 σ)
+ stdlib                                 instructions                  -3.7%   (-5923.2 σ)
+ stdlib                                 maxrss                       -11.1%     (-33.1 σ)
- stdlib                                 number of imported bytes       2.2%
+ stdlib                                 number of imported consts    -55.3%
- stdlib                                 number of imported entries    21.5%
- stdlib                                 tactic execution               3.5%      (75.9 σ)
+ stdlib                                 task-clock                    -3.6%    (-129.8 σ)
+ stdlib                                 type checking                 -1.6%     (-20.3 σ)
- stdlib size                            bytes .olean                   5.5%
+ stdlib size                            lines C                      -18.9%
+ stdlib size                            max dynamic symbols          -12.8%
- tests/bench/ interpreted               instructions                   1.8%      (99.6 σ)
+ tests/bench/ interpreted               maxrss                       -13.4%    (-924.5 σ)
- tests/bench/ interpreted               task-clock                     7.0%     (121.0 σ)
+ tests/compiler                         sum binary sizes              -9.8%
+ unionfind                              instructions                  -2.6%    (-785.5 σ)
+ workspaceSymbols                       instructions                 -81.4% (-181653.9 σ)
+ workspaceSymbols                       maxrss                       -16.0%   (-1438.0 σ)
+ workspaceSymbols                       task-clock                   -80.1%    (-173.7 σ)
+ workspaceSymbols                       wall-clock                   -80.1%    (-172.6 σ)

@zwarich zwarich force-pushed the new_codegen_mathlib branch 2 times, most recently from 656e8e1 to abfcb61 Compare June 19, 2025 22:11
leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Jun 19, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Jun 19, 2025
@zwarich zwarich marked this pull request as ready for review June 19, 2025 23:20
@zwarich
Copy link
Contributor Author
zwarich commented Jun 19, 2025

I was unable to push to the mathlib4 branch for this PR, so I made my own copy with the required changes: https://github.com/zwarich/mathlib4/tree/lean-pr-testing-8577

@zwarich zwarich added release-ci Enable all CI checks for a PR, like is done for releases merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. and removed release-ci Enable all CI checks for a PR, like is done for releases labels Jun 19, 2025
@zwarich zwarich force-pushed the new_codegen_mathlib branch from abfcb61 to f888317 Compare June 20, 2025 13:18 10000
@zwarich zwarich changed the title feat: enable the new compiler by default feat: enable the new compiler Jun 20, 2025
@zwarich zwarich force-pushed the new_codegen_mathlib branch from f888317 to 1a80f32 Compare June 20, 2025 13:50
@zwarich zwarich added release-ci Enable all CI checks for a PR, like is done for releases and removed merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. labels Jun 20, 2025
@nomeata nomeata merged commit 26b7e49 into leanprover:master Jun 20, 2025
24 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan changelog-compiler Compiler, runtime, and FFI release-ci Enable all CI checks for a PR, like is done for releases toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0