8000 chore: update to mimalloc 3.0.3 beta by Kha · Pull Request #7786 · leanprover/lean4 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

chore: update to mimalloc 3.0.3 beta #7786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Kha
Copy link
Member
@Kha Kha commented Apr 2, 2025

For testing, not intended to be merged before a stable release of mimalloc 3

@Kha Kha changed the title try mimalloc 3.0.3 chore: update to mimalloc 3.0.3 beta Apr 2, 2025
@Kha Kha marked this pull request as draft April 2, 2025 11:12
@Kha
Copy link
Member Author
Kha commented Apr 2, 2025

!bench

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 2, 2025
@leanprover-community-bot
Copy link
Collaborator
leanprover-community-bot commented Apr 2, 2025

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase bd24ca3093b314b348c63826f7a868e6433b6acc --onto 911ea07a73733650eefa5240652404856bdcceb7. You can force Mathlib CI using the force-mathlib-ci label. (2025-04-02 11:40:16)
  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase dedfbaf521084ae78d0a826de8b3bab8fd5a3ebd --onto 911ea07a73733650eefa5240652404856bdcceb7. You can force Mathlib CI using the force-mathlib-ci label. (2025-04-02 13:52:57)
  • ✅ Mathlib branch lean-pr-testing-7786 has successfully built against this PR. (2025-05-14 12:33:33) View Log
  • ✅ Mathlib branch lean-pr-testing-7786 has successfully built against this PR. (2025-06-20 12:17:00) View Log

@leanprover leanprover deleted a comment from leanprover-bot Apr 2, 2025
@Kha Kha force-pushed the push-qnqpwnuqmnnu branch from eb7a7be to 5e45ad6 Compare April 2, 2025 13:16
@Kha
Copy link
Member Author
Kha commented Apr 2, 2025

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 5e45ad6.
There were significant changes against commit dedfbaf:

  Benchmark                              Metric          Change
  ==========================================================================
+ Init.Data.BitVec.Lemmas                branches         -1.2%    (-96.3 σ)
+ Init.Data.BitVec.Lemmas                maxrss           -7.2%    (-11.9 σ)
+ Init.Data.BitVec.Lemmas re-elab        branches         -1.1%    (-53.5 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -7.7%   (-685.7 σ)
+ Init.Data.List.Sublist async           branches         -1.3%   (-150.0 σ)
+ Init.Data.List.Sublist async           maxrss          -12.1%    (-29.5 σ)
+ Init.Data.List.Sublist re-elab -j4     branches         -1.8%    (-15.9 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -35.3%    (-64.7 σ)
+ Init.Prelude async                     maxrss           -8.7%    (-22.5 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   branches         -1.8%   (-559.2 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -4.5%    (-11.1 σ)
+ Std.Data.Internal.List.Associative     branches         -1.8%    (-91.1 σ)
- big_do                                 instructions      1.5%     (84.0 σ)
+ big_do                                 maxrss           -6.3%    (-74.1 σ)
+ big_omega.lean                         maxrss           -1.8%   (-100.3 σ)
- big_omega.lean                         task-clock        5.3%     (10.1 σ)
- big_omega.lean                         wall-clock        5.4%     (13.0 σ)
+ big_omega.lean MT                      maxrss           -1.6%   (-363.5 σ)
- big_omega.lean MT                      task-clock        5.1%     (12.6 σ)
- big_omega.lean MT                      wall-clock        5.1%     (10.8 σ)
+ binarytrees                            instructions     -2.3%   (-870.4 σ)
+ binarytrees                            maxrss          -38.9%   (-348.6 σ)
+ binarytrees.st                         instructions     -1.5%   (-492.2 σ)
+ binarytrees.st                         maxrss           -1.0%
- bv_decide_inequality.lean              instructions      1.3%     (14.7 σ)
- bv_decide_inequality.lean              maxrss            5.3%     (38.4 σ)
- bv_decide_large_aig.lean               branch-misses    44.1%     (41.1 σ)
- bv_decide_large_aig.lean               branches          1.7%     (16.9 σ)
- bv_decide_large_aig.lean               instructions      2.5%     (24.8 σ)
+ bv_decide_large_aig.lean               maxrss          -10.4%    (-37.4 σ)
- bv_decide_mul                          maxrss            1.5%     (33.2 σ)
+ bv_decide_realworld                    maxrss          -12.1%    (-24.2 σ)
- bv_decide_realworld                    task-clock        4.8%     (11.3 σ)
- bv_decide_realworld                    wall-clock        4.5%     (14.6 σ)
- const_fold                             instructions     23.9%    (175.2 σ)
- const_fold                             task-clock       26.5%     (24.0 σ)
- const_fold                             wall-clock       26.5%     (23.9 σ)
- deriv                                  instructions     36.0%    (488.4 σ)
- deriv                                  task-clock       52.7%     (22.0 σ)
- deriv                                  wall-clock       52.6%     (22.0 σ)
- identifier auto-completion             bran
8000
ches          4.1%     (78.2 σ)
- identifier auto-completion             instructions      5.2%     (90.7 σ)
+ identifier auto-completion             maxrss           -4.2%    (-55.6 σ)
- identifier auto-completion             task-clock       10.7%     (16.8 σ)
- ilean roundtrip                        branches          1.1%     (30.2 σ)
- ilean roundtrip                        compress         11.2%     (80.1 σ)
- ilean roundtrip                        instructions      2.6%     (79.0 σ)
- ilean roundtrip                        task-clock       14.5%     (14.1 σ)
- ilean roundtrip                        wall-clock       14.5%     (14.2 σ)
- import Lean                            branches          5.1%    (136.6 σ)
- import Lean                            instructions      6.3%    (190.8 σ)
+ import Lean                            maxrss           -1.1%    (-14.4 σ)
- lake build clean                       instructions      4.5%    (118.4 σ)
+ lake build clean                       maxrss          -13.5%   (-139.3 σ)
- lake build no-op                       instructions      2.5%     (35.2 σ)
+ lake build no-op                       maxrss           -7.5%    (-95.1 σ)
- lake config elab                       instructions      2.5%    (494.6 σ)
+ lake config elab                       maxrss           -2.1%    (-60.1 σ)
- lake config import                     instructions      6.4%    (260.0 σ)
- lake config tree                       instructions      6.2%    (167.7 σ)
- lake env                               instructions      6.6%    (312.1 σ)
- lake env                               task-clock       20.8%     (20.9 σ)
- lake env                               wall-clock       20.8%     (21.0 σ)
- lake startup                           instructions      5.6%    (114.5 σ)
- language server startup                branches          3.8%     (45.5 σ)
- language server startup                instructions      4.9%     (51.1 σ)
+ language server startup                maxrss           -4.1%   (-121.9 σ)
+ nat_repr                               branches         -2.4%   (-450.5 σ)
+ nat_repr                               instructions     -1.3%   (-212.4 σ)
- omega_stress.lean async                instructions      1.2%     (45.7 σ)
+ omega_stress.lean async                maxrss           -6.1%    (-32.2 σ)
- rbmap                                  instructions      2.6%    (384.3 σ)
+ rbmap                                  maxrss           -1.6%
- rbmap_1                                instructions     57.8%    (632.1 σ)
- rbmap_1                                task-clock       50.2%     (21.4 σ)
- rbmap_1                                wall-clock       50.1%     (21.5 σ)
- rbmap_10                               instructions     18.6%    (585.6 σ)
- rbmap_10                               task-clock       28.5%     (11.3 σ)
- rbmap_10                               wall-clock       28.5%     (11.2 σ)
- rbmap_fbip                             instructions      1.7%    (149.9 σ)
+ rbmap_fbip                             maxrss           -1.4%    (-10.1 σ)
- rbmap_library                          instructions      3.0%    (397.3 σ)
- reduceMatch                            instructions      4.1%     (67.0 σ)
+ reduceMatch                            maxrss           -9.7%   (-139.8 σ)
- simp_arith1                            branches          5.6%    (459.8 σ)
- simp_arith1                            instructions      6.3%    (578.6 σ)
+ simp_arith1                            maxrss           -1.7%    (-79.7 σ)
+ stdlib                                 maxrss           -3.5%    (-59.1 σ)
+ tests/bench/ interpreted               maxrss           -2.0%   (-107.5 σ)
+ unionfind                              instructions     -1.7% (-16575.7 σ)
+ unionfind                              maxrss          -14.3%   (-143.0 σ)
- workspaceSymbols                       instructions      2.1%    (277.8 σ)
+ workspaceSymbols                       maxrss           -2.8%    (-86.5 σ)

@Kha
Copy link
Member Author
Kha commented Apr 2, 2025

@daanx I thought you might be interested in this data! The "Init.Data.List.Sublist re-elab -j4 maxrss -35.3%" is extremely encouraging for longer-running processes of our language server, though it looks like benchmarks that effectively spend most of their time in the allocator do see some significant regressions.

@daanx
Copy link
daanx commented Apr 2, 2025

Hi Sebastian! Hope you are doing well (and we should catch up some time! come to PLDI this year :-)).
It is great to see the maxrss reduction -- the v3 version is developed specifically to reduce the commit of some (very) large long running services (having 600GiB+ working sets, 600+ threads etc.) so it is nice to see the new design is effective on Linux and smaller programs as well.

On my benchmark suite v3 is close to v2 in absolute performance as well -- but when I see the numbers above there is some serious regression on rbmap! Yikes. I wonder though if it is a matter of a bit more tuning -- like turning off purging or reducing page search etc. Is there a somewhat easy way for me to run (some of) these benchmarks and play with different settings for mimalloc? It would be great to figure out what is the cause of this.

@daanx
Copy link
daanx commented Apr 2, 2025

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

@Kha
Copy link
Member Author
Kha commented Apr 3, 2025

come to PLDI this year :-)

I might just do that :) . Looking forward to seeing you again!

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

This is CPU instructions as per perf stat, yes. I can reproduce the instructions increase locally but I only get 6% wall-clock overhead, which makes it harder to profile. I am seeing ~7% spent in mi_arenas_page_alloc_fresh while allocations were basically free before, so that could explain it. https://share.firefox.dev/3YhNJuH

Is there a somewhat easy way for me to run (some of) these benchmarks and play with different settings for mimalloc? It would be great to figure out what is the cause of this.

You can run the benchmark this way without building Lean from source, via elan:

lean +leanprover/lean4-pr-releases:pr-release-7786 rbmap_checkpoint.lean -c rbmap_checkpoint.lean.c && leanc +leanprover/lean4-pr-releases:pr-release-7786 -o rbmap_checkpoint.lean.out rbmap_checkpoint.lean.c -O3 -DNDEBUG && perf stat ./rbmap_checkpoint.lean.out 2000000 1

That doesn't give you any control over mimalloc compile-time options of course. The code embedding it is here:

if (USE_MIMALLOC)
list(APPEND RUNTIME_OBJS ${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/src/static.c)
# Lean code includes it as `lean/mimalloc.h` but for compiling `static.c` itself, add original dir
include_directories(${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/include)
set_source_files_properties(${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/src/static.c PROPERTIES
# (C flags are incomplete, compile as C++ instead like everything else)
LANGUAGE CXX
# make all symbols visible, always build with optimizations as otherwise Lean becomes too slow
COMPILE_FLAGS "-DMI_SHARED_LIB -DMI_SHARED_LIB_EXPORT -O3 -DNDEBUG -DMI_WIN_NOREDIRECT")
endif()
. I found it easiest to include static.c directly. I hope I'm not missing some important dev3 flags this way?

@daanx
Copy link
daanx commented Apr 4, 2025

come to PLDI this year :-)

I might just do that :) . Looking forward to seeing you again!

Great -- see you then!

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

This is CPU instructions as per perf stat, yes. I can reproduce the instructions increase locally but I only get 6% wall-clock overhead, which makes it harder to profile. I am seeing ~7% spent in mi_arenas_page_alloc_fresh while allocations were basically free before, so that could explain it. https://share.firefox.dev/3YhNJuH

Ah, that is interesting; that routine is the main new part that does more fine-grained memory sharing. I'll look into it more. (it might just be that I should optimize the initial allocation in pristine reserved memory)

Weird to see the mi_free_try_collect_mt in there as well -- I guess rbmap_1 first allocates a lot and only frees everything at the end? (this is the one where you insert N items but also keep every intermediate tree right?).

. I found it easiest to include static.c directly. I hope I'm not missing some important dev3 flags this way?

Yes, that is a great way to build it as part of anothor project/runtime. Koka does this too. I don't think you would miss any flags (but ensure you defined -DNDEBUG in release mode ?)

daanx added a commit to microsoft/mimalloc that referenced this pull request May 13, 2025
…ially address perf regresion in mimalloc v3 with respect to v2 -- see also leanprover/lean4#7786
@daanx
Copy link
daanx commented May 14, 2025

@Kha : I may have fixed some of the perf regression you saw in these tests (in particular rb_map1) -- if you find time it would be nice to see if the latest dev3 improves things.

@Kha Kha force-pushed the push-qnqpwnuqmnnu branch from 5e45ad6 to 484e3bc Compare May 14, 2025 09:55
@Kha
Copy link
Member Author
Kha commented May 14, 2025

!bench

@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 484e3bc.
The entire run failed.
Found no significant differences.

@Kha
Copy link
Member Author
Kha commented May 14, 2025

!bench

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request May 14, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request May 14, 2025
@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 8f92420.
There were significant changes against commit 29cc755:

  Benchmark                              Metric          Change
  ========================================================================
+ Init.Data.BitVec.Lemmas                maxrss          -17.2% (-235.1 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -7.0%  (-44.0 σ)
+ Init.Data.List.Sublist async           maxrss          -23.5%  (-37.2 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -28.3%  (-25.8 σ)
+ Init.Prelude async                     maxrss          -18.0%  (-95.6 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -8.6%  (-34.9 σ)
+ Std.Data.Internal.List.Associative     maxrss           -3.5%  (-38.1 σ)
- big_do                                 branches          1.1%  (104.5 σ)
- big_do                                 instructions      1.4%  (197.4 σ)
+ big_do                                 maxrss           -3.4%  (-97.8 σ)
+ big_omega.lean                         maxrss           -2.0%  (-27.9 σ)
- big_omega.lean                         task-clock        7.3%  (217.7 σ)
- big_omega.lean                         wall-clock        7.5%   (40.9 σ)
+ big_omega.lean MT                      maxrss           -2.3%  (-53.1 σ)
+ binarytrees                            instructions     -1.3%  (-29.7 σ)
- bv_decide_inequality.lean              maxrss            2.0%   (23.5 σ)
- bv_decide_large_aig.lean               branch-misses    62.6%  (385.6 σ)
- bv_decide_large_aig.lean               branches          3.4%   (87.3 σ)
- bv_decide_large_aig.lean               instructions      3.7%  (142.2 σ)
+ bv_decide_large_aig.lean               maxrss           -8.3% (-424.1 σ)
- bv_decide_mul                          maxrss            1.0%   (20.4 σ)
+ channel.lean                           maxrss          -24.3%  (-24.9 σ)
- const_fold                             instructions      8.4%   (55.4 σ)
- deriv                                  instructions     10.3%   (88.4 σ)
- identifier auto-completion             branches          3.0%   (25.5 σ)
- identifier auto-completion             instructions      3.6%   (27.1 σ)
+ identifier auto-completion             maxrss           -4.2%  (-30.9 σ)
- identifier auto-completion             task-clock        9.5%   (50.1 σ)
- ilean roundtrip                        instructions      3.0%   (24.2 σ)
- import Lean                            branches          5.9%  (114.1 σ)
- import Lean                            instructions      6.8%  (132.6 σ)
+ import Lean                            maxrss           -1.4%  (-53.5 σ)
- lake build clean                       instructions      5.5%  (347.0 σ)
+ lake build clean                       maxrss          -16.6% (-262.8 σ)
+ lake build no-op                       maxrss          -14.5%  (-34.1 σ)
- lake config elab                       instructions      4.1%  (137.6 σ)
+ lake config elab                       maxrss           -2.7%  (-33.6 σ)
- lake config elab                       task-clock        8.7%   (63.0 σ)
- lake config elab                       wall-clock        8.6%   (55.6 σ)
- lake config import                     instructions      7.3%   (90.9 σ)
+ lake config import                     maxrss           -1.1%  (-26.5 σ)
- lake config import                     task-clock       17.4%   (60.5 σ)
- lake config import                     wall-clock       17.3%   (60.0 σ)
- lake config tree                       instructions      6.9%  (160.5 σ)
+ lake config tree                       maxrss           -1.2%  (-21.1 σ)
- lake config tree                       task-clock       13.9%  (220.2 σ)
- lake config tree                       wall-clock       13.8%  (191.6 σ)
- lake env                               instructions      7.4%  (135.4 σ)
- lake env                               task-clock       13.4%   (46.2 σ)
- lake env                               wall-clock       13.2%   (50.9 σ)
- lake startup                           instructions      5.9%  (247.5 σ)
- language server startup                branches          4.7%  (213.6 σ)
- language server startup                instructions      5.4%  (223.7 σ)
+ language server startup                maxrss           -3.8% (-110.6 σ)
+ omega_stress.lean async                maxrss           -9.5%  (-61.8 σ)
- rbmap                                  instructions      2.7%  (269.5 σ)
- rbmap_1                                instructions     20.4%  (227.2 σ)
- rbmap_1                                task-clock       42.3%   (75.9 σ)
- rbmap_1                                wall-clock       42.3%   (76.3 σ)
- rbmap_10                               instructions      7.7%  (109.3 σ)
- rbmap_10                               task-clock       29.9%   (31.2 σ)
- rbmap_10                               wall-clock       29.9%   (31.1 σ)
- rbmap_fbip                             instructions      3.4% (1806.2 σ)
- rbmap_library                          instructions      3.1%  (468.4 σ)
- rbmap_library                          task-clock       19.8%   (48.0 σ)
- rbmap_library                          wall-clock       19.7%   (53.5 σ)
+ reduceMatch                            maxrss          -12.8%  (-26.1 σ)
- simp_arith1                            branches          3.8%  (224.0 σ)
- simp_arith1                            instructions      4.3%  (379.0 σ)
+ simp_arith1                            maxrss           -2.3% (-156.0 σ)
+ stdlib                                 maxrss           -1.3%  (-20.9 σ)
- stdlib                                 task-clock        2.5%   (96.5 σ)
+ tests/bench/ interpreted               maxrss           -2.3% (-157.9 σ)
+ unionfind                              maxrss          -14.5%  (-83.4 σ)
+ workspaceSymbols                       maxrss           -2.8%  (-37.4 σ)
+ workspaceSymbols                       task-clock       -5.0%  (-23.8 σ)
+ workspaceSymbols                       wall-clock       -5.0%  (-23.5 σ)

@leanprover-community-bot leanprover-community-bot added the builds-mathlib CI has verified that Mathlib builds against this PR label May 14, 2025
@Kha
Copy link
Member Author
Kha commented May 14, 2025

Hmm, that did help with rbmap_1 instructions, but otherwise a bit mixed? I updated the base branch as well to get in new benchmarks but I can rerun the previous mimalloc version on it if a direct comparison would be helpful!

clrpackages pushed a commit to clearlinux-pkgs/mimalloc that referenced this pull request Jun 11, 2025
…3.1.4

Daan (53):
      bump version to 1.9.4 for further development
      update readme
      fix release build warning (unused mi_stat_total_print)
      fix dynamic override test on non-windows platforms
      fix invalid pointer detection in release mode (issue #1051 and #1053)
      fix dynamic override test on non-windows platforms
      make dynamic override test verbose
      nicer cmake logic for windows override test
      add comment
      fix alpine compilation with prctl.h (issue #1059)
      typo
      fix prctl.h includes for alpine linux/musl (hopefully fixes #1065, #1066, #1067)
      validate pointer before assertion in mi_free_size (issue #754)
      fix assertion in mi_free_size (issue #754)
      fix cast on msvc
      fix mi_ctz/clz/bsf/bsr to avoid msvc compilation bug (issue #1071, pr #1081)
      revert use of selectany for msvc (issue #1078)
      add fixed TLS slot test to pipeline on Windows
      fix debug assertion for windows TLS
      make windows fixed TLS opt-in
      fix syntax error
      fix stats for pages and page_bins
      merge 3301ba0 to fix page_bin and pages statistics
      ensure page stats are done on the subproc stats in case the tld is NULL
      merge 3b567b1 to track pages statistics correctly
      fix merge conflict
      fix output for api test
      fix api test for 32-bits
      fix format specifier for numa nodes
      fix format specifier in stat output
      fix missing csize assignment in _mi_os_free_ex
      check all os_commit calls and return NULL on failure
      fix base address if commit fails on aligned overallocation
      fix missing csize assignment in _mi_os_free_ex
      check all os_commit calls and return NULL on failure
      fix base address if commit fails on aligned overallocation
      fix windows 32-bit compilation
      fix merge error
      check return value of mi_arena_commit, potential fix for #1098
      use main instead of master in readme
      add initial support for _mi_prim_reuse and MADV_FREE_REUSABLE on macOS (issue #1097)
      add MI_UNUSED for unix _mi_prim_reuse
      fall back to MADV_DONTNEED if MADV_FREE_REUSABLE fails on macOS; disable use of MADV_FREE_REUSE on a reset (issue #1097)
      add _mi_os_zalloc
      fix assertion in mi_os_ensure_zero
      fix assertion
      improve handling of commit failure in arena allocation
      make macOS interposes compile for older macOS versions (by @Noxybot, PR #1028)
      fix link error without static library build (by @fd00, PR #1082)
      fix build for TSAN tests
      fix armv7 detection
      update vcpkg config to 1.9.4
      update readme for upcoming release

Daan Leijen (3):
      change default page_reclaim_max and change reclamation test to potentially address perf regresion in mimalloc v3 with respect to v2 -- see also leanprover/lean4#7786
      add mi_process_done to the api
      do not automatically call mi_process_done if mi_option_destroy_on_exit > 1

Eduard Voronkin (1):
      fix recursion in TLS init on Android

daanx (41):
      nicer show arenas
      fix build error on msvc/clang-cl in C mode (issue #1060)
      refactor numa_node_count
      add atomic_cas_ptr_strong_acq_rel
      fix signed compare warning
      use C++ compilation with clang-cl (as well as msvc) on Windows
      fix typo
      allow size==0 for mi_prim_free (issue #1041)
      extend override test on windows
      Use second user TLS slot to avoid using reserved fields in the TEB (issue #1078)
      fix use of mi_bsr
      add unix large page size constant and adjust aligment to the large page size for large allocations
      improve TLS access on Windows with msvc (by Frank Richter, issue #1078)
      add more decl_hidden specifiers on extern variables to improve access on arm64
      improve TLS access on Windows with msvc (by Frank Richter, issue #1078)
      add more decl_hidden specifiers on extern variables to improve access on arm64
      use TlsAlloc with a dynamic offset for MI_WIN_USE_FIXED_TLS by default (issue #1078)
      fix MI_WIN_USE_FIXED_TLS conditions
      fix TLS initialization for MI_WIN_USE_FIXED_TLS with redirection
      destroy the page map as well if MIMALLOC_DESTROY_ON_EXIT is set; see issue #1041
      add page_cross_thread_max_reclaim option
      potential fix for low address allocations, issue #1087
      add alpine x86 docker file
      fix include of prctl.h on alpine linux x86
      fix format specifier (for alpine linux x86, issue #1086)
      fix guarded sample rate of 1 (issue #1085)
      add guarded TLS test for Windows fixed TLS
      refine test for allowing page reclaim on free to limit reclaim if the originating heap is part of a threadpool
      use proper atomics for the page_map
      fix valid pointer check for low addresses (#issue 1087)
      always use non-flat map in secure mode and validate pointers passed to free
      don't reset stats at process start so page_map commit is included
      get Windows version dynamically
      add _mi_stats_init to initialize the process start time
      fix OS allocation size tracking in the memid
      fix page map resolving for low addresses (issue #1087)
      mark assert_fail as cold and noreturn; move assert to internal.h (see issue #1091, and python/cpython#134586)
      define mi_decl_align separate from mi_decl_cache_align
      ensure use of subproc_main during unsafe destroy to avoid derefercing the heap
      use subproc_main for final statistics output to avoid dereferencing the heap
      enable building for xbox, based on pr #1084 by @maxbachmann
@daanx
Copy link
daanx commented Jun 20, 2025

Hi @Kha -- can you run the benchmarks again but with tag v3.1.5 (versus the v2 results) -- in my local experiments it does better than v2 and I think the previous results may be due to a bug. Great seeing you btw, all the best, Daan

@hargoniX
Copy link
Contributor

!bench

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Jun 20, 2025
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Jun 20, 2025
@leanprover-bot
Copy link
Collaborator

Here are the benchmark results for commit 8d9b69c.
There were significant changes against commit 29cc755:

  Benchmark                              Metric          Change
  =========================================================================
+ Init.Data.BitVec.Lemmas                maxrss          -17.9%   (-43.5 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -6.9%   (-32.2 σ)
+ Init.Data.List.Sublist async           maxrss          -23.4%   (-50.2 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -29.0%  (-121.5 σ)
+ Init.Prelude async                     maxrss          -18.3%   (-42.4 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -8.9%   (-27.6 σ)
+ Std.Data.Internal.List.Associative     maxrss           -3.5%   (-22.8 σ)
- big_do                                 branches          1.1%   (131.4 σ)
- big_do                                 instructions      1.4%   (187.3 σ)
+ big_omega.lean                         maxrss           -2.0%   (-54.1 σ)
+ big_omega.lean MT                      maxrss           -2.4%   (-27.0 σ)
+ binarytrees                            instructions     -1.3% (-1769.7 σ)
+ binarytrees                            maxrss          -38.8%  (-228.0 σ)
- bv_decide_inequality.lean              maxrss            2.0%    (30.0 σ)
- bv_decide_large_aig.lean               branch-misses    62.5%   (156.1 σ)
- bv_decide_large_aig.lean               branches          3.4%   (111.5 σ)
- bv_decide_large_aig.lean               instructions      3.7%   (157.1 σ)
- bv_decide_mul                          maxrss            1.0%    (77.8 σ)
- channel.lean                           bounded0_spsc     4.7%    (26.6 σ)
+ channel.lean                           boundedn_spsc   -11.0%   (-24.6 σ)
+ channel.lean                           maxrss          -24.3%   (-34.6 σ)
- channel.lean                           unbounded_seq     9.8%    (23.0 σ)
- const_fold                             instructions      8.4%    (64.6 σ)
- deriv                                  instructions     10.4%   (185.3 σ)
- identifier auto-completion             branches          3.2%    (23.9 σ)
- identifier auto-completion             instructions      3.8%    (24.4 σ)
+ identifier auto-completion             maxrss           -4.2%   (-56.9 σ)
- identifier auto-completion             task-clock        9.0%    (28.0 σ)
- identifier auto-completion             wall-clock       12.5%    (26.7 σ)
- ilean roundtrip                        branches          2.4%    (99.7 σ)
- ilean roundtrip                        instructions      3.1%   (105.2 σ)
- ilean roundtrip                        parse            15.7%    (34.3 σ)
- ilean roundtrip                        task-clock       18.3%    (25.8 σ)
- ilean roundtrip                        wall-clock       18.3%    (25.9 σ)
- import Lean                            branches          5.8%   (376.3 σ)
- import Lean                            instructions      6.7%  (1090.2 σ)
+ import Lean                            maxrss           -1.4%   (-53.2 σ)
- lake build clean                       instructions      5.5%   (471.3 σ)
+ lake build clean                       maxrss          -16.5%  (-171.3 σ)
+ lake build no-op                       maxrss          -14.3%   (-48.8 σ)
- lake config elab                       instructions      4.1%   (114.2 σ)
+ lake config elab                       maxrss           -2.6%   (-64.5 σ)
- lake config import                     instructions      7.3%   (166.4 σ)
- lake config tree                       instructions      6.9%  (1506.2 σ)
+ lake config tree                       maxrss           -1.2%   (-20.2 σ)
- lake env                               instructions      7.4%   (150.7 σ)
+ lake env                               maxrss           -1.2%   (-39.5 σ)
- lake startup                           instructions      5.8%   (136.4 σ)
- language server startup                branches          4.6%    (79.5 σ)
- language server startup                instructions      5.4%    (85.1 σ)
+ language server startup                maxrss           -3.8%   (-36.2 σ)
- omega_stress.lean async                branch-misses     7.4%   (135.8 σ)
+ omega_stress.lean async                maxrss           -9.5%   (-50.0 σ)
- rbmap                                  instructions      2.7%   (183.8 σ)
- rbmap_1                                instructions     20.4%   (216.0 σ)
- rbmap_1                                task-clock       41.5%    (20.3 σ)
- rbmap_1                                wall-clock       41.5%    (20.3 σ)
- rbmap_10                               instructions      7.7%   (198.6 σ)
- rbmap_fbip                             instructions      3.5%   (331.1 σ)
- rbmap_fbip                             task-clock       18.4%    (34.2 σ)
- rbmap_fbip                             wall-clock       18.4%    (32.3 σ)
- rbmap_library                          instructions      3.1%   (605.6 σ)
+ reduceMatch                            maxrss          -13.1%  (-105.2 σ)
+ riscv-ast.lean                         maxrss           -3.3%   (-22.8 σ)
- simp_arith1                            branches          3.8%    (91.9 σ)
- simp_arith1                            instructions      4.3%   (119.7 σ)
+ simp_arith1                            maxrss           -2.3%   (-39.1 σ)
+ stdlib                                 maxrss           -1.3%   (-33.0 σ)
- stdlib                                 wall-clock        2.1%    (25.7 σ)
+ tests/bench/ interpreted               maxrss           -2.4%  (-161.5 σ)
- tests/bench/ interpreted               task-clock        3.0%    (24.8 σ)
+ unionfind                              maxrss          -14.4%  (-143.3 σ)
+ workspaceSymbols                       maxrss           -2.8%   (-52.9 σ)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
builds-mathlib CI has verified that Mathlib builds against this PR toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0