chore: update to mimalloc 3.0.3 beta #7786

Kha · 2025-04-02T11:11:38Z

For testing, not intended to be merged before a stable release of mimalloc 3

Kha · 2025-04-02T11:12:30Z

!bench

leanprover-community-bot · 2025-04-02T11:40:16Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase bd24ca3093b314b348c63826f7a868e6433b6acc --onto 911ea07a73733650eefa5240652404856bdcceb7. You can force Mathlib CI using the force-mathlib-ci label. (2025-04-02 11:40:16)
❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase dedfbaf521084ae78d0a826de8b3bab8fd5a3ebd --onto 911ea07a73733650eefa5240652404856bdcceb7. You can force Mathlib CI using the force-mathlib-ci label. (2025-04-02 13:52:57)
✅ Mathlib branch lean-pr-testing-7786 has successfully built against this PR. (2025-05-14 12:33:33) View Log
✅ Mathlib branch lean-pr-testing-7786 has successfully built against this PR. (2025-06-20 12:17:00) View Log

Kha · 2025-04-02T13:17:22Z

!bench

leanprover-bot · 2025-04-02T15:12:37Z

Here are the benchmark results for commit 5e45ad6.
There were significant changes against commit dedfbaf:

  Benchmark                              Metric          Change
  ==========================================================================
+ Init.Data.BitVec.Lemmas                branches         -1.2%    (-96.3 σ)
+ Init.Data.BitVec.Lemmas                maxrss           -7.2%    (-11.9 σ)
+ Init.Data.BitVec.Lemmas re-elab        branches         -1.1%    (-53.5 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -7.7%   (-685.7 σ)
+ Init.Data.List.Sublist async           branches         -1.3%   (-150.0 σ)
+ Init.Data.List.Sublist async           maxrss          -12.1%    (-29.5 σ)
+ Init.Data.List.Sublist re-elab -j4     branches         -1.8%    (-15.9 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -35.3%    (-64.7 σ)
+ Init.Prelude async                     maxrss           -8.7%    (-22.5 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   branches         -1.8%   (-559.2 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -4.5%    (-11.1 σ)
+ Std.Data.Internal.List.Associative     branches         -1.8%    (-91.1 σ)
- big_do                                 instructions      1.5%     (84.0 σ)
+ big_do                                 maxrss           -6.3%    (-74.1 σ)
+ big_omega.lean                         maxrss           -1.8%   (-100.3 σ)
- big_omega.lean                         task-clock        5.3%     (10.1 σ)
- big_omega.lean                         wall-clock        5.4%     (13.0 σ)
+ big_omega.lean MT                      maxrss           -1.6%   (-363.5 σ)
- big_omega.lean MT                      task-clock        5.1%     (12.6 σ)
- big_omega.lean MT                      wall-clock        5.1%     (10.8 σ)
+ binarytrees                            instructions     -2.3%   (-870.4 σ)
+ binarytrees                            maxrss          -38.9%   (-348.6 σ)
+ binarytrees.st                         instructions     -1.5%   (-492.2 σ)
+ binarytrees.st                         maxrss           -1.0%
- bv_decide_inequality.lean              instructions      1.3%     (14.7 σ)
- bv_decide_inequality.lean              maxrss            5.3%     (38.4 σ)
- bv_decide_large_aig.lean               branch-misses    44.1%     (41.1 σ)
- bv_decide_large_aig.lean               branches          1.7%     (16.9 σ)
- bv_decide_large_aig.lean               instructions      2.5%     (24.8 σ)
+ bv_decide_large_aig.lean               maxrss          -10.4%    (-37.4 σ)
- bv_decide_mul                          maxrss            1.5%     (33.2 σ)
+ bv_decide_realworld                    maxrss          -12.1%    (-24.2 σ)
- bv_decide_realworld                    task-clock        4.8%     (11.3 σ)
- bv_decide_realworld                    wall-clock        4.5%     (14.6 σ)
- const_fold                             instructions     23.9%    (175.2 σ)
- const_fold                             task-clock       26.5%     (24.0 σ)
- const_fold                             wall-clock       26.5%     (23.9 σ)
- deriv                                  instructions     36.0%    (488.4 σ)
- deriv                                  task-clock       52.7%     (22.0 σ)
- deriv                                  wall-clock       52.6%     (22.0 σ)
- identifier auto-completion             bran
8000
ches          4.1%     (78.2 σ)
- identifier auto-completion             instructions      5.2%     (90.7 σ)
+ identifier auto-completion             maxrss           -4.2%    (-55.6 σ)
- identifier auto-completion             task-clock       10.7%     (16.8 σ)
- ilean roundtrip                        branches          1.1%     (30.2 σ)
- ilean roundtrip                        compress         11.2%     (80.1 σ)
- ilean roundtrip                        instructions      2.6%     (79.0 σ)
- ilean roundtrip                        task-clock       14.5%     (14.1 σ)
- ilean roundtrip                        wall-clock       14.5%     (14.2 σ)
- import Lean                            branches          5.1%    (136.6 σ)
- import Lean                            instructions      6.3%    (190.8 σ)
+ import Lean                            maxrss           -1.1%    (-14.4 σ)
- lake build clean                       instructions      4.5%    (118.4 σ)
+ lake build clean                       maxrss          -13.5%   (-139.3 σ)
- lake build no-op                       instructions      2.5%     (35.2 σ)
+ lake build no-op                       maxrss           -7.5%    (-95.1 σ)
- lake config elab                       instructions      2.5%    (494.6 σ)
+ lake config elab                       maxrss           -2.1%    (-60.1 σ)
- lake config import                     instructions      6.4%    (260.0 σ)
- lake config tree                       instructions      6.2%    (167.7 σ)
- lake env                               instructions      6.6%    (312.1 σ)
- lake env                               task-clock       20.8%     (20.9 σ)
- lake env                               wall-clock       20.8%     (21.0 σ)
- lake startup                           instructions      5.6%    (114.5 σ)
- language server startup                branches          3.8%     (45.5 σ)
- language server startup                instructions      4.9%     (51.1 σ)
+ language server startup                maxrss           -4.1%   (-121.9 σ)
+ nat_repr                               branches         -2.4%   (-450.5 σ)
+ nat_repr                               instructions     -1.3%   (-212.4 σ)
- omega_stress.lean async                instructions      1.2%     (45.7 σ)
+ omega_stress.lean async                maxrss           -6.1%    (-32.2 σ)
- rbmap                                  instructions      2.6%    (384.3 σ)
+ rbmap                                  maxrss           -1.6%
- rbmap_1                                instructions     57.8%    (632.1 σ)
- rbmap_1                                task-clock       50.2%     (21.4 σ)
- rbmap_1                                wall-clock       50.1%     (21.5 σ)
- rbmap_10                               instructions     18.6%    (585.6 σ)
- rbmap_10                               task-clock       28.5%     (11.3 σ)
- rbmap_10                               wall-clock       28.5%     (11.2 σ)
- rbmap_fbip                             instructions      1.7%    (149.9 σ)
+ rbmap_fbip                             maxrss           -1.4%    (-10.1 σ)
- rbmap_library                          instructions      3.0%    (397.3 σ)
- reduceMatch                            instructions      4.1%     (67.0 σ)
+ reduceMatch                            maxrss           -9.7%   (-139.8 σ)
- simp_arith1                            branches          5.6%    (459.8 σ)
- simp_arith1                            instructions      6.3%    (578.6 σ)
+ simp_arith1                            maxrss           -1.7%    (-79.7 σ)
+ stdlib                                 maxrss           -3.5%    (-59.1 σ)
+ tests/bench/ interpreted               maxrss           -2.0%   (-107.5 σ)
+ unionfind                              instructions     -1.7% (-16575.7 σ)
+ unionfind                              maxrss          -14.3%   (-143.0 σ)
- workspaceSymbols                       instructions      2.1%    (277.8 σ)
+ workspaceSymbols                       maxrss           -2.8%    (-86.5 σ)

Kha · 2025-04-02T15:16:00Z

@daanx I thought you might be interested in this data! The "Init.Data.List.Sublist re-elab -j4 maxrss -35.3%" is extremely encouraging for longer-running processes of our language server, though it looks like benchmarks that effectively spend most of their time in the allocator do see some significant regressions.

daanx · 2025-04-02T16:34:27Z

Hi Sebastian! Hope you are doing well (and we should catch up some time! come to PLDI this year :-)).
It is great to see the maxrss reduction -- the v3 version is developed specifically to reduce the commit of some (very) large long running services (having 600GiB+ working sets, 600+ threads etc.) so it is nice to see the new design is effective on Linux and smaller programs as well.

On my benchmark suite v3 is close to v2 in absolute performance as well -- but when I see the numbers above there is some serious regression on rbmap! Yikes. I wonder though if it is a matter of a bit more tuning -- like turning off purging or reducing page search etc. Is there a somewhat easy way for me to run (some of) these benchmarks and play with different settings for mimalloc? It would be great to figure out what is the cause of this.

daanx · 2025-04-02T16:36:05Z

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

Kha · 2025-04-03T09:33:46Z

come to PLDI this year :-)

I might just do that :) . Looking forward to seeing you again!

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

This is CPU instructions as per perf stat, yes. I can reproduce the instructions increase locally but I only get 6% wall-clock overhead, which makes it harder to profile. I am seeing ~7% spent in mi_arenas_page_alloc_fresh while allocations were basically free before, so that could explain it. https://share.firefox.dev/3YhNJuH

Is there a somewhat easy way for me to run (some of) these benchmarks and play with different settings for mimalloc? It would be great to figure out what is the cause of this.

You can run the benchmark this way without building Lean from source, via elan:

lean +leanprover/lean4-pr-releases:pr-release-7786 rbmap_checkpoint.lean -c rbmap_checkpoint.lean.c && leanc +leanprover/lean4-pr-releases:pr-release-7786 -o rbmap_checkpoint.lean.out rbmap_checkpoint.lean.c -O3 -DNDEBUG && perf stat ./rbmap_checkpoint.lean.out 2000000 1

That doesn't give you any control over mimalloc compile-time options of course. The code embedding it is here:

lean4/src/runtime/CMakeLists.txt

Lines 7 to 16 in bb6bfdb

    
           if (USE_MIMALLOC) 
        
             list(APPEND RUNTIME_OBJS ${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/src/static.c) 
        
             # Lean code includes it as `lean/mimalloc.h` but for compiling `static.c` itself, add original dir 
        
             include_directories(${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/include) 
        
             set_source_files_properties(${LEAN_BINARY_DIR}/../mimalloc/src/mimalloc/src/static.c PROPERTIES 
        
               # (C flags are incomplete, compile as C++ instead like everything else) 
        
               LANGUAGE CXX 
        
               # make all symbols visible, always build with optimizations as otherwise Lean becomes too slow 
        
               COMPILE_FLAGS "-DMI_SHARED_LIB -DMI_SHARED_LIB_EXPORT -O3 -DNDEBUG -DMI_WIN_NOREDIRECT") 
        
           endif()

. I found it easiest to include static.c directly. I hope I'm not missing some important dev3 flags this way?

daanx · 2025-04-04T16:58:19Z

come to PLDI this year :-)

I might just do that :) . Looking forward to seeing you again!

Great -- see you then!

Just to add: what does "instructions" mean? Did mimalloc execute 57.8% more instructions for rbmap_1 ?!I

This is CPU instructions as per perf stat, yes. I can reproduce the instructions increase locally but I only get 6% wall-clock overhead, which makes it harder to profile. I am seeing ~7% spent in mi_arenas_page_alloc_fresh while allocations were basically free before, so that could explain it. https://share.firefox.dev/3YhNJuH

Ah, that is interesting; that routine is the main new part that does more fine-grained memory sharing. I'll look into it more. (it might just be that I should optimize the initial allocation in pristine reserved memory)

Weird to see the mi_free_try_collect_mt in there as well -- I guess rbmap_1 first allocates a lot and only frees everything at the end? (this is the one where you insert N items but also keep every intermediate tree right?).

. I found it easiest to include static.c directly. I hope I'm not missing some important dev3 flags this way?

Yes, that is a great way to build it as part of anothor project/runtime. Koka does this too. I don't think you would miss any flags (but ensure you defined -DNDEBUG in release mode ?)

…ially address perf regresion in mimalloc v3 with respect to v2 -- see also leanprover/lean4#7786

daanx · 2025-05-14T01:21:46Z

@Kha : I may have fixed some of the perf regression you saw in these tests (in particular rb_map1) -- if you find time it would be nice to see if the latest dev3 improves things.

Kha · 2025-05-14T09:56:20Z

!bench

leanprover-bot · 2025-05-14T09:57:37Z

Here are the benchmark results for commit 484e3bc.
The entire run failed.
Found no significant differences.

Kha · 2025-05-14T11:05:53Z

!bench

leanprover-bot · 2025-05-14T11:40:53Z

Here are the benchmark results for commit 8f92420.
There were significant changes against commit 29cc755:

  Benchmark                              Metric          Change
  ========================================================================
+ Init.Data.BitVec.Lemmas                maxrss          -17.2% (-235.1 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -7.0%  (-44.0 σ)
+ Init.Data.List.Sublist async           maxrss          -23.5%  (-37.2 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -28.3%  (-25.8 σ)
+ Init.Prelude async                     maxrss          -18.0%  (-95.6 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -8.6%  (-34.9 σ)
+ Std.Data.Internal.List.Associative     maxrss           -3.5%  (-38.1 σ)
- big_do                                 branches          1.1%  (104.5 σ)
- big_do                                 instructions      1.4%  (197.4 σ)
+ big_do                                 maxrss           -3.4%  (-97.8 σ)
+ big_omega.lean                         maxrss           -2.0%  (-27.9 σ)
- big_omega.lean                         task-clock        7.3%  (217.7 σ)
- big_omega.lean                         wall-clock        7.5%   (40.9 σ)
+ big_omega.lean MT                      maxrss           -2.3%  (-53.1 σ)
+ binarytrees                            instructions     -1.3%  (-29.7 σ)
- bv_decide_inequality.lean              maxrss            2.0%   (23.5 σ)
- bv_decide_large_aig.lean               branch-misses    62.6%  (385.6 σ)
- bv_decide_large_aig.lean               branches          3.4%   (87.3 σ)
- bv_decide_large_aig.lean               instructions      3.7%  (142.2 σ)
+ bv_decide_large_aig.lean               maxrss           -8.3% (-424.1 σ)
- bv_decide_mul                          maxrss            1.0%   (20.4 σ)
+ channel.lean                           maxrss          -24.3%  (-24.9 σ)
- const_fold                             instructions      8.4%   (55.4 σ)
- deriv                                  instructions     10.3%   (88.4 σ)
- identifier auto-completion             branches          3.0%   (25.5 σ)
- identifier auto-completion             instructions      3.6%   (27.1 σ)
+ identifier auto-completion             maxrss           -4.2%  (-30.9 σ)
- identifier auto-completion             task-clock        9.5%   (50.1 σ)
- ilean roundtrip                        instructions      3.0%   (24.2 σ)
- import Lean                            branches          5.9%  (114.1 σ)
- import Lean                            instructions      6.8%  (132.6 σ)
+ import Lean                            maxrss           -1.4%  (-53.5 σ)
- lake build clean                       instructions      5.5%  (347.0 σ)
+ lake build clean                       maxrss          -16.6% (-262.8 σ)
+ lake build no-op                       maxrss          -14.5%  (-34.1 σ)
- lake config elab                       instructions      4.1%  (137.6 σ)
+ lake config elab                       maxrss           -2.7%  (-33.6 σ)
- lake config elab                       task-clock        8.7%   (63.0 σ)
- lake config elab                       wall-clock        8.6%   (55.6 σ)
- lake config import                     instructions      7.3%   (90.9 σ)
+ lake config import                     maxrss           -1.1%  (-26.5 σ)
- lake config import                     task-clock       17.4%   (60.5 σ)
- lake config import                     wall-clock       17.3%   (60.0 σ)
- lake config tree                       instructions      6.9%  (160.5 σ)
+ lake config tree                       maxrss           -1.2%  (-21.1 σ)
- lake config tree                       task-clock       13.9%  (220.2 σ)
- lake config tree                       wall-clock       13.8%  (191.6 σ)
- lake env                               instructions      7.4%  (135.4 σ)
- lake env                               task-clock       13.4%   (46.2 σ)
- lake env                               wall-clock       13.2%   (50.9 σ)
- lake startup                           instructions      5.9%  (247.5 σ)
- language server startup                branches          4.7%  (213.6 σ)
- language server startup                instructions      5.4%  (223.7 σ)
+ language server startup                maxrss           -3.8% (-110.6 σ)
+ omega_stress.lean async                maxrss           -9.5%  (-61.8 σ)
- rbmap                                  instructions      2.7%  (269.5 σ)
- rbmap_1                                instructions     20.4%  (227.2 σ)
- rbmap_1                                task-clock       42.3%   (75.9 σ)
- rbmap_1                                wall-clock       42.3%   (76.3 σ)
- rbmap_10                               instructions      7.7%  (109.3 σ)
- rbmap_10                               task-clock       29.9%   (31.2 σ)
- rbmap_10                               wall-clock       29.9%   (31.1 σ)
- rbmap_fbip                             instructions      3.4% (1806.2 σ)
- rbmap_library                          instructions      3.1%  (468.4 σ)
- rbmap_library                          task-clock       19.8%   (48.0 σ)
- rbmap_library                          wall-clock       19.7%   (53.5 σ)
+ reduceMatch                            maxrss          -12.8%  (-26.1 σ)
- simp_arith1                            branches          3.8%  (224.0 σ)
- simp_arith1                            instructions      4.3%  (379.0 σ)
+ simp_arith1                            maxrss           -2.3% (-156.0 σ)
+ stdlib                                 maxrss           -1.3%  (-20.9 σ)
- stdlib                                 task-clock        2.5%   (96.5 σ)
+ tests/bench/ interpreted               maxrss           -2.3% (-157.9 σ)
+ unionfind                              maxrss          -14.5%  (-83.4 σ)
+ workspaceSymbols                       maxrss           -2.8%  (-37.4 σ)
+ workspaceSymbols                       task-clock       -5.0%  (-23.8 σ)
+ workspaceSymbols                       wall-clock       -5.0%  (-23.5 σ)

Kha · 2025-05-14T12:40:31Z

Hmm, that did help with rbmap_1 instructions, but otherwise a bit mixed? I updated the base branch as well to get in new benchmarks but I can rerun the previous mimalloc version on it if a direct comparison would be helpful!

@Noxybot

…3.1.4 Daan (53): bump version to 1.9.4 for further development update readme fix release build warning (unused mi_stat_total_print) fix dynamic override test on non-windows platforms fix invalid pointer detection in release mode (issue #1051 and #1053) fix dynamic override test on non-windows platforms make dynamic override test verbose nicer cmake logic for windows override test add comment fix alpine compilation with prctl.h (issue #1059) typo fix prctl.h includes for alpine linux/musl (hopefully fixes #1065, #1066, #1067) validate pointer before assertion in mi_free_size (issue #754) fix assertion in mi_free_size (issue #754) fix cast on msvc fix mi_ctz/clz/bsf/bsr to avoid msvc compilation bug (issue #1071, pr #1081) revert use of selectany for msvc (issue #1078) add fixed TLS slot test to pipeline on Windows fix debug assertion for windows TLS make windows fixed TLS opt-in fix syntax error fix stats for pages and page_bins merge 3301ba0 to fix page_bin and pages statistics ensure page stats are done on the subproc stats in case the tld is NULL merge 3b567b1 to track pages statistics correctly fix merge conflict fix output for api test fix api test for 32-bits fix format specifier for numa nodes fix format specifier in stat output fix missing csize assignment in _mi_os_free_ex check all os_commit calls and return NULL on failure fix base address if commit fails on aligned overallocation fix missing csize assignment in _mi_os_free_ex check all os_commit calls and return NULL on failure fix base address if commit fails on aligned overallocation fix windows 32-bit compilation fix merge error check return value of mi_arena_commit, potential fix for #1098 use main instead of master in readme add initial support for _mi_prim_reuse and MADV_FREE_REUSABLE on macOS (issue #1097) add MI_UNUSED for unix _mi_prim_reuse fall back to MADV_DONTNEED if MADV_FREE_REUSABLE fails on macOS; disable use of MADV_FREE_REUSE on a reset (issue #1097) add _mi_os_zalloc fix assertion in mi_os_ensure_zero fix assertion improve handling of commit failure in arena allocation make macOS interposes compile for older macOS versions (by @Noxybot, PR #1028) fix link error without static library build (by @fd00, PR #1082) fix build for TSAN tests fix armv7 detection update vcpkg config to 1.9.4 update readme for upcoming release Daan Leijen (3): change default page_reclaim_max and change reclamation test to potentially address perf regresion in mimalloc v3 with respect to v2 -- see also leanprover/lean4#7786 add mi_process_done to the api do not automatically call mi_process_done if mi_option_destroy_on_exit > 1 Eduard Voronkin (1): fix recursion in TLS init on Android daanx (41): nicer show arenas fix build error on msvc/clang-cl in C mode (issue #1060) refactor numa_node_count add atomic_cas_ptr_strong_acq_rel fix signed compare warning use C++ compilation with clang-cl (as well as msvc) on Windows fix typo allow size==0 for mi_prim_free (issue #1041) extend override test on windows Use second user TLS slot to avoid using reserved fields in the TEB (issue #1078) fix use of mi_bsr add unix large page size constant and adjust aligment to the large page size for large allocations improve TLS access on Windows with msvc (by Frank Richter, issue #1078) add more decl_hidden specifiers on extern variables to improve access on arm64 improve TLS access on Windows with msvc (by Frank Richter, issue #1078) add more decl_hidden specifiers on extern variables to improve access on arm64 use TlsAlloc with a dynamic offset for MI_WIN_USE_FIXED_TLS by default (issue #1078) fix MI_WIN_USE_FIXED_TLS conditions fix TLS initialization for MI_WIN_USE_FIXED_TLS with redirection destroy the page map as well if MIMALLOC_DESTROY_ON_EXIT is set; see issue #1041 add page_cross_thread_max_reclaim option potential fix for low address allocations, issue #1087 add alpine x86 docker file fix include of prctl.h on alpine linux x86 fix format specifier (for alpine linux x86, issue #1086) fix guarded sample rate of 1 (issue #1085) add guarded TLS test for Windows fixed TLS refine test for allowing page reclaim on free to limit reclaim if the originating heap is part of a threadpool use proper atomics for the page_map fix valid pointer check for low addresses (#issue 1087) always use non-flat map in secure mode and validate pointers passed to free don't reset stats at process start so page_map commit is included get Windows version dynamically add _mi_stats_init to initialize the process start time fix OS allocation size tracking in the memid fix page map resolving for low addresses (issue #1087) mark assert_fail as cold and noreturn; move assert to internal.h (see issue #1091, and python/cpython#134586) define mi_decl_align separate from mi_decl_cache_align ensure use of subproc_main during unsafe destroy to avoid derefercing the heap use subproc_main for final statistics output to avoid dereferencing the heap enable building for xbox, based on pr #1084 by @maxbachmann

daanx · 2025-06-20T10:57:49Z

Hi @Kha -- can you run the benchmarks again but with tag v3.1.5 (versus the v2 results) -- in my local experiments it does better than v2 and I think the previous results may be due to a bug. Great seeing you btw, all the best, Daan

hargoniX · 2025-06-20T11:02:37Z

!bench

leanprover-bot · 2025-06-20T11:37:42Z

Here are the benchmark results for commit 8d9b69c.
There were significant changes against commit 29cc755:

  Benchmark                              Metric          Change
  =========================================================================
+ Init.Data.BitVec.Lemmas                maxrss          -17.9%   (-43.5 σ)
+ Init.Data.BitVec.Lemmas re-elab        maxrss           -6.9%   (-32.2 σ)
+ Init.Data.List.Sublist async           maxrss          -23.4%   (-50.2 σ)
+ Init.Data.List.Sublist re-elab -j4     maxrss          -29.0%  (-121.5 σ)
+ Init.Prelude async                     maxrss          -18.3%   (-42.4 σ)
+ Std.Data.DHashMap.Internal.RawLemmas   maxrss           -8.9%   (-27.6 σ)
+ Std.Data.Internal.List.Associative     maxrss           -3.5%   (-22.8 σ)
- big_do                                 branches          1.1%   (131.4 σ)
- big_do                                 instructions      1.4%   (187.3 σ)
+ big_omega.lean                         maxrss           -2.0%   (-54.1 σ)
+ big_omega.lean MT                      maxrss           -2.4%   (-27.0 σ)
+ binarytrees                            instructions     -1.3% (-1769.7 σ)
+ binarytrees                            maxrss          -38.8%  (-228.0 σ)
- bv_decide_inequality.lean              maxrss            2.0%    (30.0 σ)
- bv_decide_large_aig.lean               branch-misses    62.5%   (156.1 σ)
- bv_decide_large_aig.lean               branches          3.4%   (111.5 σ)
- bv_decide_large_aig.lean               instructions      3.7%   (157.1 σ)
- bv_decide_mul                          maxrss            1.0%    (77.8 σ)
- channel.lean                           bounded0_spsc     4.7%    (26.6 σ)
+ channel.lean                           boundedn_spsc   -11.0%   (-24.6 σ)
+ channel.lean                           maxrss          -24.3%   (-34.6 σ)
- channel.lean                           unbounded_seq     9.8%    (23.0 σ)
- const_fold                             instructions      8.4%    (64.6 σ)
- deriv                                  instructions     10.4%   (185.3 σ)
- identifier auto-completion             branches          3.2%    (23.9 σ)
- identifier auto-completion             instructions      3.8%    (24.4 σ)
+ identifier auto-completion             maxrss           -4.2%   (-56.9 σ)
- identifier auto-completion             task-clock        9.0%    (28.0 σ)
- identifier auto-completion             wall-clock       12.5%    (26.7 σ)
- ilean roundtrip                        branches          2.4%    (99.7 σ)
- ilean roundtrip                        instructions      3.1%   (105.2 σ)
- ilean roundtrip                        parse            15.7%    (34.3 σ)
- ilean roundtrip                        task-clock       18.3%    (25.8 σ)
- ilean roundtrip                        wall-clock       18.3%    (25.9 σ)
- import Lean                            branches          5.8%   (376.3 σ)
- import Lean                            instructions      6.7%  (1090.2 σ)
+ import Lean                            maxrss           -1.4%   (-53.2 σ)
- lake build clean                       instructions      5.5%   (471.3 σ)
+ lake build clean                       maxrss          -16.5%  (-171.3 σ)
+ lake build no-op                       maxrss          -14.3%   (-48.8 σ)
- lake config elab                       instructions      4.1%   (114.2 σ)
+ lake config elab                       maxrss           -2.6%   (-64.5 σ)
- lake config import                     instructions      7.3%   (166.4 σ)
- lake config tree                       instructions      6.9%  (1506.2 σ)
+ lake config tree                       maxrss           -1.2%   (-20.2 σ)
- lake env                               instructions      7.4%   (150.7 σ)
+ lake env                               maxrss           -1.2%   (-39.5 σ)
- lake startup                           instructions      5.8%   (136.4 σ)
- language server startup                branches          4.6%    (79.5 σ)
- language server startup                instructions      5.4%    (85.1 σ)
+ language server startup                maxrss           -3.8%   (-36.2 σ)
- omega_stress.lean async                branch-misses     7.4%   (135.8 σ)
+ omega_stress.lean async                maxrss           -9.5%   (-50.0 σ)
- rbmap                                  instructions      2.7%   (183.8 σ)
- rbmap_1                                instructions     20.4%   (216.0 σ)
- rbmap_1                                task-clock       41.5%    (20.3 σ)
- rbmap_1                                wall-clock       41.5%    (20.3 σ)
- rbmap_10                               instructions      7.7%   (198.6 σ)
- rbmap_fbip                             instructions      3.5%   (331.1 σ)
- rbmap_fbip                             task-clock       18.4%    (34.2 σ)
- rbmap_fbip                             wall-clock       18.4%    (32.3 σ)
- rbmap_library                          instructions      3.1%   (605.6 σ)
+ reduceMatch                            maxrss          -13.1%  (-105.2 σ)
+ riscv-ast.lean                         maxrss           -3.3%   (-22.8 σ)
- simp_arith1                            branches          3.8%    (91.9 σ)
- simp_arith1                            instructions      4.3%   (119.7 σ)
+ simp_arith1                            maxrss           -2.3%   (-39.1 σ)
+ stdlib                                 maxrss           -1.3%   (-33.0 σ)
- stdlib                                 wall-clock        2.1%    (25.7 σ)
+ tests/bench/ interpreted               maxrss           -2.4%  (-161.5 σ)
- tests/bench/ interpreted               task-clock        3.0%    (24.8 σ)
+ unionfind                              maxrss          -14.4%  (-143.3 σ)
+ workspaceSymbols                       maxrss           -2.8%   (-52.9 σ)

Kha changed the title ~~try mimalloc 3.0.3~~ chore: update to mimalloc 3.0.3 beta Apr 2, 2025

Kha marked this pull request as draft April 2, 2025 11:12

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 2, 2025

leanprover deleted a comment from leanprover-bot Apr 2, 2025

Kha force-pushed the push-qnqpwnuqmnnu branch from eb7a7be to 5e45ad6 Compare April 2, 2025 13:16

daanx added a commit to microsoft/mimalloc that referenced this pull request May 13, 2025

change default page_reclaim_max and change reclamation test to potent…

36e1cbf

…ially address perf regresion in mimalloc v3 with respect to v2 -- see also leanprover/lean4#7786

try mimalloc 3.0.3

484e3bc

Kha force-pushed the push-qnqpwnuqmnnu branch from 5e45ad6 to 484e3bc Compare May 14, 2025 09:55

fix

8f92420

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request May 14, 2025

Update lean-toolchain for testing leanprover/lean4#7786

201a761

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request May 14, 2025

Update lean-toolchain for testing leanprover/lean4#7786

9c595d8

leanprover-community-bot added the builds-mathlib CI has verified that Mathlib builds against this PR label May 14, 2025

tag v3.1.5

8d9b69c

leanprover-community-mathlib4-bot added a commit to leanprover-community/batteries that referenced this pull request Jun 20, 2025

Update lean-toolchain for leanprover/lean4#7786

44641c9

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Jun 20, 2025

Update lean-toolchain for leanprover/lean4#7786

9049e5f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: update to mimalloc 3.0.3 beta #7786

chore: update to mimalloc 3.0.3 beta #7786

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chore: update to mimalloc 3.0.3 beta #7786

Are you sure you want to change the base?

chore: update to mimalloc 3.0.3 beta #7786

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!