Open
Description
Hi,
I tested the following benchmark on M3 Max:
Mix.install([:flow, :benchee])
Benchee.run(
%{
"sequential execution" => fn -> 1..1_000_000 |> Enum.map(fn _ -> :crypto.strong_rand_bytes(1000) end) |> Enum.map(& Base.encode32(&1, case: :lower)) end,
"parallel execution" => fn -> 1..1_000_000 |> Flow.from_enumerable() |> Flow.map(fn _ -> :crypto.strong_rand_bytes(1000) end) |> Flow.map(& Base.encode32(&1, case: :lower)) |> Enum.to_list() end
}
)
The log of the execution with Erlang/OTP installed by asdf
is as follows:
Operating System: macOS
CPU Information: Apple M3 Max
Number of Available Cores: 16
Available memory: 128 GB
Elixir 1.17.3
Erlang 27.1.2
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking parallel execution ...
Benchmarking sequential execution ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
parallel execution 2.59 0.39 s ±21.20% 0.35 s 0.53 s
sequential execution 0.35 2.89 s ±3.14% 2.89 <
6DB4
span class="pl-s1">s 2.95 s
Comparison:
parallel execution 2.59
sequential execution 0.35 - 7.46x slower +2.50 s
However, the log of the execution with the community-maintained pre-compiled Erlang/OTP for macOS is as follows:
Operating System: macOS
CPU Information: Apple M3 Max
Number of Available Cores: 16
Available memory: 128 GB
Elixir 1.17.3
Erlang 27.1.2
JIT enabled: true
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking parallel execution ...
Benchmarking sequential execution ...
Calculating statistics...
Formatting results...
Name ips average deviation median 99th %
sequential execution 0.33 3.03 s ±3.33% 3.03 s 3.10 s
parallel execution 0.186 5.37 s ±0.00% 5.37 s 5.37 s
Comparison:
sequential execution 0.33
parallel execution 0.186 - 1.77x slower +2.34 s
It seems too slow parallel execution of this benchmark. I felt this is an issue.