Description
Hi, I'm exploring Rust as an extension for Ruby for a lot of expensive calculations. I made a project that implements a simple financial algorithm in six ways (Ruby, C, Helix, ruru, FFI). They are all implemented with the minimum viable code to allow Ruby to be able to call a Rust function cash_flow
.
The benchmark for Helix was surprising, and I'm curious what is unique about Helix that causes the function calls to return Ruby so slowly in comparison with the other methods.
As you can see from the numbers below, helix's iterations per second when called from Ruby are almost half of ruru, C, and Ruby for a simple function:
Warming up --------------------------------------
ruby method 203479 i/100ms
rust helix instance 120885 i/100ms
rust helix class 121661 i/100ms
rust ffi class 161558 i/100ms
rust ruru class 199846 i/100ms
c class 221703 i/100ms
Calculating -------------------------------------
iterations per second total iterations time
ruby method 4966573.4 (±6.8%) i/s - 24824438 in 5.022462s
rust helix instance 1875397.8 (±6.1%) i/s - 9429030 in 5.046921s
rust helix class 1852779.7 (±5.9%) i/s - 9246236 in 5.008588s
rust ffi class 3082134.8 (±8.1%) i/s - 15348010 in 5.019943s
rust ruru class 4275527.6 (±6.2%) i/s - 21383522 in 5.021156s
c class 5483016.5 (±6.0%) i/s - 27491172 in 5.032074s
However, when running a criterion benchmark for the function within the Rust repository, the performance is superb:
Benchmarking cash_flow
Benchmarking cash_flow: Warming up for 3.0000 s
Benchmarking cash_flow: Collecting 100 samples in estimated 5.0000 s (955,252,950 iterations)
Benchmarking cash_flow: Analyzing
cash_flow time: [5.2014 ns 5.2626 ns 5.3245 ns]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
slope [5.2014 ns 5.3245 ns] R^2 [0.8278649 0.8276527]
mean [5.2079 ns 5.3387 ns] std. dev. [268.40 ps 395.58 ps]
median [5.1706 ns 5.3213 ns] med. abs. dev. [208.08 ps 356.25 ps]
This is a significant difference between the actual function and whatever Helix is doing to connect Ruby to Rust. Obviously with interop there's going to be some performance drop, but as you can see the other methods were approximately comparable.
I want to dig deeper into it because Helix was the best API and usability of all of the methods I tried, but I want to know exactly why the performance is inhibited before we implement critical code with it. Any ideas? Thank you!