IO::Path.slurp: Read entire file contents at once #5867

ab5tract · 2025-04-28T07:15:00Z

We can easily pre-size the read volume in the call to nqp::readfh for the cases where it makes sense (ie., every path that isn't "-".IO).

We still preserve the older behavior in case the byte length is super low or if there is some other interruption to getting the self.s value at runtime.

@timo++ noticed this opportunity.

We can easily pre-size the read volume in the call to nqp::readfh for the cases where it makes sense (ie., every path that isn't "-".IO). We still preserve the older behavior in case the byte length is super low or if there is some other interruption to getting the self.s value at runtime.

niner · 2025-04-28T08:04:11Z

This does another file system access though for the stat() call. Is that really faster than resizing the buffer as needed?

ab5tract · 2025-04-28T18:43:31Z

This does another file system access though for the stat() call. Is that really faster than resizing the buffer as needed?

For large files I've seen a 2-3x reduction in ingestion time.

But indeed, we would want a final version of such a feature to have some rubrics around only doing this when it's "worth it".

Another option would be to put the onus on the user entirely, for instance with a new multi candidate for IO::Path.slurp (eg, $path.IO.slurp(:presize)).

ab5tract · 2025-04-28T18:52:18Z

With AllPrintings.json, a 512160792 byte file.

# r is the alias for the development raku executable
r -Mnqp -e 'my $start = INIT now; say "Loading file..."; my $content = "AllPrintings.json".IO.slurp; say "Took {now - $start} seconds to load file into memory.";'
Loading file...
Took 4.564159965 seconds to load file into memory.

vs

raku -e 'my $start = INIT now; say "Loading file..."; my $content = "AllPrintings.json".IO.slurp; say "Took {now - $start} seconds to load file into memory.";'
Loading file...
Took 10.147421301 seconds to load file into memory.

Of course, I expect that hardware and software based optimizations around caching frequently accessed data from storage makes this kind of benchmarking (already an inexact science) a bit more tricky.

niner · 2025-04-28T19:00:25Z

As this is about a trade-off we also need numbers for how this affects loading of smaller files - which is arguably the far more common use case. What if the file is on remote storage like NFS where roundtrip times are much larger?

ugexe · 2025-04-28T22:12:56Z

You are referencing IO::Handle but all the code is IO::Path

ab5tract · 2025-04-29T06:19:00Z

As this is about a trade-off we also need numbers for how this affects loading of smaller files - which is arguably the far more common use case.

Right, as I said in my previous post, we would need to do some extensive benchmarking.

What if the file is on remote storage like NFS where roundtrip times are much larger?

Then I would think the base approach might also be suboptimal in that you would want to request larger (but probably not entire) chunks at a time. But that's just a first reaction based on limited experience, so..

Anyway, this is probably a good reason to place the behavior behind an opt-in adverb. We could emit some help text at the end of a file read that takes > X (based on the benchmarking above) informing the user that there may be a faster read option.

ab5tract · 2025-04-29T06:19:27Z

You are referencing IO::Handle but all the code is IO::Path

Thanks for catching this. I've adjusted the title.

MasterDuke17 · 2025-04-30T22:23:27Z

src/core.c/IO/Path.rakumod

          ?? nqp::join("\n",nqp::split("\r\n",nqp::decode($blob,$encoding)))
          !! ""
    }

    proto method slurp() {*}
    multi method slurp(IO::Path:D: :$bin!) {
+        my $size = max try self.s, fallback-slurp-size;


FYI, infix max (i.e., self.s max fallback-slurp-size) is quite a bit faster than sub/method max.

ab5tract added 2 commits April 28, 2025 09:11

Protect against self.s failures

d5c3dc9

ab5tract added IO performance labels Apr 28, 2025

ab5tract changed the title ~~IO::Handle.slurp: Read entire file contents at once~~ IO::Path.slurp: Read entire file contents at once Apr 29, 2025

MasterDuke17 reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IO::Path.slurp: Read entire file contents at once #5867

IO::Path.slurp: Read entire file contents at once #5867

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IO::Path.slurp: Read entire file contents at once #5867

Are you sure you want to change the base?

IO::Path.slurp: Read entire file contents at once #5867

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!