8000 IO::Path.slurp: Read entire file contents at once by ab5tract · Pull Request #5867 · rakudo/rakudo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

IO::Path.slurp: Read entire file contents at once #5867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ab5tract
8000 Copy link
Collaborator

We can easily pre-size the read volume in the call to nqp::readfh for the cases where it makes sense (ie., every path that isn't "-".IO).

We still preserve the older behavior in case the byte length is super low or if there is some other interruption to getting the self.s value at runtime.

@timo++ noticed this opportunity.

We can easily pre-size the read volume in the call to nqp::readfh
for the cases where it makes sense (ie., every path that isn't
"-".IO).

We still preserve the older behavior in case the byte length is
super low or if there is some other interruption to getting the
self.s value at runtime.
@niner
Copy link
Collaborator
niner commented Apr 28, 2025

This does another file system access though for the stat() call. Is that really faster than resizing the buffer as needed?

@ab5tract
Copy link
Collaborator Author

This does another file system access though for the stat() call. Is that really faster than resizing the buffer as needed?

For large files I've seen a 2-3x reduction in ingestion time.

But indeed, we would want a final version of such a feature to have some rubrics around only doing this when it's "worth it".

Another option would be to put the onus on the user entirely, for instance with a new multi candidate for IO::Path.slurp (eg, $path.IO.slurp(:presize)).

8000

@ab5tract
Copy link
Collaborator Author
ab5tract commented Apr 28, 2025

With AllPrintings.json, a 512160792 byte file.

# r is the alias for the development raku executable
r -Mnqp -e 'my $start = INIT now; say "Loading file..."; my $content = "AllPrintings.json".IO.slurp; say "Took {now - $start} seconds to load file into memory.";'
Loading file...
Took 4.564159965 seconds to load file into memory.

vs

raku -e 'my $start = INIT now; say "Loading file..."; my $content = "AllPrintings.json".IO.slurp; say "Took {now - $start} seconds to load file into memory.";'
Loading file...
Took 10.147421301 seconds to load file into memory.

Of course, I expect that hardware and software based optimizations around caching frequently accessed data from storage makes this kind of benchmarking (already an inexact science) a bit more tricky.

@niner
Copy link
Collaborator
niner commented Apr 28, 2025

As this is about a trade-off we also need numbers for how this affects loading of smaller files - which is arguably the far more common use case. What if the file is on remote storage like NFS where roundtrip times are much larger?

@ugexe
Copy link
Member
ugexe commented Apr 28, 2025

You are referencing IO::Handle but all the code is IO::Path

@ab5tract ab5tract changed the title IO::Handle.slurp: Read entire file contents at once IO::Path.slurp: Read entire file contents at once Apr 29, 2025
@ab5tract
Copy link
Collaborator Author

As this is about a trade-off we also need numbers for how this affects loading of smaller files - which is arguably the far more common use case.

Right, as I said in my previous post, we would need to do some extensive benchmarking.

What if the file is on remote storage like NFS where roundtrip times are much larger?

Then I would think the base approach might also be suboptimal in that you would want to request larger (but probably not entire) chunks at a time. But that's just a first reaction based on limited experience, so..

Anyway, this is probably a good reason to place the behavior behind an opt-in adverb. We could emit some help text at the end of a file read that takes > X (based on the benchmarking above) informing the user that there may be a faster read option.

@ab5tract
Copy link
Collaborator Author

You are referencing IO::Handle but all the code is IO::Path

Thanks for catching this. I've adjusted the title.

?? nqp::join("\n",nqp::split("\r\n",nqp::decode($blob,$encoding)))
!! ""
}

proto method slurp() {*}
multi method slurp(IO::Path:D: :$bin!) {
my $size = max try self.s, fallback-slurp-size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, infix max (i.e., self.s max fallback-slurp-size) is quite a bit faster than sub/method max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0