-
-
Notifications
You must be signed in to change notification settings - Fork 375
IO::Path.slurp: Read entire file contents at once #5867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
We can easily pre-size the read volume in the call to nqp::readfh for the cases where it makes sense (ie., every path that isn't "-".IO). We still preserve the older behavior in case the byte length is super low or if there is some other interruption to getting the self.s value at runtime.
This does another file system access though for the stat() call. Is that really faster than resizing the buffer as needed? |
For large files I've seen a 2-3x reduction in ingestion time. But indeed, we would want a final version of such a feature to have some rubrics around only doing this when it's "worth it". Another option would be to put the onus on the user entirely, for instance with a new multi candidate for |
With
vs
Of course, I expect that hardware and software based optimizations around caching frequently accessed data from storage makes this kind of benchmarking (already an inexact science) a bit more tricky. |
As this is about a trade-off we also need numbers for how this affects loading of smaller files - which is arguably the far more common use case. What if the file is on remote storage like NFS where roundtrip times are much larger? |
You are referencing |
Right, as I said in my previous post, we would need to do some extensive benchmarking.
Then I would think the base approach might also be suboptimal in that you would want to request larger (but probably not entire) chunks at a time. But that's just a first reaction based on limited experience, so.. Anyway, this is probably a good reason to place the behavior behind an opt-in adverb. We could emit some help text at the end of a file read that takes > X (based on the benchmarking above) informing the user that there may be a faster read option. |
Thanks for catching this. I've adjusted the title. |
?? nqp::join("\n",nqp::split("\r\n",nqp::decode($blob,$encoding))) | ||
!! "" | ||
} | ||
|
||
proto method slurp() {*} | ||
multi method slurp(IO::Path:D: :$bin!) { | ||
my $size = max try self.s, fallback-slurp-size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, infix max (i.e., self.s max fallback-slurp-size
) is quite a bit faster than sub/method max.
We can easily pre-size the read volume in the call to nqp::readfh for the cases where it makes sense (ie., every path that isn't "-".IO).
We still preserve the older behavior in case the byte length is super low or if there is some other interruption to getting the self.s value at runtime.
@timo++ noticed this opportunity.