-
Notifications
You must be signed in to change notification settings - Fork 33
Read files with memory mapping #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the c 8000 ommunity.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Turns out I was right with my I have also added a benchmark comparing using memory mapping to loading the whole file into memory, had to expose a bit of internals for that, hopefully that's acceptable. I also bumped iterations a bit so that the numbers are a bit more meaningful — let me know if you want that reduced again. Up to date benchmarksTimer precision: 20 ns
loader fastest │ slowest │ median │ mean │ samples │ iters
╰─ loader │ │ │ │ │
├─ file_via_loader 6.392 s │ 6.467 s │ 6.409 s │ 6.423 s │ 3 │ 9
╰─ file_via_loader_slow 7.078 s │ 7.145 s │ 7.085 s │ 7.103 s │ 3 │ 9
# Android build with memmap
n2: error: build.ninja:528568: input out/host/linux-x86/bin/go/androidmk/linux_glibc_x86_64/obj/androidmk missing
________________________________________________________
Executed in 6.98 secs fish external
usr time 4.84 secs 0.16 millis 4.84 secs
sys time 2.10 secs 1.06 millis 2.10 secs
# Android build without memmap
n2: error: build.ninja:528568: input out/host/linux-x86/bin/go/androidmk/linux_glibc_x86_64/obj/androidmk missing
________________________________________________________
Executed in 7.27 secs fish external
usr time 5.15 secs 0.03 millis 5.15 secs
sys time 2.09 secs 1.00 millis 2.09 secs |
pub struct Loader { | ||
pub graph: graph::Graph, | ||
_graph: Rc<RefCell<graph::Graph>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to wrap the graph like this, so I can share it with file_loader
. Not sure how kosher that is, but I felt not having to remember about converting FileId
to PathBuf
at the site is nice.
I've been looking into trying to see how
nix-ninja
(which re-usesn2
) would deal with building Android. Of course even if it works, it's ways off — but getting there would probably necessitate upstreaming their improvements. It turns out it probably won't happen on their end — pdtpartners/nix-ninja#27 (reply in thread) — so I've been looking into how hard it would be to do this myself. The less divergence from upstream, the better, after all.I've picked the lowest hanging fruit I could think of, which is reading files via memory mapping — the Android
build.ninja
I've generated is ~2GB, so it's bound to take a while to read (and it will also be a memory hog). Contrary to the Android fork I've used a pre-existing memory mapping crate, hopefully abstracting out any platform differences.I have also tried a few map-like data structures to associate
FileId
s to file handles and went with vanillaHashMap
, at least for now (the Android fork usedDashMap
for concurrency, but I guess I'll evaluate that when I start porting actual concurrency). The way the code is structured (split between aFileMap
trait and aFileLoader
that can be be parameterised by any implementation) stems from that (it made it easier to swap implementations for benchmarking). I think it might make sense to leave it like that, because I feel like it's going to be useful when comparing how different data structures work with concurrency, but if you feel strongly about inlining it at this point, I don't mind either. I have also based this onnix-ninja
changes, because I wanted to be able to test with it, but I don't mind rebasing it it on the master branch (or just waiting for those changes to be merged first)This is at least partially based on the existing Android change, so it shouldn't be too egregiously wrong (at least both
cargo test --release
and manual tests with the Android build file seem to work), but I also haven't done a lot of low-level coding since the university and while I've followed Rust for over a decade, I don't have a whole lot of practical experience in actually writing non-trivial code in it. So here there could be dragons. There's two things I'm aware might be iffy (but there could be more):MMAP_FIXED
(and it's as far as I can tell not cross-platform either). What I've done, is I also extended the mapping by a page, made all of it R/W, written 0 at the last byte of file and made it R/O again. While it works in tests, I'm not entirely sure it's correct — for one I have a nagging feeling that this only works as long as the file never ends at the last byte of the page. If it ever did and we had to write the null byte at the start of the next page, we'de probably get aSIGBUS
. Do you think that's a reasonable suspicion? If so, I guess I could first add support forMMAP_FIXED
to the memory mapping crate first and do exactly what the original PR had done,read_file
call site, so I've pushed resolving the file path fromFileId
to theFileLoader
. But that necessitated passing at least part of theLoader
below and to get it to work I had to useRc<RefCell<_>>
. Now, like I said my practical experience wit non-trivial Rust code is scarce, so I don't really have a good working intuition of borrow checker and when to use which smart pointer type. This is what had gotten the code working, but looks kinda eh with all thoseborrow_mut
s. If there's a better way to getFileLoader
to resolve paths itself (instead of having to remember to do that at each call site) without having to resort to smart pointer zoo, then I can do that instead, but I'm not quite sure what that would be. I guess I could pass a closure closing over the path resolve method instead, but I don't think that's allow me to sidestep the issue, it will just move theRc
s to a different place? But like I said, I don't have a working intution of the borrows checker, so maybe there's some obvious solution I'm not seeing.Here are some benchmarks I've done on my machine (Ryzen 9 5900HX, so not great, not terrible).
So it looks like there's about a second of a gain, for a relatively straightforward change. Hopefully this PR is not too terribly and acceptable for upstreaming. If there's something I could have done differently let me know, and I'll try to improve that.