-
Notifications
You must be signed in to change notification settings - Fork 807
fix: Concurrent map read/write in Tree methods #1238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: previous-main
Are you sure you want to change the base?
Conversation
a4ab95c
to
065f7d5
Compare
@pjbgf Hello! Excuse me for tagging you, but is there any chance this change could be reviewed? |
@motoki317 thank you for your PR. Changes like this are better off targeting the Would you be able to provide a benchmark test showing the performance improvements? |
@pjbgf Thanks for taking a look at this. I don't think this change is a breaking one, but should I still target the v6 branch? I think it'd be better to import this into v5 (and maybe to v6 as well) if v6 is going to take a bit longer. For benchmark, I could try adding one, but I'm not sure how to go about benchmarking this. |
At least in my usecase, this change resulted in at least a few times faster processing time (a few hundreds ms to a few seconds gain), in most of real projects (I'm mining a lot of OSS repositories), using parallel calls to these methods. |
@motoki317 the
It is common for performance improvements to have side-effects, sometimes the slower processing times causes higher allocations, or memory usage, which leads to slower long-running applications due to GC pressure. For example this commit made clones 15% slower, however it reduced the memory cost by 77%. By running Go benchmarks and pprof we can understand what are the trade-offs before making such decisions. |
065f7d5
to
d80d2dc
Compare
@pjbgf Thank you for the response. I see, I have updated the base branch to As for the benchmark, I added a (rough-cut) benchmark code above, but I seem to be getting
This doesn't occur in my tool's usecase, so I may be missing something here, or it may be just that go-git's implementation is not really thread-safe at many places other than this. Nevertheless, when I ignore these errors, I obtained the following results. I run benchmarks on both main and in my branch, and both on
which is telling me that this change is making the code slower, at least for this benchmark code. However, I have checked twice now that this change indeed improves performance in my usecase, by checking performance before and after my tool's commit here. So, actually I am not sure what to do now. I'll probably keep using this patched go-git in my tool, and I may improve the "preload" implementation so it doesn't become a bottleneck in my tool and report it back here (when I have bandwidth to do so). |
@motoki317 thank you again for the follow-up on this.
That is correct. There are thread-safety issues and the idea is to start finding and fixing them as part of
Unfortunately, the failures could actually mean that not all code is being executed, which invalidate the benchmark results. Let me try to take a look at this in the coming days. |
The "tree path cache" internal field in Tree struct is preventing Tree methods to be called from multiple goroutines concurrently - producing "concurrent map read and map write" panics.
This PR disables this internal cache and allows the same Tree object methods to be called concurrently.
IMHO, this type of cache is a bad practice in Go and it is better to utilize multiple goroutines to make processing things faster.
Of course, unless the performance gain by this cache was measured thoroughly and this cache definitely improves performance, but I don't think this is the case.
In my project usecase, disabling this cache resulted in a faster performance of my tool. (see commit.)
Thanks!