-
Notifications
You must be signed in to change notification settings - Fork 15
Parallel import scanning (python) #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2c2fa5d
to
2352163
Compare
CodSpeed Instrumentation Performance ReportMerging #198 will improve performances by ×43Comparing Summary
Benchmarks breakdown
|
2352163
to
bfc829a
Compare
bfc829a
to
ee3f43c
Compare
src/grimp/exceptions.py
Outdated
def __reduce__(self): | ||
return SourceSyntaxError, (self.filename, self.lineno, self.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding a comment to explain why this is necessary.
@seddonym This doesn't seem believable ☝️ - I think it can have something to do with https://docs.codspeed.io/instruments/cpu/#system-calls Benchmarks on my machine look more believable:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thank you! A real game changer.
I would like to address the degraded performance when the cache is warm - but it looks like that is really easily done. I'll open a separate PR to benchmark using the cache for small numbers of changes, and we can rebase off this to check there isn't degraded performance there too.
I'm not yet sold on introducing joblib - would prefer not to introduce third party dependencies unless there's a really strong reason. Could we try this using the standard library instead?
src/grimp/exceptions.py
Outdated
def __reduce__(self): | ||
return SourceSyntaxError, (self.filename, self.lineno, self.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding a comment to explain why this is necessary.
src/grimp/application/usecases.py
Outdated
for found_package in found_packages: | ||
for module_file in found_package.module_files: | ||
module = module_file.module | ||
modules_files_to_scan = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be module_files_to_scan
?
src/grimp/application/usecases.py
Outdated
@@ -184,3 +203,21 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool: | |||
module.is_descendant_of(package_module) or module == package_module | |||
for package_module in package_modules | |||
) | |||
|
|||
|
|||
def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐼 module_files
could be Collection[ModuleFile]
for more flexibility.
src/grimp/application/usecases.py
Outdated
@@ -184,3 +203,21 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool: | |||
module.is_descendant_of(package_module) or module == package_module | |||
for package_module in package_modules | |||
) | |||
|
|||
|
|||
def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐼 Always sceptical when I see a list 😉 .
Maybe tuple[tuple[ModuleFile], ...]
would be a slightly better return type as it's immutable?
Also considered frozenset[frozenset[ModuleFile]]
but I wonder if hashing might incur a penalty here.
src/grimp/application/usecases.py
Outdated
def _scan_imports( | ||
import_scanner: AbstractImportScanner, | ||
exclude_type_checking_imports: bool, | ||
module_files: List[ModuleFile], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be Iterable[ModuleFile]
for greater flexibility.
src/grimp/application/usecases.py
Outdated
for module_file in modules_files_to_scan.difference(imports_by_module_file): | ||
imports_by_module_file[module_file] = import_scanner.scan_for_imports( | ||
module_file.module, exclude_type_checking_imports=exclude_type_checking_imports | ||
remaining_modules_files_to_scan = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the _scan_packages
function has tipped over into becoming too long now - would you mind breaking it up a bit?
src/grimp/application/usecases.py
Outdated
chunked_remaining_modules_files_to_scan = _create_chunks( | ||
remaining_modules_files_to_scan, n_chunks=N_CPUS | ||
) | ||
with parallel_config(n_jobs=N_CPUS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the cache is full, remaining_modules_files_to_scan
is empty - if that's the case it's a waste of time to spin up these processes (and it noticeably degrades performance). At the very least we should wrap this in if remaining_modules_files_to_scan
. I tried that locally and it recovered the degraded performance for a fully populated cache.
Also, n_jobs
should be min(N_CPUS, len(remaining_module_files_to_scan))
, right? I tried that locally and it improves the situation if I've only changed one file. It would be good to add a benchmark for that scenario as I don't think we currently measure it.
I'm not sure what the threshold is multiprocessing becomes worthwhile, but we could possibly improve on the logic a little by having a non-parallel track for this too, for small numbers of changes. That said, I tried the approach of spinning up only one extra process for one changed module and it wasn't noticeably slower, so it's possibly not worth it.
ee3f43c
to
3f6fdc3
Compare
Oh - just remembered one thing, would you mind adding a line to the CHANGELOG too? |
e6de21e
to
f16dff9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seddonym I've added a pure multiprocessing version as a fixup now. It does seem to be a bit slower.
- When running tests locally I notice the difference in the overall time of the test suite.
- When running benchmarks I see that it's quite a bit slower than joblib. Benchmarks on my machine:
------- benchmark 'test_build_django_from_cache': --------------
Name (time in ms) Mean
-----------------------------------------------------------------
test_build_django_from_cache (joblib) 37.9565
test_build_django_from_cache (master) 38.1530
test_build_django_from_cache (multiprocessing) 39.2882
-----------------------------------------------------------------
-------- benchmark 'test_build_django_uncached': ----------------
Name (time in ms) Mean
------------------------------------------------------------------
test_build_django_uncached (joblib) 250.2343
test_build_django_uncached (multiprocessing) 421.2143
test_build_django_uncached (master) 1,066.4864
------------------------------------------------------------------
I think it's likely okay to use joblib here:
- joblib has 4000 stars on GitHub, so isn't going anywhere too soon.
- joblib has no dependences beyond python and emphasises robustness over features (see "design choices" https://joblib.readthedocs.io/en/stable/why.html#design-choices)
- Absolute worst case if this dependency breaks we can just remove it again and things get a bit slower.
So I'd suggest I remove that fixup commit and we go with the joblib version.
What do you think?
64c9f7d
to
49ef093
Compare
This is needed to ensure that the error can be sent between processes. This ensures that the test `test_syntax_error_includes_module` still passes after using multiprocessing.
This is helpful, to avoid having to pass the large cache object to the multiprocessing code.
613f0cd
to
813881d
Compare
813881d
to
ab2b66a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, thanks so much for this.
As discussed elsewhere, we should probably add an optimization for small numbers of modules where it's not worth multiprocessing. That will speed up the functional tests, if nothing else.
@@ -186,3 +182,38 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool: | |||
module.is_descendant_of(package_module) or module == package_module | |||
for package_module in package_modules | |||
) | |||
|
|||
|
|||
def _read_imports_from_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for breaking this up, much easier to read.
@@ -14,6 +15,8 @@ | |||
from ..domain.valueobjects import DirectImport, Module | |||
from .config import settings | |||
|
|||
N_CPUS = multiprocessing.cpu_count() |
This PR uses CPU parallelism to accelerate import scanning. For large codebases this should make building the graph much faster (on machines with multiple CPU cores).
On my machine (8 cores):
We hope to move more and more of this graph building code into rust, so this change may not live very long. From my perspective the change can still be valuable though:
A previous iteration by @duzumaki here #142. Changes compared to that PR:
test_syntax_error_includes_module
by makingSourceSyntaxError
pickleable. Fixes the issue here Speed up import scanning by 400-500% #142 (comment)import_scanner
object many times e.g. if we would create a job per module file. I believe this will fix the issue observed here Speed up import scanning by 400-500% #142 (comment)Benchmarks suggest building without the cache is much faster, but building with the cache is a little slower #198 (comment)