Parallel import scanning (python) #198

Peter554 · 2025-04-04T08:03:14Z

This PR uses CPU parallelism to accelerate import scanning. For large codebases this should make building the graph much faster (on machines with multiple CPU cores).

On my machine (8 cores):

Before: Building graph for Kraken without cache took 46s.
After: Building graph for Kraken without cache takes 14s.

We hope to move more and more of this graph building code into rust, so this change may not live very long. From my perspective the change can still be valuable though:

It's a small change, so little effort.
It allows for a fairer comparison when moving this code to rust (is the speed up we later achieve simply due to parallelism, or something else more rust specific?).

A previous iteration by @duzumaki here #142. Changes compared to that PR:

Fix the issue with test_syntax_error_includes_module by making SourceSyntaxError pickleable. Fixes the issue here Speed up import scanning by 400-500% #142 (comment)
Use joblib for a cleaner interface to multiprocessing.
Chunk the modules files to ensure that we only create one job per CPU. This avoids having to serialise the (large) import_scanner object many times e.g. if we would create a job per module file. I believe this will fix the issue observed here Speed up import scanning by 400-500% #142 (comment)

Benchmarks suggest building without the cache is much faster, but building with the cache is a little slower #198 (comment)

codspeed-hq · 2025-04-04T08:22:25Z

CodSpeed Instrumentation Performance Report

Merging #198 will improve performances by ×43

_{Comparing Peter554:parallel-import-scanning (ab2b66a) with master (23a1e85)}

Summary

⚡ 3 improvements
✅ 19 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`test_build_django_from_cache_a_few_misses[15]`	375.6 ms	173 ms	×2.2
⚡	`test_build_django_from_cache_a_few_misses[350]`	5,790.1 ms	295.7 ms	×20
⚡	`test_build_django_uncached`	6,147.9 ms	143.4 ms	×43

Peter554 · 2025-04-04T09:11:03Z

src/grimp/exceptions.py

+    def __reduce__(self):
+        return SourceSyntaxError, (self.filename, self.lineno, self.text)


See https://stackoverflow.com/a/36342588

Probably worth adding a comment to explain why this is necessary.

Peter554 · 2025-04-04T09:45:50Z

@seddonym This doesn't seem believable ☝️ - I think it can have something to do with https://docs.codspeed.io/instruments/cpu/#system-calls

Benchmarks on my machine look more believable:

------- benchmark 'test_build_django_from_cache': 2 tests -------
Name (time in ms)                                  Mean
-----------------------------------------------------------------
test_build_django_from_cache (master)     36.0710 (1.0)
test_build_django_from_cache (NOW)              72.1804 (2.00)
-----------------------------------------------------------------

-------- benchmark 'test_build_django_uncached': 2 tests ---------
Name (time in ms)                                   Mean
------------------------------------------------------------------
test_build_django_uncached (NOW)                251.8374 (1.0)
test_build_django_uncached (master)     1,064.6040 (4.23)
------------------------------------------------------------------

seddonym

This is fantastic, thank you! A real game changer.

I would like to address the degraded performance when the cache is warm - but it looks like that is really easily done. I'll open a separate PR to benchmark using the cache for small numbers of changes, and we can rebase off this to check there isn't degraded performance there too.

I'm not yet sold on introducing joblib - would prefer not to introduce third party dependencies unless there's a really strong reason. Could we try this using the standard library instead?

seddonym · 2025-04-04T10:12:03Z

src/grimp/exceptions.py

+    def __reduce__(self):
+        return SourceSyntaxError, (self.filename, self.lineno, self.text)


Probably worth adding a comment to explain why this is necessary.

seddonym · 2025-04-04T10:12:23Z

src/grimp/application/usecases.py

-    for found_package in found_packages:
-        for module_file in found_package.module_files:
-            module = module_file.module
+    modules_files_to_scan = {


Should this be module_files_to_scan?

seddonym · 2025-04-04T10:27:26Z

src/grimp/application/usecases.py

@@ -184,3 +203,21 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool:
        module.is_descendant_of(package_module) or module == package_module
        for package_module in package_modules
    )
+
+
+def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]:


🐼 module_files could be Collection[ModuleFile] for more flexibility.

seddonym · 2025-04-04T10:28:48Z

src/grimp/application/usecases.py

@@ -184,3 +203,21 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool:
        module.is_descendant_of(package_module) or module == package_module
        for package_module in package_modules
    )
+
+
+def _create_chunks(module_files: List[ModuleFile], *, n_chunks: int) -> List[List[ModuleFile]]:


🐼 Always sceptical when I see a list 😉 .

Maybe tuple[tuple[ModuleFile], ...] would be a slightly better return type as it's immutable?

Also considered frozenset[frozenset[ModuleFile]] but I wonder if hashing might incur a penalty here.

seddonym · 2025-04-04T10:29:54Z

src/grimp/application/usecases.py

+def _scan_imports(
+    import_scanner: AbstractImportScanner,
+    exclude_type_checking_imports: bool,
+    module_files: List[ModuleFile],


Could be Iterable[ModuleFile] for greater flexibility.

seddonym · 2025-04-04T10:37:02Z

src/grimp/application/usecases.py

-    for module_file in modules_files_to_scan.difference(imports_by_module_file):
-        imports_by_module_file[module_file] = import_scanner.scan_for_imports(
-            module_file.module, exclude_type_checking_imports=exclude_type_checking_imports
+    remaining_modules_files_to_scan = list(


I think the _scan_packages function has tipped over into becoming too long now - would you mind breaking it up a bit?

seddonym · 2025-04-04T10:45:28Z

src/grimp/application/usecases.py

+    chunked_remaining_modules_files_to_scan = _create_chunks(
+        remaining_modules_files_to_scan, n_chunks=N_CPUS
+    )
+    with parallel_config(n_jobs=N_CPUS):


When the cache is full, remaining_modules_files_to_scan is empty - if that's the case it's a waste of time to spin up these processes (and it noticeably degrades performance). At the very least we should wrap this in if remaining_modules_files_to_scan. I tried that locally and it recovered the degraded performance for a fully populated cache.

Also, n_jobs should be min(N_CPUS, len(remaining_module_files_to_scan)), right? I tried that locally and it improves the situation if I've only changed one file. It would be good to add a benchmark for that scenario as I don't think we currently measure it.

I'm not sure what the threshold is multiprocessing becomes worthwhile, but we could possibly improve on the logic a little by having a non-parallel track for this too, for small numbers of changes. That said, I tried the approach of spinning up only one extra process for one changed module and it wasn't noticeably slower, so it's possibly not worth it.

seddonym · 2025-04-04T17:00:52Z

Oh - just remembered one thing, would you mind adding a line to the CHANGELOG too?

Peter554

@seddonym I've added a pure multiprocessing version as a fixup now. It does seem to be a bit slower.

When running tests locally I notice the difference in the overall time of the test suite.
When running benchmarks I see that it's quite a bit slower than joblib. Benchmarks on my machine:

------- benchmark 'test_build_django_from_cache':  --------------
Name (time in ms)                                         Mean
-----------------------------------------------------------------
test_build_django_from_cache (joblib)                     37.9565
test_build_django_from_cache (master)                     38.1530
test_build_django_from_cache (multiprocessing)            39.2882
-----------------------------------------------------------------

-------- benchmark 'test_build_django_uncached':  ----------------
Name (time in ms)                                         Mean
------------------------------------------------------------------
test_build_django_uncached (joblib)                       250.2343
test_build_django_uncached (multiprocessing)              421.2143
test_build_django_uncached (master)                     1,066.4864
------------------------------------------------------------------

I think it's likely okay to use joblib here:

joblib has 4000 stars on GitHub, so isn't going anywhere too soon.
joblib has no dependences beyond python and emphasises robustness over features (see "design choices" https://joblib.readthedocs.io/en/stable/why.html#design-choices)
Absolute worst case if this dependency breaks we can just remove it again and things get a bit slower.

So I'd suggest I remove that fixup commit and we go with the joblib version.

What do you think?

This is needed to ensure that the error can be sent between processes. This ensures that the test `test_syntax_error_includes_module` still passes after using multiprocessing.

This is helpful, to avoid having to pass the large cache object to the multiprocessing code.

seddonym

Great stuff, thanks so much for this.

As discussed elsewhere, we should probably add an optimization for small numbers of modules where it's not worth multiprocessing. That will speed up the functional tests, if nothing else.

seddonym · 2025-04-08T14:43:23Z

src/grimp/application/usecases.py

@@ -186,3 +182,38 @@ def _is_external(module: Module, found_packages: Set[FoundPackage]) -> bool:
        module.is_descendant_of(package_module) or module == package_module
        for package_module in package_modules
    )
+
+
+def _read_imports_from_cache(


Thanks for breaking this up, much easier to read.

seddonym · 2025-04-08T14:44:35Z

src/grimp/application/usecases.py

@@ -14,6 +15,8 @@
 from ..domain.valueobjects import DirectImport, Module
 from .config import settings

+N_CPUS = multiprocessing.cpu_count()


🤔 I wonder what the overhead of this is, probably tiny but it might make sense to calculate it at runtime, only if we need it.

Maybe worth tweaking next time we're in this file.

Peter554 force-pushed the parallel-import-scanning branch 2 times, most recently from 2c2fa5d to 2352163 Compare April 4, 2025 08:18

Peter554 force-pushed the parallel-import-scanning branch from 2352163 to bfc829a Compare April 4, 2025 08:52

Peter554 marked this pull request as ready for review April 4, 2025 09:09

Peter554 force-pushed the parallel-import-scanning branch from bfc829a to ee3f43c Compare April 4, 2025 09:10

Peter554 commented Apr 4, 2025

View reviewed changes

seddonym requested changes Apr 4, 2025

View reviewed changes

Peter554 force-pushed the parallel-import-scanning branch from ee3f43c to 3f6fdc3 Compare April 4, 2025 16:30

Peter554 force-pushed the parallel-import-scanning branch 3 times, most recently from e6de21e to f16dff9 Compare April 4, 2025 19:27

Peter554 commented Apr 4, 2025

View reviewed changes

Peter554 requested a review from seddonym April 4, 2025 19:44

Peter554 force-pushed the parallel-import-scanning branch 6 times, most recently from 64c9f7d to 49ef093 Compare April 7, 2025 12:45

Peter554 added 3 commits April 8, 2025 13:36

Make SourceSyntaxError pickleable

708f40c

This is needed to ensure that the error can be sent between processes. This ensures that the test `test_syntax_error_includes_module` still passes after using multiprocessing.

Separate cache lookup from import scanning

fc55897

This is helpful, to avoid having to pass the large cache object to the multiprocessing code.

Break up _scan_packages into smaller functions

5c22437

Peter554 force-pushed the parallel-import-scanning branch 2 times, most recently from 613f0cd to 813881d Compare April 8, 2025 11:43

Peter554 added 2 commits April 8, 2025 16:31

Parallelise import scanning

c3ba71c

Update changelog

ab2b66a

Peter554 force-pushed the parallel-import-scanning branch from 813881d to ab2b66a Compare April 8, 2025 14:31

seddonym approved these changes Apr 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel import scanning (python) #198

Parallel import scanning (python) #198

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		def __reduce__(self):
		return SourceSyntaxError, (self.filename, self.lineno, self.text)

Parallel import scanning (python) #198

Parallel import scanning (python) #198

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #198 will improve performances by ×43

Summary

Benchmarks breakdown

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!