-
Notifications
You must be signed in to change notification settings - Fork 445
Embed parallelization into the multi_voxel_fit decorator. #2593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @arokem, Thank you for updating !
Comment last updated at 2024-06-29 20:43:35 UTC |
Oh yeah - that’s a great pointer! I’ll try to incorporate the ideas you
implemented there into this PR.
…On Sun, May 8, 2022 at 10:07 AM Serge Koudoro ***@***.***> wrote:
Thank you for starting this @arokem <https://github.com/arokem>?
Have you looked at #1418 <#1418>? I
think some ideas can be reused here.
—
Reply to this email directly, view it on GitHub
<#2593 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NTHC7HEWHHJSVVXWKDVI7YGFANCNFSM5VLIP45A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2593 +/- ##
==========================================
- Coverage 83.65% 83.64% -0.02%
==========================================
Files 152 152
Lines 21374 21395 +21
Branches 3459 3465 +6
==========================================
+ Hits 17881 17895 +14
- Misses 2629 2636 +7
Partials 864 864
|
I ran a benchmark on a beefy 24-cpu compute server with the recent commit.I get a roughly 13x speedup for fitting the fwdti model with |
Does anyone understand why half the CI actions are still pending? They have been pending since Friday! |
No, but I will restart them first |
Hi @arokem, It seems we have a new issue with DIPY installation. I do not know yet what changes. the CI's are failing in all PR. |
Just rebased on top of #2595 |
Does anyone understand these CI failures? I don't think they are related to the content of the PR, but I might be missing something. |
Both failures are on the parallelization CI's with a memory leaks issue. This might be due to some of the parallel packages that might change some environment variable flags. These flags could have an impact on this parallelized function. All supposition, this is what comes first to my mind. |
the failing function are using |
Hey @skoudoro, I noticed that you did not pin the ray version in #2600, instead pinning only protobuf, but I am seeing this again on the CI: https://github.com/dipy/dipy/runs/6634820045?check_suite_focus=true#step:9:119, which suggests to me that I should pin ray to 0.11 for now. Does that make sense to you? I'll give it a try here. |
Or possibly 1.11.1 |
We're back to this failure: https://github.com/dipy/dipy/runs/6645881563?check_suite_focus=true#step:9:3751 Interestingly, I can't get this to fail locally on my machine (in an env with dask, ray and joblib installed). I also don't exactly understand how this is related to openmp. Does |
dipy/reconst/multi_voxel.py
Outdated
single_voxel_with_self = partial(single_voxel_fit, self) | ||
n_jobs = kwargs.get("n_jobs", multiprocessing.cpu_count() - 1) | ||
vox_per_chunk = np.max([data_to_fit.shape[0] // n_jobs, 1]) | ||
chunks = [data_to_fit[ii:ii + vox_per_chunk] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might duplicate memory. Need to benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this: https://pypi.org/project/memory-profiler/
8d7f71b
to
173160c
Compare
Plan to make progress here:
|
173160c
to
ddac5c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, it works with `engin in ["serial", "dask"]
- my laptop crash with
ray
- See below for joblib issue.
I will share the timing when those 2 are fixed.
Thanks @arokem
dipy/reconst/multi_voxel.py
Outdated
_parallel_fit_worker, | ||
chunks, | ||
func_args=[single_voxel_with_self], | ||
**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dask
did not complain but joblib
fails with this:
TypeError: __init__() got an unexpected keyword argument 'vox_per_chunk'
we need to update paramap function
7e158ff
to
dda2ffa
Compare
Co-authored-by: Oscar Esteban <code@oscaresteban.es>
Co-authored-by: Oscar Esteban <code@oscaresteban.es>
Co-authored-by: Oscar Esteban <code@oscaresteban.es>
Rebased. Could you help me with the CI? I am not sure how to check that we're in the "optional" case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased. Could you help me with the CI? I am not sure how to check that we're in the "optional" case.
Thanks! It should be something like the suggestion below. I hope there is no typo in this syntax
Co-authored-by: Serge Koudoro <skab12@gmail.com>
Thanks!
…On Fri, Jun 28, 2024 at 7:08 PM Serge Koudoro ***@***.***> wrote:
***@***.**** commented on this pull request.
Rebased. Could you help me with the CI? I am not sure how to check that
we're in the "optional" case.
Thanks! It should something like the suggestion below. I hope there is no
typo in this syntax
------------------------------
In .github/workflows/test_template.yml
<#2593 (comment)>:
> @@ -96,6 +96,16 @@ jobs:
python-version: ${{ matrix.python-version }}
channels: defaults, conda-forge
use-only-tar-bz2: true
+ - name: Install HDF5 and pytables on macOS
+ if: runner.os == 'macOS'
⬇️ Suggested change
- if: runner.os == 'macOS'
+ if: ${{ (runner.os == 'macOS') && (env.EXTRA_DEPENDS != '') && (env.INSTALL_TYPE == 'pip') }}
—
Reply to this email directly, view it on GitHub
<#2593 (review)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NVVEQ4JXZUBKN3BKI3ZJUY2NAVCNFSM5VLIP45KU5DIOJSWCZC7NNSXTPCQOVWGYUTFOF2WK43UKJSXM2LFO45TEMJUG42TIMJQGA3A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arokem,
I was going to merge but I saw that a previous comment has been remove but not addressed. see below the suggestion and then I can go ahead.
Thank you
Co-authored-by: Serge Koudoro <skab12@gmail.com>
Co-authored-by: Serge Koudoro <skab12@gmail.com>
Sorry about that and thanks for spotting this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, all good, I will go ahead and merge this PR.
Thank you for this amazing work @arokem and @asagilmore!
NOTE: we need to document it in details somewhere. It will take time. So, for now, I recommend a follow-up PR to add a small recipe in dipy documentation (see https://docs.dipy.org/stable/recipes). Just add a small section: How to accelerate the fitting in all DIPY reconstruction models?
or something similar.
We're writing up a report about this here and we'd be happy to have input on the results and ideas that we are developing there (the repo for that report is here: https://github.com/nrdg/2024-dipy-parallelization).
Where can we find the comparison with joblib, dask and ray? I see a lot of work with Ray and this is clearly the recommended backend. However, it would be great to see the advantage against the others backend. Thank you for your feedback
Following up with my questions above @arokem and @asagilmore |
I've started playing around with the idea that the multi_voxel_fit decorator could use
paramap
instead of iterating over voxels. If we can make this work generally that would be pretty cool. So far, I've only tested this with the fwdti model, and in that case, the change to the additional changes to the code are rather minimal, which gives me hope that we might be able to use this wherever we use this decorator, so in csd, dsi, forecast, fwdti, gqi, ivim, mapmri, mcsd, qtdmri, and shore (!).