8000 Distributed IDR and ParILUT algorithms? · Issue #1793 · ginkgo-project/ginkgo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Distributed IDR and ParILUT algorithms? #1793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saustinp opened this issue Feb 19, 2025 · 3 comments
Open

Distributed IDR and ParILUT algorithms? #1793

saustinp opened this issue Feb 19, 2025 · 3 comments

Comments

@saustinp
Copy link
saustinp commented Feb 19, 2025

Hi there,

I'm new to Ginkgo, but I'm quite interested in its potential for incorporation into my lab's CFD code. We research DG FE methods for large-scale simulations, particularly on multi-GPU systems. Currently, we handroll all of our solvers, but I'm interested in adapting our solver workflow to utilize high performance off the shelf math libraries targeted for GPUs, which is how I discovered Ginkgo.

I came across the IDR(s) solver and was very encouraged by its performance, both in memory utilization and robustness compared to GMRES:
https://icl.utk.edu/newsletter/presentations/2015/Ponce-IDR-Solver-for-MAGMA%20Sparse-Iter-Package-2015-05-15.pdf
https://icl.utk.edu/files/publications/2017/icl-utk-1369-2017.pdf

I notice that Ginkgo supports IDR(s), but I don't believe that the package currently features multi-GPU/distributed support for the IDR(s) solver? I'm wondering if this is a development objective, or if there are fundamental limitations about the algorithm that are keeping it from scaling to distributed systems.

In addition, I'm also interested in the ParILUT preconditioner. I use ILU a lot for smaller cases but would love to be able to use it on larger problems, especially on the GPU. In the above references, I also see that it appears to significantly enhance the performance of IDR(s). I'm wondering if there is a plan to port ParILUT to multi-GPU/distributed memory as well?

In summary, I'd like to solicit advice from you all on the following:

  1. What is your experience with IDR? It seems quite promising from the linked references, but I haven't seen it incorporated into any other packages like PETSc or Trilinos. Is this just because the IDR concept is still new and hasn't "taken off" yet, or have people observed lackluster performance in real problems and passed over it in lieu of established alternatives like GMRES?
  2. If so, would it be possible to port IDR to multi-GPU/distributed memory similar to the rest of the algorithms in the Ginkgo solver collection?
  3. The same for ParILUT?

I appreciate your time and look forward to hopefully incorporating Ginkgo into my research!

@yhmtsai
Copy link
Member
yhmtsai commented Feb 19, 2025

I think IDR can work in distributed environment from the code perceptive.
However, some of parameters like
$\alpha = p^H_i g_i / m_{i, i}$ and $m_{i,k} = p^H_i * g_k$ should be computed in global not just local to fit the algorithm.
Trying it in distributed might be still interesting but the behavior might not be the same as non-distributed case from the above reason. It should give better performance per iteration because it does not use global operation for these two formula.

@MarcelKoch
Copy link
Member

I would disagree with @yhmtsai here. Right now, the IDR will definitely not work in the distributed setting. It will throw an exception if you try to use it with distributed vectors.
Getting it to run might require a bit more changes. There are the reductions that @yhmtsai mentioned, which need to be adjusted, but there are also some global dense matrix * dense matrix products in there, which don't work well so far. But we can definitely look into it, if this is something you would require.

As for the ParILUT, it requires an SpGEMM operation, which we also don't support in our distributed setting. There is als 7CBA o the issue with distributed triangular solves, which are also not available right now.
You could still use the ParILUT on each MPI rank locally, by wrapping it with a gko::experimental::distributed::preconditioner::Schwarz. TBH, I think this will be the best approach for a while, since getting distributed triangular solves to work might be quite challenging and perhaps not worth it do to the poor parallelization potential.

@saustinp
Copy link
Author

Hi all,

Thank you for your quick and informative feedback! @MarcelKoch, it sounds like IDR needs some work to support the distributed environment. Do you think this is a feasible task for your team? On what timeframe do you think it could be done, and how would I go about formally requesting the feature?

Thank you for the tip regarding handling the distributed ParILUT. I will try that first and let you know if I have any follow up questions.

I am pretty excited about the IDR solver and would love the chance to test it and Ginkgo's other solvers at scale with other solvers (both home-rolled and in the PETSc library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0