8000 Add Distributed Multigrid. by pratikvn · Pull Request #1269 · ginkgo-project/ginkgo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add Distributed Multigrid. #1269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 19, 2024
Merged

Add Distributed Multigrid. #1269

merged 6 commits into from
Apr 19, 2024

Conversation

pratikvn
Copy link
Member
@pratikvn pratikvn commented Feb 6, 2023

This PR updates the multigrid class to handle distributed matrices and hence allows preconditioning and solution with distributed multigrid.

Major changes

  1. Store row and column partition objects in the Matrix class to use within Multigrid.
  2. Template the memory allocation and multigrid core functions on VectorType and allow dynamic switching between the two.
  3. Store matrix_data object in the distributed matrix class to be able to generate coarse matrices.

Of course, as there is no coarse generation method, we cannot still use distributed multigrid automatically, but that will be added in a future PR.

Points of discussion

  1. We probably need to store the partition objects in the distributed matrix class, but I am open to other alternatives.
  2. For ease, we also probably need to store the matrix_data object (or devie_matrix_data), but I am not very happy about this.

Issues

  1. The mixed precision version of distributed multigrid does not yet work and needs to be looked into.

@pratikvn pratikvn added mod:core This is related to the core module. type:multigrid This is related to multigrid 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. type:distributed-functionality labels Feb 6, 2023
@pratikvn pratikvn self-assigned this Feb 6, 2023
@ginkgo-bot ginkgo-bot added the type:solver This is related to the solvers label Feb 6, 2023
@MarcelKoch MarcelKoch self-requested a review February 6, 2023 13:07
@upsj upsj requested review from upsj, greole and yhmtsai February 6, 2023 13:29
@greole
Copy link
Collaborator
greole commented Feb 9, 2023

Would that allow to use Multigrid as a preconditioner without Schwarz?

@pratikvn pratikvn force-pushed the dist-schwarz branch 4 times, most recent 8000 ly from 8274219 to fc978a0 Compare February 9, 2023 14:15
@MarcelKoch
Copy link
Member

Right now, you use the partition only to get the local size of the matrix, which you can also get from the local matrix. The stored matrix data is not used at all. I would suggest removing these until they are actually necessary.

@pratikvn pratikvn force-pushed the dist-schwarz branch 2 times, most recently from cf7cbcf to de2adf6 Compare February 10, 2023 14:13
Base automatically changed from dist-schwarz to develop February 11, 2023 12:27
@pratikvn
Copy link
Member Author

@greole , Yes, but a coarse generation algorithm ( that is distributed capable) is necessary. Meaning we need to have the equivalent of AMGX which generates the triplet (R, A_c, P).

@pratikvn
Copy link
Member Author

@MarcelKoch, yes, I dont intend to merge this yet. I just wanted to show what changes could be necessary. At present we do not need the partition or the matrix_data.

Copy link
Member
yhmtsai commented Feb 19, 2023

@pratikvn could you rebase it? I think some changes are related to the schwarz?

@pratikvn pratikvn force-pushed the dist-mg-update branch 2 times, most recently from 55ad491 to ce4d26a Compare February 20, 2023 07:57
Comment on lines 525 to 526
this->run_mg_cycle<VectorType>(cycle, level + 1, next_level_matrix, g.get(),
e.get(), next_mode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can not be VectorType. Different layers may have different precision

#endif


template <typename VectorType>
8000 Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same issue

@MarcelKoch MarcelKoch removed their request for review October 25, 2023 09:07
@yhmtsai yhmtsai force-pushed the dist-mg-update branch 2 times, most recently from 3f09518 to a3f7ba4 Compare January 18, 2024 08:50
@MarcelKoch MarcelKoch requested a review from greole April 5, 2024 09:24
@MarcelKoch MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024
@yhmtsai yhmtsai assigned yhmtsai and unassigned pratikvn Apr 8, 2024
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Apr 8, 2024
Copy link
Member Author
@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.


auto mg = mg_factory->generate(mtx);

ASSERT_NO_THROW(mg->apply(b, x));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also a test that shows that the result is the same as in the non-distrtibuted case would make sense ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think distributed coarsening method usually different from the non-distributed one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are testing the solver here, I think checking if the two solutions are within some tolerance makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, for this situation, I can not do it because we do not have actual distributed coarsening method.
I think the current test should be enough because the distributed multigrid actually only handle the vector allocation. These tests ensure the allocation is correct and the corresponding apply works correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. Maybe make a note/TODO here for now, and it should be okay to add it later

* DistributedLocalSize is a feature class providing `get_local_size` which
* return the size of local operator.
*/
class DistributedLocalSize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a separate interface class ? Can this not be in DistributedBase ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the question arises what to do with the non_local_size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about putting it into the DistributedBase, but I wonder it is out of the scope because it only handles the communicator now.
non_local_size can not go into this base because the vector does not have the idea about non_local_size.
Or, we just put non_local_size of vector is dim<2>(0, 0).
I do not have an idea about the usage of non_local_size out of the matrix

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean putting non_local_size into the DistributedLocalSize interface class, but if we have a DistributedLocalSize we should probably also implement one to handle non_local_size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I only mean the functionality but don't mean put it into the DistributedLocalSize.
I agree with user may expect something similar to get_local_size() for non_local_matrix().
For me, if we need to get_non_local_size(), we need to put get_local_size() into DistributedLocalSize and get_non_local_size() into DistributedNonLocalSize because vector does not have non_local information such that we can not put them together into DistrbutedBase. One in DistrbutedBase and another in DistributedNonLocalSize are also inconsistent in my mind. (unless we put non_local_size = 0 in vector cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be okay to have only local_size in DistributedBase. A separate interface class is too clunky IMO. But given that this probably will be overwritten, I dont have a strong opinion, and we can discuss this later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move it to DistributedBase now.

Copy link
Collaborator
@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments from my side. Looks good to me in general. Maybe the tests can be simplified to avoid the implementation of the Dummy classes.

* DistributedLocalSize is a feature class providing `get_local_size` which
* return the size of local operator.
*/
class DistributedLocalSize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the question arises what to do with the non_local_size.

@yhmtsai
Copy link
Member
yhmtsai commented Apr 14, 2024

@greole yes, the coarsening may affect the non_local_matrix, but coarsening method will take care of it not the multigrid itself.
In distributed multigrid, we have prepared the distributed matrix each level already, so we only need to deal with the vector according to each distributed matrix (the usage of local size here for creating the vector).

@pratikvn pratikvn removed the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Apr 16, 2024
Copy link
Member Author
@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from my side, this looks good to be merged.

yhmtsai and others added 2 commits April 18, 2024 11:54
Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>
Copy link
Member
@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to github pr limitation, approve on behalf of @pratikvn

Copy link
Collaborator
@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a final question, in order to use it we also need to merge a distributed coarsening method, like PGM?

@pratikvn
Copy link
Member Author

@greole, yes, #1403 would need to be merged as well. @yhmtsai , please feel to merge this when you are ready. If you need me to merge it, let me know.

@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Apr 18, 2024
@yhmtsai yhmtsai force-pushed the dist-mg-update branch 2 times, most recently from e4387bc to 95ead72 Compare April 19, 2024 09:19
yhmtsai and others added 4 commits April 19, 2024 13:29
Co-authored-by: Gregor Olenik <gregor.olenik@web.de>
Co-authored-by: Gregor Olenik <gregor.olenik@web.de>
Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>
@yhmtsai yhmtsai merged commit c60c77e into develop Apr 19, 2024
12 of 15 checks passed
@yhmtsai yhmtsai deleted the dist-mg-update branch April 19, 2024 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. type:distributed-functionality type:multigrid This is related to multigrid type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0