8000 Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels by pratikvn · Pull Request #1450 · ginkgo-project/ginkgo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels #1450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Dec 12, 2023

Conversation

pratikvn
Copy link
Member
@pratikvn pratikvn commented Nov 5, 2023

This PR adds a batched Csr matrix format, that stores the same sparsity pattern for all entries, but different values for each batch. Only simple and advanced apply to batch::MultiVector are supported for now.

TODO

  • Add CUDA/HIP kernels
  • Add DPCPP kernels
  • Add to batch::Bicgstab dispatch
  • Update docs

@pratikvn pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. type:batched-functionality This is related to the batched functionality in Ginkgo labels Nov 5, 2023
@pratikvn pratikvn self-assigned this Nov 5, 2023
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners type:matrix-format This is related to the Matrix formats type:stopping-criteria This is related to the stopping criteria mod:all This touches all Ginkgo modules. labels Nov 5, 2023
@pratikvn pratikvn force-pushed the batch-bicgstab-device branch 2 times, most recently from f600023 to 79e68b3 Compare November 5, 2023 16:17
upsj
upsj previously requested changes Nov 5, 2023
Copy link
Member
@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is pretty late, so I only went over the implementation, not the tests. Most of it LGTM, there is just a behavioral change in here that I believe overshoots in fixing a specific edge case.

@@ -37,7 +37,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


#include <ginkgo/core/base/batch_multi_vector.hpp>
#include <ginkgo/core/matrix/batch_dense.hpp>
#include <ginkgo/core/matrix/batch_ell.hpp>
8000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

@pratikvn
Copy link
Member Author
pratikvn commented Nov 5, 2023

@upsj , this was not meant to be in the release. I had a bit of time, so decided to work on it. Thanks for reviewing, but this is still a WIP, that is why I havent requested any reviews or marked it ready-for-review yet.

@upsj
Copy link
Member
upsj commented Nov 5, 2023

@pratikvn phew, thank you, that would have been a night shift otherwise 😄

@pratikvn
Copy link
Member Author
pratikvn commented Nov 5, 2023

format!

@pratikvn pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Nov 5, 2023
@pratikvn pratikvn requested a review from a team November 5, 2023 22:04
Base automatically changed from batch-bicgstab-device to develop November 5, 2023 23:44
@pratikvn
Copy link
Member Author
pratikvn commented Nov 6, 2023

format!

@pratikvn pratikvn force-pushed the batch-csr-upd branch 2 times, most recently from 4438972 to 87dbd50 Compare November 24, 2023 09:27
@MarcelKoch MarcelKoch self-requested a review November 24, 2023 10:11
Copy link
Member
@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, only some smaller comments

@@ -443,8 +443,26 @@ void Csr<ValueType, IndexType>::read(device_mat_data&& data)
auto arrays = data.empty_out();
this->row_ptrs_.resize_and_reset(size[0] + 1);
this->set_size(size);
this->values_ = std::move(arrays.values);
this->col_idxs_ = std::move(arrays.col_idxs);
// Need to copy instead of move, as move into view data is not supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand why the copy is necessary. Nothing in the array move-assignment depends on if the moved to array is a view or not.
But if the move still doesn't work, then the array should be fixed, instead of the code here.

Copy link
Member Author
@pratikvn pratikvn Nov 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, move into view is not supported. It does not make sense to move data from an owning array to a non-owning one. I think the behaviour in array is correct.

To have correct behaviour here, we can either have all cases just copy, or copy only when the arrays of this are non-owning, and I think it is better to avoid deep copies wherever possible, so I think the implementation here is a good solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this boils down to the question, what is the correct behavior for read, when the matrix was created from views. First, that is an edge case to me. Users, who create a matrix from views, usually already have data in those views. Otherwise, what is the point?
Second, I think it would be better to fix the const device_mat_data& overload, so that it actually copies. Also, this is an issue in all of our matrix classes, and should probably be adjusted in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create that PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behaviour of const device_mat_data& overload does copy. This function is just a helper that reduces code duplication by only taking in a temporary object.

I agree that it is an edge-case, but it is very useful in implementing read for batched objects. If and when we implement non-uniform batch objects, I think we should follow a similar approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, const device_mat_data& copies the input, and that is then moved into the matrix's array, which I think is reasonable. I guess copying the arrays directly instead would also work, but adds a bit more duplication. But I think we cannot preserve the same move semantics, because moving data into views will always fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that the input itself is not copied, but instead only the arrays are copied. This should have no overhead.

Moving data into views doesn't fail, it does exactly what is should do. The move overload takes ownership of the input data, and thus the matrix takes ownership of the stored arrays. So afterwards the matrix can't have array views anymore. I agree that we should clarify this, but I think this is the correct behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, reading into a view was never something we really considered before. This only matters for matrix types that reuse the arrays anyways (Csr and Coo), so it's not a notable maintenance overhead to duplicate a tiny bit of the implementation. We should have tests for both cases in place though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with copying the arrays instead of copying the input. As we probably do that in another PR, maybe we push this PR forward with these changes ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now #1476 which fixes this, so I would suggest merging #1476 first.


this->mtx_00->apply(this->b_00.get(), this->x_00.get());
this->mtx_01->apply(this->b_01.get(), this->x_01.get());
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());

for AAA

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, AAA seems to not be followed correctly in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be. Act is only the batch matrix apply. The normal csr matrix apply comes in the assert block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got a bit confused and thought the mtx_0->apply was the arrange part. But still I think we usually group applies of the to-test object together with the reference object. We do this in the common tests (reference and device apply) and in the MPI tests. So I would still suggest grouping all applies into the act category and then the unbatch into the assert caategory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that defeats the purpose of the AAA structure. IMO, the Act block should only contain function calls/declarations that are being tested, otherwise it is unclear what is being tested.

Copy link
Member
@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for the core/matrix/csr, other LGTM

*
* @return the number of stored elements per batch item.
*/
size_type get_num_elements_per_item() const noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use get_num_stored_elements() in other place.
should it be get_num_stored_elements_per_item() for consistence? it's quite long though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think get_num_elements_per_item is clear here. With stored it gets too long. Maybe it can also be renamed to get_nnz_per_item to make it shorter ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we try to avoid nnz because the stored element can be zeros

Comment on lines +30 to +33
* Csr is a general sparse matrix format that stores the column indices for each
* nonzero entry and a cumulative sum of the number of nonzeros in each row. It
* is one of the most popular sparse matrix formats due to its versatility and
* ability to store a wide range of sparsity patterns in an efficient fashion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not update documentation with batch like batchEll?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be updated. Or are you seeing something here that is incorrect ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like CSR not batchCSR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note below clarifies that this is a batched matrix fomat, and how the batches are stored. I think being within the namespace also makes that clear. It is similar to batch::Ell. But I think can add more clarification to make it clear.

auto max_num_elems = this->get_common_size()[0] *
this->get_common_size()[1] *
this->get_num_batch_items();
GKO_ASSERT(values_.get_size() <= max_num_elems);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to check this? it is just unused from the batch csr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, the get_num_stored_elements is based on values.size(), so it is required to only store the batch value

Copy link
codecov bot commented Dec 9, 2023

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (310bb4c) 89.33% compared to head (6adf1a6) 89.37%.

❗ Current head 6adf1a6 differs from pull request most recent head c30aae8. Consider uploading reports for the commit c30aae8 to get more accurate results

Files Patch % Lines
core/matrix/batch_csr.cpp 86.27% 7 Missing ⚠️
core/test/matrix/batch_csr.cpp 96.80% 4 Missing ⚠️
include/ginkgo/core/matrix/batch_csr.hpp 88.46% 3 Missing ⚠️
core/device_hooks/common_kernels.inc.cpp 0.00% 2 Missing ⚠️
core/solver/batch_dispatch.hpp 75.00% 1 Missing ⚠️
reference/test/matrix/batch_csr_kernels.cpp 98.43% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1450      +/-   ##
===========================================
+ Coverage    89.33%   89.37%   +0.04%     
===========================================
  Files          688      696       +8     
  Lines        56555    56944     +389     
===========================================
+ Hits         50524    50895     +371     
- Misses        6031     6049      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pratikvn and others added 17 commits December 11, 2023 21:32
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 14 Code Smells

68.6% 68.6% Coverage
16.2% 16.2% Duplication

warning The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link
Member
@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the testing strategy should be revisited at some point. A lot of tests in core test functionality that is composed of smaller parts, but the implementation is the same for all different matrix types. For example, batch::write only requires that the type has create_const_view_for_item. Because of that many tests are repetetive and could probably be unified.

@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Dec 12, 2023
@pratikvn
Copy link
Member Author

As the previous pipeline had fully passed, and I only removed a couple of tests, I will just go ahead and merge this PR, not waiting for the whole pipeline to complete.

@pratikvn pratikvn merged commit 830c289 into develop Dec 12, 2023
@pratikvn pratikvn deleted the batch-csr-upd branch December 12, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. 1:ST:run-full-test mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:batched-functionality This is related to the batched functionality in Ginkgo type:matrix-format This is related to the Matrix formats type:preconditioner This is related to the preconditioners type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria
Projects
Status: Completed
Development

Successfully merging this pull request may close these issues.

5 participants
0