Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels #1450

pratikvn · 2023-11-05T14:40:33Z

This PR adds a batched Csr matrix format, that stores the same sparsity pattern for all entries, but different values for each batch. Only simple and advanced apply to batch::MultiVector are supported for now.

TODO

Add CUDA/HIP kernels
Add DPCPP kernels
Add to batch::Bicgstab dispatch
Update docs

upsj

This PR is pretty late, so I only went over the implementation, not the tests. Most of it LGTM, there is just a behavioral change in here that I believe overshoots in fixing a specific edge case.

common/cuda_hip/matrix/batch_csr_kernels.hpp.inc

core/matrix/csr.cpp

core/test/matrix/batch_csr.cpp

dpcpp/matrix/batch_csr_kernels.hpp.inc

upsj · 2023-11-05T19:12:28Z

omp/matrix/batch_ell_kernels.cpp

@@ -37,7 +37,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


 #include <ginkgo/core/base/batch_multi_vector.hpp>
-#include <ginkgo/core/matrix/batch_dense.hpp>
+#include <ginkgo/core/matrix/batch_ell.hpp>


pratikvn · 2023-11-05T19:21:07Z

@upsj , this was not meant to be in the release. I had a bit of time, so decided to work on it. Thanks for reviewing, but this is still a WIP, that is why I havent requested any reviews or marked it ready-for-review yet.

upsj · 2023-11-05T19:24:02Z

@pratikvn phew, thank you, that would have been a night shift otherwise 😄

pratikvn · 2023-11-05T21:58:12Z

format!

stale

pratikvn · 2023-11-06T08:36:14Z

format!

MarcelKoch

looks good, only some smaller comments

MarcelKoch · 2023-11-24T10:35:06Z

core/matrix/csr.cpp

@@ -443,8 +443,26 @@ void Csr<ValueType, IndexType>::read(device_mat_data&& data)
    auto arrays = data.empty_out();
    this->row_ptrs_.resize_and_reset(size[0] + 1);
    this->set_size(size);
-    this->values_ = std::move(arrays.values);
-    this->col_idxs_ = std::move(arrays.col_idxs);
+    // Need to copy instead of move, as move into view data is not supported.


I'm not sure if I understand why the copy is necessary. Nothing in the array move-assignment depends on if the moved to array is a view or not.
But if the move still doesn't work, then the array should be fixed, instead of the code here.

No, move into view is not supported. It does not make sense to move data from an owning array to a non-owning one. I think the behaviour in array is correct.

To have correct behaviour here, we can either have all cases just copy, or copy only when the arrays of this are non-owning, and I think it is better to avoid deep copies wherever possible, so I think the implementation here is a good solution.

So, I think this boils down to the question, what is the correct behavior for read, when the matrix was created from views. First, that is an edge case to me. Users, who create a matrix from views, usually already have data in those views. Otherwise, what is the point?
Second, I think it would be better to fix the const device_mat_data& overload, so that it actually copies. Also, this is an issue in all of our matrix classes, and should probably be adjusted in a separate PR.

I will create that PR.

I think the behaviour of const device_mat_data& overload does copy. This function is just a helper that reduces code duplication by only taking in a temporary object.

I agree that it is an edge-case, but it is very useful in implementing read for batched objects. If and when we implement non-uniform batch objects, I think we should follow a similar approach.

Yes, const device_mat_data& copies the input, and that is then moved into the matrix's array, which I think is reasonable. I guess copying the arrays directly instead would also work, but adds a bit more duplication. But I think we cannot preserve the same move semantics, because moving data into views will always fail.

I would suggest that the input itself is not copied, but instead only the arrays are copied. This should have no overhead.

Moving data into views doesn't fail, it does exactly what is should do. The move overload takes ownership of the input data, and thus the matrix takes ownership of the stored arrays. So afterwards the matrix can't have array views anymore. I agree that we should clarify this, but I think this is the correct behavior.

I tend to agree, reading into a view was never something we really considered before. This only matters for matrix types that reuse the arrays anyways (Csr and Coo), so it's not a notable maintenance overhead to duplicate a tiny bit of the implementation. We should have tests for both cases in place though

I am fine with copying the arrays instead of copying the input. As we probably do that in another PR, maybe we push this PR forward with these changes ?

There is now #1476 which fixes this, so I would suggest merging #1476 first.

core/matrix/batch_struct.hpp

cuda/solver/batch_bicgstab_kernels.cu

test/matrix/batch_csr_kernels.cpp

core/test/matrix/batch_csr.cpp

MarcelKoch · 2023-11-24T11:16:19Z

reference/test/matrix/batch_csr_kernels.cpp

+
+    this->mtx_00->apply(this->b_00.get(), this->x_00.get());
+    this->mtx_01->apply(this->b_01.get(), this->x_01.get());
+    auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());


Suggested change

auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());

auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());

for AAA

More generally, AAA seems to not be followed correctly in this file.

I believe it should be. Act is only the batch matrix apply. The normal csr matrix apply comes in the assert block.

Ok, I got a bit confused and thought the mtx_0->apply was the arrange part. But still I think we usually group applies of the to-test object together with the reference object. We do this in the common tests (reference and device apply) and in the MPI tests. So I would still suggest grouping all applies into the act category and then the unbatch into the assert caategory.

I think that defeats the purpose of the AAA structure. IMO, the Act block should only contain function calls/declarations that are being tested, otherwise it is unclear what is being tested.

yhmtsai

except for the core/matrix/csr, other LGTM

dpcpp/matrix/batch_csr_kernels.hpp.inc

reference/test/solver/batch_bicgstab_kernels.cpp

yhmtsai · 2023-12-05T15:34:20Z

include/ginkgo/core/matrix/batch_csr.hpp

+     *
+     * @return the number of stored elements per batch item.
+     */
+    size_type get_num_elements_per_item() const noexcept


we use get_num_stored_elements() in other place.
should it be get_num_stored_elements_per_item() for consistence? it's quite long though

Yes, I think get_num_elements_per_item is clear here. With stored it gets too long. Maybe it can also be renamed to get_nnz_per_item to make it shorter ?

I think we try to avoid nnz because the stored element can be zeros

yhmtsai · 2023-12-05T15:35:25Z

include/ginkgo/core/matrix/batch_csr.hpp

+ * Csr is a general sparse matrix format that stores the column indices for each
+ * nonzero entry and a cumulative sum of the number of nonzeros in each row. It
+ * is one of the most popular sparse matrix formats due to its versatility and
+ * ability to store a wide range of sparsity patterns in an efficient fashion.


not update documentation with batch like batchEll?

I think it should be updated. Or are you seeing something here that is incorrect ?

It's like CSR not batchCSR

The note below clarifies that this is a batched matrix fomat, and how the batches are stored. I think being within the namespace also makes that clear. It is similar to batch::Ell. But I think can add more clarification to make it clear.

yhmtsai · 2023-12-05T15:38:59Z

include/ginkgo/core/matrix/batch_csr.hpp

+        auto max_num_elems = this->get_common_size()[0] *
+                             this->get_common_size()[1] *
+                             this->get_num_batch_items();
+        GKO_ASSERT(values_.get_size() <= max_num_elems);


do you need to check this? it is just unused from the batch csr

ah, the get_num_stored_elements is based on values.size(), so it is required to only store the batch value

reference/test/matrix/batch_csr_kernels.cpp

codecov · 2023-12-09T03:56:41Z

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (310bb4c) 89.33% compared to head (6adf1a6) 89.37%.

❗ Current head 6adf1a6 differs from pull request most recent head c30aae8. Consider uploading reports for the commit c30aae8 to get more accurate results

Files	Patch %	Lines
core/matrix/batch_csr.cpp	86.27%	7 Missing ⚠️
core/test/matrix/batch_csr.cpp	96.80%	4 Missing ⚠️
include/ginkgo/core/matrix/batch_csr.hpp	88.46%	3 Missing ⚠️
core/device_hooks/common_kernels.inc.cpp	0.00%	2 Missing ⚠️
core/solver/batch_dispatch.hpp	75.00%	1 Missing ⚠️
reference/test/matrix/batch_csr_kernels.cpp	98.43%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1450      +/-   ##
===========================================
+ Coverage    89.33%   89.37%   +0.04%     
===========================================
  Files          688      696       +8     
  Lines        56555    56944     +389     
===========================================
+ Hits         50524    50895     +371     
- Misses        6031     6049      +18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Aditya Kashi <kashia@ornl.gov>

Co-authored-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

sonarqubecloud · 2023-12-12T09:37:06Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
14 Code Smells

68.6% Coverage
16.2% Duplication

The version of Java (11.0.3) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

MarcelKoch

LGTM. I think the testing strategy should be revisited at some point. A lot of tests in core test functionality that is composed of smaller parts, but the implementation is the same for all different matrix types. For example, batch::write only requires that the type has create_const_view_for_item. Because of that many tests are repetetive and could probably be unified.

core/test/matrix/batch_csr.cpp

reference/test/matrix/batch_csr_kernels.cpp

pratikvn · 2023-12-12T14:08:18Z

As the previous pipeline had fully passed, and I only removed a couple of tests, I will just go ahead and merge this PR, not waiting for the whole pipeline to complete.

pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. type:batched-functionality This is related to the batched functionality in Ginkgo labels Nov 5, 2023

pratikvn self-assigned this Nov 5, 2023

pratikvn force-pushed the batch-csr-upd branch from 5ef3670 to cf08426 Compare November 5, 2023 15:07

pratikvn force-pushed the batch-bicgstab-device branch 2 times, most recently from f600023 to 79e68b3 Compare November 5, 2023 16:17

pratikvn force-pushed the batch-csr-upd branch from cf08426 to e694784 Compare November 5, 2023 17:57

upsj previously requested changes Nov 5, 2023

View reviewed changes

pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Nov 5, 2023

pratikvn requested a review from a team November 5, 2023 22:04

Base automatically changed from batch-bicgstab-device to develop November 5, 2023 23:44

pratikvn force-pushed the batch-csr-upd branch from f8145d0 to 65d4ac9 Compare November 6, 2023 08:30

pratikvn force-pushed the batch-csr-upd branch from 9f9cab4 to 1f8e583 Compare November 12, 2023 18:53

pratikvn force-pushed the batch-csr-upd branch 2 times, most recently from 4438972 to 87dbd50 Compare November 24, 2023 09:27

MarcelKoch self-requested a review November 24, 2023 10:11

MarcelKoch previously requested changes Nov 24, 2023

View reviewed changes

pratikvn force-pushed the batch-csr-upd branch from 11285e6 to 9f00319 Compare December 4, 2023 14:16

yhmtsai approved these changes Dec E544 5, 2023

View reviewed changes

pratikvn force-pushed the batch-csr-upd branch from 99d94ec to 5c1bc41 Compare December 8, 2023 15:32

pratikvn and others added 17 commits December 11, 2023 21:32

Add batch::matrix::Csr class and core

e877090

Co-authored-by: Aditya Kashi <kashia@ornl.gov>

Copy arrays instead of move in mono Csr

646a294

Add ref kernels, core and ref tests

e12e1a7

Co-authored-by: Aditya Kashi <kashia@ornl.gov>

Add omp kernels, dev kernel framework and tests

f4565b5

Co-authored-by: Aditya Kashi <kashia@ornl.gov>

Add CUDA/HIP kernels and tests

373898a

Co-authored-by: Aditya Kashi <kashia@ornl.gov>

Add dpcpp kernels and tests

a9d89d9

Co-authored-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>

dpcpp fixes, core tests, read arr copy fix

a86d05b

udpate docs

218e39d

Add batch::Csr to batch::Bicgstab dispatch

883ed9e

Format files

24a6e91

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

Format files

7157735

Review updates

d5cbf7b

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Remove leftover copyright

12e2987

Format files

bffada4

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

update to use get_size()

611fa27

Review updates

1b17d5d

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

revert csr read move overload

6adf1a6

pratikvn force-pushed the batch-csr-upd branch from 5c1bc41 to 6adf1a6 Compare December 11, 2023 20:32

MarcelKoch approved these changes Dec 12, 2023

View reviewed changes

core/test/matrix/batch_csr.cpp Outdated Show resolved Hide resolved

reference/test/matrix/batch_csr_kernels.cpp Outdated Show resolved Hide resolved

review updates

c30aae8

pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Dec 12, 2023

pratikvn merged commit 830c289 into develop Dec 12, 2023

pratikvn deleted the batch-csr-upd branch December 12, 2023 14:09

	auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());

	auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get());

Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels #1450

Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels #1450

Uh oh!

Conversation

Uh oh!

TODO

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment