-
Notifications
You must be signed in to change notification settings - Fork 96
Add a CSR batched matrix format, CUDA, HIP and DPCPP kernels #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5ef3670
to
cf08426
Compare
f600023
to
79e68b3
Compare
cf08426
to
e694784
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is pretty late, so I only went over the implementation, not the tests. Most of it LGTM, there is just a behavioral change in here that I believe overshoots in fixing a specific edge case.
@@ -37,7 +37,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |||
|
|||
|
|||
#include <ginkgo/core/base/batch_multi_vector.hpp> | |||
#include <ginkgo/core/matrix/batch_dense.hpp> | |||
#include <ginkgo/core/matrix/batch_ell.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
@upsj , this was not meant to be in the release. I had a bit of time, so decided to work on it. Thanks for reviewing, but this is still a WIP, that is why I havent requested any reviews or marked it ready-for-review yet. |
@pratikvn phew, thank you, that would have been a night shift otherwise 😄 |
format! |
f8145d0
to
65d4ac9
Compare
format! |
9f9cab4
to
1f8e583
Compare
4438972
to
87dbd50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, only some smaller comments
core/matrix/csr.cpp
Outdated
@@ -443,8 +443,26 @@ void Csr<ValueType, IndexType>::read(device_mat_data&& data) | |||
auto arrays = data.empty_out(); | |||
this->row_ptrs_.resize_and_reset(size[0] + 1); | |||
this->set_size(size); | |||
this->values_ = std::move(arrays.values); | |||
this->col_idxs_ = std::move(arrays.col_idxs); | |||
// Need to copy instead of move, as move into view data is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand why the copy is necessary. Nothing in the array move-assignment depends on if the moved to array is a view or not.
But if the move still doesn't work, then the array should be fixed, instead of the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, move into view is not supported. It does not make sense to move data from an owning array to a non-owning one. I think the behaviour in array is correct.
To have correct behaviour here, we can either have all cases just copy, or copy only when the arrays of this
are non-owning, and I think it is better to avoid deep copies wherever possible, so I think the implementation here is a good solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think this boils down to the question, what is the correct behavior for read
, when the matrix was created from views. First, that is an edge case to me. Users, who create a matrix from views, usually already have data in those views. Otherwise, what is the point?
Second, I think it would be better to fix the const device_mat_data&
overload, so that it actually copies. Also, this is an issue in all of our matrix classes, and should probably be adjusted in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the behaviour of const device_mat_data&
overload does copy. This function is just a helper that reduces code duplication by only taking in a temporary object.
I agree that it is an edge-case, but it is very useful in implementing read for batched objects. If and when we implement non-uniform batch objects, I think we should follow a similar approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, const device_mat_data&
copies the input, and that is then moved into the matrix's array, which I think is reasonable. I guess copying the arrays directly instead would also work, but adds a bit more duplication. But I think we cannot preserve the same move semantics, because moving data into views will always fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that the input itself is not copied, but instead only the arrays are copied. This should have no overhead.
Moving data into views doesn't fail, it does exactly what is should do. The move overload takes ownership of the input data, and thus the matrix takes ownership of the stored arrays. So afterwards the matrix can't have array views anymore. I agree that we should clarify this, but I think this is the correct behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree, reading into a view was never something we really considered before. This only matters for matrix types that reuse the arrays anyways (Csr and Coo), so it's not a notable maintenance overhead to duplicate a tiny bit of the implementation. We should have tests for both cases in place though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with copying the arrays instead of copying the input. As we probably do that in another PR, maybe we push this PR forward with these changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
this->mtx_00->apply(this->b_00.get(), this->x_00.get()); | ||
this->mtx_01->apply(this->b_01.get(), this->x_01.get()); | ||
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get()); | |
auto res = gko::batch::unbatch<gko::batch::MultiVector<T>>(this->x_0.get()); |
for AAA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, AAA seems to not be followed correctly in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it should be. Act is only the batch matrix apply. The normal csr matrix apply comes in the assert block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I got a bit confused and thought the mtx_0->apply
was the arrange part. But still I think we usually group applies of the to-test object together with the reference object. We do this in the common tests (reference and device apply) and in the MPI tests. So I would still suggest grouping all applies into the act category and then the unbatch into the assert caategory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that defeats the purpose of the AAA structure. IMO, the Act block should only contain function calls/declarations that are being tested, otherwise it is unclear what is being tested.
11285e6
to
9f00319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for the core/matrix/csr, other LGTM
* | ||
* @return the number of stored elements per batch item. | ||
*/ | ||
size_type get_num_elements_per_item() const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use get_num_stored_elements()
in other place.
should it be get_num_stored_elements_per_item()
for consistence? it's quite long though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think get_num_elements_per_item
is clear here. With stored
it gets too long. Maybe it can also be renamed to get_nnz_per_item
to make it shorter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we try to avoid nnz because the stored element can be zeros
* Csr is a general sparse matrix format that stores the column indices for each | ||
* nonzero entry and a cumulative sum of the number of nonzeros in each row. It | ||
* is one of the most popular sparse matrix formats due to its versatility and | ||
* ability to store a wide range of sparsity patterns in an efficient fashion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not update documentation with batch like batchEll?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be updated. Or are you seeing something here that is incorrect ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's like CSR not batchCSR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note below clarifies that this is a batched matrix fomat, and how the batches are stored. I think being within the namespace also makes that clear. It is similar to batch::Ell
. But I think can add more clarification to make it clear.
auto max_num_elems = this->get_common_size()[0] * | ||
this->get_common_size()[1] * | ||
this->get_num_batch_items(); | ||
GKO_ASSERT(values_.get_size() <= max_num_elems); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to check this? it is just unused from the batch csr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the get_num_stored_elements is based on values.size(), so it is required to only store the batch value
99d94ec
to
5c1bc41
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #1450 +/- ##
===========================================
+ Coverage 89.33% 89.37% +0.04%
===========================================
Files 688 696 +8
Lines 56555 56944 +389
===========================================
+ Hits 50524 50895 +371
- Misses 6031 6049 +18 ☔ View full report in Codecov by Sentry. |
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Aditya Kashi <kashia@ornl.gov>
Co-authored-by: Phuong Nguyen <phuong.nguyen@icl.utk.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
5c1bc41
to
6adf1a6
Compare
Kudos, SonarCloud Quality Gate passed!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think the testing strategy should be revisited at some point. A lot of tests in core test functionality that is composed of smaller parts, but the implementation is the same for all different matrix types. For example, batch::write
only requires that the type has create_const_view_for_item
. Because of that many tests are repetetive and could probably be unified.
As the previous pipeline had fully passed, and I only removed a couple of tests, I will just go ahead and merge this PR, not waiting for the whole pipeline to complete. |
This PR adds a batched Csr matrix format, that stores the same sparsity pattern for all entries, but different values for each batch. Only simple and advanced apply to
batch::MultiVector
are supported for now.TODO