-
Notifications
You must be signed in to change notification settings - Fork 451
Implement sort_by_key #6801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sort_by_key #6801
Conversation
if (keys.stride(0) != 1 && values.stride(0) != 1) { | ||
Kokkos::abort( | ||
"SYCL sort_by_key only supports rank-1 Views with stride(0) = 1."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to double check that.
c3330bd
to
241e60b
Compare
eaff439
to
8759de7
Compare
#if defined(KOKKOS_ENABLE_CUDA) | ||
template <class ComparatorType, class KeysDataType, class... KeysProperties, | ||
class ValuesDataType, class... ValuesProperties> | ||
void sort_by_key_device_view_with_comparator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I should duplicate these or not. I followed the way it's written in Kokkos_SortImpl.hpp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure we have a test with strides layout
#if defined(KOKKOS_ENABLE_CUDA) | ||
|
||
// Workaround for `Instruction 'shfl' without '.sync' is not supported on | ||
// .target sm_70 and higher from PTX ISA version 6.4`. | ||
// Also see https://github.com/NVIDIA/cub/pull/170. | ||
#if !defined(CUB_USE_COOPERATIVE_GROUPS) | ||
#define CUB_USE_COOPERATIVE_GROUPS | ||
#endif | ||
|
||
#pragma GCC diagnostic push | ||
#pragma GCC diagnostic ignored "-Wshadow" | ||
|
||
#if defined(KOKKOS_COMPILER_CLANG) | ||
// Some versions of Clang fail to compile Thrust, failing with errors like | ||
// this: | ||
// <snip>/thrust/system/cuda/detail/core/agent_launcher.h:557:11: | ||
// error: use of undeclared identifier 'va_printf' | ||
// The exact combination of versions for Clang and Thrust (or CUDA) for this | ||
// failure was not investigated, however even very recent version combination | ||
// (Clang 10.0.0 and Cuda 10.0) demonstrated failure. | ||
// | ||
// Defining _CubLog here locally allows us to avoid that code path, however | ||
// disabling some debugging diagnostics | ||
#pragma push_macro("_CubLog") | ||
#ifdef _CubLog | ||
#undef _CubLog | ||
#endif | ||
#define _CubLog | ||
#include <thrust/device_ptr.h> | ||
#include <thrust/sort.h> | ||
#pragma pop_macro("_CubLog") | ||
#else | ||
#include <thrust/device_ptr.h> | ||
#include <thrust/sort.h> | ||
#endif | ||
|
||
#pragma GCC diagnostic pop | ||
|
||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think we might want to collocate the CUDA sort and sort_by_key implementation so we don't have to duplicate that beautiful thing.
(no action required on your part at this time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this badly needed feature. I think the PR was updated while I was making my review so if some comments are outdated please ignore
MaybeComparator&&... maybeComparator) { | ||
auto const n = keys.size(); | ||
|
||
Kokkos::View<unsigned int*, ExecutionSpace> permute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating memory here seems like it will add a significant performance penalty, depending on how frequently sort is called. Is there a way we could try and cache the allocation or let the user pass in a view of the same size as the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with the current interface imho. There needs to be a broader discussion of what helpful features of Kokkos' sort_by_key could be implemented (like, sorting multiple values by the same key, computing permutation, or caching this). The scope of the current PR is restricted to simply allow dispatching to available routines in Thrust/oneDPL and follows the same interface.
47fa4af
to
665e3a1
Compare
We need diff --git a/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp b/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
index 13d8dd9b9..a8b2f2455 100644
--- a/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
+++ b/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
@@ -187,16 +187,24 @@ void sort_by_key_via_sort(
KOKKOS_LAMBDA(int i) { permute(i) = i; });
static_assert(sizeof...(MaybeComparator) <= 1);
+
+ // FIXME OPENMPTARGET The sort happens on the host so we have to copy keys there
+#ifdef KOKKOS_ENABLE_OPENMPTARGET
+ auto keys_ = Kokkos::create_mirror_view(Kokkos::view_alloc(Kokkos::HostSpace{}, Kokkos::WithoutInitializing), keys);
+ Kokkos::deep_copy(exec, keys_, keys);
+#else
+ auto keys_ = keys
+#endif
if constexpr (sizeof...(MaybeComparator) == 0) {
Kokkos::sort(
exec, permute,
- KOKKOS_LAMBDA(int i, int j) { return keys(i) < keys(j); });
+ KOKKOS_LAMBDA(int i, int j) { return keys_(i) < keys_(j); });
} else {
auto keys_comparator =
std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...));
Kokkos::sort(
exec, permute, KOKKOS_LAMBDA(int i, int j) {
- return keys_comparator(keys(i), keys(j));
+ return keys_comparator(keys_(i), keys_(j));
});
} to fix |
For |
#6815 fixes the HPX failure. |
|
Can you rebase now that #6815 has been merged? |
That the HIP part is not tested. |
Ripped out ROCThrust version per @masterleinad's suggestion, will add it to #6793 once this PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. We can combine duplications in the implementation details later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by review. Letting Christian referee.
@aprokop thanks for doing all this btw! |
@masterleinad Could you please help me figure out the SYCL build failures? There is some issue with comparator being device copyable. Perhaps, it would need the same treatment as OpenMPTarget to copy the keys to the host in certain situation. Edit: it seem that it would be pretty annoying to figure out when to copy comparator to the host or when not to. We would need to copy if sort copies, which means for any backend that goes through Which also makes me think that it is likely that the current sort implementation is broken for comparators if a user does not provide a host compatible comparator for some spaces. |
@@ -177,17 +177,37 @@ void sort_by_key_via_sort( | |||
|
|||
static_assert(sizeof...(MaybeComparator) <= 1); | |||
if constexpr (sizeof...(MaybeComparator) == 0) { | |||
#ifdef KOKKOS_ENABLE_SYCL | |||
auto* raw_keys_in_comparator = keys_in_comparator.data(); | |||
auto stride = keys_in_comparator.stride(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masterleinad I'm curious, why is it 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extent(0)
described the distance between elements that have the same indices except for the first dimension and consecutive indices in the first dimension. Since we are only allowing one-dimensional Views here, that is the distance between consecutive elements in memory.
Yes, the implicit assumption is that the data used in the comparator is both host and device-accessible. Furthermore, for |
Is it because |
Using raw pointers here is a workaround for the explicit use of |
But does writing a specialization for Kokkos views resolve the issue here? |
diff --git a/core/src/Kokkos_View.hpp b/core/src/Kokkos_View.hpp
index 484a0e6f6..74f734728 100644
--- a/core/src/Kokkos_View.hpp
+++ b/core/src/Kokkos_View.hpp
@@ -1996,6 +1996,11 @@ KOKKOS_INLINE_FUNCTION DeducedCommonPropsType<Views...> common_view_alloc_prop(
} // namespace Kokkos
+#ifdef KOKKOS_ENABLE_SYCL
+template <class DataType, class... Properties>
+struct sycl::is_device_copyable<Kokkos::View<DataType, Properties...>> : std::true_type{};
+#endif
+
#include <impl/Kokkos_ViewUniformType.hpp>
#include <impl/Kokkos_Atomic_View.hpp> doesn't seem to be enough for the compiler to stop complaining. |
I'm happy to write a follow-up pull request that does the approach outlined in #6801 (comment) instead. |
Ignoring HIP failures. @aprokop thank you for contributing this! |
// FIXME OPENMPTARGET The sort happens on the host so we have to copy keys there | ||
#ifdef KOKKOS_ENABLE_OPENMPTARGET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be decided based on accessibility form the host
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure. Take HIP sort. When #6793 is merged, with the comparator sort we would need to check rocThrust presence, and copy to the host when it is absent. Essentially, we would need to copy to the host when sort does, and that is a more complex criteria than the accessibility.
For HIP, would need #6793.
Fixes #6668.