8000 Implement sort_by_key by aprokop · Pull Request #6801 · kokkos/kokkos · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Implement sort_by_key #6801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 28, 2024
Merged

Implement sort_by_key #6801

merged 8 commits into from
Feb 28, 2024

Conversation

aprokop
Copy link
Collaborator
@aprokop aprokop commented Feb 9, 2024

For HIP, would need #6793.

Fixes #6668.

@aprokop aprokop requested a review from dalg24 February 9, 2024 21:08
Comment on lines +120 to +139
if (keys.stride(0) != 1 && values.stride(0) != 1) {
Kokkos::abort(
"SYCL sort_by_key only supports rank-1 Views with stride(0) = 1.");
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to double check that.

@aprokop aprokop marked this pull request as draft February 9, 2024 21:11
@aprokop aprokop added the Enhancement Improve existing capability; will potentially require voting label Feb 9, 2024
@aprokop aprokop force-pushed the sort_by_key branch 2 times, most recently from c3330bd to 241e60b Compare February 11, 2024 18:25
@aprokop aprokop changed the title [wip] SortByKey Implement sort_by_key Feb 11, 2024
@aprokop aprokop force-pushed the sort_by_key branch 4 times, most recently from eaff439 to 8759de7 Compare February 12, 2024 01:49
@aprokop aprokop marked this pull request as ready for review February 12, 2024 02:30
#if defined(KOKKOS_ENABLE_CUDA)
template <class ComparatorType, class KeysDataType, class... KeysProperties,
class ValuesDataType, class... ValuesProperties>
void sort_by_key_device_view_with_comparator(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I should duplicate these or not. I followed the way it's written in Kokkos_SortImpl.hpp.

Copy link
Member
@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure we have a test with strides layout

Comment on lines +24 to +60
#if defined(KOKKOS_ENABLE_CUDA)

// Workaround for `Instruction 'shfl' without '.sync' is not supported on
// .target sm_70 and higher from PTX ISA version 6.4`.
// Also see https://github.com/NVIDIA/cub/pull/170.
#if !defined(CUB_USE_COOPERATIVE_GROUPS)
#define CUB_USE_COOPERATIVE_GROUPS
#endif

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wshadow"

#if defined(KOKKOS_COMPILER_CLANG)
// Some versions of Clang fail to compile Thrust, failing with errors like
// this:
// <snip>/thrust/system/cuda/detail/core/agent_launcher.h:557:11:
// error: use of undeclared identifier 'va_printf'
// The exact combination of versions for Clang and Thrust (or CUDA) for this
// failure was not investigated, however even very recent version combination
// (Clang 10.0.0 and Cuda 10.0) demonstrated failure.
//
// Defining _CubLog here locally allows us to avoid that code path, however
// disabling some debugging diagnostics
#pragma push_macro("_CubLog")
#ifdef _CubLog
#undef _CubLog
#endif
#define _CubLog
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
#pragma pop_macro("_CubLog")
#else
#include <thrust/device_ptr.h>
#include <thrust/sort.h>
#endif

#pragma GCC diagnostic pop

#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think we might want to collocate the CUDA sort and sort_by_key implementation so we don't have to duplicate that beautiful thing.
(no action required on your part at this time)

Copy link
Contributor
@nmm0 nmm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this badly needed feature. I think the PR was updated while I was making my review so if some comments are outdated please ignore

MaybeComparator&&... maybeComparator) {
auto const n = keys.size();

Kokkos::View<unsigned int*, ExecutionSpace> permute(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating memory here seems like it will add a significant performance penalty, depending on how frequently sort is called. Is there a way we could try and cache the allocation or let the user pass in a view of the same size as the keys?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with the current interface imho. There needs to be a broader discussion of what helpful features of Kokkos' sort_by_key could be implemented (like, sorting multiple values by the same key, computing permutation, or caching this). The scope of the current PR is restricted to simply allow dispatching to available routines in Thrust/oneDPL and follows the same interface.

@aprokop aprokop force-pushed the sort_by_key branch 2 times, most recently from 47fa4af to 665e3a1 Compare February 13, 2024 00:22
@masterleinad
Copy link
Contributor

We need

diff --git a/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp b/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
index 13d8dd9b9..a8b2f2455 100644
--- a/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
+++ b/algorithms/src/sorting/impl/Kokkos_SortByKeyImpl.hpp
@@ -187,16 +187,24 @@ void sort_by_key_via_sort(
       KOKKOS_LAMBDA(int i) { permute(i) = i; });
 
   static_assert(sizeof...(MaybeComparator) <= 1);
+
+            // FIXME OPENMPTARGET The sort happens on the host so we have to copy keys there
+#ifdef KOKKOS_ENABLE_OPENMPTARGET
+  auto keys_ = Kokkos::create_mirror_view(Kokkos::view_alloc(Kokkos::HostSpace{}, Kokkos::WithoutInitializing), keys);
+                  Kokkos::deep_copy(exec, keys_, keys);
+#else
+  auto keys_ = keys
+#endif   
   if constexpr (sizeof...(MaybeComparator) == 0) {
     Kokkos::sort(
         exec, permute,
-        KOKKOS_LAMBDA(int i, int j) { return keys(i) < keys(j); });
+        KOKKOS_LAMBDA(int i, int j) { return keys_(i) < keys_(j); });
   } else {
     auto keys_comparator =
         std::get<0>(std::tuple<MaybeComparator...>(maybeComparator...));
     Kokkos::sort(
         exec, permute, KOKKOS_LAMBDA(int i, int j) {
-          return keys_comparator(keys(i), keys(j));
+          return keys_comparator(keys_(i), keys_(j));
         });
   }

to fix OpenMPTarget.

@masterleinad
Copy link
Contributor

For HPX, it seems to be a fencing/enqueuing issue. If we can't figure it out, we might want to ask @msimberg to have a look.

@masterleinad
Copy link
Contributor

#6815 fixes the HPX failure.

@aprokop
Copy link
Collaborator Author
aprokop commented Feb 13, 2024

SYCL and HIP fail for seemingly unrelated reasons. HPX fails as @masterleinad described. Other than those, everything else passes.

@masterleinad
Copy link
Contributor

Can you rebase now that #6815 has been merged?

@aprokop
Copy link
Collaborator Author
aprokop commented Feb 14, 2024

Can you rebase now that #6815 has been merged?

Do we want to wait till #6793 and rebase on that?

@masterleinad
Copy link
Contributor

What is your concern here? That the name KOKKOS_ENABLE_ROCTHRUST may change, or that the HIP path is not tested in CI?

That the HIP part is not tested.

@aprokop
Copy link
Collaborator Author
aprokop commented Feb 19, 2024

Ripped out ROCThrust version per @masterleinad's suggestion, will add it to #6793 once this PR is merged.

Copy link
Contributor
@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. We can combine duplications in the implementation details later.

Copy link
Member
@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by review. Letting Christian referee.

@fnrizzi
Copy link
Contributor
fnrizzi commented Feb 20, 2024

@aprokop thanks for doing all this btw!

@aprokop
Copy link
Collaborator Author
aprokop commented Feb 20, 2024

@masterleinad Could you please help me figure out the SYCL build failures? There is some issue with comparator being device copyable. Perhaps, it would need the same treatment as OpenMPTarget to copy the keys to the host in certain situation.

Edit: it seem that it would be pretty annoying to figure out when to copy comparator to the host or when not to. We would need to copy if sort copies, which means for any backend that goes through copy_to_host_run_stdsort_copy_back and for SYCL when strides is not 1.

Which also makes me think that it is likely that the current sort implementation is broken for comparators if a user does not provide a host compatible comparator for some spaces.

@@ -177,17 +177,37 @@ void sort_by_key_via_sort(

static_assert(sizeof...(MaybeComparator) <= 1);
if constexpr (sizeof...(MaybeComparator) == 0) {
#ifdef KOKKOS_ENABLE_SYCL
auto* raw_keys_in_comparator = keys_in_comparator.data();
auto stride = keys_in_comparator.stride(0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masterleinad I'm curious, why is it 0 here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extent(0) described the distance between elements that have the same indices except for the first dimension and consecutive indices in the first dimension. Since we are only allowing one-dimensional Views here, that is the distance between consecutive elements in memory.

@masterleinad
Copy link
Contributor

Which also makes me think that it is likely that the current sort implementation is broken for comparators if a user does not provide a host compatible comparator for some spaces.

Yes, the implicit assumption is that the data used in the comparator is both host and device-accessible. Furthermore, for SYCL, the comparator must satisfy the sycl::is_device_copyable trait when using oneDPL. This is why using raw pointers instead of Views fixed compilation as a quick fix.

@dalg24
Copy link
Member
dalg24 commented Feb 28, 2024

Which also makes me think that it is likely that the current sort implementation is broken for comparators if a user does not provide a host compatible comparator for some spaces.

Yes, the implicit assumption is that the data used in the comparator is both host and device-accessible. Furthermore, for SYCL, the comparator must satisfy the sycl::is_device_copyable trait when using oneDPL. This is why using raw pointers instead of Views fixed compilation as a quick fix.

Is it because Kokkos::Views are not marked to be device copyable?

@masterleinad
Copy link
Contributor

Is it because Kokkos::Views are not marked to be device copyable?

sycl::is_device_copyable<View<...>>) is false. Our general approach to deal with non-device-copyable types is to just declare the whole functor as device-copyable by specializing sycl::is_device_copyable since it wouldn't be enough anyway to go through all Kokkos types for supporting all kernels users are running with other backends.

Using raw pointers here is a workaround for the explicit use of Kokkos::View. We could in general declare the comparator for sort and sort_by_key to be device-copyable (in the same way as we are doing that for all other kernels) but since we are not controlling the kernel launch, there is not much we can do if the type used for comparisons is not device-copyable. In that case, we would probably need to fall back to doing the sort on the host.

@dalg24
Copy link
Member
dalg24 commented Feb 28, 2024

But does writing a specialization for Kokkos views resolve the issue here?

@masterleinad
Copy link
Contributor

But does writing a specialization for Kokkos views resolve the issue here?

diff --git a/core/src/Kokkos_View.hpp b/core/src/Kokkos_View.hpp
index 484a0e6f6..74f734728 100644
--- a/core/src/Kokkos_View.hpp
+++ b/core/src/Kokkos_View.hpp
@@ -1996,6 +1996,11 @@ KOKKOS_INLINE_FUNCTION DeducedCommonPropsType<Views...> common_view_alloc_prop(
 
 }  // namespace Kokkos
 
+#ifdef KOKKOS_ENABLE_SYCL
+template <class DataType, class... Properties>
+struct sycl::is_device_copyable<Kokkos::View<DataType, Properties...>> : std::true_type{};
+#endif
+
 #include <impl/Kokkos_ViewUniformType.hpp>
 #include <impl/Kokkos_Atomic_View.hpp>

doesn't seem to be enough for the compiler to stop complaining.

@masterleinad
Copy link
Contributor

I'm happy to write a follow-up pull request that does the approach outlined in #6801 (comment) instead.

@dalg24
Copy link
Member
dalg24 commented Feb 28, 2024

Ignoring HIP failures.
The issues with SYCL with user-defined functors when deferring an algorithm implementation to oneDPL is already present in develop with Kokkos::sort and a custom comparator.
We decided we would document the issue and work on a fix in a follow up.

@aprokop thank you for contributing this!

Comment on lines +168 to +169
// FIXME OPENMPTARGET The sort happens on the host so we have to copy keys there
#ifdef KOKKOS_ENABLE_OPENMPTARGET
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be decided based on accessibility form the host

Copy link
Collaborator Author
@aprokop aprokop Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure. Take HIP sort. When #6793 is merged, with the comparator sort we would need to check rocThrust presence, and copy to the host when it is absent. Essentially, we would need to copy to the host when sort does, and that is a more complex criteria than the accessibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0