-
Notifications
You must be signed in to change notification settings - Fork 43
Distributed nearest callback #1075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Daniel Arndt <arndtd@ornl.com>
bff4f8e
to
ab1eb5d
Compare
namespace Details | ||
{ | ||
|
||
template <typename Tree, typename Callback, typename OutValue, bool UseValues> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OutValue
argument is really annoying. Currently, no avoiding it. The issue is that we can't determine from the callback's signature what the output value type is going to be. The way we are doing things right now is that determine that from the storage arrays passed in by a user.
@@ -84,7 +84,7 @@ struct LegacyCallbackWrapper | |||
PairValueIndex<Value, Index> const &value, | |||
Output const &out) const | |||
{ | |||
_callback(predicate, value.index, out); | |||
_callback(predicate, (int)value.index, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more precise treatment for APIv1. I also needed this due to the default distributed callback now being templated on the value, resulting in the unsigned -> signed
loss of precision errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand that change nor your "a more precise treatment for APIv1" comment.
In any case we will also need to add a comment in the code to explain why we cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our definition of the original callback was operator()(Query const& query, int i)
. Without casting, the second argument is actually of type the default index of PairValueIndex
, which is unsigned
. That diverged from the original APIv1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and you prefer explicit cast than implicit for readability right, it is not needed per se?
(just trying to make sure I understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for aggregate initialization
The problem with tags is that their behavior is not visible for users. If a previous version of ArborX ignores tags, it may just silently do something else. Instead, we want to fail compilation if the user indicates an intention we do not support. This is one way to do it: user has to wrap their callback into a class.
@@ -84,7 +84,7 @@ struct LegacyCallbackWrapper | |||
PairValueIndex<Value, Index> const &value, | |||
Output const &out) const | |||
{ | |||
_callback(predicate, value.index, out); | |||
_callback(predicate, (int)value.index, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand that change nor your "a more precise treatment for APIv1" comment.
In any case we will also need to add a comment in the code to explain why we cast.
[[maybe_unused]] int count = 0; | ||
_callback(query, value, [&](OutValue const &ov) { | ||
out_value = ov; | ||
++count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this incrementing the counter need to be atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that each nearest query is executed by a single thread. If that assumption is wrong, we don't have an easy way to check this, unless we create a full view and do atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to that effect
@dalg24 Should have addressed all your comments with one exception. Regarding atomics, I think it is fine as it is. I did not do atomics, as I think it is fine as it is. The only way I can see us triggering it if we call callbacks on the on-node nearest neighbor search results in parallel. We don't currently do that, nor do we have any plans to do it, as nearest neighbors require priority trees. The only exception to the latter is k = 1 specialization in BruteForce (not currently implemented), but even that would result in a single answer and thus no parallel callback calls for the same query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few last comments
[[maybe_unused]] int count = 0; | ||
_callback(query, value, [&](OutValue const &ov) { | ||
out_value = ov; | ||
++count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to that effect
@@ -84,7 +84,7 @@ struct LegacyCallbackWrapper | |||
PairValueIndex<Value, Index> const &value, | |||
Output const &out) const | |||
{ | |||
_callback(predicate, value.index, out); | |||
_callback(predicate, (int)value.index, out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and you prefer explicit cast than implicit for readability right, it is not needed per se?
(just trying to make sure I understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge after you add the comments I asked for
4 builds (all CUDA and SYCL) timed out, though 2 of them made through the first make+ctest. 3 builds passed. Given that there's been only comment change since 94330e2, I'm comfortable merging this. |
[The 3-round rework of #737 will be pushed in that branch once this PR has been merged].
Works for both APIv1 and APIv2 (that was quite tricky).
Introduces two ways to do a nearest callback in the distributed setting:
Constrained meaning that
Constrained callback is indicated with a new tag.
Current limitations:
To be fair, even with the spatial callbacks I'm not sure what that means