-
Notifications
You must be signed in to change notification settings - Fork 96
TSAN fixes #1743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSAN fixes #1743
Conversation
This is not really necessary, but it keeps the TSAN happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks for finding and fixing these issues! Two points I would like to discuss (shouldn't impact the PR though):
- naming - the functions in CUDA and HIP are named
load_...
, where the fact that a memory ordering is specified makes it clear that we are dealing with atomics. For consistency it would be best if OpenMP and CUDA/HIP use the same names. They can become pretty long though (load_relaxed_local
), so I would like to avoid adding anotheratomic_
prefix to them. Do we need to use it? - With PGM being the only kernel that has this issue, I would probably suggest splitting it into OpenMP, SYCL and CUDA/HIP again, also since there is some optimization potential in terms of load balancing and memory access patterns. The alternative (putting atomic operations into the unified headers) seems to invite using the unified kernels for too much.
omp/components/atomic.hpp
Outdated
// implementation of double and float. The compiler will throw an error if the | ||
// templated version is implemented. GCC doesn't throw an error. | ||
template <typename T> | ||
struct atomic_store_helper; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this? Or can we get the same result with just overloaded functions? We only use atomics for a fixed set of types (value types and index types)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I first wanted to use the templated implementation, but as that doesn't work with clang, we can just use simple overloads.
@upsj regarding 1.: I will go with just Regarding 2.: I would also not support adding atomics for the unified kernels. I've added now the separate kernels, it now a bit more LOC, but the kernels are simple enough that it should be fine. |
I would consider still adding the |
I would rather leave it out to indicate that we can't make assumptions on the memory ordering at all. |
relaxed is as close to "no assumptions" as possible in terms of memory ordering. But not pushing any harder than that 😄 |
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1743 +/- ##
===========================================
+ Coverage 89.51% 89.52% +0.01%
===========================================
Files 797 797
Lines 65865 65889 +24
===========================================
+ Hits 58960 58989 +29
+ Misses 6905 6900 -5 ☔ View full report in Codecov by Sentry. |
05f6082
to
6ad5853
Compare
This PR includes fixes for errors reported by the TSAN. The changes include:
match_edge
function. I didn't fix this, since that would require to extract the common kernels implementations into each backend, which I found a bit overkill. There is also a data race in the non-deterministic part ofassign_to_exist_agg
, but I ignored that, since it is marked as non-deterministic anyway.