8000 TSAN fixes by MarcelKoch · Pull Request #1743 · ginkgo-project/ginkgo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

TSAN fixes #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 8, 2024
Merged

TSAN fixes #1743

merged 6 commits into from
Dec 8, 2024

Conversation

MarcelKoch
Copy link
Member

This PR includes fixes for errors reported by the TSAN. The changes include:

  • RCM: locking the write mutex when checking the queue size
  • SparsityCsr: using atomic load/store when checking of other threads report that the matrix is unsorted
  • Par IC/ILU (t): use atomic load/stores in the same way as the CUDA/HIP implementations
  • MG: 'fix' a write-after-write race in the kcycle stopping criteria. Honestly, I don't think this is necessary, but it stops TSAN from complaining.
  • Batch Solvers: fixed the generation of the 3pt stencil matrix for tests, which caused data races.
  • NOT FIXED: the PGM contains data races in the match_edge function. I didn't fix this, since that would require to extract the common kernels implementations into each backend, which I found a bit overkill. There is also a data race in the non-deterministic part of assign_to_exist_agg, but I ignored that, since it is marked as non-deterministic anyway.

This is not really necessary, but it keeps the TSAN happy.
@MarcelKoch MarcelKoch self-assigned this Dec 6, 2024
@MarcelKoch MarcelKoch added this to the Ginkgo 1.9.0 milestone Dec 6, 2024
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:openmp This is related to the OpenMP module. type:solver This is related to the solvers type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. type:factorization This is related to the Factorizations type:reordering This is related to the matrix(LinOp) reordering type:multigrid This is related to multigrid labels Dec 6, 2024
@MarcelKoch MarcelKoch requested a review from a team December 6, 2024 16:19
Copy link
Member
@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for finding and fixing these issues! Two points I would like to discuss (shouldn't impact the PR though):

  1. naming - the functions in CUDA and HIP are named load_..., where the fact that a memory ordering is specified makes it clear that we are dealing with atomics. For consistency it would be best if OpenMP and CUDA/HIP use the same names. They can become pretty long though (load_relaxed_local), so I would like to avoid adding another atomic_ prefix to them. Do we need to use it?
  2. With PGM being the only kernel that has this issue, I would probably suggest splitting it into OpenMP, SYCL and CUDA/HIP again, also since there is some optimization potential in terms of load balancing and memory access patterns. The alternative (putting atomic operations into the unified headers) seems to invite using the unified kernels for too much.

// implementation of double and float. The compiler will throw an error if the
// templated version is implemented. GCC doesn't throw an error.
template <typename T>
struct atomic_store_helper;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this? Or can we get the same result with just overloaded functions? We only use atomics for a fixed set of types (value types and index types)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I first wanted to use the templated implementation, but as that doesn't work with clang, we can just use simple overloads.

@MarcelKoch
Copy link
Member Author

@upsj regarding 1.: I will go with just load|store, since OpenMP 4.5 doesn't support specifying the memory ordering.

Regarding 2.: I would also not support adding atomics for the unified kernels. I've added now the separate kernels, it now a bit more LOC, but the kernels are simple enough that it should be fine.

@upsj
Copy link
Member
upsj commented Dec 7, 2024

I would consider still adding the _relaxed suffix, because this is the weakest possible atomic memory ordering, and likely close to what the default would be (just to avoid accidentally relying on too much ordering between operations)

@MarcelKoch
Copy link
Member Author

I would rather leave it out to indicate that we can't make assumptions on the memory ordering at all.

@upsj
Copy link
Member
upsj commented Dec 7, 2024

relaxed is as close to "no assumptions" as possible in terms of memory ordering. But not pushing any harder than that 😄

Copy link
sonarqubecloud bot commented Dec 7, 2024

Copy link
codecov bot commented Dec 7, 2024

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.52%. Comparing base (f95fc48) to head (05f6082).

Files with missing lines Patch % Lines
omp/components/atomic.hpp 96.96% 1 Missing ⚠️
omp/factorization/par_ilut_kernels.cpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1743      +/-   ##
===========================================
+ Coverage    89.51%   89.52%   +0.01%     
===========================================
  Files          797      797              
  Lines        65865    65889      +24     
===========================================
+ Hits         58960    58989      +29     
+ Misses        6905     6900       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcelKoch MarcelKoch merged commit 059823f into develop Dec 8, 2024
8 of 11 checks passed
@MarcelKoch MarcelKoch deleted the tsan-fixes branch December 8, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:run-full-test mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:openmp This is related to the OpenMP module. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats type:multigrid This is related to multigrid type:reordering This is related to the matrix(LinOp) reordering type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0