8000 Investigate removing the sycl::reqd_sub_group_size kernel attribute in reduce-then-scan kernels · Issue #2134 · uxlfoundation/oneDPL · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Investigate removing the sycl::reqd_sub_group_size kernel attribute in reduce-then-scan kernels #2134
Open
@mmichel11

Description

@mmichel11

Summary:
Currently, reduce-then-scan kernels in oneDPL are reliant on underlying hardware supporting sub-group sizes of 32 (and 16 for some special cases). We enforce this through the usage of the sycl::reqd_sub_group_size attribute and a fallback to the legacy scan-then-propagate implementation if sub-group sizes of 32 are not present.

Problem Statement:
Our reduce-then-scan implementation should ideally be able to support hardware with arbitrary sub-group sizes without sacrificing performance. This would enable us to use reduce-then-scan across all GPU accelerators instead of relying on our legacy implementation. It would also allow us to gracefully handle the scenarios which require special workarounds due to sub-group hardware bugs such as #2133.

Additional Context:
The decision to require a sub-group size of 32 was made to be able to use this sub-group size as a compile time constant throughout the reduce-then-scan implementation as opposed to relying on the sub_group class' get_local_range runtime query. Brief experimentation was performed during initial development with removing this sub-group size requirement and relying on run-time sub-group size queries, but large performance hits were observed.

The critical portion of code that is affected by this change is the sub-group level scan:

__inclusive_sub_group_masked_scan(const __dpl_sycl::__sub_group& __sub_group, _MaskOp __mask_fn,
. I had experimented with replacing the sub-group size integral template with a run-time sub-group size variable and enabling partial loop unrolling, but I saw large performance hits. We should take a deeper dive into this issue and what can be done to resolve it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0