8000 AMD ReorderInstruction pass will reorder the global_load ahead of local_store and break the local_prefetch logic which will miss match TritonAMDGPULowerInstructionSchedHints::createLocalPrefetchSchedule code logic · Issue #6750 · triton-lang/triton · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
AMD ReorderInstruction pass will reorder the global_load ahead of local_store and break the local_prefetch logic which will miss match TritonAMDGPULowerInstructionSchedHints::createLocalPrefetchSchedule code logic #6750
Closed
@chuxin12345

Description

@chuxin12345

Describe the bug

TritonAMDGPULowerInstructionSchedHints::createLocalPrefetchSchedule() logic assume that the ir is like:

// Prefetch Schema cluster order and staging.
// for i in (...):
//   local_stores: stage=i+1
//   global_loads: stage=i+2
//   compute:      stage=i
//   local_load:   stage=i+1
//   tail:         stage=i

but TritonAMDGPUReorderInstructionsPass::scheduleGlobalLoadLocalStore() will reorder the order like:

// for i in (...):
//   global_loads: stage=i+2
//   local_stores: stage=i+1
//   compute:      stage=i
//   local_load:   stage=i+1
//   tail:         stage=i

and this will lead to TritonAMDGPULowerInstructionSchedHints::createLocalPrefetchSchedule can't work.

due to triton will insert sync & barrier before local_store, and global_loads & local_stores & compute cross sync & barrier limit. and sched.group mask can't work any more.

Environment details

Triton tip code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0