[Feature] Support Hierarchical Group-Cast Collective Communication #26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
all2all-v
and 2d cp device mesh (inter-node x intra-node)MAGI_ATTENTION_HIERARCHICAL_COMM
to control whether to enable hierarchical group-cast.export MAGI_ATTENTION_HIERARCHICAL_COMM=1
and pass a 2D cp device mesh.if magi_attention.is_hierarchical_comm_enable():
to check if it is enabled.magi_attention.is_cuda_device_max_connections_one() == False
.high_bandwidth_domain_size > 1
is also forbidden when it's enabled.comm/primitive/utils.py
to reduce runtime overhead for range ops.range_fill_
kernel forout_zero_fill
operation in eachattn_fwd_partial
call.