make src.build -j<n> NCCL_HOME=path-to-occl/build
make src_simple.build -j<n> NCCL_HOME=path-to-occl/build
make src_chaos_order.build -j<n> NCCL_HOME=path-to-occl/build
and for experiments with MPI:
make src.build -j<n> MPI=1 MPI_HOME=path-to-mpi NCCL_HOME=path-to-occl/build
make src_simple.build -j<n> MPI=1 MPI_HOME=path-to-mpi NCCL_HOME=path-to-occl/build
Using
-gencode=arch=compute_86,code=sm_86
forNVCC_GENCODE
by default. Set theNVCC_GENCODE
environment variable when needed.
bash nccl_tests.sh <NUM_GPUS> <COLL_FUNC> <BUFFER_SIZE>
bash occl_tests.sh <NUM_GPUS> <COLL_FUNC> <BUFFER_SIZE>
- Supported
COLL_FUNC
includesAR
,AG
,RS
,R
, andB
, representing all-reduce, all-gather, reduce-scatter, reduce, and broadcast.
and for experiments with MPI:
bash mpi_nccl_tests.sh <NUM_GPUS_PER_NODE> <COLL_FUNC> <BUFFER_SIZE>
bash mpi_occl_tests.sh <NUM_GPUS_PER_NODE> <COLL_FUNC> <BUFFER_SIZE>
To demonstrate OCCL's deadlock-prevention capability:
export BINARY=CHAOS
bash occl_tests.sh 8