Tags: crownk1997/nccl
Tags
2.5.6-1 (NVIDIA#255) Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (NVIDIA#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (NVIDIA#155, NVIDIA#248). Detect duplicated devices and return an error (NVIDIA#231). Add tuning for GCP
2.4.8-1 Fix NVIDIA#209: improve socket transport performance Split transfers over multiple sockets Launch multiple threads to drive sockets Detect AWS NICs and set nsockets/nthreads accordingly
NCCL 2.4.6-1 Added detection of IBM/Power NVLink bridge device. Add NUMA support to PCI distance calculations. Added NCCL_IGNORE_CPU_AFFINITY env var. Fix memory leaks; GithubIssue#180 Compiler warning fix; GithubIssue#178 Replace non-standard variable length arrays. GithubIssue#171 Fix Tree+Shared Memory crash. GithubPR#185 Fix LL cleanup hang during long running DL jobs. Fix NCCL_RINGS environment variable handling. Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191 Improve bootstrap socket connection reliability at scale. Fix hostname hashing issue. GithubIssue#187 Code cleanup to rename all non device files from *.cu to *.cc
PreviousNext