- The University of Hong Kong
Highlights
- Pro
-
kahypar Public
Forked from kahypar/kahyparKaHyPar (Karlsruhe Hypergraph Partitioning) is a multilevel hypergraph partitioning framework providing direct k-way and recursive bisection based partitioning algorithms that compute solutions of …
C++ GNU General Public License v3.0 UpdatedJun 15, 2025 -
kahypar-shared-resources Public
Forked from kahypar/kahypar-shared-resourcesThis repository contains resources shared between KaHyPar and Mt-KaHyPar under MIT license.
C++ MIT License UpdatedMar 18, 2025 -
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedMar 10, 2025 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedMar 7, 2025 -
mt-kahypar Public
Forked from kahypar/mt-kahyparMt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best seq…
C++ MIT License UpdatedFeb 10, 2025 -
nsys2json Public
A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.
-
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
Python Other UpdatedJan 14, 2025 -
Megatron-LM Public
Forked from NVIDIA/Megatron-LMArtifact for DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
-
Lancet Public
Forked from awslabs/Lancet-Accelerating-MoE-Training-via-Whole-Graph-Computation-Communication-OverlappingOfficial implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping, published in MLSys'24.
C++ Apache License 2.0 UpdatedSep 25, 2024 -
pymetis Public
Forked from inducer/pymetisA Python wrapper around Metis, a graph partitioning package
C Other UpdatedSep 3, 2024 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedJul 14, 2024 -
grouped_gemm Public
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Cuda Apache License 2.0 UpdatedJul 8, 2024 -
LLaVA Public
Forked from haotian-liu/LLaVA[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
Python Apache License 2.0 UpdatedNov 20, 2023 -
text-to-text-transfer-transformer Public
Forked from google-research/text-to-text-transfer-transformerCode for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Python Apache License 2.0 UpdatedSep 30, 2023 -
-
DeepSpeed Public
Forked from deepspeedai/DeepSpeedDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Python Apache License 2.0 UpdatedMay 25, 2023 -
ccf-deadlines Public
Forked from ccfddl/ccf-deadlines⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
Vue MIT License UpdatedApr 3, 2023 -
-
elkai Public
Forked from fikisipi/elkaiPython 3 travelling salesman (TSP) approx solver based on LKH (cross platform)
Python Other UpdatedDec 4, 2022 -
byteps Public
Forked from joapolarbear/bytepsA high performance and generic framework for distributed DNN training
Python Other UpdatedSep 26, 2022 -
horovod Public
Forked from horovod/horovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
C++ Other UpdatedSep 26, 2022 -
aws-ofi-nccl Public
Forked from aws/aws-ofi-ncclThis is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
C Apache License 2.0 UpdatedAug 11, 2022 -
perftest Public
Forked from linux-rdma/perftestInfiniband Verbs Performance Tests
-
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ BSD 3-Clause "New" or "Revised" License UpdatedJul 28, 2022 -
-
BayesianOptimization Public
Forked from bayesian-optimization/BayesianOptimizationA Python implementation of global optimization with gaussian processes.
Python MIT License UpdatedMar 30, 2022 -
tutel Public
Forked from microsoft/TutelTutel MoE: An Optimized Mixture-of-Experts Implementation
Python MIT License UpdatedMar 30, 2022 -
-
-
tensorflow Public
Forked from tensorflow/tensorflowAn Open Source Machine Learning Framework for Everyone
C++ Apache License 2.0 UpdatedJan 27, 2021