-
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedMar 17, 2025 -
Pai-Megatron-Patch Public
Forked from alibaba/Pai-Megatron-PatchThe official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Python Apache License 2.0 UpdatedFeb 10, 2025 -
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedJan 23, 2025 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache 10000 License 2.0 UpdatedNov 6, 2024 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor parallelism on GPUs.
C++ Apache License 2.0 UpdatedOct 30, 2024 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedSep 10, 2024 -
FastDeploy Public
Forked from PaddlePaddle/FastDeploy⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…
C++ Apache License 2.0 UpdatedAug 30, 2024 -
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ Apache License 2.0 UpdatedJul 25, 2024 -
-
flash-attention Public
Forked from PaddlePaddle/flash-attentionFast and memory-efficient exact attention
C++ BSD 3-Clause "New" or "Revised" License UpdatedJun 25, 2024 -
PaddleNLP Public
Forked from PaddlePaddle/PaddleNLP👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search…
Python UpdatedJun 18, 2024 -
flash-attention-hip Public
Flash Attention 2 C API for Paddle-ROCM
-
composable_kernel Public
Forked from ROCm/composable_kernelComposable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++ Other UpdatedDec 7, 2023 -
hipBLASLt Public
Forked from ROCm/hipBLASLthipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
-
ChatGLM-6B-in-DeepSpeed-Chat Public
ChatGLM-6B in DeepSpeed-Chat for DCU
-
GLM-Pretrain in Megatron-Deepspeed for DCU
-
VkFFT Public
Forked from DTolm/VkFFTVulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
C++ MIT License UpdatedAug 22, 2023 -
Tensile Public
Forked from ROCm/TensileStretching GPU performance for GEMMs and tensor contractions.
Python MIT License UpdatedAug 22, 2023 -
docs Public
Forked from PaddlePaddle/docsDocumentations for PaddlePaddle
Python Apache License 2.0 UpdatedJul 18, 2023 -
oneflow Public
Forked from Oneflow-Inc/oneflowOneFlow is a performance-centered and open-source deep learning framework.
C++ Apache License 2.0 UpdatedJul 18, 2023