Stars
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
Question and Answer based on Anything.
Open-Sora: Democratizing Efficient Video Production for All
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
xiaotianzhang01 / Efficient_Foundation_Model_Survey
Forked from UbiquitousLearning/Efficient_Foundation_Model_SurveyA high-performance inference system for large language models, designed for production environments.
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.