Stars
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Example models using DeepSpeed
Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
A tensor-aware point-to-point communication primitive for machine learning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
A benchmark for evaluation and comparison of various NLP tasks in Persian language.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Development repository for the Triton language and compiler
VQ-VAE + Transformer based synthesis of 3D anatomical imaging data
Medical Imaging Deep Learning library to train and deploy 3D segmentation models on Azure Machine Learning
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
PyTorch extensions for high performance and large scale training.
Optimized primitives for collective multi-GPU communication
Collection of various algorithms in mathematics, machine learning, computer science, physics, etc implemented in C for educational purposes.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Real-time pose estimation accelerated with NVIDIA TensorRT
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Python package built to ease deep learning on graph, on top of existing DL frameworks.