Stars
💥💻💥 A data-parallel functional programming language
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Mirror of the Glasgow Haskell Compiler. Please submit issues and patches to GHC's Gitlab instance (https://gitlab.haskell.org/ghc/ghc). First time contributors are encouraged to get started with th…
Universal LLM Deployment Engine with ML Compilation
eBPF Developer Tutorial: Learning eBPF Step by Step with Examples
A debugging and profiling tool that can trace and visualize python code execution
nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for do…
Neural Code Intelligence Survey 2024; Reading lists and resources
Fast and memory-efficient exact attention
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Ongoing research training transformer models at scale
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
scikit-learn: machine learning in Python
A Python package with bindings to the "Virtual Instrument Software Architecture" VISA library, in order to control measurement devices and test equipment via GPIB, RS232, or USB.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
"Probabilistic Machine Learning" - a book series by Kevin Murphy