Stars
PyTorch native quantization and sparsity for training and inference
Triton-based implementation of Sparse Mixture of Experts.
GPU operators for sparse tensor operations
Accelerating Diffusion Transformers with Token-wise Feature Caching
terashuf shuffles multi-terabyte text files using limited memory
Cramming the training of a (BERT-type) language model into limited compute.
Minimal pretraining script for language modeling in PyTorch. Supporting torch compilation and DDP. It includes a model implementation and a data preprocessing.
A framework for few-shot evaluation of language models.
Measuring Massive Multitask Language Understanding | ICLR 2021
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Official inference repo for FLUX.1 models
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Implementations of attention with the softpick function, naive and FlashAttention-2
ian4hu / Clipy
Forked from Clipy/ClipyClipboard extension app for macOS.
EleutherAI / nanoGPT-mup
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.