Highlights
- Pro
Stars
FlashInfer: Kernel Library for LLM Serving
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
Easy-to-use headless React Hooks to run LLMs in the browser with WebGPU. Just useLLM().
🗣️ Chat with LLM like Vicuna totally in your browser with WebGPU, safely, privately, and with no server. Powered by web llm.
Universal LLM Deployment Engine with ML Compilation
High-performance In-browser LLM Inference Engine
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Transformer related optimization, including BERT, GPT
Development repository for the Triton language and compiler
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
A collection of resources and papers on Diffusion Models
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Training and serving large-scale neural networks with auto parallelization.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)