Stars
A library for researching neural networks compression and acceleration methods.
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
User-friendly LLaMA: Train or Run the model using PyTorch. Nothing else.
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Efficient Retrieval Augmentation and Generation Framework
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Efficient few-shot learning with Sentence Transformers
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Sparsity-aware deep learning inference runtime for CPUs
🐶 Kubernetes CLI To Manage Your Clusters In Style!
Repository containing code for "How to Train BERT with an Academic Budget" paper
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks