-
independent
Highlights
- Pro
Stars
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
minimal GRPO implementation from scratch
Minimal reproduction of DeepSeek R1-Zero
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
A Python library transfers PyTorch tensors between CPU and NVMe
Mini versions of GPT2, LLama3, .. for pre-training
Everything about the SmolLM2 and SmolVLM family of models
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
OLMoE: Open Mixture-of-Experts Language Models
VideoSys: An easy and efficient system for video generation
Development repository for the Triton language and compiler
Latency and Memory Analysis of Transformer Models for Training and Inference
RTP: Rethinking Tensor Parallelism with Memory Deduplication
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
A validation and profiling tool for AI infrastructure
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)