Stars
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
A scalable, end-to-end training pipeline for general-purpose agents
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
A Comprehensive Toolkit for High-Quality PDF Content Extraction
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions.…
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
Qihoo360 / 360-LLaMA-Factory
Forked from hiyouga/LLaMA-Factoryadds Sequence Parallelism into LLaMA-Factory
slime is a LLM post-training framework aiming at scaling RL.
✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】
The simplest, fastest repository for training/finetuning medium-sized GPTs.
TransMLA: Multi-Head Latent Attention Is All You Need
A Pytorch tutorial of Conditional Flow Matching[Lipman22] using MNIST dataset.
Serverless LLM Serving for Everyone.
PyTorch code and models for VJEPA2 self-supervised learning from video.
Model Context Protocol Servers
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
ACL 2025: Synthetic data generation pipelines for text-rich images.
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs