Stars
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
Using GPT to organize and access information, and generate questions. Long term goal is to make an agent-like research assistant.
Wan: Open and Advanced Large-Scale Video Generative Models
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A Datacenter Scale Distributed Inference Serving Framework
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Open-Sora: Democratizing Efficient Video Production for All
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
The Triton TensorRT-LLM Backend
FlashInfer: Kernel Library for LLM Serving
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术