Stars
TradingAgents: Multi-Agents LLM Financial Trading Framework
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…
Cost-efficient and pluggable Infrastructure components for GenAI inference
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
🚀 PR-Agent (Qodo Merge open-source): An AI-Powered 🤖 Tool for Automated Pull Request Analysis, Feedback, Suggestions and More! 💻🔍
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Fast and memory-efficient exact attention
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Backup & export all Evernote notes and notebooks
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version in translation
RayLLM - LLMs on Ray (Archived). Read README for more info.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
A high-throughput and memory-efficient inference and serving engine for LLMs
A guidance language for controlling large language models.
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Several simple examples for popular neural network toolkits calling custom CUDA operators.