Stars
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
Making large AI models cheaper, faster and more accessible
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
Simple, scalable AI model deployment on GPU clusters
Integrate the DeepSeek API into popular softwares
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
DeepSeek-VL: Towards Real-World Vision-Language Understanding
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
A flexible framework powered by ComfyUI for generating personalized Nobel Prize images.
Slim(toolkit): Don't change anything in your container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
ncnn is a high-performance neural network inference framework optimized for the mobile platform
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
SGLang is a fast serving framework for large language models and vision language models.
Universal LLM Deployment Engine with ML Compilation
Development repository for the Triton language and compiler