-
CUDA-Learn-Notes Public
Forked from xlite-dev/LeetCUDA📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Cuda GNU General Public License v3.0 UpdatedApr 21, 2025 -
llm-compressor Public
Forked from vllm-project/llm-compressorTransformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python Apache License 2.0 UpdatedDec 11, 2024 -
lm-evaluation-harness Public
Forked from EleutherAI/lm-evaluation-harnessA framework for few-shot evaluation of language models.
Python MIT License UpdatedNov 28, 2024 -
Mooncake Public
Forked from kvcache-ai/MooncakeMooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++ Apache License 2.0 UpdatedNov 28, 2024 -
detectron2 Public
Forked from facebookresearch/detectron2Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Python Apache License 2.0 UpdatedNov 17, 2024 -
entropix Public
Forked from xjdr-alt/entropixEntropy Based Sampling and Parallel CoT Decoding
Python Apache License 2.0 UpdatedNov 13, 2024 -
unilm Public
Forked from microsoft/unilmLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Python MIT License UpdatedNov 9, 2024 -
HAMi Public
Forked from Project-HAMi/HAMiHeterogeneous AI Computing Virtualization Middleware
Go Apache License 2.0 UpdatedOct 31, 2024 -
bigcode-evaluation-harness Public
Forked from bigcode-project/bigcode-evaluation-harnessA framework for the evaluation of autoregressive code generation language models.
Python Apache License 2.0 UpdatedOct 31, 2024 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedOct 25, 2024 -
outlines Public
Forked from dottxt-ai/outlinesStructured Text Generation
Python Apache License 2.0 UpdatedSep 4, 2024 -
sglang Public
Forked from sgl-project/sglangSGLang is yet another fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedJul 29, 2024 -
gin Public
Forked from gin-gonic/ginGin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much better performance -- up to 40 times faster. If you need smashing performance, get yourself some Gin.
Go MIT License UpdatedJul 28, 2024 -
triton Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
C++ MIT License UpdatedJul 25, 2024 -
deepsparse Public
Forked from neuralmagic/deepsparseSparsity-aware deep learning inference runtime for CPUs
Python Other UpdatedJul 19, 2024 -
modelmesh-serving Public
Forked from kserve/modelmesh-servingController for ModelMesh
Go Apache License 2.0 UpdatedJul 16, 2024 -
FlagEmbedding Public
Forked from FlagOpen/FlagEmbeddingRetrieval and Retrieval-augmented LLMs
Python MIT License UpdatedJun 26, 2024 -
dify Public
Forked from langgenius/difyDify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
TypeScript Other UpdatedJun 21, 2024 -
lectures Public
Forked from gpu-mode/lecturesMaterial for cuda-mode lectures
Jupyter Notebook Apache License 2.0 UpdatedJun 13, 2024 -
Qwen-Agent Public
Forked from QwenLM/Qwen-AgentAgent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
Python Other UpdatedJun 6, 2024 -
marlin Public
Forked from IST-DASLab/marlinFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Python Apache License 2.0 UpdatedApr 22, 2024 -
lightllm Public
Forked from ModelTC/lightllmLightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Python Apache License 2.0 UpdatedApr 15, 2024 -
lmdeploy Public
Forked from InternLM/lmdeployLMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Python Apache License 2.0 UpdatedMar 22, 2024 -
KnowLM Public
Forked from zjunlp/KnowLMAn Open-sourced Knowledgable Large Language Model Framework.
Python MIT License UpdatedMar 16, 2024 -
transformer-debugger Public
Forked from openai/transformer-debuggerPython MIT License UpdatedMar 13, 2024 -
LLaMA-Pro Public
Forked from TencentARC/LLaMA-ProProgressive LLaMA with Block Expansion.
Python Apache License 2.0 UpdatedMar 12, 2024 -
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedMar 8, 2024 -
dolma Public
Forked from allenai/dolmaData and tools for generating and inspecting OLMo pre-training data.
Python Apache License 2.0 UpdatedFeb 27, 2024 -
LookaheadDecoding Public
Forked from hao-ai-lab/LookaheadDecodingPython Apache License 2.0 UpdatedFeb 14, 2024 -
LLaMA-Factory Public
Forked from hiyouga/LLaMA-FactoryEasy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)
Python Apache License 2.0 UpdatedNov 1, 2023