herocouple

herocouple

1 follower · 1 following

CUDA-Learn-Notes Public
Forked from xlite-dev/LeetCUDA

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

Cuda GNU General Public License v3.0 Updated Apr 21, 2025
llm-compressor Public
Forked from vllm-project/llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python Apache License 2.0 Updated Dec 11, 2024
lm-evaluation-harness Public
Forked from EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python MIT License Updated Nov 28, 2024
Mooncake Public
Forked from kvcache-ai/Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ Apache License 2.0 Updated Nov 28, 2024
detectron2 Public
Forked from facebookresearch/detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Python Apache License 2.0 Updated Nov 17, 2024
entropix Public
Forked from xjdr-alt/entropix

Entropy Based Sampling and Parallel CoT Decoding

Python Apache License 2.0 Updated Nov 13, 2024
unilm Public
Forked from microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python MIT License Updated Nov 9, 2024
HAMi Public
Forked from Project-HAMi/HAMi

Heterogeneous AI Computing Virtualization Middleware

Go Apache License 2.0 Updated Oct 31, 2024
bigcode-evaluation-harness Public
Forked from bigcode-project/bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Python Apache License 2.0 Updated Oct 31, 2024
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python Apache License 2.0 Updated Oct 25, 2024
outlines Public
Forked from dottxt-ai/outlines

Structured Text Generation

Python Apache License 2.0 Updated Sep 4, 2024
sglang Public
Forked from sgl-project/sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python Apache License 2.0 Updated Jul 29, 2024
gin Public
Forked from gin-gonic/gin

Gin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much better performance -- up to 40 times faster. If you need smashing performance, get yourself some Gin.

Go MIT License Updated Jul 28, 2024
triton Public
Forked from triton-lang/triton

Development repository for the Triton language and compiler

C++ MIT License Updated Jul 25, 2024
deepsparse Public
Forked from neuralmagic/deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Python Other Updated Jul 19, 2024
modelmesh-serving Public
Forked from kserve/modelmesh-serving

Controller for ModelMesh

Go Apache License 2.0 Updated Jul 16, 2024
FlagEmbedding Public
Forked from FlagOpen/FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python MIT License Updated Jun 26, 2024
dify Public
Forked from langgenius/dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript Other Updated Jun 21, 2024
lectures Public
Forked from gpu-mode/lectures

Material for cuda-mode lectures

Jupyter Notebook Apache License 2.0 Updated Jun 13, 2024
Qwen-Agent Public
Forked from QwenLM/Qwen-Agent

Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.

Python Other Updated Jun 6, 2024
marlin Public
Forked from IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python Apache License 2.0 Updated Apr 22, 2024
lightllm Public
Forked from ModelTC/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python Apache License 2.0 Updated Apr 15, 2024
lmdeploy Public
Forked from InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python Apache License 2.0 Updated Mar 22, 2024
KnowLM Public
Forked from zjunlp/KnowLM

An Open-sourced Knowledgable Large Language Model Framework.

Python MIT License Updated Mar 16, 2024
transformer-debugger Public
Forked from openai/transformer-debugger

Python MIT License Updated Mar 13, 2024
LLaMA-Pro Public
Forked from TencentARC/LLaMA-Pro

Progressive LLaMA with Block Expansion.

Python Apache License 2.0 Updated Mar 12, 2024
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Mar 8, 2024
dolma Public
Forked from allenai/dolma

Data and tools for generating and inspecting OLMo pre-training data.

Python Apache License 2.0 Updated Feb 27, 2024
LookaheadDecoding Public
Forked from hao-ai-lab/LookaheadDecoding

Python Apache License 2.0 Updated Feb 14, 2024
LLaMA-Factory Public
Forked from hiyouga/LLaMA-Factory

Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

Python Apache License 2.0 Updated Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

herocouple

Block or report herocouple

CUDA-Learn-Notes Public

llm-compressor Public

lm-evaluation-harness Public

Mooncake Public

detectron2 Public

entropix Public

unilm Public

HAMi Public

bigcode-evaluation-harness Public

vllm Public

outlines Public

sglang Public

gin Public

triton Public

deepsparse Public

modelmesh-serving Public

FlagEmbedding Public

dify Public

lectures Public

Qwen-Agent Public

marlin Public

lightllm Public

lmdeploy Public

KnowLM Public

transformer-debugger Public

LLaMA-Pro Public

flashinfer Public

dolma Public

LookaheadDecoding Public

LLaMA-Factory Public