8000 herocouple (herocouple) / Repositories · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View herocouple's full-sized avatar

Block or report herocouple

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
  • 📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

    Cuda GNU General Public License v3.0 Updated Apr 21, 2025
  • Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python Apache License 2.0 Updated Dec 11, 2024
  • A framework for few-shot evaluation of language models.

    Python MIT License Updated Nov 28, 2024
  • Mooncake Public

    Forked from kvcache-ai/Mooncake

    Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

    C++ Apache License 2.0 Updated Nov 28, 2024
  • Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

    Python Apache License 2.0 Updated Nov 17, 2024
  • entropix Public

    Forked from xjdr-alt/entropix

    Entropy Based Sampling and Parallel CoT Decoding

    Python Apache License 2.0 Updated Nov 13, 2024
  • unilm Public

    Forked from microsoft/unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Python MIT License Updated Nov 9, 2024
  • HAMi Public

    Forked from Project-HAMi/HAMi

    Heterogeneous AI Computing Virtualization Middleware

    Go Apache License 2.0 Updated Oct 31, 2024
  • A framework for the evaluation of autoregressive code generation language models.

    Python Apache License 2.0 Updated Oct 31, 2024
  • vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python Apache License 2.0 Updated Oct 25, 2024
  • outlines Public

    Forked from dottxt-ai/outlines

    Structured Text Generation

    Python Apache License 2.0 Updated Sep 4, 2024
  • sglang Public

    Forked from sgl-project/sglang

    SGLang is yet another fast serving framework for large language models and vision language models.

    Python Apache License 2.0 Updated Jul 29, 2024
  • gin Public

    Forked from gin-gonic/gin

    Gin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much better performance -- up to 40 times faster. If you need smashing performance, get yourself some Gin.

    Go MIT License Updated Jul 28, 2024
  • triton Public

    Forked from triton-lang/triton

    Development repository for the Triton language and compiler

    C++ MIT License Updated Jul 25, 2024
  • Sparsity-aware deep learning inference runtime for CPUs

    Python Other Updated Jul 19, 2024
  • Controller for ModelMesh

    Go Apache License 2.0 Updated Jul 16, 2024
  • Retrieval and Retrieval-augmented LLMs

    Python MIT License Updated Jun 26, 2024
  • dify Public

    Forked from langgenius/dify

    Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

    TypeScript Other Updated Jun 21, 2024
  • lectures Public

    Forked from gpu-mode/lectures

    Material for cuda-mode lectures

    Jupyter Notebook Apache License 2.0 Updated Jun 13, 2024
  • Qwen-Agent Public

    Forked from QwenLM/Qwen-Agent

    Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.

    Python Other Updated Jun 6, 2024
  • marlin Public

    Forked from IST-DASLab/marlin

    FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python Apache License 2.0 Updated Apr 22, 2024
  • lightllm Public

    Forked from ModelTC/lightllm

    LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

    Python Apache License 2.0 Updated Apr 15, 2024
  • lmdeploy Public

    Forked from InternLM/lmdeploy

    LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

    Python Apache License 2.0 Updated Mar 22, 2024
  • KnowLM Public

    Forked from zjunlp/KnowLM

    An Open-sourced Knowledgable Large Language Model Framework.

    Python MIT License Updated Mar 16, 2024
  • Python MIT License Updated Mar 13, 2024
  • LLaMA-Pro Public

    Forked from TencentARC/LLaMA-Pro

    Progressive LLaMA with Block Expansion.

    Python Apache License 2.0 Updated Mar 12, 2024
  • FlashInfer: Kernel Library for LLM Serving

    Cuda Apache License 2.0 Updated Mar 8, 2024
  • dolma Public

    Forked from allenai/dolma

    Data and tools for generating and inspecting OLMo pre-training data.

    Python Apache License 2.0 Updated Feb 27, 2024
  • Python Apache License 2.0 Updated Feb 14, 2024
  • Easy-to-use LLM fine-tuning framework (LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, ChatGLM3)

    Python Apache License 2.0 Updated Nov 1, 2023
0