8000 gty111 (Tianyu Guo) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View gty111's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report gty111

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gty111/README.md
  • PH.D. student at Sun Yat-sen university

  • AI Infra, MLSys, Simulaters, GPU architecture

  • Visit my personal web

News

  • [2025/4/27] We have released gLLM, an efficient pipeline parallelism inference engine for LLM.

PRs for Project

  • sglang: Fix port number overflow link
  • xDiT: Enable warm up for VAE link
  • xDiT: Fix parallel vae link
  • DistVAE: Fix batch dimension link
  • vLLM: [Benchmark] Refactor sample_requests in benchmark_throughput link
  • vLLM: [Bugfix] fix automatic prefix args and add log info link
  • vLLM: [Minor Fix] Fix comments in benchmark_serving link
  • vLLM: [Minor Fix] Remove unused code in benchmark_prefix_caching.py link
  • TVM: [Doc] Fix minor error in "Expressions in Relay" link
  • TVM: [Doc] Fix minor error in doc (Add an operator to Relay) link

Pinned Loading

  1. gLLM gLLM Public

    gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

    Python 11 1

  2. PTX-EMU PTX-EMU Public

    PTX-EMU is a simple emulator for CUDA program.

    C++ 31 5

  3. GEMM_MMA GEMM_MMA Public

    Optimize GEMM with tensorcore step by step

    26 5

  4. SimpleUseGpgpuSim SimpleUseGpgpuSim Public

    GPGPU-SIM 使用篇

    Shell 14 1

  5. GEMM_WMMA GEMM_WMMA Public

    GEMM by WMMA (tensor core)

    Cuda 12 7

  6. ConvNN ConvNN Public

    A simple CNN training framework support on CPU and GPU(CUDNN)

    C++ 3

0