zfy3000163

zfy3000 zfy3000163

14 followers · 86 following

Achievements

Starred repositories

NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Go 1,172 190 Updated Apr 16, 2025

Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware

Go 1,547 291 Updated Apr 30, 2025

NVIDIA / CUDALibrarySamples

CUDA Library Samples

Cuda 1,910 377 Updated May 1, 2025

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 303 60 Updated May 2, 2025

DynamoDS / Dynamo

Open Source Graphical Programming for Design

HTML 1,848 648 Updated May 1, 2025

ppl-ai / pplx-kernels

Perplexity GPU Kernels

C++ 267 26 Updated May 1, 2025

CopilotNext / Calculator

JavaScript 1 Updated Apr 7, 2025

CopilotNext / python_calc

Python 1 Updated Apr 15, 2025

ROCm / rocSHMEM

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

C++ 78 16 Updated May 1, 2025

ByteDance-Seed / Triton-distributed

Distributed Triton for Parallel Systems

MLIR 629 37 Updated May 2, 2025

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 346 43 Updated Sep 21, 2024

triton-inference-server / triton_distributed

Rust 48 15 Updated Mar 7, 2025

neuralmagic / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 12 3 Updated May 2, 2025

pnnl / QASMBench

A low-level OpenQASM benchmark suite for NISQ evaluation and simulation. Please see our paper for details.

OpenQASM 112 36 Updated Jan 20, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 256 55 Updated Mar 20, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Python 2,570 159 Updated May 2, 2025

computerhistory / AlexNet-Source-Code

This package contains the original 2012 AlexNet code.

Cuda 2,574 332 Updated Mar 12, 2025

NVIDIA / nv-bmc-shmem

Shared memory IPC on BMC

C++ 6 Updated Apr 27, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,392 1,209 Updated May 1, 2025

neuralmagic / nm-vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 263 11 Updated Oct 11, 2024

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 912 58 Updated Apr 15, 2025

pytorch / torchtitan

A PyTorch native library for large-scale model training

Python 3,652 351 Updated May 2, 2025

yifuwang / symm-mem-recipes

Python 69 4 Updated Dec 27, 2024

muriloboratto / NVSHEMEM

Sample Codes using NVSHMEM on Multi-GPU

8 Updated Jan 22, 2023

NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,745 452 Updated Oct 9, 2023

MooreThreads / torch_musa

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 389 28 Updated Apr 27, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,767 285 Updated Apr 30, 2025

DominicBreuker / pspy

Monitor linux processes without root permissions

Go 5,392 540 Updated Jan 17, 2023

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,840 4,499 Updated May 2, 2025

zhaochenyang20 / ModelServer

Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang

zfy3000 zfy3000163

Starred repositories

packet-filter

socket-server-c

fstackqperf

qperfdpdk

noviswitch