8000 zfy3000163 (zfy3000) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View zfy3000163's full-sized avatar

Block or report zfy3000163

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Go 1,172 190 Updated Apr 16, 2025

Heterogeneous AI Computing Virtualization Middleware

Go 1,547 291 Updated Apr 30, 2025

CUDA Library Samples

Cuda 1,910 377 Updated May 1, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 303 60 Updated May 2, 2025

Open Source Graphical Programming for Design

HTML 1,848 648 Updated May 1, 2025

Perplexity GPU Kernels

C++ 267 26 Updated May 1, 2025
JavaScript 1 Updated Apr 7, 2025
Python 1 Updated Apr 15, 2025

rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.

C++ 78 16 Updated May 1, 2025

Distributed Triton for Parallel Systems

MLIR 629 37 Updated May 2, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 346 43 Updated Sep 21, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 12 3 Updated May 2, 2025

A low-level OpenQASM benchmark suite for NISQ evaluation and simulation. Please see our paper for details.

OpenQASM 112 36 Updated Jan 20, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 256 55 Updated Mar 20, 2025

CUDA Python: Performance meets Productivity

Python 2,570 159 Updated May 2, 2025

This package contains the original 2012 AlexNet code.

Cuda 2,574 332 Updated Mar 12, 2025

Shared memory IPC on BMC

C++ 6 Updated Apr 27, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,392 1,209 Updated May 1, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 263 11 Updated Oct 11, 2024

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 912 58 Updated Apr 15, 2025

A PyTorch native library for large-scale model training

Python 3,652 351 Updated May 2, 2025
Python 69 4 Updated Dec 27, 2024

Sample Codes using NVSHMEM on Multi-GPU

8 Updated Jan 22, 2023

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,745 452 Updated Oct 9, 2023

torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.

Python 389 28 Updated Apr 27, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,767 285 Updated Apr 30, 2025

Monitor linux processes without root permissions

Go 5,392 540 Updated Jan 17, 2023

Making large AI models cheaper, faster and more accessible

Python 40,840 4,499 Updated May 2, 2025

Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang

Python 50 6 Updated Nov 8, 2024
Next
0