8000 StudyingShao (NVJiangShao) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View StudyingShao's full-sized avatar
😅
😅

Block or report StudyingShao

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

CUDA Templates for Linear Algebra Subroutines

C++ 2 Updated Apr 1, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,654 1,258 Updated Jun 7, 2025

A PyTorch Toolbox for Grouped GEMM in MoE Model Training

5 1 Updated May 28, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,464 428 Updated Jun 7, 2025

The Triton TensorRT-LLM Backend

Shell 846 122 Updated Jun 5, 2025

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 125 39 Updated Jan 2, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 3 Updated May 19, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,679 1,481 Updated Jun 8, 2025

A plugin to use Nvidia GPU in PySCF package

Cuda 204 37 Updated Jun 7, 2025

Main Web Site (Online Books)

HTML 9,517 923 Updated Apr 28, 2025

面向开发人员梳理的代码安全指南

13,447 1,947 Updated Mar 20, 2023
0