nox-410

Yining Shi nox-410

Working on AI compiler at Nvidia; Peking University

43 followers · 10 following

Nvidia
Shanghai

Achievements

Stars

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,224 94 Updated May 31, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,839 1,488 Updated Apr 24, 2025

gem5 / gem5

The official repository for the gem5 computer-system architecture simulator.

C++ 2,019 1,424 Updated May 31, 2025

Cambricon / triton-linalg

Development repository for the Triton-Linalg conversion

C++ 189 21 Updated Feb 7, 2025

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,667 917 Updated Jul 1, 2024

microsoft / autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 45,332 6,861 Updated May 30, 2025

nox-410 / tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 50 2 Updated Jul 23, 2024

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,411 347 Updated Mar 19, 2025

nox-410 / cutlass

Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 2 2 Updated Aug 24, 2023

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 15,730 2,006 Updated May 31, 2025

Kensuke-Hinata / statistic

collecting books, papers and docs.

2,761 1,298 Updated Oct 24, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,602 1,710 Updated May 22, 2025

nox-410 / Welder

OSDI 2023 Welder, deeplearning compiler

Python 19 5 Updated Nov 24, 2023

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 400 188 Updated May 31, 2025

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 986 166 Updated Sep 19, 2024

PKU-DAIR / Hetu

Forked from Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 307 34 Updated Apr 21, 2025

Hsword / Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu

Python 112 51 Updated Dec 18, 2023

PKU-DAIR / open-box

Forked from thomas-young-2013/open-box

Generalized and Efficient 99F9 Blackbox Optimization System

Python 407 55 Updated Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yining Shi nox-410

Achievements

Achievements

Block or report nox-410

Stars

tile-ai / tilelang

Jiayi-Pan / TinyZero

gem5 / gem5

Cambricon / triton-linalg

karpathy / minbpe

microsoft / autogen

nox-410 / tvm.tl

BBuf / tvm_mlir_learn

nox-410 / cutlass

triton-lang / triton

Kensuke-Hinata / statistic

Dao-AILab / flash-attention

nox-410 / Welder

ROCm / composable_kernel

microsoft / nnfusion

PKU-DAIR / Hetu

Hsword / Hetu

PKU-DAIR / open-box

KarypisLab / METIS

NVIDIA / thrust

facebookresearch / shaDow_GNN

chenshuo / muduo

pybind / pybind11

wubinzzu / NeuRec

qixianyu-buaa / EigenChineseDocument

bytedance / byteps

snap-stanford / ogb

ryandougherty / Introduction-to-the-Theory-of-Computation-Solutions

eugeneyan / applied-ml

GraphSAINT / GraphSAINT