8000 nox-410 (Yining Shi) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View nox-410's full-sized avatar
  • Nvidia
  • Shanghai

Block or report nox-410

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,224 94 Updated May 31, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,839 1,488 Updated Apr 24, 2025

The official repository for the gem5 computer-system architecture simulator.

C++ 2,019 1,424 Updated May 31, 2025

Development repository for the Triton-Linalg conversion

C++ 189 21 Updated Feb 7, 2025

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,667 917 Updated Jul 1, 2024

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 45,332 6,861 Updated May 30, 2025

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 50 2 Updated Jul 23, 2024

compiler learning resources collect.

Python 2,411 347 Updated Mar 19, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 2 2 Updated Aug 24, 2023

Development repository for the Triton language and compiler

MLIR 15,730 2,006 Updated May 31, 2025

collecting books, papers and docs.

2,761 1,298 Updated Oct 24, 2024

Fast and memory-efficient exact attention

Python 17,602 1,710 Updated May 22, 2025

OSDI 2023 Welder, deeplearning compiler

Python 19 5 Updated Nov 24, 2023

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 400 188 Updated May 31, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 986 166 Updated Sep 19, 2024

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 307 34 Updated Apr 21, 2025

A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu

Python 112 51 Updated Dec 18, 2023

Generalized and Efficient 99F9 Blackbox Optimization System

Python 407 55 Updated Oct 17, 2024

METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering

C 850 170 Updated Oct 27, 2023

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,965 762 Updated Feb 8, 2024

[NeurIPS 2021]: Improve the GNN expressivity and scalability by decoupling the depth and receptive field of state-of-the-art GNN architectures

Python 134 15 Updated Mar 18, 2022

Event-driven network library for multi-threaded Linux server in C++11

C++ 15,381 5,254 Updated Feb 28, 2025

Seamless operability between C++11 and Python

C++ 16,676 2,181 Updated May 30, 2025

Next RecSys Library

Python 1,057 220 Updated Mar 24, 2023

This is my translation of Chinese document of Eigen

C++ 915 198 Updated Aug 15, 2023

A high performance and generic framework for distributed DNN training

Python 3,681 492 Updated Oct 3, 2023

Benchmark datasets, data loaders, and evaluators for graph machine learning

Python 2,008 407 Updated May 6, 2025

Solutions to Michael Sipser's Introduction to the Theory of Computation Book (3rd Edition).

TeX 357 143 Updated Oct 12, 2021

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

27,969 3,756 Updated Jul 18, 2024

[ICLR 2020; IPDPS 2019] Fast and accurate minibatch training for deep GNNs and large graphs (GraphSAINT: Graph Sampling Based Inductive Learning Method).

Python 485 87 Updated Aug 12, 2022
Next
0