Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Minimal reproduction of DeepSeek R1-Zero
The official repository for the gem5 computer-system architecture simulator.
Development repository for the Triton-Linalg conversion
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
nox-410 / cutlass
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
Development repository for the Triton language and compiler
Fast and memory-efficient exact attention
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
PKU-DAIR / Hetu
Forked from Hsword/HetuA high-performance distributed deep learning system targeting large-scale and automated distributed training.
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, please visit/star/fork https://github.com/PKU-DAIR/Hetu
PKU-DAIR / open-box
Forked from thomas-young-2013/open-boxGeneralized and Efficient 99F9 Blackbox Optimization System
METIS - Serial Graph Partitioning and Fill-reducing Matrix Ordering
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
[NeurIPS 2021]: Improve the GNN expressivity and scalability by decoupling the depth and receptive field of state-of-the-art GNN architectures
Event-driven network library for multi-threaded Linux server in C++11
Seamless operability between C++11 and Python
This is my translation of Chinese document of Eigen
A high performance and generic framework for distributed DNN training
Benchmark datasets, data loaders, and evaluators for graph machine learning
Solutions to Michael Sipser's Introduction to the Theory of Computation Book (3rd Edition).
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
[ICLR 2020; IPDPS 2019] Fast and accurate minibatch training for deep GNNs and large graphs (GraphSAINT: Graph Sampling Based Inductive Learning Method).