8000 MaoZiming (Ziming Mao) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View MaoZiming's full-sized avatar
🔭
🔭

Organizations

@Y-Hack @Yale-LILY @skypilot-org @berkeley-cs168 @Trinity-data-store @uccl-project

Block or report MaoZiming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Microsoft Collective Communication Library

C++ 351 32 Updated Sep 20, 2023

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

DeepSeek-V3/R1 inference performance simulator

Jupyter Notebook 154 21 Updated Mar 27, 2025

A High-Throughput Parallel Lossless Compressor for Scientific Data

C++ 70 14 Updated Jan 22, 2023

Expert Parallelism Load Balancer

Python 1,228 195 Updated Mar 24, 2025

llm-d is a Kubernetes-native high-performance distributed LLM inference framework

Makefile 1,322 105 Updated Jun 25, 2025

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,151 162 Updated Jun 5, 2025

Unified Collective Communication Library

C 259 112 Updated Jul 8, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 2,531 301 Updated Jul 8, 2025

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,387 472 Updated Jun 30, 2025

PyTorch Single Controller

Rust 303 47 Updated Jul 9, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 451 109 Updated Jul 9, 2025

GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.

Cuda 341 28 Updated Jun 18, 2025

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.

C++ 177 69 Updated Jul 9, 2025

Mellanox libibverbs

C++ 70 14 Updated Aug 28, 2019

Machnet provides applications like databases and finance an easy way to access low-latency DPDK-based messaging on public cloud VMs. 750K RPS on Azure at 61 us P99.9.

C++ 120 22 Updated Jan 28, 2025

RDMA core userspace libraries and daemons

C 1,856 759 Updated Jul 8, 2025

Lossless compressor of multidimensional floating-point arrays

C++ 114 16 Updated Jun 5, 2020

Collective communications library with various primitives for multi-machine training.

C++ 1,325 333 Updated Jun 17, 2025

Optimized primitives for collective multi-GPU communication

C++ 3,844 955 Updated Jun 18, 2025

ROCm Communication Collectives Library (RCCL)

C++ 346 159 Updated Jul 8, 2025

NCCL Tests

Cuda 1,174 294 Updated Jun 6, 2025
Python 110 13 Updated Oct 9, 2024

Intelligent Storage Acceleration Library

C 1,009 326 Updated Jun 6, 2025

Rust implementation of RaptorQ (RFC6330)

Rust 318 52 Updated May 1, 2025

Next-generation datacenter OS built on kernel bypass to speed up unmodified code while improving platform density and security

C++ 102 13 Updated May 29, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,262 841 Updated Jul 8, 2025

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 90,927 10,361 Updated Jul 8, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,526 298 Updated Jul 9, 2025
Next
0