-
UC Berkeley
- Berkeley, CA
-
21:29
(UTC -07:00) - https://maoziming.github.io/
- @ziming_mao
- in/maoziming
Stars
Analyze computation-communication overlap in V3/R1.
DeepSeek-V3/R1 inference performance simulator
A High-Throughput Parallel Lossless Compressor for Scientific Data
llm-d is a Kubernetes-native high-performance distributed LLM inference framework
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Supercharge Your LLM with the Fastest KV Cache Layer
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Machnet provides applications like databases and finance an easy way to access low-latency DPDK-based messaging on public cloud VMs. 750K RPS on Azure at 61 us P99.9.
Lossless compressor of multidimensional floating-point arrays
Collective communications library with various primitives for multi-machine training.
Optimized primitives for collective multi-GPU communication
Next-generation datacenter OS built on kernel bypass to speed up unmodified code while improving platform density and security
DeepEP: an efficient expert-parallel communication library
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.