Highlights
- Pro
Stars
Expert Kit is an efficient foundation of Expert Parallelism (EP) for MoE model Inference on heterogenous hardware
A low-latency, billion-scale, and updatable graph-based vector store on SSD.
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
High-speed and easy-use LLM serving framework for local deployment
Fast and memory-efficient exact attention
AutoMQ is a stateless/diskless Kafka on S3. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. Multi-AZ Availability.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Next-generation datacenter OS built on kernel bypass to speed up unmodified code while improving platform density and security
High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
Tools and Reference Code for Intel Optimizations (eg Large Pages)
Pacman: An Efficient Compaction Approach for Log-Structured Key-Value Store on Persistent Memory
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Dynamic (Temporal) Knowledge Graph Completion (Reasoning)
Deep Learning Zero to All - Pytorch