Hello! I am using Github.
Stars
2
stars
written in C++
Clear filter
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention