- π Principal AI Researcher at Stealth Startup
- πΌ Inference Lead for SGLang at LMSYS, working closely with Lianmin Zheng and Ying Sheng to co-lead the project. Responsible for releases, optimization, and roadmap. Led major version development and blog posts, including Llama 3, DeepSeek V3, Large Scale EP, and GB200 NVL72. Co-authored the FlashInfer paper (MLSys 2025 Best Paper). Committer for FlashInfer and LMDeploy. Previously Lead Software Engineer at Baseten, co-authored the DeepSeek V3 and Qwen 3 launch blogs and The Baseten Inference Stack ebook. Earlier at Meituan, led CTR GPU inference and vector retrieval system development, and co-authored the QQQ paper (ICLR 2025 Workshop).
- π Check out my talk on SGLang at GPU MODE, CAMEL-AI Hackathon, CUDA Tech Briefing at NVIDIA GTC 2025, AI Engineer World's Fair 2025
- π DeepSeek V3 Related: SGLang Day One Support, Latent Space Podcast, The New York Times First Article, Second Article
- π« Contact: me@zhyncs.com | Telegram
- π More: LinkedIn | Homepage
- π The best way to contact me is via the SGLang Slack. We're looking for open-source enthusiasts and learners to help grow the SGLang project and community.
Pinned Loading
-
sgl-project/sglang
sgl-project/sglang PublicSGLang is a fast serving framework for large language models and vision language models.
-
flashinfer-ai/flashinfer
flashinfer-ai/flashinfer PublicFlashInfer: Kernel Library for LLM Serving
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.