Yineng Zhang zhyncs

🔭 Principal AI Researcher at Stealth Startup
💼 Inference Lead for SGLang at LMSYS, working closely with Lianmin Zheng and Ying Sheng to co-lead the project. Responsible for releases, optimization, and roadmap. Led major version development and blog posts, including Llama 3, DeepSeek V3, Large Scale EP, and GB200 NVL72. Co-authored the FlashInfer paper (MLSys 2025 Best Paper). Committer for FlashInfer and LMDeploy. Previously Lead Software Engineer at Baseten, co-authored the DeepSeek V3 and Qwen 3 launch blogs and The Baseten Inference Stack ebook. Earlier at Meituan, led CTR GPU inference and vector retrieval system development, and co-authored the QQQ paper (ICLR 2025 Workshop).
👀 Check out my talk on SGLang at GPU MODE, CAMEL-AI Hackathon, CUDA Tech Briefing at NVIDIA GTC 2025, AI Engineer World's Fair 2025
🚀 DeepSeek V3 Related: SGLang Day One Support, Latent Space Podcast, The New York Times First Article, Second Article
📫 Contact: me@zhyncs.com | Telegram
📄 More: LinkedIn | Homepage
🙌 The best way to contact me is via the SGLang Slack. We're looking for open-source enthusiasts and learners to help grow the SGLang project and community.

Provide feedback