Stars
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[RA-L 2025 Accept without Revision] A stereo visual-inertial odometry system based on voxel map
The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[ICCV 2023] You Only Look at One Partial Sequence
A Unified Driving World Model for Future Generation and Perception
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
Official Repository for "HydraViT: Stacking Heads for a Scalable ViT" (NeurIPS'24)
MoBA: Mixture of Block Attention for Long-Context LLMs
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
《MATHEMATICS FOR MACHINE LEARNING》 一书的部分翻译。
Companion webpage to the book "Mathematics For Machine Learning"
⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.
Fast and memory-efficient exact attention
Development repository for the Triton language and compiler
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[AAAI 2025] Linear-complexity Visual Sequence Learning with Gated Linear Attention
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…
[CVPR 2025] MINIMA: Modality Invariant Image Matching
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)