-
Huazhong University of Science and Technology
- Wuhan, China
Stars
Enhancing Low-Resource Relation Representations through Multi-View Decoupling (AAAI 2024))
[NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[CVPR 2025] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
[ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Models
Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
[arXiv '24] Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels
[AAAI 2025] Linear-complexity Visual Sequence Learning with Gated Linear Attention
Official code of "ViTGaze: Gaze Following with Interaction Features in Vision Transformers"
[arXiv'24] EVA-X: A foundation model for general chest X-ray analysis with self-supervised learning
Strong and Open Vision Language Assistant for Mobile Devices
[CVPR 2024] VCoder: Versatile Vision Encoders for Multimodal Large Language Models
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
3DV 2024: Fast High Dynamic Range Radiance Fields for Dynamic Scenes
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
[NeurIPS 2023] CircuitFormer: Circuit as Set of Points
Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation
A high-throughput and memory-efficient inference and serving engine for LLMs
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editin…