Stars
Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Code for "Multi-view Reconstruction via SfM-guided Monocular Depth Estimation". CVPR 2025 (Oral Presentation)
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
[OpenPAR] An open-source framework for Pedestrian Attribute Recognition, based on PyTorch
Collect the awesome works evolved around reasoning models like O1/R1 in visual domain
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull request…
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
High-Resolution 3D Human Digitization from A Single Image.
The official code for "ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations" presented at CVPR 2022, along with its extended version ImFace++.
3D version of the MNIST database of handwritten digits
[MICCAI 2024] Easy diffusion models (optionally with segmentation guidance) for medical images and beyond.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Automatic segmentation of CBCT scans with a 3D Unet
Janus-Series: Unified Multimodal Understanding and Generation Models
Diffusion Models in Medical Imaging (Published in Medical Image Analysis Journal)
A collection of resources on applications of multi-modal learning in medical imaging.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.