-
Nanyang Technological University
- Singapore
- https://xiaoaoran.github.io
Stars
SARLANG-1M is a large-scale benchmark tailored for multimodal SAR image understanding, with a primary focus on integrating SAR with textual modality.
[IEEE GRSS DFC 2025 Track II] BRIGHT: A globally distributed multimodal VHR dataset for all-weather disaster response
[IEEE TGRS 2024] ChangeMamba: Remote Sensing Change Detection Based on Spatio-Temporal State Space Model
Official repo for "Foundation Models for Remote Sensing and Earth Observation: A Survey"
[NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
The official implementation of "Segment Anything with Multiple Modalities".
[ECCV 2024 Oral] The official implementation of "CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model".
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Make your models invariant to changes in scale.
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
An open source implementation of CLIP.
[NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation
ImageBind One Embedding Space to Bind Them All
Align 3D Point Cloud with Multi-modalities for Large Language Models
Per-Pixel Classification is Not All You Need for Semantic Segmentation (NeurIPS 2021, spotlight)
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Painter & SegGPT Series: Vision Foundation Models from BAAI
Segment Anything in High Quality [NeurIPS 2023]