Stars
Implementation of 'DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation' [CVPR 2022]
Implementation of 'Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization' [CVPR 2021]
Official implementations for paper: VACE: All-in-One Video Creation and Editing
Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
Official PyTorch implementation of SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
The official implementation of the paper "LEGION: Learning to Ground and Explain for Synthetic Image Detection"
FakeVLM: Advancing Synthetic Image Detection through Explainable Multimodal Models and Fine-Grained Artifact Analysis
This repo collects research papers that use AI tools and are in the field of scientific research (including computer science, agronomy, chemistry, physics, etc.). We call this method as Deep-Research.
The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Code for "Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation"
Janus-Series: Unified Multimodal Understanding and Generation Models
google-deepmind / streetlearn < 950A /h3>
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a TensorFlow implementation of goal-driven navigation agents solving the task published in “L…
Multimodal Large Language Models for Remote Sensing (RS-MLLMs): A Survey
[AAAI 2025]This repo contains evaluation code for the paper “UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios”
Awesome-Remote-Sensing-Vision-Language-Models
The official pytorch implementation of Exploring the Interactive Guidance for Unified and Effective Image Matting [Arxiv]
The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”
✨✨Latest Advances on Multimodal Large Language Models
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
This is the repo for the paper Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.
Awesome lists about framework figures in papers
[ECCV 2024] About The official implementation of the paper "Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network“.
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Official code for CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications"