8000 GitHub - cvlab-kaist/ZeroCo: CVPR 2025 (Highlight) : Official implementation of "Cross-View Completion Models are Zero-shot Correspondence Estimators"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CVPR 2025 (Highlight) : Official implementation of "Cross-View Completion Models are Zero-shot Correspondence Estimators"

License

Notifications You must be signed in to change notification settings

cvlab-kaist/ZeroCo

Repository files navigation

Cross-View Completion Models are
Zero-shot Correspondence Estimators

CVPR 2025 Highlight

Honggyu An1* · Jin Hyeon Kim2* · Seonghoon Park3 · Jaewoo Jung1
Jisang Han1 . Sunghwan Hong2† . Seungryong Kim1†

1KAIST    2Korea University    3Samgsung Electronics
*: Equal Contribution      †: Corresponding Author

ZeroCo is a zero-shot correspondence model that demonstrates the effectiveness of cross-attention maps, learned through cross-view completion training, in capturing correspondences.

🔍 Overview

In this work, we explore a novel perspective on cross-view completion learning by drawing an analogy to self-supervised correspondence learning. Through our analysis, we show that cross-attention maps in cross-view completion capture correspondences more effectively than correlations derived from encoder or decoder features.

This repository introduces ZeroCo, a zero-shot correspondence model designed to demonstrate that cross-attention maps encode rich correspondences. Additionally, we provide ZeroCo-Flow and ZeroCo-Depth, which extend ZeroCo for learning-based matching and multi-frame depth estimation, respectively.

🛠️ What to expect

  • Release Zeroco code
  • Release Zeroco-flow and Zeroco-depth code
  • Release pretrained weights

Environment

  • Create and activate conda environment with python 3.10.

    conda create -n ZeroCo python=3.10.15
    conda activate ZeroCo
  • Our code is developed based on pytorch 2.1.2 and CUDA 12.1. Please refer to the requirements.txt file to install the necessary dependencies.

    pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
    pip install -r requirements.txt
    
  • Create admin/local by running the following command and update the paths to the dataset

    python -c "from admin.environment import create_default_local_file; create_default_local_file()"
    

Evaluation Datasets

  • For the evaluation of the zero-shot correspondence task, we used the HPatches and ETH3D datasets.

  • You can proceed with the download and preprocessing using the following bash script.

    bash download_ETH3D.sh
    bash download_hpatches.sh

Prepare Pretrained Weights

  • Since we evaluate using a pretrained model for Cross-view Completion, it is necessary to download the pretrained weights.

  • The models currently implemented in our code are as follows. Please visit each repository to obtain the pretrained weights and download them into the ./pretrained_weights folder.

    • CroCo: Cross-view completion pretrained model (Our baseline).
    • DUSt3R: 3D pointmap regressor model based on CroCo.
    • MASt3R: Feature matching model based on CroCo and DUSt3R.
  • Additionally, you can directly evaluate models with the same architecture as DUSt3R, such as MonST3R.

Zero-shot Evaluation

The scripts folder contains multiple bash files for evaluating models on either the HPatches or ETH3D datasets. Most experiments were conducted on HPatches. For each model, you can perform zero-shot evaluation of geometric matching performance using one of three methods:

Available Methods

  1. Encoder Correlation: Uses encoder features to build a correlation
  2. Decoder Correlation: Uses decoder features to build a correlation
  3. Cross-Attention Maps: Uses cross-attention maps for correlation

For detailed explanations of each method, please refer to our paper.

Example Commands

# HPatches (Original Resolution) - CroCov2
bash scripts/run_hp_crocov2_Largebase.sh

# HPatches (240 Resolution) - CroCov2
bash scripts/run_hp240_crocov2_LargeBase.sh

# ETH3D - CroCov2 
bash scripts/run_eth3d_crocov2_LargeBase.sh
Script Configuration Details Each evaluation script contains several key parameters that can be customized:
# Example evaluation script
CUDA=0  # Specify GPU device rank
CUDA_VISIBLE_DEVICES=${CUDA} python -u eval_matching.py \
    --seed 2024                     # Random Seed Selection
    --dataset hp                    # Dataset (hp: HPatches, hp-240: HPatches (240x240), eth3d: ETH3D)
    --model_img_size 224 224        # CVC model's Input image dimensions
    --model crocov2                 # Model type [crocov1, crocov2, dust3r, mast3r]
    --pre_trained_models croco      # Pre-trained model type
    --croco_ckpt /path/to/croco/ckpts/CroCo_V2_ViTLarge_BaseDecoder.pth
    --output_mode ca_map            # Correlation method choose from [enc_feat, dec_feat, ca_map]
    --output_ca_map                 # Enable cross-attention map output
    --reciprocity                   # Enable reciprocal cross-attention map
    --save_dir /path/to/save/images/for/visualisation/  

🙏 Acknowledgements

This code is heavily based on DenseMatching, We highly appreciate the authors for their great work.

📚 Citation

If you found this code useful, please consider citing our paper.

@article{an2024cross,
  title={Cross-View Completion Models are Zero-shot Correspondence Estimators},
  author={An, Honggyu and Kim, Jinhyeon and Park, Seonghoon and Jung, Jaewoo and Han, Jisang and Hong, Sunghwan and Kim, Seungryong},
  journal={arXiv preprint arXiv:2412.09072},
  year={2024}
}

About

CVPR 2025 (Highlight) : Official implementation of "Cross-View Completion Models are Zero-shot Correspondence Estimators"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6

0