8000 GitHub - PallottaEnrico/SyncVP: [CVPR'25] SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

PallottaEnrico/SyncVP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SyncVP

Official implementation of CVPR 2025 paper:

"SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction"

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall

arXiv Project Page Badge Project Page Badge

How to

Training

To train your model you can use predefined config files or define custom ones.

You will need to follow the next steps:

1. Train autoencoder (ideally one per modality)

python3 main.py --config configs/run/train/ae_city_rgb.yaml

(Optional) GAN fine-tuning: A few iterations may increase autoencoder reconstruction performance.

2. Train a single modality diffusion model

This can already be used as a standalone model for video prediction.

python3 main.py --config configs/run/train/ddpm_city_rgb.yaml

3. Train a Multi-modal diffusion model (SyncVP).

You can either initialize this with pre-trained modality specific diffusion models or train it from scratch, we recommend the first option as discussed in the paper.

python3 main.py --config configs/run/train/sync_city.yaml

Evaluation

python3 main.py --config configs/run/eval/sync_city.yaml

Model checkpoints

Cityscapes autoencoders and multi-modal model checkpoints can be downloaded using:

bash download.sh

Datasets

Preprocessed version of Cityscapes at 128x128 resolution with disparity (depth) maps can be downloaded here.

📋 TODO List

  • Non 1:1 aspect ratio implementation
  • Full evaluation code release
  • Training code released

Cite

@InProceedings{Pallotta_2025_CVPR,
    author    = {Pallotta, Enrico and Azar, Sina Mokhtarzadeh and Li, Shuai and Zatsarynna, Olga and Gall, Juergen},
    title     = {SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {13787-13797}
}

Reference

This repository is mainly based on the PVDM codebase.

About

[CVPR'25] SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0