SyncVP

Official implementation of CVPR 2025 paper:

"SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction"

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall

How to

Training

To train your model you can use predefined config files or define custom ones.

You will need to follow the next steps:

1. Train autoencoder (ideally one per modality)

python3 main.py --config configs/run/train/ae_city_rgb.yaml

(Optional) GAN fine-tuning: A few iterations may increase autoencoder reconstruction performance.

2. Train a single modality diffusion model

This can already be used as a standalone model for video prediction.

python3 main.py --config configs/run/train/ddpm_city_rgb.yaml

3. Train a Multi-modal diffusion model (SyncVP).

You can either initialize this with pre-trained modality specific diffusion models or train it from scratch, we recommend the first option as discussed in the paper.

python3 main.py --config configs/run/train/sync_city.yaml

Evaluation

python3 main.py --config configs/run/eval/sync_city.yaml

Model checkpoints

Cityscapes autoencoders and multi-modal model checkpoints can be downloaded using:

bash download.sh

Datasets

Preprocessed version of Cityscapes at 128x128 resolution with disparity (depth) maps can be downloaded here.

📋 TODO List

Non 1:1 aspect ratio implementation
Full evaluation code release
Training code released

Cite

@InProceedings{Pallotta_2025_CVPR,
    author    = {Pallotta, Enrico and Azar, Sina Mokhtarzadeh and Li, Shuai and Zatsarynna, Olga and Gall, Juergen},
    title     = {SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction},
    booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {13787-13797}
}

Reference

This repository is mainly based on the PVDM codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
configs		configs
evals		evals
exps		exps
losses		losses
models		models
tools		tools
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SyncVP

How to

Training

1. Train autoencoder (ideally one per modality)

2. Train a single modality diffusion model

3. Train a Multi-modal diffusion model (SyncVP).

Evaluation

Model checkpoints

Datasets

📋 TODO List

Cite

Reference

About

Uh oh!

Releases

Packages

Languages

License

PallottaEnrico/SyncVP

Folders and files

Latest commit

History

Repository files navigation

SyncVP

How to

Training

1. Train autoencoder (ideally one per modality)

2. Train a single modality diffusion model

3. Train a Multi-modal diffusion model (SyncVP).

Evaluation

Model checkpoints

Datasets

📋 TODO List

Cite

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages