DiffusionSfM

This repository contains the official implementation for DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion. The paper has been accepted to CVPR 2025.

Project Page | arXiv |

News

2025.05.04: Initial code release.

Introduction

tl;dr Given a set of multi-view images, DiffusionSfM represents scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame. It learns a denoising diffusion model to infer these elements directly from multi-view inputs.

Install

Clone DiffusionSfM:

git clone https://github.com/QitaoZhao/DiffusionSfM.git
cd DiffusionSfM

Create the environment and install packages:

conda create -n diffusionsfm python=3.9
conda activate diffusionsfm

# enable nvcc
conda install -c conda-forge cudatoolkit-dev

### torch
# CUDA 11.7
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

pip install -r requirements.txt

### pytorch3D
# CUDA 11.7
conda install https://anaconda.org/pytorch3d/pytorch3d/0.7.7/download/linux-64/pytorch3d-0.7.7-py39_cu117_pyt201.tar.bz2

# xformers
conda install xformers -c xformers

Tested on:

Springdale Linux 8.6 with torch 2.0.1 & CUDA 11.7 on A6000 GPUs.

Note: If you encounter the error

ImportError: .../libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

when importing PyTorch, refer to this related issue or try installing Intel MKL explicitly with:

conda install mkl==2024.0

Run Demo

(1) Try the Online Demo

Check out our interactive demo on Hugging Face:

👉 DiffusionSfM Demo

(2) Run the Gradio Demo Locally

Download the model weights manually from Hugging Face:

from huggingface_hub import hf_hub_download

filepath = hf_hub_download(repo_id="qitaoz/DiffusionSfM", filename="qitaoz/DiffusionSfM")

or Google Drive:

gdown https://drive.google.com/uc\?id\=1NBdq7A1QMFGhIbpK1HT3ATv2S1jXWr2h
unzip models.zip

Next run the demo like so:

# first-time running may take a longer time
python gradio_app.py

You can run our model in two ways:

Upload Images — Upload your own multi-view images above.
Use a Preprocessed Example — Select one of the pre-collected examples below.

Training

Set up wandb:

wandb login

See docs/train.md for more detailed instructions on training.

Evaluation

See docs/eval.md for instructions on how to run evaluation code.

Acknowledgments

This project builds upon RayDiffusion. Amy Lin and Jason Y. Zhang developed the initial codebase during the early stages of this project.

Cite DiffusionSfM

If you find this code helpful, please cite:

@inproceedings{zhao2025diffusionsfm,
  title={DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion}, 
  author={Qitao Zhao and Amy Lin and Jeff Tan and Jason Y. Zhang and Deva Ramanan and Shubham Tulsiani},
  booktitle={CVPR},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
conf		conf
data/demo		data/demo
diffusionsfm		diffusionsfm
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gradio_app.py		gradio_app.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffusionSfM

News

Introduction

Install

Run Demo

(1) Try the Online Demo

(2) Run the Gradio Demo Locally

Training

Evaluation

Acknowledgments

Cite DiffusionSfM

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

QitaoZhao/DiffusionSfM

Folders and files

Latest commit

History

Repository files navigation

DiffusionSfM

News

Introduction

Install

Run Demo

(1) Try the Online Demo

(2) Run the Gradio Demo Locally

Training

Evaluation

Acknowledgments

Cite DiffusionSfM

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages