GitHub - facebookresearch/pippo: Pippo: High-Resolution Multi-View Humans from a Single Image

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)

Yash Kant^1,2,3 · Ethan Weber^1,4 · Jin Kyu Kim¹ · Rawal Khirodkar¹ · Su Zhaoen¹ · Julieta Martinez¹
Igor Gilitschenski*^2,3 · Shunsuke Saito*¹ · Timur Bagautdinov*¹

* Joint Advising

¹ Meta Reality Labs · ² University of Toronto · ³ Vector Institute · ⁴ UC Berkeley

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo. Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image.

This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

Setup code

Clone and add repository to your path:

git clone git@github.com:facebookresearch/pippo.git
cd pippo
export PATH=$PATH:$PWD

Prerequisites and Dependencies

conda create -n pippo python=3.10.1 -c conda-forge
conda activate pippo

# can adjust as required (we tested on below configuration)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia

pip install -r requirements.txt

Download and Sample Training

You can launch a sample training run on few samples of Ava-256 dataset. We provide pre-packaged samples for this training stored as npy files here. Ensure you are authenticated to huggingface with login token to download the samples.

# download packaged Ava-256 samples
python scripts/pippo/download_samples.py

We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in config/full/ directory.

# launch training (tested on single A100 GPU 80GB): full sized model
 python train.py config/full/128_4v.yml

Additionally, we provide a tiny model config to train on a smaller gpu:

# launch training (tested on single T4 GPU 16GB): tiny model
python train.py config/tiny/128_4v_tiny.yml

Training on custom dataset (see #9):

You will have to prepare your dataset similar to the provided Ava-256 samples stored in numpy files on your custom dataset.

The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) in this gist here. You can refer these methods to create these fields on your own custom datasets!

Re-projection Error

To compute the re-projection error between generated images and ground truth images, run the following command:

python scripts/pippo/reprojection_error.py

Useful Pointers

Here is a list of useful things to borrow from this codebase:

ControlMLP to inject spatial control in Diffusion Transformers: see here
Attention Biasing to run inference on 5x longer sequences: see here
Re-projection Error Metric: see here

Todos

We plan to add and update the following in the future:

Cleaning up fluff in pippo.py and dit.py
Inference script for pretrained models.

License

See LICENSE file for details.

Citation

If you benefit from this codebase, consider citing our work:

@article{Kant2024Pippo,
  title={Pippo: High-Resolution Multi-View Humans from a Single Image},
  author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov},
  year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
config		config
latent_diffusion		latent_diffusion
scripts/pippo		scripts/pippo
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
add_headers.py		add_headers.py
common.py		common.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)

This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

Setup code

Prerequisites and Dependencies

Download and Sample Training

Training on custom dataset (see #9):

Re-projection Error

Useful Pointers

Todos

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

facebookresearch/pippo

Folders and files

Latest commit

History

Repository files navigation

Pippo: High-Resolution Multi-View Humans from a Single Image

CVPR, 2025 (Highlight)

This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

Setup code

Prerequisites and Dependencies

Download and Sample Training

Training on custom dataset (see #9):

Re-projection Error

Useful Pointers

Todos

License

Citation

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages