8000 GitHub - facebookresearch/pippo: Pippo: High-Resolution Multi-View Humans from a Single Image
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Pippo: High-Resolution Multi-View Humans from a Single Image

License

Notifications You must be signed in to change notification settings

facebookresearch/pippo

Repository files navigation

Pippo: High-Resolution Multi-View Humans from a Single Image

Project Page Paper PDF Spaces Visuals (Drive)

CVPR, 2025 (Highlight)

Pippo

Yash Kant1,2,3 · Ethan Weber1,4 · Jin Kyu Kim1 · Rawal Khirodkar1 · Su Zhaoen1 · Julieta Martinez1
Igor Gilitschenski*2,3 · Shunsuke Saito*1 · Timur Bagautdinov*1

* Joint Advising

1 Meta Reality Labs · 2 University of Toronto · 3 Vector Institute · 4 UC Berkeley

We present Pippo, a generative model capable of producing 1K resolution dense turnaround videos of a person from a single casually clicked photo. Pippo is a multi-view diffusion transformer and does not require any additional inputs — e.g., a fitted parametric model or camera parameters of the input image.

This is a code-only release without pre-trained weights. We provide models, configs, inference, and sample training code on Ava-256.

Setup code

Clone and add repository to your path:

git clone git@github.com:facebookresearch/pippo.git
cd pippo
export PATH=$PATH:$PWD

Prerequisites and Dependencies

conda create -n pippo python=3.10.1 -c conda-forge
conda activate pippo

# can adjust as required (we tested on below configuration)
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.0 -c pytorch -c nvidia

pip install -r requirements.txt

Download and Sample Training

You can launch a sample training run on few samples of Ava-256 dataset. We provide pre-packaged samples for this training stored as npy files here. Ensure you are authenticated to huggingface with login token to download the samples.

# download packaged Ava-256 samples
python scripts/pippo/download_samples.py

We provide exact model configs to train Pippo models at different resolutions of 128, 512, and 1024 placed in config/full/ directory.

# launch training (tested on single A100 GPU 80GB): full sized model
 python train.py config/full/128_4v.yml

Additionally, we provide a tiny model config to train on a smaller gpu:

# launch training (tested on single T4 GPU 16GB): tiny model
python train.py config/tiny/128_4v_tiny.yml

Training on custom dataset (see #9):

You will have to prepare your dataset similar to the provided Ava-256 samples stored in numpy files on your custom dataset.

The difficult bits could be to create the Plucker Rays and Spatial Anchor images, and we have provided our implementations for those methods (using Ava-256 and Goliath data) in this gist here. You can refer these methods to create these fields on your own custom datasets!

Re-projection Error

To compute the re-projection error between generated images and ground truth images, run the following command:

python scripts/pippo/reprojection_error.py

Useful Pointers

Here is a list of useful things to borrow from this codebase:

  • ControlMLP to inject spatial control in Diffusion Transformers: see here
  • Attention Biasing to run inference on 5x longer sequences: see here
  • Re-projection Error Metric: see here

Todos

We plan to add and update the following in the future:

  • Cleaning up fluff in pippo.py and dit.py
  • Inference script for pretrained models.

License

See LICENSE file for details.

Citation

If you benefit from this codebase, consider citing our work:

@article{Kant2024Pippo,
  title={Pippo: High-Resolution Multi-View Humans from a Single Image},
  author={Yash Kant and Ethan Weber and Jin Kyu Kim and Rawal Khirodkar and Su Zhaoen and Julieta Martinez and Igor Gilitschenski and Shunsuke Saito and Timur Bagautdinov},
  year={2025},
}

About

Pippo: High-Resolution Multi-View Humans from a Single Image

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages

0