- [2025.4.30] π₯π₯π₯ We have released our Panoramic Animator model and the inference code of the whole pipeline HoloTime. Welcome to download it from Huggingface and have a try!
We propose HoloTime, a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image, along with a 360-degree 4D scene reconstruction method that seamlessly transforms the generated panoramic video into 4D assets, enabling a fully immersive 4D experience for users.
Panorama | 4D Scene |
---|---|
|
ocean2.mp4 |
|
cyberpunk.mp4 |
|
temple.mp4 |
Panorama | Panoramic Video |
---|---|
|
car.mp4 |
|
aurora.mp4 |
|
fire.mp4 |
|
firework.mp4 |
git clone https://github.com/PKU-YuanGroup/HoloTime --recursive
cd HoloTime
conda create -n holotime python=3.10 -y
conda activate holotime
conda install -c nvidia cuda-toolkit=12.4 -y
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
After installation, please follow the instructions provided here to modify a few lines of some libraries.
- The input directory structure should be like:
π¦ input/
βββ πΌοΈ --panorama1.png
βββ πΌοΈ --panorama2.png
βββ πΌοΈ --panorama3.png
βββ ...
βββ π --text_prompts.txt
The txt file contains text descriptions of panoramas, with each line corresponding to one panorama, sorted according to the natural sort order of the png filenames. You can use the text-driven panorama generation models (PanFusion or FLUX) to create input data, or you can use the files we provide.
-
Download the Panoramic Animator model from Huggingface and then put the checkpoint in the checkpoints/holotime directory. (optional as it can be done automatically)
-
Run the following command.
sh run_animator.sh
Panoramic Animator need 24GB GPU memory. VEnhancer need 80GB GPU memory for super resolution and frame interpolation. (optional)
After generating the panoramic video, you can transform the panoramic video into 4D scene by running the following command.
sh run_reconstruction.sh
Reconstruction from refinement video need 24GB GPU memory. Reconstruction from enhancement video need 48GB GPU memory.
Run the following command.
sh run_render.sh
We provide some preset trajectories here.
Special thanks to DynamiCrafter, 360DVD, VEnhancer, DreamScene360 and Spacetime Gaussian for codebase and pre-trained weights.
@misc{zhou2025holotimetamingvideodiffusion,
title={HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation},
author={Haiyang Zhou and Wangbo Yu and Jiawen Guan and Xinhua Cheng and Yonghong Tian and Li Yuan},
year={2025},
eprint={2504.21650},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.21650},
}