WorldMem: Long-term Consistent World Simulation
with Memory

Zeqi Xiao¹ Yushi Lan¹ Yifan Zhou¹ Wenqi Ouyang¹ Shuai Yang² Yanhong Zeng³ Xingang Pan¹
¹S-Lab, Nanyang Technological University,
²Wangxuan Institute of Computer Technology, Peking University,
³Shanghai AI Laboratry

demo.1.1.mp4

Installation

conda create python=3.10 -n worldmem
conda activate worldmem
pip install -r requirements.txt
conda install -c conda-forge ffmpeg=4.3.2

Quick start

python app.py

Training and Inference

To enable cloud logging with Weights & Biases (wandb), follow these steps:

Sign up for a wandb account.
Run the following command to log in:
```
wandb login
```
Open configurations/training.yaml and set the entity and project field to your wandb username.

Training

Download pretrained weights from Oasis.

Training the model on 4 H100 GPUs, it converges after approximately 500K steps. We observe that gradually increasing task difficulty improves performance. Thus, we adopt a multi-stage training strategy: ,

sh train_stage_1.sh   # Small range, no vertical turning
sh train_stage_2.sh   # Large range, no vertical turning
sh train_stage_3.sh   # Large range, with vertical turning

To resume training from a previous checkpoint, configure the resume and output_dir variables in the corresponding .sh script.

Inference

To run inference:

sh infer.sh

You can either load the diffusion model and VAE separately:

+diffusion_model_path=yslan/worldmem_checkpoints/diffusion_only.ckpt \
+vae_path=yslan/worldmem_checkpoints/vae_only.ckpt \
+customized_load=true \
+seperate_load=true \

Or load a combined checkpoint:

+load=your_model_path \
+customized_load=true \
+seperate_load=false \

Dataset

Download the Minecraft dataset from Hugging Face

Place the dataset in the following directory structure:

data/
└── minecraft/
    ├── training/
    └── validation/

TODO

Release inference models and weights;
Release training pipeline on Minecraft;
Release training data on Minecraft;

🔗 Citation

If you find our work helpful, please cite:

@misc{xiao2025worldmemlongtermconsistentworld,
      title={WORLDMEM: Long-term Consistent World Simulation with Memory}, 
      author={Zeqi Xiao and Yushi Lan and Yifan Zhou and Wenqi Ouyang and Shuai Yang and Yanhong Zeng and Xingang Pan},
      year={2025},
      eprint={2504.12369},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.12369}, 
}

👏 Acknowledgements

Diffusion Forcing: Diffusion Forcing provides flexible training and inference strategies for our methods.
Minedojo: We collect our Minecraft dataset from Minedojo.
Open-oasis: Our model architecture is based on Open-oasis. We also use pretrained VAE and DiT weight from it.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
algorithms		algorithms
assets		assets
configurations		configurations
datasets		datasets
experiments		experiments
utils		utils
.gitattributes		.gitattributes
LICENSE.md		LICENSE.md
README.md		README.md
app.py		app.py
infer.sh		infer.sh
main.py		main.py
requirements.txt		requirements.txt
train_stage_1.sh		train_stage_1.sh
train_stage_2.sh		train_stage_2.sh
train_stage_3.sh		train_stage_3.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WorldMem: Long-term Consistent World Simulation
with Memory

Installation

Quick start

Training and Inference

Training

Inference

Dataset

TODO

🔗 Citation

👏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

xizaoqu/WorldMem

Folders and files

Latest commit

History

Repository files navigation

WorldMem: Long-term Consistent World Simulation with Memory

Installation

Quick start

Training and Inference

Training

Inference

Dataset

TODO

🔗 Citation

👏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

WorldMem: Long-term Consistent World Simulation
with Memory

Packages