8000 GitHub - xizaoqu/WorldMem: [ArXiv 2025] WorldMem: Long-term Consistent World Simulation with Memory
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

xizaoqu/WorldMem

Repository files navigation


WORLDMEM Icon

WorldMem: Long-term Consistent World Simulation
with Memory

Zeqi Xiao1 Yushi Lan1 Yifan Zhou1 Wenqi Ouyang1 Shuai Yang2 Yanhong Zeng3 Xingang Pan1
1S-Lab, Nanyang Technological University,
2Wangxuan Institute of Computer Technology, Peking University,
3Shanghai AI Laboratry

demo.1.1.mp4

Installation

conda create python=3.10 -n worldmem
conda activate worldmem
pip install -r requirements.txt
conda install -c conda-forge ffmpeg=4.3.2

Quick start

python app.py

Training and Inference

To enable cloud logging with Weights & Biases (wandb), follow these steps:

  1. Sign up for a wandb account.

  2. Run the following command to log in:

    wandb login
  3. Open configurations/training.yaml and set the entity and project field to your wandb username.


Training

Download pretrained weights from Oasis.

Training the model on 4 H100 GPUs, it converges after approximately 500K steps. We observe that gradually increasing task difficulty improves performance. Thus, we adopt a multi-stage training strategy: ,

sh train_stage_1.sh   # Small range, no vertical turning
sh train_stage_2.sh   # Large range, no vertical turning
sh train_stage_3.sh   # Large range, with vertical turning

To resume training from a previous checkpoint, configure the resume and output_dir variables in the corresponding .sh script.


Inference

To run inference:

sh infer.sh

You can either load the diffusion model and VAE separately:

+diffusion_model_path=yslan/worldmem_checkpoints/diffusion_only.ckpt \
+vae_path=yslan/worldmem_checkpoints/vae_only.ckpt \
+customized_load=true \
+seperate_load=true \

Or load a combined checkpoint:

+load=your_model_path \
+customized_load=true \
+seperate_load=false \

Dataset

Download the Minecraft dataset from Hugging Face

Place the dataset in the following directory structure:

data/
└── minecraft/
    ├── training/
    └── validation/

TODO

  • Release inference models and weights;
  • Release training pipeline on Minecraft;
  • Release training data on Minecraft;

🔗 Citation

If you find our work helpful, please cite:

@misc{xiao2025worldmemlongtermconsistentworld,
      title={WORLDMEM: Long-term Consistent World Simulation with Memory}, 
      author={Zeqi Xiao and Yushi Lan and Yifan Zhou and Wenqi Ouyang and Shuai Yang and Yanhong Zeng and Xingang Pan},
      year={2025},
      eprint={2504.12369},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.12369}, 
}

👏 Acknowledgements

  • Diffusion Forcing: Diffusion Forcing provides flexible training and inference strategies for our methods.
  • Minedojo: We collect our Minecraft dataset from Minedojo.
  • Open-oasis: Our model architecture is based on Open-oasis. We also use pretrained VAE and DiT weight from it.

About

[ArXiv 2025] WorldMem: Long-term Consistent World Simulation with Memory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0