LucidSim: Learning Visual Parkour from Generated Images
We bring realistic and diverse visual data from generative models to classical physics simulators, enabling robots to learn highly dynamic tasks like parkour without requiring depth.
weaver
contains our text-to-image generation code. If you're looking for how to apply this to the simulated robot
environments (in
MuJoCo), please check out the lucidsim repo!
Alan Yu*1, Ge Yang*1,2,
Ran Choi1,
Yajvan Ravan1,
John Leonard1,
Phillip Isola1
1 MIT CSAIL,
2 Institute of AI and Fundamental Interactions (IAIFI)
* Indicates equal contribution
CoRL 2024
Table of Contents
conda create -n lucidsim python=3.10
conda activate lucidsim
For consistency, we recommend using this version of ComfyUI.
# Choose the CUDA version that your GPU supports. We will use CUDA 12.1
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --extra-index-url https://download.pytorch.org/whl/cu121
# Installing ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
git checkout ed2fa105ae29af6621232dd8ef622ff1e3346b3f
pip install -r requirements.txt
# Installing weaver module
git clone https://github.com/lucidsim/weaver.git weaver
cd weaver
pip install -e .
We recommend placing your models outside the ComfyUI
repo for better housekeeping. For this, you'll need to link your
model paths through a config file. Check out the configs
folder for a template, where you'll specify locations for
checkpoints, controlnets, and VAEs. For the provided three_mask_workflow
example, these are the models you'll need:
- SDXL Turbo 1.0: place
under
checkpoints
- SDXL Depth ControlNet: place under
controlnet
- SDXL VAE: place under
vae
After cloning this repository, you'll need to add ComfyUI to your $PYTHONPATH
and link your model paths. We recommend
managing these in a local .env
file. Then, link the config file you just created.
export PYTHONPATH=/path/to/ComfyUI:$PYTHONPATH
# See the `configs` folder for a template
export COMFYUI_CONFIG_PATH=/path/to/extra_model_paths.yaml
Weaver is organized by workflows. We include our main workflow called three_mask_workflow
, which generates an image
given a depth map along with three semantic masks, each coming with a different prompt (for example,
foreground/background/object).
We provide example conditioning images and prompts for three_mask_workflow
under the examples
folder, grouped by
scene. To try it out, use:
python weaver/scripts/demo_three_mask_workflow.py [--example-name] [--seed] [--save]
where example-name
corresponds to one of the scenes in the examples/three_mask_workflow
folder, and the save
flag
writes the output to the corresponding examples/three_mask_workflow/[example-name]/samples
folder. The script will
randomly select one of our provided prompts.
The graphical interface for ComfyUI is very helpful for designing your own workflows. Please see their documentation for
how to do this. By using this
helpful workflow to python conversion tool, you can script
your workflows as we've done with weaver/workflows/three_mask_workflow.py
.
In LucidSim, we use a distributed setup to generate images at scale. We utilize rendering nodes, launched independently
on many machines, that receive and fulfill rendering requests from the physics engine containing prompts and
conditioning images through a task queue (see Zaku). We hope to release setup
instructions for this in the future, but we have included weaver/render_node.py
for your reference.
If you find our work useful, please consider citing:
@inproceedings{yu2024learning,
title={Learning Visual Parkour from Generated Images},
author={Alan Yu and Ge Yang and Ran Choi and Yajvan Ravan and John Leonard and Phillip Isola},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
}