LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis (TOMM 2025)
Following Taming Transformers, you should create such environment named layoutenc
conda env create -f environment.yaml
conda activate layoutenc
Download first-stage models COCO-8k-VQGAN.
Change ckpt_path
in configs/coco.yaml
to point to the downloaded first-stage models.
Download the full COCO datasets and adapt data_path
in the same files, unless working with the 100 files provided for training and validation suits your needs already.
Code can be run with
python main.py --base configs/coco.yaml -t True --gpus 0,
Refer to Taming Transformers for more operations.
You only need to run such script, have fun!
python launch_gradio_app.py
Our repo is built open Frido and Taming Transformers, thanks for your opensourcing!
@article{cui2025layoutenc,
title={LayoutEnc: Leveraging Enhanced Layout Representations for Transformer-based Complex Scene Synthesis},
author={Cui, Xiao and Sun, Qi and Wang, Min and Li, Li and Zhou, Wengang and Li, Houqiang},
journal={ACM Transactions on Multimedia Computing, Communications and Applications},
year={2025},
publisher={ACM New York, NY}
}