ShowUI

Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou
Show Lab @ National University of Singapore, Microsoft

🔥 Update

⚡ API Calling

Run python3 api.py by providing a screenshot and a query.

Since we are based on huggingface gradio client, you don't need a GPU to deploy the model locally 🤗

🖥️ Computer Use

See Computer Use OOTB for using ShowUI to control your PC.

computer_use_with_showui-en-s.mp4

⭐ Quick Start

See Quick Start for local model usage.

🤗 Local Gradio

See Gradio for installation.

🚀 Training

Our Training codebases supports:

Grounding and Navigation training: Mind2Web, AITW, Miniwob
Self-customized model: ShowUI, Qwen2VL
Efficient Training: DeepSpeed, BF16, QLoRA, SDQA / FlashAttention2, Liger-Kernel
Multiple datasets mixed training
Interleaved data streaming
Image randomly resize (crop, pad)
Wandb training monitor

See Train for training set up.

🕹️ UI-Guided Token Selection

Try test.ipynb, which seamless support for Qwen2VL models.

(b) By applying UI-graph, UI Component number: 167

✍️ Annotate your own data

Try recaption.ipynb, where we provide instructions on how to recaption the original annotations using GPT-4o.

❤ Acknowledgement

We extend our gratitude to SeeClick for providing their codes and datasets.

Special thanks to Siyuan fo 6407 r assistance with the Gradio demo and OOTB support.

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{lin2024showui,
      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent}, 
      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
      year={2024},
      eprint={2411.17465},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17465}, 
}

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
assets		assets
data		data
ds_configs		ds_configs
examples		examples
main		main
model		model
utils		utils
.gitignore		.gitignore
GRADIO.md		GRADIO.md
LICENSE		LICENSE
QUICK_START.md		QUICK_START.md
README.md		README.md
TRAIN.md		TRAIN.md
api.py		api.py
app.py		app.py
merge_weight.py		merge_weight.py
recaption.ipynb		recaption.ipynb
requirements.txt		requirements.txt
run.sh		run.sh
test.ipynb		test.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ShowUI

🔥 Update

⚡ API Calling

🖥️ Computer Use

⭐ Quick Start

🤗 Local Gradio

🚀 Training

🕹️ UI-Guided Token Selection

✍️ Annotate your own data

❤ Acknowledgement

🎓 BibTeX

About

Uh oh!

Releases

Packages

Languages

License

UmeshOnAI/ShowUI

Folders and files

Latest commit

History

Repository files navigation

ShowUI

🔥 Update

⚡ API Calling

🖥️ Computer Use

⭐ Quick Start

🤗 Local Gradio

🚀 Training

🕹️ UI-Guided Token Selection

✍️ Annotate your own data

❤ Acknowledgement

🎓 BibTeX

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages