🌐 UniVLA: Unified Vision-Language-Action Model

A general-purpose VLA Model designed to unify vision, language, and action for robotics and autonomous driving.

📜 [technical report] 🤗 [model weights] 🤖 [project page]

🚀 News

2025.6.27: code released for robotic simulations.
2025.6.25: paper released on the arXiv.

🧪 Highlights

Unified Vision-Language-Action Model: supports image grounding, video generation, and action prediction.
Strong Performance on Several Robotics Benchmarks: support CALVIN, LIBERO, SimplerEnv.
Interleaved Video Training: support interleaved vision-action training in Markov Decision Process.
Broader Applications: Real-robot ALOHA & Autonomous Driving.

🔧 REPO TODO List

Policy learning for CALVIN, LIBERO, and SimplerEnv.
Support for evaluation.
World model pretraining for video generation.
Support for real-robot ALOHA.
Support for autonomous driving.
Support for general grounding.

📚 Experiments

Emu3 Pretraining Models

You can download the pretraining models from HuggingFace, here we provide the links.

Emu3-base

Emu3-vision

World Model Training

# train the world model
bash scripts/pretrain/train_video_1node.sh

world model pretraining ckpts

This model is used to serve as the prerained model for the downstream policy learning tasks, such as CALVIN, LIBERO, and SimplerEnv.

1. CALVIN Benchmark

Method Mode Setting AVG CKPT

UniVLA video sft ABCD->D 4.63 (5x:4.71) huggingface

Note: 5× means 5× inference steps, i.e., 180 steps total.

Training

Here provide single node training script, recommend multi-node training.
# video sft
bash scripts/simulator/calvin/train_calvin_abcd_video.sh
2. LIBERO Benchmark

Method Mode SPATIAL OBJECTS GOAL 10 AVG CKPT

UniVLA img sft 97.0 99.0 92.6 90.8 94.8 huggingface

UniVLA video sft 95.4 98.8 93.6 94.0 95.5 huggingface

Training
bash scripts/simulator/libero/train_libero_video.sh
3. SimplerEnv Benchmark

Method Robot Mode Put Spoon Put Carrot Stack Block Put Eggplant AVG CKPT

UniVLA Bridge(WidowX) video sft 83.3 66.7 33.3 95.8 69.8 huggingface

Training
bash scripts/simulator/simplerenv/train_simplerenv_bridge_video.sh
Setup

Here we provide a conda environment setup for the project.
conda create -n emu_vla python=3.10
pip install -r requirements.txt
Benchmark setup, training and evaluation

CALVIN

LIBERO

SimplerEnv
📁 Code Structure
OmniSim/
├── configs/       # Model configuration files
├── models/        # Tokenizer and diffusion test
├── train/         # Training dataset and pipeline
├── reference/     # Reference code
│   ├── Emu3/      # Base code
│   └── RoboVLMs/  # Evaluation code
├── scripts/       # Shell scripts for training & evaluation
├── tools/         # Data preprocessing tools
└── README.md      # Project description and user guide
    
❤️ Acknowledgement

Our work is built upon the following projects, Thanks for their great open-source work!

Emu3

RoboVLMs

OpenVLA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 UniVLA: Unified Vision-Language-Action Model

🚀 News

🧪 Highlights

🔧 REPO TODO List

📚 Experiments

Emu3 Pretraining Models

World Model Training

1. CALVIN Benchmark

Training

2. LIBERO Benchmark

Training

3. SimplerEnv Benchmark

Training

Setup

Benchmark setup, training and evaluation

📁 Code Structure

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
docs		docs
models		models
pretrain		pretrain
reference		reference
scripts		scripts
tools		tools
train		train
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Method	Mode	SPATIAL	OBJECTS	GOAL	10	AVG	CKPT
UniVLA	img sft	97.0	99.0	92.6	90.8	94.8	huggingface
UniVLA	video sft	95.4	98.8	93.6	94.0	95.5	huggingface

baaivision/UniVLA

Folders and files

Latest commit

History

Repository files navigation

🌐 UniVLA: Unified Vision-Language-Action Model

🚀 News

🧪 Highlights

🔧 REPO TODO List

📚 Experiments

Emu3 Pretraining Models

World Model Training

1. CALVIN Benchmark

Training

2. LIBERO Benchmark

Training

3. SimplerEnv Benchmark

Training

Setup

Benchmark setup, training and evaluation

📁 Code Structure

❤️ Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages