This README is functionally complete as of 06/26/2025. So if you find something missing, please open an issue and we will take care of it as soon as possible.
🆕 [2025-6-26] Tutorial for Evaluation Updated
🆕 [2025-6-15] Model Checkpoints Uploaded. Tutorial for Training/Fine-tuning Updated
🆕 [2025-6-12] Made Public.
This is the official implementation of From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models
- INT-ACT
-
Add more complete documentation for training and evaluation. Currently, the code is all there, but the documentation is sparse.
-
Release all relevant model checkpoints on HF
Install this codebase by first cloning it.
git clone --recurse-submodules https://github.com/ai4ce/INT-ACT.git
cd INT-ACT
Important
See the How to Set ENV Variables section for setting up the environment variables.
Note
This codebase relies on uv to manage the virtual environments. It's not strictly required, but the authors can only provide support for this environment management system.
Now simply run
uv sync
Important
This only installed the dependency for training and inference server. Full-scale inference requires installing the inference client (simulator) dependencies.
Inference under different environments, such as Simpler
, Simpler-ManiSkill3
, Libero
, or real world requires installing their own dependency in a separate environment.
Note
Server refers to the policy (Maniskill
) and real-world robots. Simulator/Client will feed its observations to the server to retrieve the action to execute.
We do this to allow a server-client architecture that can separate the very different compute demands of doing training vs. running experiments on a robot
This example will use Simpler as an example.
- Create a separate virtual environment for this simulator.
cd src/experiments/envs/simpler
uv venv --python=3.10 # The version can change to accommodate your simulator's need.
- Activate the inference virtual environment. This is important because we don't want to install the simulator dependencies in the training environment.
source .venv/bin/activate
- Install the simulator dependencies using
pyproject.toml
uv pip install -r pyproject.toml
Octo and Magma both requires specialized policy environments due to conflicting dependencies. This example will use Octo as an example.
- Create a separate virtual environment for this policy.
cd src/experiments/policies/octo_policy_server
uv venv --python=3.10 # The version can change to accommodate your simulator's need.
- Activate the inference virtual environment. This is important because we don't want to install the policy dependencies in the training environment.
source .venv/bin/activate
- Install the policy dependencies using
pyproject.toml
uv pip install -r pyproject.toml
For now, we refer you to Allen Ren's README
We released our trained Pi0 variants on Huggingface. You can find them under the INTACT collection. Specifically, they are:
Model | Notes | Download Link |
---|---|---|
Pi0 finetune | Pi0 finetuned on BridgeV2 | HF hub |
Pi0 finetune rephrase | Pi0 finetuned on BridgeV2 with task paraphrase | HF hub |
Pi0 scratch | Pi0 trained from scratch on BridgeV2 | HF hub |
You can find the details in each checkpoint's model card.
For convenience, we also include links to the baselines which have been generously provided by their original authors:
Model | Reference | Download Link |
---|---|---|
Magma | Magma: A Foundation Model for Multimodal AI Agents CVPR 2025 | HF hub |
SpatialVLA | SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model RSS 2025 | HF hub |
Octo models | Octo: An Open-Source Generalist Robot Policy RSS 2024 | Small (HF) / Base (HF) |
The documentation can be found in doc/training_finetuning.md.
The documentation can be found in doc/evaluation.md.
- Create a
set_path.sh
file in the project's root directory - Fill out the following variables
#!/bin/bash
# used to sync the path on HPC with data from collaborators and the model from the baseline directory
# to avoid redundant data download
# training dataset
export VLA_DATA_DIR=
# logging for trained models, logs, etc
export VLA_LOG_DIR=
# WandB
export VLA_WANDB_ENTITY=
# HF cache. TRANSFORMERS_CACHE is deprecated, but still used by some libraries and they themselves a bit confused tbh
export TRANSFORMERS_CACHE=
export HF_HOME=
# SIMPLER
export MS2_REAL2SIM_ASSET_DIR=
export MS_ASSET_DIR=
export XLA_PYTHON_CLIENT_PREALLOCATE=false
# uv (This is optional if you don't mind uv using your home directory, which may not be the case for HPC)
export UV_CACHE_DIR=
export UV_PYTHON_INSTALL_DIR=
# Singularity (This is obviously optional if you don't use Singularity)
export OVERLAY_EXT3=