To train the ARP-based model using offline Furniture-Bench datasets, cd to furni-rlb
directory and run
python3 train.py config=./configs/arp_plus.yaml hydra.job.name=<JOB_NAME_YOU_LIKE> train.num_gpus=1 train.seq_len=10
Since we use hydra, you can override the config such as adding train.bs=48
to change the batch size or wandb=<WANDB_PROJECT_NAME>
to log the metrics to wandb.
Also, you can choose which vision encoder to use by overriding model.hp.vision_encoder
. The current available choices are "resnet18"
, "resnet50"
, and "dinov2"
.
To evaluate the model on the Furniture-Bench simulator, cd to rpl-final-eval
directory and run
python -m run run_prefix=$(date "+%Y-%m-%d-%H-%M-%S") rolf.demo_path=furniture_dataset_processed/low/lamp/ env.furniture=lamp gpu=0 is_train=False init_ckpt_path=<PATH_TO_CHECKPOINT>
We present an imitation learning architecture based on autoregressive action sequence learning. We demonstrate strong results on Push-T, ALOHA, RLBench, and real robot experiments. For details, please check our paper.
github-demo.mp4
To install, clone this repository and recreate the python environment according to ENV.md, and download datasets and pretrained models according to Download.md.
-
To evaluate or run demonstration with pretrained models, follow the instructions in Eval.md.
-
To train ARP in Push-T, ALOHA, or RLBench, follow the instructions in Train.md.
-
To count MACs and parameters, please check profile.ipynb.
-
To run baselines and ablation studies, please check Experiments.md. We also provide a much cleaner implementation of RVT-2.
-
Please check real-robot/readme.ipynb, if you want to learn more about the real robot experiment.
-
Visualization on Likelihood Inference and Prediction with Human Guidance. Please check pusht/qualitative-visualize.ipynb.
-
If you look for supplementary video, please check the videos folder in https://rutgers.box.com/s/uzozemx67kje58ycy3lyzf1zgddz8tyq.
-
arp.py is a single-file implementation of our autoregressive policy. Directly running this file in command line will train an ARP model to generate binary mnist images.
- The only hairy part of the code is the
generate
function, which is, in principle simple but has some engineering details. - Note, action decoder (in paper) are named as predictor in this file.
- Here are my ongoing documentation.
- The only hairy part of the code is the
-
We provide 2d-waypoints-real-robot.ipynb, which shows you how to get 2d waypoints or 2d Joint locations (which can be used as guidance for low-level actions), from URDF, camera parameters and joint positions of real robots.
In case this work is helpful for your research, please cite:
@misc{zhang2024arp,
title={Autoregressive Action Sequence Learning for Robotic Manipulation},
author={Xinyu Zhang, Yuhan Liu, Haonan Chang, Liam Schramm, and Abdeslam Boularias},
year={2024},
eprint={arXiv:2410.03132},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.03132},
}