Open
Description
Hi,
Thanks for your excellent work. We have used the script to finetune the openvla/openvla-7b model on LIBERO-spatial,
torchrun --standalone --nnodes 1 --nproc-per-node X vla-scripts/finetune.py \
--vla_path openvla/openvla-7b \
--data_root_dir /PATH/TO/RLDS/DATASETS/DIR/ \
--dataset_name libero_spatial_no_noops \
--run_root_dir /YOUR/CHECKPOINTS/AND/LOG/DIR/ \
--use_l1_regression True \
--use_diffusion False \
--use_film False \
--num_images_in_input 2 \
--use_proprio True \
--batch_size 8 \
--learning_rate 5e-4 \
--num_steps_before_decay 100000 \
--max_steps 150005 \
--save_freq 10000 \
--save_latest_checkpoint_only False \
--image_aug True \
--lora_rank 32 \
--wandb_entity "YOUR_WANDB_ENTITY" \
--wandb_project "YOUR_WANDB_PROJECT" \
--run_id_note parallel_dec--8_acts_chunk--continuous_acts--L1_regression--3rd_person_img--wrist_img--proprio_state
But after 150K step, our loss is still 0.008 and cannot achieve <0.006.
With this checkpoint, we evaluate our model on LIBERO-spatial and only ahieve ~92.6% success rate which is quite lower than the reported 96.2%. Could you please help to check what the problem is? Thank you.
The only difference may be that we use 8 A800 to finetune the model. But is this the problem?
Metadata
Metadata
Assignees
Labels
No labels