8000 Cannot achieve reported loss (<0.006) and performance on LIBERO with the finetuned model by given script. · Issue #74 · moojink/openvla-oft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Cannot achieve reported loss (<0.006) and performance on LIBERO with the finetuned model by given script. #74
Open
@Liang-ZX

Description

@Liang-ZX

Hi,

Thanks for your excellent work. We have used the script to finetune the openvla/openvla-7b model on LIBERO-spatial,

torchrun --standalone --nnodes 1 --nproc-per-node X vla-scripts/finetune.py \
  --vla_path openvla/openvla-7b \
  --data_root_dir /PATH/TO/RLDS/DATASETS/DIR/ \
  --dataset_name libero_spatial_no_noops \
  --run_root_dir /YOUR/CHECKPOINTS/AND/LOG/DIR/ \
  --use_l1_regression True \
  --use_diffusion False \
  --use_film False \
  --num_images_in_input 2 \
  --use_proprio True \
  --batch_size 8 \
  --learning_rate 5e-4 \
  --num_steps_before_decay 100000 \
  --max_steps 150005 \
  --save_freq 10000 \
  --save_latest_checkpoint_only False \
  --image_aug True \
  --lora_rank 32 \
  --wandb_entity "YOUR_WANDB_ENTITY" \
  --wandb_project "YOUR_WANDB_PROJECT" \
  --run_id_note parallel_dec--8_acts_chunk--continuous_acts--L1_regression--3rd_person_img--wrist_img--proprio_state

But after 150K step, our loss is still 0.008 and cannot achieve <0.006.
With this checkpoint, we evaluate our model on LIBERO-spatial and only ahieve ~92.6% success rate which is quite lower than the reported 96.2%. Could you please help to check what the problem is? Thank you.

Image

The only difference may be that we use 8 A800 to finetune the model. But is this the problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0