-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Extend reward classifier for multiple camera views #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend reward classifier for multiple camera views #626
Conversation
2- Added classifier to the `eval_on_robot.py` script to be able to predict the reward while during the rollout on the robot 3- general fixes and optimizations to the code
@ChorntonYoel could you give it a quick review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part related to the reward classifier looks good, I have checked locally - it converges, the only one thing is failling test
pytest tests/test_train_hilserl_classifier.py
I think that need to fix it before merging
Thanks @helper2424 ! The test issue was due to the addition of multi-camera training. I updated the test files and added additional testing function for training with two camera sources. |
Looks great. |
3bb5ed5
into
user/michel-aractingi/2024-11-27-port-hil-serl
What this does
(A) This PR adds the possibility to train the reward classifier with multiple camera images. The architecture includes:
1- one pretrained resnet encoder for all images (with frozen parameters)
2- each image is passed through the encoder to get a compressed representation.
3- The representations are concatenated and passed through an MLP that will be trained to predict the reward.
(B) This PR also extends the
eval_on_robot.py
script to load the reward classifier and the label each timestep with the predicted reward in real-time.(C) in
lerobot/common/robot_devices/control_utils.py
a new function to reset the follower arm to the initial position it was in at the start of the control. This is important for collecting a dataset with rewards so we don't get redundant frames in the trajectory (related to manually reseting the robot) after the task has terminated.How to test
lerobot/scripts/train_hilserl_classifier.py
eval_on_robot.py
, example:NOTE you can use a dataset I collected with so100
aractingi/pick_place_lego_cube_1