[WIP] HIL SERL port grasp critic #937

AdilZouitine · 2025-04-04T07:58:53Z

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

Examples:

Title	Label
Fixes #[issue]	(🐛 Bug)
Adds new dataset	(🗃️ Dataset)
Optimizes something	(⚡️ Performance)

How it was tested

Explain/show how you tested your changes.

Examples:

Added test_something in tests/test_stuff.py.
Added new_feature and checked that training converges with policy X on dataset/environment Y.
Optimized some_function, it now runs X times faster than previously.

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

Examples:

pytest -sx tests/test_stuff.py::test_something

python lerobot/scripts/train.py --some.option=true

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.

Note: Before submitting this PR, please read the contributor guideline.

- Implemented grasp critic to evaluate gripper actions - Added corresponding config parameters for tuning

- Added complementary info in the add method - Added complementary info in the sample method

- Integrated the grasp critic gradient update to the training loop in learner_server - Added Adam optimizer and configured grasp critic learning rate in configuration_sac - Added target critics networks update after the critics gradient step

for more information, see https://pre-commit.ci

removed complementary info from buffer and learner server removed get_gripper_action function added gripper parameters to `common/envs/configs.py`

- Removed GraspCriticNetworkConfig class and integrated its parameters into SACConfig. - Added num_discrete_actions parameter to SACConfig for better action handling. - Updated SACPolicy to conditionally create grasp critic networks based on num_discrete_actions. - Enhanced grasp critic forward pass to handle discrete actions and compute losses accordingly.

- Updated SACPolicy to conditionally compute losses for grasp critic based on num_discrete_actions. - Simplified forward method to return loss outputs as a dictionary for better clarity. - Adjusted learner_server to handle both main and grasp critic losses during training. - Ensured optimizers are created conditionally for grasp critic based on configuration settings.

- Introduced mock_gripper parameter in ManiskillEnvConfig to enable gripper simulation. - Added ManiskillMockGripperWrapper to adjust action space for environments with discrete actions. - Updated SACPolicy to compute continuous action dimensions correctly, ensuring compatibility with the new gripper setup. - Refactored action handling in the training loop to accommodate the changes in action dimensions.

…ling - Cleaned up code formatting for better readability, including consistent spacing and removal of unnecessary blank lines. - Consolidated continuous action dimension calculation to enhance clarity and maintainability. - Simplified loss return statements in the forward method to improve code structure. - Ensured grasp critic parameters are included conditionally based on configuration settings.

…ation - Updated SACPolicy to conditionally compute grasp critic losses based on the presence of discrete actions. - Refactored the forward method to handle grasp critic model selection and loss computation more clearly. - Adjusted learner server to utilize optimized parameters for grasp critic during training. - Improved action handling in the ManiskillMockGripperWrapper to accommodate both tuple and single action inputs.

…tion - Cached encoder output in select_action method to reduce redundant computations. - Updated action selection and grasp critic calls to utilize cached encoder features when available.

…hing support - Added async_prefetch parameter to SACConfig for improved buffer management. - Implemented get_iterator method in ReplayBuffer to support asynchronous prefetching of batches. - Updated learner_server to utilize the new iterator for online and offline sampling, enhancing training efficiency.

for more information, see https://pre-commit.ci

…ed logic to put gripper reward in info

for more information, see https://pre-commit.ci

…ity. Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

s1lent4gnt and others added 20 commits March 31, 2025 17:35

Add grasp critic

4a1c26d

- Implemented grasp critic to evaluate gripper actions - Added corresponding config parameters for tuning

Add complementary info in the replay buffer

007fee9

- Added complementary info in the add method - Added complementary info in the sample method

Add gripper penalty wrapper

7452f9b

Add get_gripper_action method to GamepadController

2c1e5fa

Add grasp critic to the training loop

c774bbe

- Integrated the grasp critic gradient update to the training loop in learner_server - Added Adam optimizer and configured grasp critic learning rate in configuration_sac - Added target critics networks update after the critics gradient step

[pre-commit.ci] auto fixes from pre-commit.com hooks

7983baf

for more information, see https://pre-commit.ci

Added Gripper quantization wrapper and grasp penalty

fe2ff51

removed complementary info from buffer and learner server removed get_gripper_action function added gripper parameters to `common/envs/configs.py`

Enhance SACPolicy to support shared encoder and optimize action selec…

51f1625

…tion - Cached encoder output in select_action method to reduce redundant computations. - Updated action selection and grasp critic calls to utilize cached encoder features when available.

fix indentation issue

e86fe66

[pre-commit.ci] auto fixes from pre-commit.com hooks

037ecae

for more information, see https://pre-commit.ci

fix caching

7741526

Handle gripper penalty

4621f4e

Refactor complementary_info handling in ReplayBuffer

6c10390

fix sign issue

632b2b4

AdilZouitine force-pushed the user/adil_zouitine/2025-4-1-port-grasp-critic branch from 4dc8ac6 to 632b2b4 Compare April 7, 2025 15:48

pre-commit-ci bot and others added 9 commits April 7, 2025 15:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7be613

for more information, see https://pre-commit.ci

Add rounding for safety

a813562

fix caching and dataset stats is optional

d948b95

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7edf2a

for more information, see https://pre-commit.ci

General fixes in code, removed delta action, fixed grasp penalty, add…

5428ab9

…ed logic to put gripper reward in info

[pre-commit.ci] auto fixes from pre-commit.com hooks

ba09f44

for more information, see https://pre-commit.ci

fix encoder training

854bfb4

Refactor modeling_sac and parameter handling for clarity and reusabil…

320a1a9

…ity. Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

stick to hil serl nn architecture

35ecaae

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

AdilZouitine and others added 6 commits April 16, 2025 16:45

match target entropy hil serl

a850d43

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

change the tanh distribution to match hil serl

157f719

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

Handle caching

0c9a3ec

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

fix caching

d4f341e

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

Update log_std_min type to float in PolicyConfig for consistency

a6f612e

Fix init temp

7191bbb

Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>

AdilZouitine merged commit dc1548f into user/adil-zouitine/2025-1-7-port-hil-serl-new Apr 16, 2025
1 of 7 checks passed

AdilZouitine deleted the user/adil_zouitine/2025-4-1-port-grasp-critic branch April 16, 2025 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] HIL SERL port grasp critic #937

[WIP] HIL SERL port grasp critic #937

[WIP] HIL SERL port grasp critic #937

[WIP] HIL SERL port grasp critic #937

Conversation

What this does

How it was tested

How to checkout & try? (for the reviewer)

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR