8000 [WIP] HIL SERL port grasp critic by AdilZouitine · Pull Request #937 · huggingface/lerobot · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[WIP] HIL SERL port grasp critic #937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AdilZouitine
Copy link
Member

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

Examples:

Title Label
Fixes #[issue] (🐛 Bug)
Adds new dataset (🗃️ Dataset)
Optimizes something (⚡️ Performance)

How it was tested

Explain/show how you tested your changes.

Examples:

  • Added test_something in tests/test_stuff.py.
  • Added new_feature and checked that training converges with policy X on dataset/environment Y.
  • Optimized some_function, it now runs X times faster than previously.

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

Examples:

pytest -sx tests/test_stuff.py::test_something
python lerobot/scripts/train.py --some.option=true

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.

Note: Before submitting this PR, please read the contributor guideline.

s1lent4gnt and others added 20 commits March 31, 2025 17:35
- Implemented grasp critic to evaluate gripper actions
- Added corresponding config parameters for tuning
- Added complementary info in the add method
- Added complementary info in the sample method
- Integrated the grasp critic gradient update to the training loop in learner_server
- Added Adam optimizer and configured grasp critic learning rate in configuration_sac
- Added target critics networks update after the critics gradient step
removed complementary info from buffer and learner server
removed get_gripper_action function
added gripper parameters to `common/envs/configs.py`
- Removed GraspCriticNetworkConfig class and integrated its parameters into SACConfig.
- Added num_discrete_actions parameter to SACConfig for better action handling.
- Updated SACPolicy to conditionally create grasp critic networks based on num_discrete_actions.
- Enhanced grasp critic forward pass to handle discrete actions and compute losses accordingly.
- Updated SACPolicy to conditionally compute losses for grasp critic based on num_discrete_actions.
- Simplified forward method to return loss outputs as a dictionary for better clarity.
- Adjusted learner_server to handle both main and grasp critic losses during training.
- Ensured optimizers are created conditionally for grasp critic based on configuration settings.
- Introduced mock_gripper parameter in ManiskillEnvConfig to enable gripper simulation.
- Added ManiskillMockGripperWrapper to adjust action space for environments with discrete actions.
- Updated SACPolicy to compute continuous action dimensions correctly, ensuring compatibility with the new gripper setup.
- Refactored action handling in the training loop to accommodate the changes in action dimensions.
…ling

- Cleaned up code formatting for better readability, including consistent spacing and removal of unnecessary blank lines.
- Consolidated continuous action dimension calculation to enhance clarity and maintainability.
- Simplified loss return statements in the forward method to improve code structure.
- Ensured grasp critic parameters are included conditionally based on configuration settings.
…ation

- Updated SACPolicy to conditionally compute grasp critic losses based on the presence of discrete actions.
- Refactored the forward method to handle grasp critic model selection and loss computation more clearly.
- Adjusted learner server to utilize optimized parameters for grasp critic during training.
- Improved action handling in the ManiskillMockGripperWrapper to accommodate both tuple and single action inputs.
…tion

- Cached encoder output in select_action method to reduce redundant computations.
- Updated action selection and grasp critic calls to utilize cached encoder features when available.
…hing support

- Added async_prefetch parameter to SACConfig for improved buffer management.
- Implemented get_iterator method in ReplayBuffer to support asynchronous prefetching of batches.
- Updated learner_server to utilize the new iterator for online and offline sampling, enhancing training efficiency.
@AdilZouitine AdilZouitine force-pushed the user/adil_zouitine/2025-4-1-port-grasp-critic branch from 4dc8ac6 to 632b2b4 Compare April 7, 2025 15:48
AdilZouitine and others added 6 commits April 16, 2025 16:45
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com>
@AdilZouitine AdilZouitine merged commit dc1548f into user/adil-zouitine/2025-1-7-port-hil-serl-new Apr 16, 2025
1 of 7 checks passed
@AdilZouitine AdilZouitine deleted the user/adil_zouitine/2025-4-1-port-grasp-critic branch April 16, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0