Helper2424/check hilserl #1047

helper2424 · 2025-04-28T18:17:37Z

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

Examples:

Title	Label
Fixes #[issue]	(🐛 Bug)
Adds new dataset	(🗃️ Dataset)
Optimizes something	(⚡️ Performance)

How it was tested

Explain/show how you tested your changes.

Examples:

Added test_something in tests/test_stuff.py.
Added new_feature and checked that training converges with policy X on dataset/environment Y.
Optimized some_function, it now runs X times faster than previously.

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

Examples:

pytest -sx tests/test_stuff.py::test_something

python lerobot/scripts/train.py --some.option=true

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR

Note: Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. Try to avoid tagging more than 3 people.

Note: Before submitting this PR, please read the contributor guideline.

Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai> Co-authored-by: resolver101757 <kelster101757@hotmail.com> Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com> Co-authored-by: Remi <re.cadene@gmail.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>

…licy on the robot (huggingface#541) Co-authored-by: Yoel <yoel.chornton@gmail.com>

Co-authored-by: Yoel <yoel.chornton@gmail.com>

Co-authored-by: KeWang1017 <ke.wang@helloleap.ai>

…ing logic - Added `num_subsample_critics`, `critic_target_update_weight`, and `utd_ratio` to SACConfig. - Implemented target entropy calculation in SACPolicy if not provided. - Introduced subsampling of critics to prevent overfitting during updates. - Updated temperature loss calculation to use the new target entropy. - Added comments for future UTD update implementation. These changes improve the flexibility and performance of the SAC implementation.

…s & check script (huggingface#578)

…n handling - Updated action selection to use distribution sampling and log probabilities for better stochastic behavior. - Enhanced standard deviation clamping to prevent extreme values, ensuring stability in policy outputs. - Cleaned up code by removing unnecessary comments and improving readability. These changes aim to refine the SAC implementation, enhancing its robustness and performance during training and inference.

- Updated standard deviation parameterization in SACConfig to 'softplus' with defined min and max values for improved stability. - Modified action sampling in SACPolicy to use reparameterized sampling, ensuring better gradient flow and log probability calculations. - Cleaned up log probability calculations in TanhMultivariateNormalDiag for clarity and efficiency. - Increased evaluation frequency in YAML configuration to 50000 for more efficient training cycles. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

…d stability - Updated SACConfig to replace standard deviation parameterization with log_std_min and log_std_max for better control over action distributions. - Modified SACPolicy to streamline action selection and log probability calculations, enhancing stochastic behavior. - Removed deprecated TanhMultivariateNormalDiag class to simplify the codebase and improve maintainability. These changes aim to enhance the robustness and performance of the SAC implementation during training and inference.

for more information, see https://pre-commit.ci

…ology with 'discrete_critic'. Update related methods and comments for clarity and consistency in handling discrete actions.

The test was inadvertently comparing uninitialized parts of the array, which could lead to inconsistent or undefined results. This fix ensures only the relevant, properly initialized sections are checked. Co-authored-by: Eugene Mironov <helper2424@gmail.com>

… lerobot (huggingface#1018) Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>

for more information, see https://pre-commit.ci

…bot into helper2424/c 65EE heck_hilserl

ChorntonYoel and others added 30 commits April 18, 2025 15:02

Add human intervention mechanism and eval_robot script to evaluate po…

30a808c

…licy on the robot (huggingface#541) Co-authored-by: Yoel <yoel.chornton@gmail.com>

Fixup

d78cef1

Update lerobot/scripts/train_hilserl_classifier.py

d1f76cb

Co-authored-by: Yoel <yoel.chornton@gmail.com>

nit in control_robot.py

b57d6a7

completed losses

9d48d23

Port SAC WIP (huggingface#581)

be3adda

Co-authored-by: KeWang1017 <ke.wang@helloleap.ai>

added comments from kewang

1a8b99e

[Port Hil-SERL] Add unit tests for the reward classifier & fix import…

17a3a31

…s & check script (huggingface#578)

[HIL-SERL PORT] Fix linter issues (huggingface#588)

22a1899

added optimizer and sac to factory.py

ad7eea1

Added normalization schemes and style checks

4624a83

trying to get sac running

63d8c96

style fixes

642e3a3

split encoder for critic and actor

c6ca952

added temporary fix for missing task_index key in online environment

e5801f4

[Port HIL_SERL] Final fixes for the Reward Classifier (huggingface#598)

d1d6ffd

Extend reward classifier for multiple camera views (huggingface#626)

181727c

[WIP] correct sac implementation

a0e2be8

remove breakpoint

e8449e9

SAC works

2fd7887

Add rlpd tricks

46827fb

[WIP] correct sac implementation

57344bf

SAC works

875c027

Change SAC policy implementation with configuration and modeling classes

760d60a

Add type annotations and restructure SACConfig class fields

ef77799

michel-aractingi and others added 30 commits April 18, 2025 16:18

Fixes for the reward classifier

3b24ad3

Added option to add current readings to the state of the policy

9886520

nits in configuration classifier and control_robot

c1ee25d

[pre-commit.ci] auto fixes from pre-commit.com hooks

0d70f0b

for more information, see https://pre-commit.ci

Refactor SACPolicy and configuration to replace 'grasp_critic' termin…

a7a51cf

…ology with 'discrete_critic'. Update related methods and comments for clarity and consistency in handling discrete actions.

Refactor crop_dataset_roi

dc726cb

[HIL-SERl PORT] Unit tests for Replay Buffer (huggingface#966)

0030ff3

Fix linter issue

c5845ee

Fix linter issue part 2

6230840

Fixup linter (huggingface#1017)

4ce3362

Ignore spellcheck for ik variable

b77cee7

fix install ci

ecc960b

allow to install prerelease for maniskill

cf03ca9

fix ci

a001824

[HIL-SERL] Update CI to allow installation of prerelease versions for…

299effe

… lerobot (huggingface#1018) Co-authored-by: imstevenpmwork <steven.palma@huggingface.co>

Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new

671ac34

[HIL-SERL]Remove overstrict pre-commit modifications (huggingface#1028)

c58b504

Clean the code and remove todo

b8c2b0b

Clean the code

a8da4a3

[Port HIl-Serl] Refactor gym-manipulator (huggingface#1034)

bd4db8d

Merge branch 'main' into user/adil-zouitine/2025-1-7-port-hil-serl-new

1d4f660

cleaning

50e9a8e

checkout normalize.py to prev commit

ea89b29

rename reward classifier

4257fe5

configs

8330637

[pre-commit.ci] auto fixes from pre-commit.com hooks

63fde2a

for more information, see https://pre-commit.ci

Debug

91f35f5

Move back configs

d319a11

Merge branch 'helper2424/check_hilserl' of github.com:helper2424/lero…

1f98b8b

…bot into helper2424/c 65EE heck_hilserl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper2424/check hilserl #1047

Helper2424/check hilserl #1047

Helper2424/check hilserl #1047

Are you sure you want to change the base?

Helper2424/check hilserl #1047

Conversation

What this does

How it was tested

How to checkout & try? (for the reviewer)

SECTION TO REMOVE BEFORE SUBMITTING YOUR PR