Reward classifier and training #528

ChorntonYoel · 2024-11-26T17:58:35Z

What this does

Title	Label
Item 2 of Reward Classifier in issue #504	(Feature)

This PR is meant to add a reward classifier (used to classify if an image of a robot performing a task should get a reward or not), a training file allowing the training of the classifier (with logging + resuming), a config.yaml file that can be used to start a training, and a few tests for the training loop

How it was tested

Using 10 episodes made with the reward system of this PR: #518
Also I added a test file for the training classifier file. Lots of things are mocked but it covers the basics I believe.

How to checkout & try? (for the reviewer)

python lerobot/scripts/train_classifier.py \
    --config-name", "policy/reward_classifier.yaml",

With the wandb entity and the dataset name adapted.

I was able to reproduce 95%+ after a few epochs with facebook/convnext-base-224 as backbone and a dataset of 10 epsiodes of ~15 sec.
This branch was built on top of the branch from #518 so will need to wait for this one to be merged befre merging

…uggingface#450)

…ggingface#489)

Co-authored-by: Remi <re.cadene@gmail.com>

…lassifier_and_training

Co-authored-by: Remi <re.cadene@gmail.com>

michel-aractingi · 2024-12-03T17:52:47Z

Nice work @ChorntonYoel ! Could you move the classifier directory to lerobot/common/policies/classifier/ to lerobot/common/policies/hilserl/classifier.

Since now we will only use the reward classifier for hil-serl then we will put everything in its directory. In the future when the classifiers are more established we can have a separate directory in lerobot/common/classifiers to host different kinds of reward recognition models.

michel-aractingi · 2024-12-03T17:56:50Z

lerobot/common/policies/factory.py

+    elif name == "classifier":
+        from lerobot.common.policies.classifier.configuration_classifier import ClassifierConfig
+        from lerobot.common.policies.classifier.modeling_classifier import Classifier
+
+        return Classifier, ClassifierConfig


I think its not ideal to put the classifier in the factory.py of policies. I think we can remove and instead of relying on make_policy in the training script we can directly define the classifier there. Since the training script of the classifier is not train.py.

What do you think?

Is this better now that the classifier has the "policy" hilserl/classifier" ?
Or do you still think it's confusing and we should initialize it in a different way?

michel-aractingi · 2024-12-03T17:59:46Z

lerobot/scripts/train_classifier.py

+from lerobot.common.datasets.factory import resolve_delta_timestamps
+from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
+from lerobot.common.logger import Logger
+from lerobot.common.policies.factory import make_policy


We could remove make_policy and manually define it later.

Suggested change

from lerobot.common.policies.factory import make_policy

from lerobot.common.policies.classifier.configuration_classifier import ClassifierConfig

from lerobot.common.policies.classifier.modeling_classifier import Classifier

michel-aractingi · 2024-12-03T18:11:35Z

lerobot/scripts/train_classifier.py

+    model = make_policy(
+        hydra_cfg=cfg,
+        dataset_stats=dataset.meta.stats if not cfg.resume else None,
+        pretrained_policy_name_or_path=str(logger.last_pretrained_model_dir) if cfg.resume else None,
+    ).to(device)
+


We can define the classifier here:

Suggested change

model = make_policy(

hydra_cfg=cfg,

dataset_stats=dataset.meta.stats if not cfg.resume else None,

pretrained_policy_name_or_path=str(logger.last_pretrained_model_dir) if cfg.resume else None,

).to(device)

from lerobot.common.policies.factory import _policy_cfg_from_hydra_cfg

classifier_cfg = _policy_cfg_from_hydra_cfg(ClassifierConfig, cfg)

if not cfg.resume:

model = Classifier(classifier_config, dataset.meta.stats)

else:

model = Classifier(classifier_config)

model.load_state_dict(Classifier.from_pretrained(str(logger.last_pretrained_model_dir)).state_dict())

model = model.to(device)

Outdated, but would you still prefer I do that? I don't mind

lerobot/common/robot_devices/control_utils.py

lerobot/common/datasets/lerobot_dataset.py

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>

Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai> Co-authored-by: resolver101757 <kelster101757@hotmail.com> Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com> Co-authored-by: Remi <re.cadene@gmail.com> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>

ChorntonYoel and others added 30 commits November 22, 2024 23:49

add reward assignment during teleoperation

2e15499

nit

0c3faff

8000

pre commit

e7805ed

Add support for Windows (huggingface#494)

0151ec5

bug causes error uploading to huggingface, unicode issue on windows. (h…

b8bf366

…uggingface#450)

Add distinction between two unallowed cases in name check "eval_" (hu…

2001f16

…ggingface#489)

remove populate dataset

5a60728

take off useless code

abf5798

fix find motor port

6ee99dd

Update lerobot/scripts/control_robot.py

10000

81a926f

Co-authored-by: Remi <re.cadene@gmail.com>

adapt to v2

c7eeff4

nit

a4f7db9

nit

e123a1f

fix

56447f9

cleanup rebase

62d3116

nit from rebase

cdc723e

fix reward leak between episodes

57f58d8

next.reward

515487f

nit

63b23f6

Update lerobot/common/robot_devices/control_utils.py

1eb1f3b

Co-authored-by: Remi <re.cadene@gmail.com>

Update lerobot/common/robot_devices/control_utils.py

4b78469

Co-authored-by: Remi <re.cadene@gmail.com>

int in arg parser

62db861

add classifier + training logic

1e46694

have image/label keys in config

84669d7

add image/label keys + train proportion in config

de510f4

nit

673e622

update train classifier comments

9d35377

take off useless arg use_amp in validate

9e67534

add basic tests for the training loop

eecdaf4

Merge branch 'user/aliberts/2024_09_25_reshape_dataset' into reward_c…

415dd22

…lassifier_and_training

ChorntonYoel and others added 3 commits November 29, 2024 13:16

Update lerobot/scripts/control_robot.py

7e63eaa

Co-authored-by: Remi <re.cadene@gmail.com>

switch to extra_features

5b89a4b

fix multi-class training

0753b35

ChorntonYoel marked this pull request as ready for review November 29, 2024 16:25

ChorntonYoel added 3 commits November 29, 2024 18:17

disable classifier

344b6d3

update config

e70cf8b

add markdown

b65aa82

michel-aractingi mentioned this pull request Dec 2, 2024

Add human intervention mechanism and eval_robot script to evaluate policy on the robot #541

Merged

4 tasks

ChorntonYoel added 3 commits December 5, 2024 20:07

switch to hilserl/classifier

5e90031

fix + simplify tests

6867171

deeper hilserl renaming

688099a

michel-aractingi reviewed Dec 6, 2024

View reviewed changes

lerobot/common/robot_devices/control_utils.py Outdated Show resolved Hide resolved

michel-aractingi reviewed Dec 6, 2024

View reviewed changes

lerobot/common/datasets/lerobot_dataset.py Outdated Show resolved Hide resolved

ChorntonYoel and others added 3 commits December 6, 2024 10:38

Update lerobot/common/datasets/lerobot_dataset.py

a3f2833

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>

Update lerobot/common/robot_devices/control_utils.py

388eb28

Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co>

classifier out of policy factory

5bb3fa3

michel-aractingi merged commit 6490927 into huggingface:user/michel-aractingi/2024-11-27-port-hil-serl Dec 9, 2024

michel-aractingi mentioned this pull request Dec 9, 2024

Port HIL-SERL #565

Open

ChorntonYoel mentioned this pull request Dec 11, 2024

Reward assignment during recording #518

Closed

michel-aractingi mentioned this pull request Jan 10, 2025

Extend reward classifier for multiple camera views #626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reward classifier and training #528

Reward classifier and training #528

	from lerobot.common.policies.factory import make_policy
	from lerobot.common.policies.classifier.configuration_classifier import ClassifierConfig
	from lerobot.common.policies.classifier.modeling_classifier import Classifier

-    model = make_policy(
-        hydra_cfg=cfg,
-        dataset_stats=dataset.meta.stats if not cfg.resume else None,
-        pretrained_policy_name_or_path=str(logger.last_pretrained_model_dir) if cfg.resume else None,
-    ).to(device)
+    from lerobot.common.policies.factory import _policy_cfg_from_hydra_cfg
+    classifier_cfg = _policy_cfg_from_hydra_cfg(ClassifierConfig, cfg)
+    if not cfg.resume:
+        model = Classifier(classifier_config, dataset.meta.stats)
+    else:
+        model = Classifier(classifier_config)
+        model.load_state_dict(Classifier.from_pretrained(str(logger.last_pretrained_model_dir)).state_dict())
+    model = model.to(device)

Reward classifier and training #528

Reward classifier and training #528

Conversation

What this does

How it was tested

How to checkout & try? (for the reviewer)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment