8000 [RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). by sven1977 · Pull Request #53262 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sven1977
Copy link
Contributor
@sven1977 sven1977 commented May 23, 2025

Examples folder do-over (vol 52): Custom action distribution example

  • Adds new custom action dist class + RLModule using this class
  • Adds new example script (also to CI tests)
  • replaces existing Catalogs-based one.
  • fix docs

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested a review from a team as a code owner May 23, 2025 12:17
@sven1977 sven1977 added rllib RLlib related issues rllib-models An issue related to RLlib (default or custom) Models. rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples labels May 23, 2025
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested a review from a team as a code owner May 23, 2025 13:10
Copy link
Collaborator
@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. APproved with a kind request for including temperature decay.

@@ -670,7 +665,7 @@ def from_logits(
child_distribution_cls_struct, child_distribution_list
)

return TorchMultiDistribution(
return cls(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing it here the other way around?

@@ -0,0 +1,118 @@
"""Example on how to define and run an experiment with a custom action distribution.

The example uses an additional `temperature` parameter on top of the built-in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great example of how to introduce temperature into the action sampling. Could we also show how to decay this temperature. Temperature decay over the course of training is a common practice in RL.

# to None, its default value.
self.action_dist_cls = _make_categorical_with_temperature(
self.model_config.get("action_dist_temperature", 1.0),
)
Copy link
Contributor
@ArturNiederfahrenhorst ArturNiederfahrenhorst May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sven1977 I think, for the purpose of this PR, using this API still makes sense.
But I'd like to propose a (backward-compatible) change to RL Modules:
RLModule.get_inference_action_dist_cls should be a getter method RLModule.inference_action_dist_cls to make it look like an attribute but does the same thing as today. If user now wants to override, they set that attribute in the setup method. Because today, we have a mixture of attributes and these getter methods to modify RLModules.

That way, the default way to change all action distributions would be the setup method, while the old path of overriding RLModule.get_inference_action_dist_cls would still be available through overriding the RLModule.inference_action_dist_cls getter method. So we get to a state where user does not have to mix inheritance-based definition of components with setup().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also CC @simonsays1980

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are almost there, already. The default implementation of get_inference_action_dist_cls today is:

    def get_inference_action_dist_cls(self) -> Type[TorchDistribution]:
        if self.action_dist_cls is not None:
            return self.action_dist_cls
        elif isinstance(self.action_space, gym.spaces.Discrete):
            return TorchCategorical
        elif isinstance(self.action_space, gym.spaces.Box):
            return TorchDiagGaussian
        else:
            raise ValueError(...)

Are you suggesting to just make the attributes more granular, like introduce self.inference_action_dist_cls, self.exploration_action_dist_cls, and self.train_action_dist_cls?

I'm not sure. Maybe this would complicate things and give users too many options.

Counter suggestion:

  • We deprecate the option to set any dist-cls attribute. Everything has to be done through overriding methods.
  • Analogous to overriding _forward vs _forward_[inference|exploration|train], we should introduce the methods: _get_action_dist_cls() <- for all cases, _get_action_dist_cls_inference, etc.. <- for the specific cases. By default, all the specific cases simply call the generic _get_action_dist_cls(). Again, completely analogous to behavior of the _forward methods. This way, if users just need one class, they override _get_action_dist_cls, if they need more granularity for some phases, they override the phase-specific methods.

# your custom class(es) from these. In this case, leave self.action_dist_cls set
# to None, its default value.
self.action_dist_cls = _make_categorical_with_temperature(
self.model_config.get("action_dist_temperature", 1.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please not default to 1.0 just for the purpose of making this a bit safer?
With how it is now, this example would not fail if user sets model_config["action_dist_temp"] or some other wrong index of the model dict making user believe that die temperature has negligible impact because the failure is silent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This my my only nit, the rest are just "thoughts for future PRs"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I generally agree that hidden defaults should be avoided. Will fix ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

this RLModule is subject to. Note that the observation space might not be the
exact space from your env, but that it might have already gone through
preprocessing through a connector pipeline (for example, flattening,
frame-stacking, mean/std-filtering, etc..).
Copy link
Contributor
@ArturNiederfahrenhorst ArturNiederfahrenhorst May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I think we should, at some point, disambiguate the word observation_space by changing it to input_space or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been thinking about this for some time as well.

I think a contra-argument could be:

  • In 99% of the cases, a sub-module within a MultiRLModule is some form of policy, mapping agent-observations to agent-actions.
  • Yes, there are sometimes sub-modules in a MultiRLModule that are NOT policies, like a world model or a shared encoder. But even in these cases, they normally take observations as inputs, or - and that would still require observation_space information to be present - a combination of observations and (last n) rewards and (last n) actions.
  • Yes, you could also have a sub-module that's some sort of head, getting its input from an intermediary embedding layer, but then in that case, I would think that the size of that embedding layer (probably some 1D tensor) would be given in self.model_config.

@override(Distribution)
def from_logits(cls, logits: TensorType, **kwargs) -> "TorchDistribution":
return cls(logits=logits, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor
@ArturNiederfahrenhorst ArturNiederfahrenhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit. Thanks!

sven1977 added 3 commits May 28, 2025 10:25
…nup_examples_folder_52_custom_action_distribution
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) May 28, 2025 10:43
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label May 28, 2025
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge May 28, 2025 11:24
@sven1977 sven1977 enabled auto-merge (squash) May 28, 2025 11:24
@sven1977 sven1977 merged commit c2135cf into ray-project:master May 29, 2025
6 checks passed
chris-ray-zhang pushed a commit to chris-ray-zhang/ray that referenced this pull request May 29, 2025
…example (new script, replaces existing Catalogs-based one). (ray-project#53262)

Signed-off-by: Chris Zhang <chris@anyscale.com>
vickytsang pushed a commit to ROCm/ray that referenced this pull request Jun 3, 2025
…example (new script, replaces existing Catalogs-based one). (ray-project#53262)

Signed-off-by: Vicky Tsang <vtsang@amd.com>
iamjustinhsu pushed a commit to iamjustinhsu/ray that referenced this pull request Jun 12, 2025
…example (new script, replaces existing Catalogs-based one). (ray-project#53262)
rebel-scottlee pushed a commit to rebellions-sw/ray that referenced this pull request Jun 21, 2025
…example (new script, replaces existing Catalogs-based one). (ray-project#53262)

Signed-off-by: Scott Lee <scott.lee@rebellions.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-models An issue related to RLlib (default or custom) Models.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0