[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

sven1977 · 2025-05-23T12:17:34Z

Examples folder do-over (vol 52): Custom action distribution example

Adds new custom action dist class + RLModule using this class
Adds new example script (also to CI tests)
replaces existing Catalogs-based one.
fix docs

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>

simonsays1980

LGTM. APproved with a kind request for including temperature decay.

simonsays1980 · 2025-05-23T13:59:03Z

rllib/models/torch/torch_distributions.py

@@ -670,7 +665,7 @@ def from_logits(
            child_distribution_cls_struct, child_distribution_list
        )

-        return TorchMultiDistribution(
+        return cls(


Why changing it here the other way around?

simonsays1980 · 2025-05-23T14:01:41Z

rllib/examples/actions/custom_action_distribution.py

@@ -0,0 +1,118 @@
+"""Example on how to define and run an experiment with a custom action distribution.
+
+The example uses an additional `temperature` parameter on top of the built-in


Great example of how to introduce temperature into the action sampling. Could we also show how to decay this temperature. Temperature decay over the course of training is a common practice in RL.

ArturNiederfahrenhorst · 2025-05-27T08:10:51Z

rllib/examples/rl_modules/classes/custom_action_distribution_rlm.py

+        # to None, its default value.
+        self.action_dist_cls = _make_categorical_with_temperature(
+            self.model_config.get("action_dist_temperature", 1.0),
+        )


@sven1977 I think, for the purpose of this PR, using this API still makes sense.
But I'd like to propose a (backward-compatible) change to RL Modules:
RLModule.get_inference_action_dist_cls should be a getter method RLModule.inference_action_dist_cls to make it look like an attribute but does the same thing as today. If user now wants to override, they set that attribute in the setup method. Because today, we have a mixture of attributes and these getter methods to modify RLModules.

That way, the default way to change all action distributions would be the setup method, while the old path of overriding RLModule.get_inference_action_dist_cls would still be available through overriding the RLModule.inference_action_dist_cls getter method. So we get to a state where user does not have to mix inheritance-based definition of components with setup().

Also CC @simonsays1980

I think we are almost there, already. The default implementation of get_inference_action_dist_cls today is:

def get_inference_action_dist_cls(self) -> Type[TorchDistribution]: if self.action_dist_cls is not None: return self.action_dist_cls elif isinstance(self.action_space, gym.spaces.Discrete): return TorchCategorical elif isinstance(self.action_space, gym.spaces.Box): return TorchDiagGaussian else: raise ValueError(...)

Are you suggesting to just make the attributes more granular, like introduce self.inference_action_dist_cls, self.exploration_action_dist_cls, and self.train_action_dist_cls?

I'm not sure. Maybe this would complicate things and give users too many options.

Counter suggestion:

We deprecate the option to set any dist-cls attribute. Everything has to be done through overriding methods.

Analogous to overriding _forward vs _forward_[inference|exploration|train], we should introduce the methods: _get_action_dist_cls() <- for all cases, _get_action_dist_cls_inference, etc.. <- for the specific cases. By default, all the specific cases simply call the generic _get_action_dist_cls(). Again, completely analogous to behavior of the _forward methods. This way, if users just need one class, they override _get_action_dist_cls, if they need more granularity for some phases, they override the phase-specific methods.

ArturNiederfahrenhorst · 2025-05-27T08:15:34Z

rllib/examples/rl_modules/classes/custom_action_distribution_rlm.py

+        # your custom class(es) from these. In this case, leave self.action_dist_cls set
+        # to None, its default value.
+        self.action_dist_cls = _make_categorical_with_temperature(
+            self.model_config.get("action_dist_temperature", 1.0),


Can we please not default to 1.0 just for the purpose of making this a bit safer?
With how it is now, this example would not fail if user sets model_config["action_dist_temp"] or some other wrong index of the model dict making user believe that die temperature has negligible impact because the failure is silent.

This my my only nit, the rest are just "thoughts for future PRs"

Good point! I generally agree that hidden defaults should be avoided. Will fix ...

ArturNiederfahrenhorst · 2025-05-27T08:17:44Z

rllib/examples/rl_modules/classes/vpg_torch_rlm.py

+        this RLModule is subject to. Note that the observation space might not be the
+        exact space from your env, but that it might have already gone through
+        preprocessing through a connector pipeline (for example, flattening,
+        frame-stacking, mean/std-filtering, etc..).


Note: I think we should, at some point, disambiguate the word observation_space by changing it to input_space or something similar.

Have been thinking about this for some time as well.

I think a contra-argument could be:

In 99% of the cases, a sub-module within a MultiRLModule is some form of policy, mapping agent-observations to agent-actions.

Yes, there are sometimes sub-modules in a MultiRLModule that are NOT policies, like a world model or a shared encoder. But even in these cases, they normally take observations as inputs, or - and that would still require observation_space information to be present - a combination of observations and (last n) rewards and (last n) actions.

Yes, you could also have a sub-module that's some sort of head, getting its input from an intermediary embedding layer, but then in that case, I would think that the size of that embedding layer (probably some 1D tensor) would be given in self.model_config.

ArturNiederfahrenhorst · 2025-05-27T08:19:21Z

rllib/models/torch/torch_distributions.py

+    @override(Distribution)
+    def from_logits(cls, logits: TensorType, **kwargs) -> "TorchDistribution":
+        return cls(logits=logits, **kwargs)
+


ArturNiederfahrenhorst

Just one nit. Thanks!

…nup_examples_folder_52_custom_action_distribution

Signed-off-by: sven1977 <svenmika1977@gmail.com>

…example (new script, replaces existing Catalogs-based one). (ray-project#53262) Signed-off-by: Chris Zhang <chris@anyscale.com>

…example (new script, replaces existing Catalogs-based one). (ray-project#53262) Signed-off-by: Vicky Tsang <vtsang@amd.com>

…example (new script, replaces existing Catalogs-based one). (ray-project#53262)

…example (new script, replaces existing Catalogs-based one). (ray-project#53262) Signed-off-by: Scott Lee <scott.lee@rebellions.ai>

wip

0289a16

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from a team as a code owner May 23, 2025 12:17

sven1977 assigned simonsays1980 May 23, 2025

sven1977 added rllib RLlib related issues rllib-models An issue related to RLlib (default or custom) Models. rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples labels May 23, 2025

wip

dc17975

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 requested a review from a team as a code owner May 23, 2025 13:10

simonsays1980 approved these changes May 23, 2025

View reviewed changes

ArturNiederfahrenhorst reviewed May 27, 2025

View reviewed changes

ArturNiederfahrenhorst approved these changes May 27, 2025

View reviewed changes

sven1977 added 3 commits May 28, 2025 10:25

Merge branch 'master' of https://github.com/ray-project/ray into clea…

f6bafd5

…nup_examples_folder_52_custom_action_distribution

wip

a135690

Signed-off-by: sven1977 <svenmika1977@gmail.com>

wip

c3432e8

Signed-off-by: sven1977 <svenmika1977@gmail.com>

sven1977 enabled auto-merge (squash) May 28, 2025 10:43

github-actions bot added the go add ONLY when ready to merge, run all tests label May 28, 2025

LINT

a3f2ab6

Signed-off-by: sven1977 <svenmika1977@gmail.com>

github-actions bot disabled auto-merge May 28, 2025 11:24

sven1977 enabled auto-merge (squash) May 28, 2025 11:24

sven1977 merged commit c2135cf into ray-project:master May 29, 2025
6 checks passed

vickytsang pushed a commit to ROCm/ray that referenced this pull request Jun 3, 2025

[RLlib] Examples folder do-over (vol 52): Custom action distribution …

9149e8f

…example (new script, replaces existing Catalogs-based one). (ray-project#53262) Signed-off-by: Vicky Tsang <vtsang@amd.com>

iamjustinhsu pushed a commit to iamjustinhsu/ray that referenced this pull request Jun 12, 2025

[RLlib] Examples folder do-over (vol 52): Custom action distribution …

4589de0

…example (new script, replaces existing Catalogs-based one). (ray-project#53262)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,118 @@
		"""Example on how to define and run an experiment with a custom action distribution.

		The example uses an additional `temperature` parameter on top of the built-in

[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

[RLlib] Examples folder do-over (vol 52): Custom action distribution example (new script, replaces existing Catalogs-based one). #53262

Conversation

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!