[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling. #53702

simonsays1980 · 2025-06-10T16:52:0 8000 1Z

Why are these changes needed?

The Offline RL API in th new API stack still misses offline policy evaluation - although it offers a validation loss. This PR introduces OPE and implements the following:

A new OfflinePolicyEvaluationRunner that derives from our OfflineEvaluationRunner and can be scheduled by our OfflineEvaluationRunnerGroup (users can also implement their own runner class for custom evaluation).
A corresponding OfflinePolicyPreEvaluator that preprocesses data for OPE.
Two new attributes in the AlgorithmConfig to control offline evaluation:
- offline_evaluation_type: the evaluation type. Can be either "eval_loss", "pdis", "is".
- offline_eval_runner_class: the runner class to use for offline evaluation. This can be a custom class. If no class is given the standard classes are used for the different evaluation types.
- Corresponding validation logic for the new attributes.
Two OPE methods:
- "is": Ordinary importance sampling.
- "pdis": Per-decision importance sampling (which inhibits usually a lower variance than simple IS).

What is still missing:

Example for using OPE with some SingleAgentEpisode data.
Using the EnvToModule pipeline inside of the OfflinePolicyPreEvaluator.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Copilot

Pull Request Overview

This pull request implements Offline Policy Evaluation (OPE) via Importance Sampling for the Offline RL API. Key changes include the introduction of new runner and pre-evaluator classes for OPE, updates to the evaluation configuration and processing in the algorithm logic, and adjustments to logging and state management for offline evaluation.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
rllib/utils/runners/runner_group.py	Added forwarding of kwargs in runner creation.
rllib/tuned_examples/bc/cartpole_bc_with_offline_evaluation.py	Configured offline evaluation type for the cartpole example.
rllib/offline/offline_evaluation_runner_group.py	Updated runner class selection logic and introduced pre-learner/evaluator assignment.
rllib/offline/offline_evaluation_runner.py	Applied override annotations and removed state updates for deprecated connectors.
rllib/env/single_agent_env_runner.py	Renamed metric keys for per-agent and per-module returns.
rllib/algorithms/algorithm_config.py	Added and validated new offline evaluation configuration attributes.
rllib/algorithms/algorithm.py	Updated offline evaluation runner setup and return value structure while introducing a local-runner fallback.

Comments suppressed due to low confidence (2)

rllib/algorithms/algorithm_config.py:2992

The attribute name used here ('offline_eval_runner_cls') is inconsistent with the previously defined 'offline_eval_runner_class'. Consider using the same attribute name for consistency.

            self.offline_eval_runner_cls = offline_eval_runner_class

rllib/algorithms/algorithm.py:1159

The return structure of 'evaluate_offline' has changed compared to the previous nested format. Please ensure that downstream consumers are updated to handle the new dictionary structure.

        return {OFFLINE_EVAL_RUNNER_RESULTS: eval_results}

rllib/offline/offline_evaluation_runner_group.py

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

sven1977 · 2025-06-27T08:52:52Z

rllib/algorithms/algorithm_config.py

@@ -2829,6 +2833,13 @@ def evaluation(
                for parallel evaluation. Setting this to 0 forces sampling to be done in the
                local OfflineEvaluationRunner (main process or the Algorithm's actor when
                using Tune).
+            offline_evaluation_type: Type of offline evaluation to run. Either `"eval_loss"`


Question: So, if a user provides offline_eval_runner_class, then the value of this field is ignored?
For more explicitness, should we not provide these 3 built-ins ("eval_loss", "is", "pdis") as classes as well and show users, where to find them in the repo? Then this config setting would be superfluous. Or do you think it's too complicated to explain?

This is a good one. Let me think about this. Both solutions have their advantages.

sven1977 · 2025-06-27T08:53:01Z

rllib/algorithms/algorithm.py

@@ -1363,6 +1366,38 @@ def _evaluate_with_custom_eval_function(self) -> Tuple[ResultDict, int, int]:

        return eval_results, env_steps, agent_steps

+    def _evaluate_offline_on_local_runner(self):
+        # if hasattr(env_runner, "input_reader") and env_runner.input_reader is None:


remove this comment?

Oh yeah! How did this even get in there?

sven1977

Approved with one question. Thanks @simonsays1980 !

simonsays1980 added 5 commits June 10, 2025 13:23

Implemented offline policy evaluation.

ea74ebd

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Small changes. WIP.

30c9a8f

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-policy-evaluation-importance-sampling

41665ed

Enabled automatic runner class selection of evaluation loss or ope.

2f43583

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-policy-evaluation-importance-sampling

768397e

simonsays1980 marked this pull request as ready for review June 10, 2025 16:53

Copilot AI review requested due to automatic review settings June 10, 2025 16:53

simonsays1980 requested a review from a team as a code owner June 10, 2025 16:53

Copilot AI reviewed Jun 10, 2025

View reviewed changes

rllib/offline/offline_evaluation_runner_group.py Outdated Show resolved Hide resolved

Update rllib/offline/offline_evaluation_runner_group.py

34f399c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

simonsays1980 requested a review from sven1977 June 10, 2025 17:10

simonsays1980 added rllib RLlib related issues rllib-evaluation Bug affecting policy evaluation with RLlib. rllib-offline-rl Offline RL problems labels Jun 10, 2025

simonsays1980 added 12 commits June 10, 2025 19:12

Added 'OfflinePolicyEvaluationRunner'.

9fb729c

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-policy-evaluation-importance-sampling

d7f8eee

Removed unused imports.

ee7f010

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added missing 'return'.

f2dd8cc

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Added new attributes to the test.

b0290d2

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-policy-evaluation-importance-sampling

0e327f3

Merge branch 'master' into offline-policy-evaluation-importance-sampling

750c2ba

Merge branch 'master' into offline-policy-evaluation-importance-sampling

7e43fee

Fixed a bug in tests due to a global import.

da66048

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

Merge branch 'master' into offline-policy-evaluation-importance-sampling

c54c296

Merge branch 'master' into offline-policy-evaluation-importance-sampling

3ae36e3

Merge branch 'master' into offline-policy-evaluation-importance-sampling

4d3168a

sven1977 reviewed Jun 27, 2025

View reviewed changes

sven1977 approved these changes Jun 27, 2025

View reviewed changes

sven1977 enabled auto-merge (squash) June 27, 2025 08:53

github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 27, 2025

sven1977 disabled auto-merge June 27, 2025 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling. #53702

[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling. #53702

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling. #53702

Are you sure you want to change the base?

[RLlib; Offline RL] Implement Offline Policy Evaluation (OPE) via Importance Sampling. #53702

Conversation

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!