Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

maxhwardg · 2025-06-23T08:32:52Z

Previously, this code would produce 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' when the module was being run on GPU.

Why are these changes needed?

They're breaking the DQN algorithm when using the GPU to run inference.

Related issue number

N/A

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Previously, this code would produce 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' when the module was being run on GPU Signed-off-by: Max Ward <maxhwardg@users.noreply.github.com>

Copilot

Pull Request Overview

This PR fixes the device mismatch issue in the DQN module by ensuring that operations are performed on the same device.

Adds a pre-condition check that raises an error if random_actions and exploit_actions are on different devices.
Converts the output of torch.rand() to the device of exploit_actions for consistency.

Comments suppressed due to low confidence (1)

rllib/algorithms/dqn/torch/default_dqn_torch_rl_module.py:103

[nitpick] The explicit device conversion reinforces device consistency; verify that this matches the intended design of always aligning random_actions with exploit_actions.

            torch.rand((B,)).to(exploit_actions.device) < epsilon,

rllib/algorithms/dqn/torch/default_dqn_torch_rl_module.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Max Ward <maxhwardg@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 23, 2025 08:32

maxhwardg requested a review from a team as a code owner June 23, 2025 08:32

Copilot AI reviewed Jun 23, 2025

View reviewed changes

rllib/algorithms/dqn/torch/default_dqn_torch_rl_module.py Outdated Show resolved Hide resolved

Update rllib/algorithms/dqn/torch/default_dqn_torch_rl_module.py

427f3f8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Max Ward <maxhwardg@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

Are you sure you want to change the base?

Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

Uh oh!

Conversation

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!