8000 Fixes default_dqn_torch_rl_module assuming the device is 'cpu' by maxhwardg · Pull Request #54004 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fixes default_dqn_torch_rl_module assuming the device is 'cpu' #54004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

maxhwardg
Copy link
@maxhwardg maxhwardg commented Jun 23, 2025

Previously, this code would produce 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' when the module was being run on GPU.

Why are these changes needed?

They're breaking the DQN algorithm when using the GPU to run inference.

Related issue number

N/A

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Previously, this code would produce 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' when the module was being run on GPU

Signed-off-by: Max Ward <maxhwardg@users.noreply.github.com>
@Copilot Copilot AI review requested due to automatic review settings June 23, 2025 08:32
@maxhwardg maxhwardg requested a review from a team as a code owner June 23, 2025 08:32
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the device mismatch issue in the DQN module by ensuring that operations are performed on the same device.

  • Adds a pre-condition check that raises an error if random_actions and exploit_actions are on different devices.
  • Converts the output of torch.rand() to the device of exploit_actions for consistency.
Comments suppressed due to low confidence (1)

rllib/algorithms/dqn/torch/default_dqn_torch_rl_module.py:103

  • [nitpick] The explicit device conversion reinforces device consistency; verify that this matches the intended design of always aligning random_actions with exploit_actions.
            torch.rand((B,)).to(exploit_actions.device) < epsilon,

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Max Ward <maxhwardg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0