8000 [core] CUDA VISIBLE DEVICES is not being set for PlacementGroups · Issue #53643 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[core] CUDA VISIBLE DEVICES is not being set for PlacementGroups #53643
Closed
@BabyChouSr

Description

@BabyChouSr

What happened + What you expected to happen

I want to run a workload on vLLM where I use the placement groups to create and schedule the vLLM task on a GPU node and use the placement_group_capture_child_tasks flag to then tell the vLLM task to use the appropriate GPU placement group. However, I noticed that the placement groups don't set the CUDA_VISIBLE_DEVICES correctly. Ray simply sets the field empty instead of mapping it to an appropriate physical device id.

Versions / Dependencies

ray==2.46.0

Reproduction script

Repro code:

import os

from ray.util.placement_group import placement_group
from ray.util.scheduling_strategies import PlacementGroupSchedulingStrategy

import ray

ray.init(num_gpus=1, num_cpus=1)

@ray.remote(
    scheduling_strategy=PlacementGroupSchedulingStrategy(
        placement_group=placement_group(
            [{"GPU": 1, "CPU": 1}] * 1,
            strategy="STRICT_PACK",
        ),
        placement_group_capture_child_tasks=True,
    ),
)
def f():
    print(f"CUDA_VISIBLE_DEVICES in environment: {os.environ['CUDA_VISIBLE_DEVICES']}")
    print(f"CUDA_VISIBLE_DEVICES is empty: {os.environ.get('CUDA_VISIBLE_DEVICES') == ''}")

ray.get(f.remote())

Output from code:

(marin) cychou@sphinx3:/nlp/scr/cychou/marin$ python  experiments/sched_strategy.py
2025-06-07 20:13:21,052	INFO worker.py:1879 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
(f pid=481782) CUDA_VISIBLE_DEVICES in environment: 
(f pid=481782) CUDA_VISIBLE_DEVICES is empty: True

nvidia-smi output:

(marin) cychou@sphinx3:/nlp/scr/cychou/marin$ nvidia-smi
Sat Jun  7 20:14:24 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:44:00.0 Off |                    0 |
| N/A   24C    P0              60W / 400W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |

Issue Severity

I have to work around it by del os.environ['CUDA_VISIBLE_DEVICES'] before running the task

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CoregpuGPU related issuesstabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0