8000 [Core] Ray CUDA Images on 2.45+ are missing required NVIDIA driver · Issue #53266 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Core] Ray CUDA Images on 2.45+ are missing required NVIDIA driver #53266
Closed
@andrewsykim

Description

@andrewsykim

What happened + What you expected to happen

It seems like Ray 2.45+ CUDA images are missing required NVIDIA tools such as nvidia-smi. The Ray Dashboard is also unable to recognize GPUs.

For example on rayproject/ray:2.44.1-py39-cu128, nvidia-smi works:

$ nvidia-smi
Fri May 23 07:28:08 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02             Driver Version: 535.230.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          Off | 00000000:05:00.0 Off |                    0 |
| N/A   27C    P0              66W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          Off | 00000000:0B:00.0 Off |                    0 |
| N/A   28C    P0              66W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          Off | 00000000:0C:00.0 Off |                    0 |
| N/A   27C    P0              68W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          Off | 00000000:84:00.0 Off |                    0 |
| N/A   27C    P0              67W / 700W |      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

On rayproject/ray:2.45.0-py39-cu128 and rayproject/ray:2.46.0-py39-cu128:

$ nvidia-smi
bash: nvidia-smi: command not found

The Ray dashboard does not show GPUs metrics as well:

Image

Ray Dashboard shows GPU info on the exact same setup but on 2.44.1:

Image

Versions / Dependencies

Working version: rayproject/ray:2.44.1-py39-cu128

Not working versions: rayproject/ray:2.45.0-py39-cu128, rayproject/ray:2.46.0-py39-cu128

Reproduction script

Deploy a RayCluster w/ KubeRay using GPUs with rayproject/ray:2.45.0-py39-cu128

Issue Severity

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Issues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray Coreregressionrelease-blockerP0 Issue that blocks the releasestability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0