Closed
Description
What happened + What you expected to happen
It seems like Ray 2.45+ CUDA images are missing required NVIDIA tools such as nvidia-smi
. The Ray Dashboard is also unable to recognize GPUs.
For example on rayproject/ray:2.44.1-py39-cu128
, nvidia-smi
works:
$ nvidia-smi
Fri May 23 07:28:08 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.230.02 Driver Version: 535.230.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA H100 80GB HBM3 Off | 00000000:05:00.0 Off | 0 |
| N/A 27C P0 66W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA H100 80GB HBM3 Off | 00000000:0B:00.0 Off | 0 |
| N/A 28C P0 66W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA H100 80GB HBM3 Off | 00000000:0C:00.0 Off | 0 |
| N/A 27C P0 68W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA H100 80GB HBM3 Off | 00000000:84:00.0 Off | 0 |
| N/A 27C P0 67W / 700W | 0MiB / 81559MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
On rayproject/ray:2.45.0-py39-cu128
and rayproject/ray:2.46.0-py39-cu128
:
$ nvidia-smi
bash: nvidia-smi: command not found
The Ray dashboard does not show GPUs metrics as well:
Ray Dashboard shows GPU info on the exact same setup but on 2.44.1:
Versions / Dependencies
Working version: rayproject/ray:2.44.1-py39-cu128
Not working versions: rayproject/ray:2.45.0-py39-cu128, rayproject/ray:2.46.0-py39-cu128
Reproduction script
Deploy a RayCluster w/ KubeRay using GPUs with rayproject/ray:2.45.0-py39-cu128
Issue Severity
None