Open
Description
What happened + What you expected to happen
ray.cluster_resources()
returns 2 GPUs as ray cluster resources on an ec2 instance with exactly 1 gpu. This is on a docker based setup with 2 containers: ray-head
and ray-worker
. The cluster is booted manually using ray_scripts.start
.
Versions / Dependencies
OS: amazon/Deep Learning AMI (Ubuntu 18.04) Version 56.1
Ray: 1.13.0
Reproduction script
current output:
>>> ray.cluster_resources()
{'GPU': 2.0, 'node:172.18.0.9': 1.0, 'object_store_memory': 19601575526.0, 'memory': 42487119054.0, 'accelerator_type:T4': 2.0, 'CPU': 8.0, 'allows_expensive': 4.0, 'node:172.18.0.5': 1.0}
expected output
>>> ray.cluster_resources()
{'GPU': 1.0, 'node:172.18.0.9': 1.0, 'object_store_memory': 19601575526.0, 'memory': 42487119054.0, 'accelerator_type:T4': 2.0, 'CPU': 8.0, 'allows_expensive': 4.0, 'node:172.18.0.5': 1.0}
Issue Severity
Medium: It is a significant difficulty but I can work around it.