8000 [Core] Ray fails to fulfill request due to node being annotated by IP address · Issue #54150 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Core] Ray fails to fulfill request due to node being annotated by IP address #54150
Closed
@dayeol

Description

@dayeol

Hello, I was able to setup a ray cluster with 16 GPUs for vLLM and here's how it looks:

(vllm) <dayeol@node0> benchmarks ray list nodes

======== List: 2025-06-26 11:28:29.307494 ========
Stats:
------------------------------
Total: 2

Table:
------------------------------
    NODE_ID                                                   NODE_IP         IS_HEAD_NODE    STATE    STATE_MESSAGE    NODE_NAME       RESOURCES_TOTAL                   LABELS
 0  05dc9fcb13c3975706b29e51c173fee8987d3d33c90603c4462c8a92  node0.internal True            ALIVE                     node0.internal CPU: 224.0                        ray.io/node_id: 05dc9fcb13c3975706b29e51c173fee8987d3d33c90603c4462c8a92
                                                                                                                                        GPU: 8.0
                                                                                                                                        accelerator_type:H100: 1.0
                                                                                                                                        memory: 1477.507 GiB
                                                                                                                                        node:__internal_head__: 1.0
                                                                                                                                        node:node0.internal: 1.0
                                                                                                                                        object_store_memory: 186.265 GiB
 1  6808cee8a2bdc20ad1ec345a2dfe05712c3b80676bbd796e84b82d20  node1.internal  False           ALIVE                     node1.internal  CPU: 224.0                        ray.io/node_id: 6808cee8a2bdc20ad1ec345a2dfe05712c3b80676bbd796e84b82d20
                                                                                                                                        GPU: 8.0
                                                                                                                                        accelerator_type:H100: 1.0
                                                                                                                                        memory: 1505.393 GiB
                                                                                                                                        node:node1.internal: 1.0
                                                                                                                                        object_store_memory: 186.265 GiB

Each of the node has only IPv6 address, but the resource is annotated with their domain names.

But when I run my workload, it tries to request {'node:2d27c:1f9f:d698:63e7:94a4:bcd2:b36a:1b3f': 0.001, 'GPU': 1.0} where 2d27c:1f9f:d698:63e7:94a4:bcd2:b36a:1b3f is the IPv6 address of node0.internal.
This request fails to fullfil because of the mismatch.

So if I try this:

@ray.remote(resources={"node:2d27c:1f9f:d698:63e7:94a4:bcd2:b36a:1b3f": 0.01}, num_gpus=0.1) 
class Actor1:
    def __init__(self):
        pass

actor1 = Actor1.remote()

@ray.remote(resources={"node:node0.internal": 0.01}, num_gpus=0.1) 
class Actor2:
    def __init__(self):
        pass

actor2 = Actor2.remote()

I only see Actor2 from ray list actors, and Actor1 fails to start with the following error:

(autoscaler +1m24s, ip=node0.internal) Error: No available node types can fulfill resource request {'CPU': 1.0, 'node:2d27c:1f9f:d698:63e7:94a4:bcd2:b36a:1b3f': 0.01, 'GPU': 0.1}. Add suitable node types to this cluster to resolve this issue.

What would be the quick fix to the issue?
Not sure if this is an issue of vLLM or ray.
Please help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0