8000 [Azure] Ray up for Azure fails · Issue #48976 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Azure] Ray up for Azure fails #48976
Open
@alhparsa

Description

@alhparsa

What happened + What you expected to happen

Following the instructions from Launching Ray Clusters on Azure and using the yaml examples from Azure Example yamls directory the ray up example-minimal.yaml fails.

Below is the logs generated by this command:

2024-11-27 16:35:40,613	INFO util.py:382 -- setting max workers for head node type to 0
2024-11-27 16:35:40,613	INFO util.py:386 -- setting max workers for ray.worker.default to 1
Checking Azure environment settings
2024-11-27 16:35:40,879	INFO config.py:52 -- Using subscription id: SUBSCRIPTION-ID
2024-11-27 16:35:40,879	INFO config.py:67 -- Creating/Updating resource group: RESOURCE_GROUP
2024-11-27 16:35:41,349 - INFO - AzureCliCredential.get_token succeeded
2024-11-27 16:35:42,110	INFO config.py:79 -- Using cluster name: minimal
2024-11-27 16:35:42,110	INFO config.py:90 -- Using unique id: fe8e
2024-11-27 16:35:42,111	INFO config.py:98 -- Using subnet mask: 10.97.0.0/16
2024-11-27 16:35:42,489	INFO config.py:144 -- Using msi_name: ray-minimal-fe8e-msi from msi_resource_group: RESOURCE_GROUP
2024-11-27 16:36:16,277 - INFO - No environment configuration found.
2024-11-27 16:36:16,280 - INFO - ManagedIdentityCredential will use IMDS
2024-11-27 16:36:16,847 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
2024-11-27 16:36:17,626 - INFO - DefaultAzureCredential acquired a token from AzureCliCredential
Traceback (most recent call last):
  File "CONDA_PATH/miniconda3/bin/ray", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/scripts/scripts.py", line 2658, in main
    return cli()
           ^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/cli_logger.py", line 856, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/scripts/scripts.py", line 1376, in up
    create_or_update_cluster(
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/commands.py", line 317, in create_or_update_cluster
    get_or_create_head_node(
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/commands.py", line 681, in get_or_create_head_node
    nodes = provider.non_terminated_nodes(head_node_tags)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 182, in non_terminated_nodes
    nodes = self._get_filtered_nodes(tag_filters=tag_filters)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 47, in wrapper
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 110, in _get_filtered_nodes
    return {k: v for k, v in self.cached_nodes.items() if match_tags(v["tags"])}
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 110, in <dictcomp>
    return {k: v for k, v in self.cached_nodes.items() if match_tags(v["tags"])}
                                                          ^^^^^^^^^^^^^^^^^^^^^
  File "CONDA_PATH/miniconda3/lib/python3.11/site-packages/ray/autoscaler/_private/_azure/node_provider.py", line 93, in match_tags
    if tags.get(k) != v:

Versions / Dependencies

Below is my ray version along with azure-cli and azure-identity:

ray==2.39.0
azure-identity==1.14.0
azure-cli==2.61.0

Reproduction script

az login --use-device-code
wget https://raw.githubusercontent.com/ray-project/ray/refs/heads/master/python/ray/autoscaler/azure/example-minimal.yaml
## Modify the yaml file to match the public, private keys you have your machine + the region and resource group
ray up example-minimal.yaml

Issue Severity

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tcommunity-backlogcore-clustersFor launching and managing Ray clusters/jobs/kubernetes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0