10000 [Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster · Issue #46444 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster  #46444
Open
@Superskyyy

Description

@Superskyyy

Description

With Ray starting to support the virtual cluster (vCluster) concept and we are seeing advanced multi-cluster per user setups, the Ray dashboard components should not be bound to a single Ray cluster's lifetime anymore, since it makes multi-tenancy sharing and telemetry data persistence complex to implement. Plus that the dashboard would go down together if the head node goes down (fate-sharing), making it difficult to backtrack what happened (and what was executing) during a major incident. @liuxsh9 @Bye-legumes @nemo9cby

Use case

Doing so will bring below benefits:

  1. Dashboard can optionally read from a persistence history server (observability database) instead of pulling directly from a running GCS. (GCS/HA redis writes to persistence store)
  2. Dashboard side overhead will not accidentally bring down the head node.
  3. Users can attach their own external monitoring platforms same way as job dashboard, to manage large amount of clusters.
  4. Each user gets their dashboard, which can be multi physical cluster or vclusters.
  5. Allow checking dashboard even after a cluster was preempted/shutdown.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1Issue that should be fixed within a few weeksdashboardIssues specific to the Ray DashboardenhancementRequest for new feature and/or capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0