[Dashboard] Decoupling dashboard and dashboard lifetime from Ray Cluster

@liuxsh9

Description

With Ray starting to support the virtual cluster (vCluster) concept and we are seeing advanced multi-cluster per user setups, the Ray dashboard components should not be bound to a single Ray cluster's lifetime anymore, since it makes multi-tenancy sharing and telemetry data persistence complex to implement. Plus that the dashboard would go down together if the head node goes down (fate-sharing), making it difficult to backtrack what happened (and what was executing) during a major incident. @liuxsh9 @Bye-legumes @nemo9cby

Use case

Doing so will bring below benefits:

Dashboard can optionally read from a persistence history server (observability database) instead of pulling directly from a running GCS. (GCS/HA redis writes to persistence store)
Dashboard side overhead will not accidentally bring down the head node.
Users can attach their own external monitoring platforms same way as job dashboard, to manage large amount of clusters.
Each user gets their dashboard, which can be multi physical cluster or vclusters.
Allow checking dashboard even after a cluster was preempted/shutdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions