Feature request: graceful shutdown for actors

In Serve, we want to be able to support graceful shutdown when actors are being shut down (e.g., due to downscaling) without failing requests. This is difficult to do correctly in the application layer because we have no way of preventing the actor from accepting and queuing new messages. We currently do this best-effort by signaling the clients to stop sending requests and having the replicas wait for a given timeout until there are no more pending queries.

Ideally, we would have an API similar to ray.kill that would signal the actor to gracefully shut down. Once this message is received, the actor would start rejecting further method calls with a GracefulShutdownError that would enable clients to safely retry.

This could either be implemented as a new RPC to the actor or by changing the behavior of the existing __ray_terminate__.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions