Description
In Serve, we want to be able to support graceful shutdown when actors are being shut down (e.g., due to downscaling) without failing requests. This is difficult to do correctly in the application layer because we have no way of preventing the actor from accepting and queuing new messages. We currently do this best-effort by signaling the clients to stop sending requests and having the replicas wait for a given timeout until there are no more pending queries.
Ideally, we would have an API similar to ray.kill
that would signal the actor to gracefully shut down. Once this message is received, the actor would start rejecting further method calls with a GracefulShutdownError
that would enable clients to safely retry.
This could either be implemented as a new RPC to the actor or by changing the behavior of the existing __ray_terminate__
.