Closed
Description
Description
The current implementation for ray.serve.batch executes the batches synchronously. This throttles throughput for asynchronous methods wrapped in ray.serve.batch
.
Use case
This could significantly improve usability when doing I/O calls to an endpoint that expects batching. It also would increase throughput for router-style composed actor where the sub-actors by avoiding the bubble that arises from waiting for the slowest sub-actor.