Open
Description
What happened + What you expected to happen
Ray starts AutoscalingRequester
even when using enableInTreeAutoscaling
through KubeRay as soon as any Ray data workload is processed. This wasn't the case in the previous versions of Ray (at least in 2.37.0 this wasn't the case). This doesn't happen when processing tasks/actors through Ray Core.
This issue prevents the cluster from scaling down, as the actor stays alive after processing finishes.
Current workaround - run the following code before the workload finishes:
autoscaling_requester = ray.get_actor(name="AutoscalingRequester", namespace="AutoscalingRequester")
ray.kill(autoscaling_requester)
Versions / Dependencies
- Ray 2.43.0
- KubeRay 1.3.0
- Python 3.12.9
- OS official Ray Docker image
Reproduction script
(base) ray@ray-cluster-head-tprgk:/opt$ ray list actors | grep AutoscalingRequester
(base) ray@ray-cluster-head-tprgk:/opt$ python
Python 3.12.9 | packaged by conda-forge | (main, Feb 14 2025, 08:00:06) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ray
>>> ray.__version__
'2.43.0'
>>> ds = ray.data.from_items([1, 2, 3, 4, 5])
>>> ds.take(1)
[{'item': 1}]
>>>
(base) ray@ray-cluster-head-tprgk:/opt$ ray list actors | grep AutoscalingRequester
12 20daa0ff5770f0520104019f0a000000 AutoscalingRequester ALIVE 0a000000 AutoscalingRequester 97972fc26eef4855de27431df5f24db28d9d2dc788be19805ba1d5cf 477754 AutoscalingRequester
Issue Severity
None