Open
Description
What happened + What you expected to happen
raylet crashes with ASSERTION FAILED: queue.num_items() == 0
on startup:
(autoscaler +5m10s) Adding 2 node(s) of type worker-r6a.48xlarge.
(pid=32403, ip=10.10.10.1) E0603 09:59:46.144885565 32403 completion_queue.cc:257] ASSERTION FAILED: queue.num_items() == 0
(pid=32403, ip=10.10.10.1) *** SIGABRT received at time=1748915986 on cpu 8 ***
(pid=32403, ip=10.10.10.1) PC: @ 0x7505092969fc (unknown) pthread_kill
(pid=32403, ip=10.10.10.1) @ 0x750509242520 (unknown) (unknown)
(pid=32403, ip=10.10.10.1) [2025-06-03 09:59:46,146 E 32403 32403] logging.cc:496: *** SIGABRT received at time=1748915986 on cpu 8 ***
(pid=32403, ip=10.10.10.1) [2025-06-03 09:59:46,146 E 32403 32403] logging.cc:496: PC: @ 0x7505092969fc (unknown) pthread_kill
(pid=32403, ip=10.10.10.1) [2025-06-03 09:59:46,146 E 32403 32403] logging.cc:496: @ 0x750509242520 (unknown) (unknown)
(pid=32403, ip=10.10.10.1) Fatal Python error: Aborted
(pid=32403, ip=10.10.10.1)
(pid=32403, ip=10.10.10.1) Stack (most recent call first):
(pid=32403, ip=10.10.10.1) File "/usr/local/lib/python3.10/dist-packages/ray/_private/worker.py", line 946 in main_loop
(pid=32403, ip=10.10.10.1) File "/usr/local/lib/python3.10/dist-packages/ray/_private/workers/default_worker.py", line 330 in <module>
(pid=32403, ip=10.10.10.1)
(pid=32403, ip=10.10.10.1) Extension modules: msgpack._cmsgpack, google._upb._message, psutil._psutil_linux, psutil._psutil_posix, setproctitle, yaml._yaml, charset_normalizer.md, ray._raylet (total: 8)
Versions / Dependencies
os: ubuntu 22.04
python: 3.10
ray: 2.46.0
Reproduction script
No reproduction script unfortunately, I don't think it's related to user code as it's happening when raylet is starting up, no user code has been ran.
Low: It annoys or frustrates me.