Closed
Description
What happened + What you expected to happen
The following is the complete master-slave node startup and test process. After the slave node joins, the error "Could not connect to socket /tmp/ray/session..." appears when initiating a task (the master node used in the example is ln02, and the slave node is cpunode01. The two nodes mount the same /tmp/ray directory. I don't know if this is the reason for the error):
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ray start --head --dashboard-host=0.0.0.0
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
Local node IP: 10.93.0.58
--------------------
Ray runtime started.
--------------------
Next steps
To add another node to this Ray cluster, run
ray start --address='10.93.0.58:6379'
To connect to this Ray cluster:
import ray
ray.init()
To submit a Ray job using the Ray Jobs CLI:
RAY_ADDRESS='http://10.93.0.58:8265' ray job submit --working-dir . -- python my_script.py
See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
for more information on submitting Ray jobs to the Ray cluster.
To terminate the Ray runtime, run
ray stop
To view the status of the cluster, use
ray status
To monitor and debug Ray, view the dashboard at
10.93.0.58:8265
If connection to the dashboard fails, check your firewall settings and network configuration.
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# cat demo.py
import ray
ray.init(address='auto')
@ray.remote
def hello():
return "Hello, Ray!"
future = hello.remote()
result = ray.get(future)
print(result)
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# python demo.py
2025-06-25 11:56:50,183 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.93.0.58:6379...
2025-06-25 11:56:50,194 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.93.0.58:8265
Hello, Ray!
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ssh -p 2222 cpunode01 "/opt/conda/bin/ray start --address='10.93.0.58:6379'"
Warning: Permanently added '[cpunode01]:2222' (ED25519) to the list of known hosts.
2025-06-25 11:57:12,029 WARNING services.py:766 -- The node IP address of the current host recorded in node_ip_address.json (10.93.0.58) is different from the current IP address: 10.93.16.54. Ray will use 10.93.16.54 as the current node's IP address. Creating 2 instances in the same host with different IP address is not supported. Please create an enhnacement request tohttps://github.com/ray-project/ray/issues.
2025-06-25 11:57:11,993 INFO scripts.py:1042 -- Local node IP: 10.93.16.54
2025-06-25 11:57:12,185 SUCC scripts.py:1058 -- --------------------
2025-06-25 11:57:12,185 SUCC scripts.py:1059 -- Ray runtime started.
2025-06-25 11:57:12,185 SUCC scripts.py:1060 -- --------------------
2025-06-25 11:57:12,185 INFO scripts.py:1062 -- To terminate the Ray runtime, run
2025-06-25 11:57:12,185 INFO scripts.py:1063 -- ray stop
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ray status
======== Autoscaler status: 2025-06-25 11:57:18.410351 ========
Node status
---------------------------------------------------------------
Active:
1 node_55e745f900c652b2f0cf59a07c6ab8d1b88e80522868190fa30edc33
1 node_98b3fc535ff63b62febe57ad7b7db29bfd020f51262abe35a9776302
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/368.0 CPU
0B/770.36GiB memory
0B/286.23GiB object_store_memory
Demands:
(no resource demands)
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# python demo.py
2025-06-25 11:58:35,822 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.93.0.58:6379...
2025-06-25 11:58:35,854 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.93.0.58:8265
[2025-06-25 11:58:45,937 C 458406 458406] raylet_connection.cc:33: Could not connect to socket /tmp/ray/session_2025-06-25_11-56-12_829320_438270/sockets/raylet.1
*** StackTrace Information ***
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x13de47a) [0x7fde7324347a] ray::operator<<()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x479) [0x7fde73245ef9] ray::RayLog::~RayLog()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0xa7b905) [0x7fde728e0905] ray::raylet::RayletConnection::RayletConnection()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorkerC2ENS0_17CoreWorkerOptionsERKNS_8WorkerIDE+0xf06) [0x7fde72804c96] ray::core::CoreWorker::CoreWorker()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImplC2ERKNS0_17CoreWorkerOptionsE+0x3de) [0x7fde72818cbe] ray::core::CoreWorkerProcessImpl::CoreWorkerProcessImpl()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess10InitializeERKNS0_17CoreWorkerOptionsE+0x34) [0x7fde7281a114] ray::core::CoreWorkerProcess::Initialize()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x801111) [0x7fde72666111] __pyx_pw_3ray_7_raylet_10CoreWorker_1__cinit__()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x802319) [0x7fde72667319] __pyx_tp_new_3ray_7_raylet_CoreWorker()
python(_PyObject_MakeTpCall+0x199) [0x561b3f3ec999] _PyObject_MakeTpCall
python(_PyEval_EvalFrameDefault+0x54a6) [0x561b3f3e89d6] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13ca) [0x561b3f3e48fa] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(PyObject_Call+0xbc) [0x561b3f3fff1c] PyObject_Call
python(_PyEval_EvalFrameDefault+0x2d83) [0x561b3f3e62b3] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13ca) [0x561b3f3e48fa] _PyEval_EvalFrameDefault
python(+0x1d7c60) [0x561b3f486c60] _PyEval_Vector
python(PyEval_EvalCode+0x87) [0x561b3f486ba7] PyEval_EvalCode
python(+0x20812a) [0x561b3f4b712a] run_eval_code_obj
python(+0x203523) [0x561b3f4b2523] run_mod
python(+0x9a6f5) [0x561b3f3496f5] pyrun_file.cold
python(_PyRun_SimpleFileObject+0x1ae) [0x561b3f4ac9fe] _PyRun_SimpleFileObject
python(_PyRun_AnyFileObject+0x44) [0x561b3f4ac594] _PyRun_AnyFileObject
python(Py_RunMain+0x38b) [0x561b3f4a978b] Py_RunMain
python(Py_BytesMain+0x37) [0x561b3f47a1f7] Py_BytesMain
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fde73d00d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fde73d00e40] __libc_start_main
python(+0x1cb0f1) [0x561b3f47a0f1]
Versions / Dependencies
2.44.1
Reproduction script
none
Issue Severity
None