8000 [Core] Could not connect to socket · Issue #54067 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Core] Could not connect to socket #54067
Closed
@vitslam

Description

@vitslam

What happened + What you expected to happen

The following is the complete master-slave node startup and test process. After the slave node joins, the error "Could not connect to socket /tmp/ray/session..." appears when initiating a task (the master node used in the example is ln02, and the slave node is cpunode01. The two nodes mount the same /tmp/ray directory. I don't know if this is the reason for the error):

(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ray start --head --dashboard-host=0.0.0.0 
Usage stats collection is enabled. To disable this, add `--disable-usage-stats` to the command that starts the cluster, or run the following command: `ray disable-usage-stats` before starting the cluster. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.

Local node IP: 10.93.0.58

--------------------
Ray runtime started.
--------------------

Next steps
  To add another node to this Ray cluster, run
    ray start --address='10.93.0.58:6379'
  
  To connect to this Ray cluster:
    import ray
    ray.init()
  
  To submit a Ray job using the Ray Jobs CLI:
    RAY_ADDRESS='http://10.93.0.58:8265' ray job submit --working-dir . -- python my_script.py
  
  See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html 
  for more information on submitting Ray jobs to the Ray cluster.
  
  To terminate the Ray runtime, run
    ray stop
  
  To view the status of the cluster, use
    ray status
  
  To monitor and debug Ray, view the dashboard at 
    10.93.0.58:8265
  
  If connection to the dashboard fails, check your firewall settings and network configuration.
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# cat demo.py 
import ray

ray.init(address='auto') 

@ray.remote
def hello():
    return "Hello, Ray!"

future = hello.remote()
result = ray.get(future)

print(result)
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# python demo.py 
2025-06-25 11:56:50,183 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.93.0.58:6379...
2025-06-25 11:56:50,194 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.93.0.58:8265 
Hello, Ray!
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ssh -p 2222 cpunode01 "/opt/conda/bin/ray start --address='10.93.0.58:6379'"
Warning: Permanently added '[cpunode01]:2222' (ED25519) to the list of known hosts.
2025-06-25 11:57:12,029 WARNING services.py:766 -- The node IP address of the current host recorded in node_ip_address.json (10.93.0.58) is different from the current IP address: 10.93.16.54. Ray will use 10.93.16.54 as the current node's IP address. Creating 2 instances in the same host with different IP address is not supported. Please create an enhnacement request tohttps://github.com/ray-project/ray/issues.
2025-06-25 11:57:11,993 INFO scripts.py:1042 -- Local node IP: 10.93.16.54
2025-06-25 11:57:12,185 SUCC scripts.py:1058 -- --------------------
2025-06-25 11:57:12,185 SUCC scripts.py:1059 -- Ray runtime started.
2025-06-25 11:57:12,185 SUCC scripts.py:1060 -- --------------------
2025-06-25 11:57:12,185 INFO scripts.py:1062 -- To terminate the Ray runtime, run
2025-06-25 11:57:12,185 INFO scripts.py:1063 --   ray stop
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# ray status
======== Autoscaler status: 2025-06-25 11:57:18.410351 ========
Node status
---------------------------------------------------------------
Active:
 1 node_55e745f900c652b2f0cf59a07c6ab8d1b88e80522868190fa30edc33
 1 node_98b3fc535ff63b62febe57ad7b7db29bfd020f51262abe35a9776302
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/368.0 CPU
 0B/770.36GiB memory
 0B/286.23GiB object_store_memory

Demands:
 (no resource demands)
(base) root@ln02:/shared2/home/wzh/t2v_data/ray# python demo.py 
2025-06-25 11:58:35,822 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 10.93.0.58:6379...
2025-06-25 11:58:35,854 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at 10.93.0.58:8265 
[2025-06-25 11:58:45,937 C 458406 458406] raylet_connection.cc:33: Could not connect to socket /tmp/ray/session_2025-06-25_11-56-12_829320_438270/sockets/raylet.1
*** StackTrace Information ***
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x13de47a) [0x7fde7324347a] ray::operator<<()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x479) [0x7fde73245ef9] ray::RayLog::~RayLog()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0xa7b905) [0x7fde728e0905] ray::raylet::RayletConnection::RayletConnection()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorkerC2ENS0_17CoreWorkerOptionsERKNS_8WorkerIDE+0xf06) [0x7fde72804c96] ray::core::CoreWorker::CoreWorker()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImplC2ERKNS0_17CoreWorkerOptionsE+0x3de) [0x7fde72818cbe] ray::core::CoreWorkerProcessImpl::CoreWorkerProcessImpl()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess10InitializeERKNS0_17CoreWorkerOptionsE+0x34) [0x7fde7281a114] ray::core::CoreWorkerProcess::Initialize()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x801111) [0x7fde72666111] __pyx_pw_3ray_7_raylet_10CoreWorker_1__cinit__()
/opt/conda/lib/python3.10/site-packages/ray/_raylet.so(+0x802319) [0x7fde72667319] __pyx_tp_new_3ray_7_raylet_CoreWorker()
python(_PyObject_MakeTpCall+0x199) [0x561b3f3ec999] _PyObject_MakeTpCall
python(_PyEval_EvalFrameDefault+0x54a6) [0x561b3f3e89d6] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13ca) [0x561b3f3e48fa] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(PyObject_Call+0xbc) [0x561b3f3fff1c] PyObject_Call
python(_PyEval_EvalFrameDefault+0x2d83) [0x561b3f3e62b3] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6c) [0x561b3f3f3a2c] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13ca) [0x561b3f3e48fa] _PyEval_EvalFrameDefault
python(+0x1d7c60) [0x561b3f486c60] _PyEval_Vector
python(PyEval_EvalCode+0x87) [0x561b3f486ba7] PyEval_EvalCode
python(+0x20812a) [0x561b3f4b712a] run_eval_code_obj
python(+0x203523) [0x561b3f4b2523] run_mod
python(+0x9a6f5) [0x561b3f3496f5] pyrun_file.cold
python(_PyRun_SimpleFileObject+0x1ae) [0x561b3f4ac9fe] _PyRun_SimpleFileObject
python(_PyRun_AnyFileObject+0x44) [0x561b3f4ac594] _PyRun_AnyFileObject
python(Py_RunMain+0x38b) [0x561b3f4a978b] Py_RunMain
python(Py_BytesMain+0x37) [0x561b3f47a1f7] Py_BytesMain
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fde73d00d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fde73d00e40] __libc_start_main
python(+0x1cb0f1) [0x561b3f47a0f1]

Versions / Dependencies

2.44.1

Reproduction script

none

Issue Severity

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CorequestionJust a question :)stabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    < 2B83 span class="Box-sc-g0xbh4-0 dCKzSy prc-Text-Text-0ima0">No branches or pull requests

    Issue actions

      0