Ray.wait causes node to hang if there are too many object ids

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
Ray installed from (source or binary): binary
Ray version: 0.7.6
Python version: 3.7.3
Exact command to reproduce:

Describe the problem

I have a fixed amount of workers that produce a couple hundred thousand results each. I pool all the object ids into a single list which roughly has around 10000000 items. If I use ray.wait on this list the node hangs. Timeout doesn't help, it hangs either way. Below is a minimal example.

Source code / logs

import time
import ray

ray.init()

@ray.remote(num_return_vals=1000000)
def test():
    time.sleep(60)
    return list(range(1000000))

results = []
for i in range(10):
    results.extend(test.remote())

ray.wait(results, timeout=20)

2019-12-09 17:27:45,501 WARNING worker.py:1619 -- The node with client ID 8bf2d3fbdbe7ae98544e0222f85c3cdb6f5f6f11 has been marked dead because the monitor has missed too many heartbeats from it.

(pid=raylet) F1209 17:28:00.863570 16313 node_manager.cc:487]  Check failed: client_id != gcs_client_->client_table().GetLocalClientId() Exiting because this node manager has mistakenly been marked dead by the monitor.
(pid=raylet) *** Check failure stack trace: ***
(pid=raylet)     @           0x6f8d1a  google::LogMessage::Fail()
(pid=raylet)     @           0x6fa103  google::LogMessage::SendToLog()
(pid=raylet)     @           0x6f8a42  google::LogMessage::Flush()
(pid=raylet)     @           0x6f8c31  google::LogMessage::~LogMessage()
(pid=raylet)     @           0x52b112  ray::RayLog::~RayLog()
(pid=raylet)     @           0x466482  ray::raylet::NodeManager::ClientRemoved()
(pid=raylet)     @           0x4b76ee  ray::gcs::ClientTable::HandleNotification()
(pid=raylet)     @           0x4d304b  _ZNSt17_Function_handlerIFvPN3ray3gcs14RedisGcsClientERKNS0_8ClientIDERKSt6vectorINS0_3rpc11GcsNodeInfoESaIS9_EEEZZNS1_11ClientTable7ConnectERKS9_ENKUlS3_RKNS0_8UniqueIDESH_E_clES3_SK_SH_EUlS3_SK_SD_E_E9_M_invokeERKSt9_Any_dataS3_S6_SD_
(pid=raylet)     @           0x4d2706  _ZNSt17_Function_handlerIFvPN3ray3gcs14RedisGcsClientERKNS0_8ClientIDENS0_3rpc13GcsChangeModeERKSt6vectorINS7_11GcsNodeInfoESaISA_EEEZNS1_3LogIS4_SA_E9SubscribeERKNS0_5JobIDES6_RKSt8functionIFvS3_S6_SE_EERKSL_IFvS3_EEEUlS3_S6_S8_SE_E_E9_M_invokeERKSt9_Any_dataS3_S6_S8_SE_
(pid=raylet)     @           0x4b5673  _ZZN3ray3gcs3LogINS_8ClientIDENS_3rpc11GcsNodeInfoEE9SubscribeERKNS_5JobIDERKS2_RKSt8functionIFvPNS0_14RedisGcsClientESA_NS3_13GcsChangeModeERKSt6vectorIS4_SaIS4_EEEERKSB_IFvSD_EEENKUlRKNS0_13CallbackReplyEE_clESU_
(pid=raylet)     @           0x4dacb9  ray::gcs::GlobalRedisCallback()
(pid=raylet)     @           0x4df9cb  redisProcessCallbacks
(pid=raylet)     @           0x4de726  RedisAsioClient::handle_read()
(pid=raylet)     @           0x4dd958  boost::asio::detail::reactive_null_buffers_op<>::do_complete()
(pid=raylet)     @           0x425bcd  boost::asio::detail::scheduler::run()
(pid=raylet)     @           0x40fb1d  main
(pid=raylet)     @     0x7f03ca300830  __libc_start_main
(pid=raylet)     @           0x4207e1  (unknown)

After this the raylet dies.

Any ideas?
Is there any other way to receive results in the order they're ready aside from this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions