8000 optimization: Client blocks on releasing references due to detached actor race condition · Issue #14137 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
optimization: Client blocks on releasing references due to detached actor race condition #14137
Open
@barakmich

Description

@barakmich

This is the comment as mentioned here https://github.com/ray-project/ray/pull/14122/files#diff-10f3fda5ddb0ff3dbb8f347dd7fc53101d2dd140585e72f2d55be831bd5455dbR134

What is the problem?

In most cases, a client object's lifetime matches its ID, but this isn't so with named actors. Performance can be improved by reverting this call to non-blocking.

Reproduction outline

Here's how named actors fail:

@ray.remote
class Actor:
  def do_it(self):
     pass

a = Actor.options(name="my_actor", lifetime="detached").remote()
# a has ActorRef 123, which is held in the server

del a
# We will (non-blocking) send a message to release 123 on the server side... sometime

b = ray.get_actor("my_actor")
# The server marks 123 as held in the set, which it already is!

# Now the non-blocking release comes in! It releases 123 on the server side, but we still have b as a reference on the client side

b.do_it.remote()
# Crashes here because now the server side doesn't have a reference to 123.

Potential fixes include attempting to reattach actor references if they've been cleaned up; better client logic around when and how to release objects; exclusively finishing all releases that may be queued before get_actor() happens (the reference will get removed and recreated, instead of happening over itself).

Of these, the last is probably the most flexible. On client release (__del__), hold a lock that releases once the release message finishes (soft-blocking anything that needs all releases flushed, including other release messages) and have get_actor require that lock. All other messages go on in their usual way, and in the usual case, that release lock doesn't block execution

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Issue moderate in impact or severityenhancementRequest for new feature and/or capabilitygood-first-issueGreat starter issue for someone just starting to contribute to Raypending-cleanupThis issue is pending cleanup. It will be removed in 2 weeks after being assigned.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0