8000 [Bug] Detached actor exceptions are not logged. · Issue #21810 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Bug] Detached actor exceptions are not logged. #21810
Open
@rkooo567

Description

@rkooo567

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

Currently, Ray's error handling is as follow;

  • If any remote call fails, the returned object ref will contain an exception
  • If returned obj is not caught by ray.get, and it goes out of scope, it prints the error message to the caller
  • If returned obj is caught, it raises an exception.

Note that in the past, we handled it by "always logging exceptions to log files". But we removed this feature so that we can have better error handling model (we don't want to print errors before ray.get is called).

The problem is this model doesn't go well with detached actor. For example;

  • Create a detached actor
  • Detached actor raises an exception in its method
  • And the driver exits before the method raises an exception

In this case, detached actors will raise an exception, but there's no way to know this because the exceptions are not logged. From user perspective, it looks like everything went well, but in the real world, the actor method has failed. I think this can be problematic in some detached actor based workloads especially when detached actors are used for "services" like appliations.

cc @edoakes @simon-mo

Versions / Dependencies

master

Reproduction script

import ray
ray.init("auto")

@ray.remote
class A:
    def r(self):
        pass

    def s(self):
        import time
        time.sleep(10)
        raise ValueError("abc")

a = A.options(lifetime="detached").remote()
ray.get(a.r.remote())
a.s.remote()
import time
time.sleep(2)

And then after 10 seconds, detached actor's s fails with ValueError, but there's no way to know this because it is not logged.

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalbugSomething that is supposed to be working; but isn'tdashboardIssues specific to the Ray DashboardobservabilityIssues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profilingpending-cleanupThis issue is pending cleanup. It will be removed in 2 weeks after being assigned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0