Mark tasks as failed if complete() is false when run finishes #2710

ThePletch · 2019-05-16T15:36:32Z

Description

Adds the check_complete_on_run setting to workers, which changes the logic used to determine job status in TaskProcess to mark a task as FAILED if its run() method executes successfully but its complete() method still returns false. This setting is false by default, to avoid breaking existing workflows.

Motivation and Context

Currently, if a task runs but does not create all of its outputs (or does not satisfy its custom complete() method), it is marked as DONE, but tasks that depend on it later fail with a cryptic "missing dependency" error. This associates the failure with the responsible task rather than marking dependent tasks that were unable to begin execution as failed. Additionally, this greatly simplifies detecting whether root tasks (i.e. tasks upon which no other tasks depend) have actually completed successfully. Without this change, they are marked as DONE and there is no built-in method to detect that their outputs have not been created.

Have you tested this? If so, how?

I have included unit tests.

ThePletch · 2019-05-20T16:16:56Z

Tests are now passing and this PR is ready for review.

dlstadther

I've not experienced this situation for non-external tasks. Could you provide an example where your task's run completes, but still has a complete evaluation of false?

dlstadther · 2019-05-21T11:16:56Z

luigi/worker.py

+                    if not self.check_complete_on_run or self.task.complete():
+                        status = DONE
+                    else:
+                        raise TaskException("Task finished running, but complete() is still returning false.")


Do we want to raise an exception in this case? Or saying the task FAILED?

Or will the task already be FAILED? And we just want to raise an exception for the reason why?

Initially I had it just marked as 'failed', but it seemed that raising an exception was the standard way to do this (given that the default on_failure callback doesn't do anything with its ar 8000 guments and just processes the currently-being-handled exception). I can switch back to manually marking the failure if that's what's preferred.

dlstadther · 2019-05-21T11:20:41Z

luigi/worker.py

@@ -447,6 +455,11 @@ class worker(Config):
    check_unfulfilled_deps = BoolParameter(default=True,
                                           description='If true, check for completeness of '
                                           'dependencies before running a task')
+    check_complete_on_run = BoolParameter(default=False,
+                                          config_path=dict(section='core', name='check-complete-on-run'),


i don't see the reason to allow this to be included in the legacy core section with alt name. I'd recommend just removing the config_path option and force the inclusion of check_complete_on_run in a [worker] section.

could you clarify what you mean by "force the inclusion in a [worker] section"? i had assumed that specifying config_path was required to read values from the config file, so the solution here would be to just change section to worker.

Currently, this new configuration option could be set as either:

[worker] check_complete_on_run: True

bc it is a python variable named check_complete_on_run and within the worker luigi.Config class

or

[core] check-complete-on-run: True

because you've specified a config_path which differs from the Config class name and variable name.

I joined the Luigi project after Config classes were already introduced. But i understood the config_path parameter option was a compatibility utility to transition the dash-delimited naming to underscore-delimited naming, as well as changing the default configuration section where the variable could be set.

So when i say "force the inclusion in the [worker] section", i just mean remove the config_path=.... Thus, requiring the variable to be set only from the [worker] section (and not allowing check-complete-on-run in the [core]`.

ThePletch · 2019-05-21T13:44:53Z

We have a set of Luigi tasks that execute jobs on external platforms, e.g. SGE, that produce various file outputs. We've run into cases where the job will exit successfully, but for whatever reason (e.g. didn't find values in an external database) will not produce the outputs we expect. It's infeasible to modify the external platform so that these cases cause the job to fail, so our best solution is to treat a lack of existing outputs as an immediate Luigi task failure.

ThePletch · 2019-05-23T18:29:53Z

@dlstadther I've made the requested changes to configuration. Is there anything else I should modify?

dlstadther

I think i'm good with this now. Thanks!

Joao-M-Almeida · 2019-11-08T13:27:40Z

This feature is not documented in https://luigi.readthedocs.io/en/stable/configuration.html#worker, is there any reason, or is just the documentation out of date?

Tarrasch · 2019-11-29T06:08:55Z

Nice feature! But yes, please add docs for this feature. Would you mind doing so @ThePletch?

honnix · 2020-01-02T18:37:58Z

I think we should document this flag. It's so useful that I have a feeling it's a missing feature from the beginning.

ThePletch · 2020-01-03T14:35:43Z

Could have sworn I had documented this as part of the PR. I'll look into addressing that this evening.

Mark tasks as failed if complete() is false when run finishes

cb71b10

ThePletch requested review from dlstadther, honnix and Tarrasch as code owners May 16, 2019 15:36

Steve Pletcher added 6 commits May 16, 2019 11:39

linting

34ab319

properly reference child task

2b57fef

Add check_complete_on_run setting to gate new feature

6ef563e

lint roller

4370cd7

fix use of misleading call signature

ee5e9c5

last param fix

d326c6f

dlstadther reviewed May 21, 2019

View reviewed changes

remove config_path param

a64cd52

dlstadther approved these changes May 24, 2019

View reviewed changes

dlstadther merged commit 41e40fc into spotify:master May 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mark tasks as failed if complete() is false when run finishes #2710

Mark tasks as failed if complete() is false when run finishes #2710

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mark tasks as failed if complete() is false when run finishes #2710

Mark tasks as failed if complete() is false when run finishes #2710

Uh oh!

Conversation

Uh oh!

Description

Motivation and Context

Have you tested this? If so, how?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!