-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Mark tasks as failed if complete() is false when run finishes #2710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests are now passing and this PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not experienced this situation for non-external tasks. Could you provide an example where your task's run
completes, but still has a complete
evaluation of false?
if not self.check_complete_on_run or self.task.complete(): | ||
status = DONE | ||
else: | ||
raise TaskException("Task finished running, but complete() is still returning false.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to raise an exception in this case? Or saying the task FAILED
?
Or will the task already be FAILED? And we just want to raise an exception for the reason why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I had it just marked as 'failed', but it seemed that raising an exception was the standard way to do this (given that the default on_failure
callback doesn't do anything with its ar
8000
guments and just processes the currently-being-handled exception). I can switch back to manually marking the failure if that's what's preferred.
luigi/worker.py
Outdated
@@ -447,6 +455,11 @@ class worker(Config): | |||
check_unfulfilled_deps = BoolParameter(default=True, | |||
description='If true, check for completeness of ' | |||
'dependencies before running a task') | |||
check_complete_on_run = BoolParameter(default=False, | |||
config_path=dict(section='core', name='check-complete-on-run'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see the reason to allow this to be included in the legacy core
section with alt name. I'd recommend just removing the config_path option and force the inclusion of check_complete_on_run
in a [worker]
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you clarify what you mean by "force the inclusion in a [worker]
section"? i had assumed that specifying config_path
was required to read values from the config file, so the solution here would be to just change section
to worker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this new configuration option could be set as either:
[worker]
check_complete_on_run: True
bc it is a python variable named check_complete_on_run
and within the worker
luigi.Config
class
or
[core]
check-complete-on-run: True
because you've specified a config_path
which differs from the Config class name and variable name.
I joined the Luigi project after Config classes were already introduced. But i understood the config_path
parameter option was a compatibility utility to transition the dash-delimited naming to underscore-delimited naming, as well as changing the default configuration section where the variable could be set.
So when i say "force the inclusion in the [worker] section", i just mean remove the config_path=...
. Thus, requiring the variable to be set only from the [worker] section (and not allowing check-complete-on-run
in the [core]`.
We have a set of Luigi tasks that execute jobs on external platforms, e.g. SGE, that produce various file outputs. We've run into cases where the job will exit successfully, but for whatever reason (e.g. didn't find values in an external database) will not produce the outputs we expect. It's infeasible to modify the external platform so that these cases cause the job to fail, so our best solution is to treat a lack of existing outputs as an immediate Luigi task failure. |
@dlstadther I've made the requested changes to configuration. Is there anything else I should modify? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i'm good with this now. Thanks!
This feature is not documented in https://luigi.readthedocs.io/en/stable/configuration.html#worker, is there any reason, or is just the documentation out of date? |
Nice feature! But yes, please add docs for this feature. Would you mind doing so @ThePletch? |
I think we should document this flag. It's so useful that I have a feeling it's a missing feature from the beginning. |
Could have sworn I had documented this as part of the PR. I'll look into addressing that this evening. |
Description
Adds the
check_complete_on_run
setting to workers, which changes the logic used to determine job status inTaskProcess
to mark a task as FAILED if itsrun()
method executes successfully but itscomplete()
method still returns false. This setting is false by default, to avoid breaking existing workflows.Motivation and Context
Currently, if a task runs but does not create all of its outputs (or does not satisfy its custom complete() method), it is marked as DONE, but tasks that depend on it later fail with a cryptic "missing dependency" error. This associates the failure with the responsible task rather than marking dependent tasks that were unable to begin execution as failed. Additionally, this greatly simplifies detecting whether root tasks (i.e. tasks upon which no other tasks depend) have actually completed successfully. Without this change, they are marked as DONE and there is no built-in method to detect that their outputs have not been created.
Have you tested this? If so, how?
I have included unit tests.