8000 Mark tasks as failed if complete() is false when run finishes by ThePletch · Pull Request #2710 · spotify/luigi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Mark tasks as failed if complete() is false when run finishes #2710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 28, 2019

Conversation

ThePletch
Copy link
Contributor
@ThePletch ThePletch commented May 16, 2019

Description

Adds the check_complete_on_run setting to workers, which changes the logic used to determine job status in TaskProcess to mark a task as FAILED if its run() method executes successfully but its complete() method still returns false. This setting is false by default, to avoid breaking existing workflows.

Motivation and Context

Currently, if a task runs but does not create all of its outputs (or does not satisfy its custom complete() method), it is marked as DONE, but tasks that depend on it later fail with a cryptic "missing dependency" error. This associates the failure with the responsible task rather than marking dependent tasks that were unable to begin execution as failed. Additionally, this greatly simplifies detecting whether root tasks (i.e. tasks upon which no other tasks depend) have actually completed successfully. Without this change, they are marked as DONE and there is no built-in method to detect that their outputs have not been created.

Have you tested this? If so, how?

I have included unit tests.

@ThePletch
Copy link
Contributor Author

Tests are now passing and this PR is ready for review.

Copy link
Collaborator
@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not experienced this situation for non-external tasks. Could you provide an example where your task's run completes, but still has a complete evaluation of false?

if not self.check_complete_on_run or self.task.complete():
status = DONE
else:
raise TaskException("Task finished running, but complete() is still returning false.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to raise an exception in this case? Or saying the task FAILED?

Or will the task already be FAILED? And we just want to raise an exception for the reason why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I had it just marked as 'failed', but it seemed that raising an exception was the standard way to do this (given that the default on_failure callback doesn't do anything with its ar 8000 guments and just processes the currently-being-handled exception). I can switch back to manually marking the failure if that's what's preferred.

luigi/worker.py Outdated
@@ -447,6 +455,11 @@ class worker(Config):
check_unfulfilled_deps = BoolParameter(default=True,
description='If true, check for completeness of '
'dependencies before running a task')
check_complete_on_run = BoolParameter(default=False,
config_path=dict(section='core', name='check-complete-on-run'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see the reason to allow this to be included in the legacy core section with alt name. I'd recommend just removing the config_path option and force the inclusion of check_complete_on_run in a [worker] section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you clarify what you mean by "force the inclusion in a [worker] section"? i had assumed that specifying config_path was required to read values from the config file, so the solution here would be to just change section to worker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this new configuration option could be set as either:

[worker]
check_complete_on_run: True

bc it is a python variable named check_complete_on_run and within the worker luigi.Config class

or

[core]
check-complete-on-run: True

because you've specified a config_path which differs from the Config class name and variable name.


I joined the Luigi project after Config classes were already introduced. But i understood the config_path parameter option was a compatibility utility to transition the dash-delimited naming to underscore-delimited naming, as well as changing the default configuration section where the variable could be set.

So when i say "force the inclusion in the [worker] section", i just mean remove the config_path=.... Thus, requiring the variable to be set only from the [worker] section (and not allowing check-complete-on-run in the [core]`.

@ThePletch
Copy link
Contributor Author

We have a set of Luigi tasks that execute jobs on external platforms, e.g. SGE, that produce various file outputs. We've run into cases where the job will exit successfully, but for whatever reason (e.g. didn't find values in an external database) will not produce the outputs we expect. It's infeasible to modify the external platform so that these cases cause the job to fail, so our best solution is to treat a lack of existing outputs as an immediate Luigi task failure.

@ThePletch
Copy link
Contributor Author

@dlstadther I've made the requested changes to configuration. Is there anything else I should modify?

Copy link
Collaborator
@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'm good with this now. Thanks!

@dlstadther dlstadther merged commit 41e40fc into spotify:master May 28, 2019
@Joao-M-Almeida
Copy link

This feature is not documented in https://luigi.readthedocs.io/en/stable/configuration.html#worker, is there any reason, or is just the documentation out of date?

@Tarrasch
Copy link
Contributor

Nice feature! But yes, please add docs for this feature. Would you mind doing so @ThePletch?

@honnix
Copy link
Member
honnix commented Jan 2, 2020

I think we should document this flag. It's so useful that I have a feeling it's a missing feature from the beginning.

@ThePletch
Copy link
Contributor Author

Could have sworn I had documented this as part of the PR. I'll look into addressing that this evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0