10000 Fix Scheduler.add_task to overwrite accepts_messages attribute. by riga · Pull Request #2469 · spotify/luigi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix Scheduler.add_task to overwrite accepts_messages attribute. #2469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2018

Conversation

riga
Copy link
Contributor
@riga riga commented Jul 23, 2018

Description

When the scheduler receives tasks via add_task(**kwargs), by default it does not overwrite task properties from kwargs when the task already exists. I wasn't aware of that when I implemented the accepts_messages flag in #2426. As a consequence, the message feature is only enabled on the triggered default task. This PR provides a fix for that.

Motivation and Context

When the triggered default_task is added, its deps are added as dummy tasks at the end of add_task to prevent some race condition during task pruning:

luigi/luigi/scheduler.py

Lines 897 to 901 in 459a260

# Task dependencies might not exist yet. Let's create dummy tasks for them for now.
# Otherwise the task dependencies might end up being pruned if scheduling takes a long time
for dep in task.deps or []:
t = self._state.get_task(dep, setdefault=self._make_task(task_id=dep, status=UNKNOWN, deps=None, priority=priority))
t.stakeholders.add(worker_id)

So when the dependency tasks are added again with their full config via add_task, an entry already exists in the task state object. As a result, the accepts_messages flag is not updated and sticks to the default value (False). For the same reason, other boolean flags like batchable are updated manually:

luigi/luigi/scheduler.py

Lines 840 to 841 in 459a260

if batchable is not None:
task.batchable = batchable

accepts_messages is treated the same way now.

Copy link
Collaborator
@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (assuming tests pass - which i imagine they should)

@riga
Copy link
Contributor Author
riga commented Jul 23, 2018

Tests pass, there is some issue in the codecov changes which I don't understand though.

@dlstadther
Copy link
Collaborator

Not concerned about that codecov failure

@dlstadther dlstadther merged commit 1d5a05e into spotify:master Jul 23, 2018
@riga
Copy link
Contributor Author
riga commented Jul 23, 2018

That was fast, thanks! =)

@dlstadther
Copy link
Collaborator

No problem! Thanks for your many luigi contributions and bug fixes :)

@Tarrasch
Copy link
Contributor

Thanks for maintaining the features you build. :)

dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master: (82 commits)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
  Add stale config
  Move github templates to .github dir
  Fix transfer config import (spotify#2458)
  Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373)
  Version 2.7.6
  ...
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 16, 2018
* upstream-master:
  Remove long-deprecated scheduler config variable alternatives (spotify#2491)
  Bump tornado milestone version (spotify#2490)
  Update moto to 1.x milestone version (spotify#2471)
  Use passed password when create a redis connection (spotify#2489)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2
6DAD
474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0