-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Add max_retrials_to_get_pods #2664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
74d7e4c
to
d347779
Compare
luigi/contrib/kubernetes.py
Outdated
break | ||
|
||
# If pod was not returned, sleep to wait for pod to be created. | ||
time.sleep(self.__POLL_TIME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to allow the number of retries to be configurable, shouldn't we also allow the wait time to be configurable as well? You may want to try a lot in a short period or only a few times over a longer duration
# If pod was not returned, sleep to wait for pod to be created. | ||
time.sleep(self.__POLL_TIME) | ||
|
||
return pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct if the pod isn't available this will return None
? (just want to be sure i understand)
luigi/contrib/kubernetes.py
Outdated
@@ -62,6 +62,9 @@ class kubernetes(luigi.Config): | |||
kubernetes_namespace = luigi.OptionalParameter( | |||
default=None, | |||
description="K8s namespace in which the job will run") | |||
max_retrials_to_get_pods = luigi.IntParameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retrials
isn't the word you intend. I think you mean retries
and to that end, can the var name just be max_retries
?
Add max_retrials_to_get_pods Fix retry number
acf5baf
to
505736c
Compare
@dlstadther |
return self.kubernetes_config.max_retries | ||
|
||
@property | ||
def poll_time_for_retry(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think poll_time_for_retry
should replace __POLL_TIME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't check the code but please add context on which module you change. I sometimes just puet
[kubernetes] Add max_retrials_to_get_pods
:)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions. |
It seems that the PR was abandoned, however this issue is steel persist and we're using ugly hacks to mitigate it. I would like to own this PR to resolve all remaining code review comments and deliver this fix to next release. Does anyone have any objections? @keisuke-umezawa @dlstadther @Tarrasch |
Feel free to submit a PR resolving this issue. I can provide feedback as my time allows. |
Description
I added
max_retrials_to_get_pods
.Motivation and Context
To fix #2570 .
Have you tested this? If so, how?
"I ran my jobs with this code and it works for me."