-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[SQS] Cancel pending and future tasks on shutdown #12228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LocalStack Community integration with Pro 2 files 2 suites 1h 15m 16s ⏱️ Results for commit d74089c. ♻️ This comment has been updated with latest results. |
if self.is_shutdown: | ||
raise Empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why we need this at the beginning here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just allows for an earlier exit if we enter the poll loop in the time it takes between shutdown()
being called and get()
being called. Just a very minor optimisation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are for example also not raising exceptions on timeouts < 0 in this case here, but that might not be all too important.
@@ -192,7 +194,7 @@ def __init__(self, num_thread: int = 3): | |||
) | |||
|
|||
def shutdown(self): | |||
self.executor.shutdown(wait=False) | |||
self.executor.shutdown(wait=True, cancel_futures=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check if we get any issues when doing a wait=True here? I am fine with cancelling the futures, but this could block the rest of the shutdown quite early (depending on the shutdown order), in contrast to at the very end of the interpreter shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed these dispatched jobs were running while the SQS provider was already supposed to have shut down + while others were in the process of shutting down.
The wait=True
was to prevent this behaviour, meaning fewer exceptions being raised and no botocore retrying happening simultaneously to LS exiting.
If we're OK with the dispatched jobs running while other services are shutting down then then happy to make this false again since the cancel_futures
behaviour is definitely more important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine, I am just worried about it blocking too long without a timeout after which we call it quits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Let's allow the interpreter to tear these down instead. I'll change wait=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is innocent enough, LGTM!
if self.is_shutdown: | ||
raise Empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are for example also not raising exceptions on timeouts < 0 in this case here, but that might not be all too important.
@@ -192,7 +194,7 @@ def __init__(self, num_thread: int = 3): | |||
) | |||
|
|||
def shutdown(self): | |||
self.executor.shutdown(wait=False) | |||
self.executor.shutdown(wait=True, cancel_futures=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine, I am just worried about it blocking too long without a timeout after which we call it quits.
Motivation
Under high volumes of SQS receive/send/delete calls, the internal SQS
ThreadPoolExecutors
can execute tasks long after a service has completed their shutdown/stop lifecycle hooks.That is, when we try stop the SQS service, currently running and future pending tasks are not cleared from the executor pool -- potentially being executed long after the service signalled to have stopped.
This PR ensures that all enqueued tasks are cancelled on shutdown.
Changes
CloudWatchDispatcher.shutdown
andMessageMoveTaskManager.close
methods to cancel all pending/future tasks that are currently enqueued for execution.