8000 [SQS] Cancel pending and future tasks on shutdown by gregfurman · Pull Request #12228 · localstack/localstack · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[SQS] Cancel pending and future tasks on shutdown #12228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 10, 2025

Conversation

gregfurman
Copy link
Contributor
@gregfurman gregfurman commented Feb 4, 2025

Motivation

Under high volumes of SQS receive/send/delete calls, the internal SQS ThreadPoolExecutors can execute tasks long after a service has completed their shutdown/stop lifecycle hooks.

That is, when we try stop the SQS service, currently running and future pending tasks are not cleared from the executor pool -- potentially being executed long after the service signalled to have stopped.

This PR ensures that all enqueued tasks are cancelled on shutdown.

Changes

  • Changed the CloudWatchDispatcher.shutdown and MessageMoveTaskManager.close methods to cancel all pending/future tasks that are currently enqueued for execution.

@gregfurman gregfurman added aws:sqs Amazon Simple Queue Service semver: patch Non-breaking changes which can be included in patch releases labels Feb 4, 2025
@gregfurman gregfurman requested review from dfangl and bentsku February 4, 2025 14:48
@gregfurman gregfurman self-assigned this Feb 4, 2025
Copy link
github-actions bot commented Feb 4, 2025

LocalStack Community integration with Pro

    2 files      2 suites   1h 15m 16s ⏱️
3 004 tests 2 873 ✅ 131 💤 0 ❌
3 006 runs  2 873 ✅ 133 💤 0 ❌

Results for commit d74089c.

♻️ This comment has been updated with latest results.

@gregfurman gregfurman marked this pull request as ready for review February 4, 2025 15:52
Comment on lines +15 to +16
if self.is_shutdown:
raise Empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why we need this at the beginning here as well?

Copy link
Contributor Author
@gregfurman gregfurman Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just allows for an earlier exit if we enter the poll loop in the time it takes between shutdown() being called and get() being called. Just a very minor optimisation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are for example also not raising exceptions on timeouts < 0 in this case here, but that might not be all too important.

@@ -192,7 +194,7 @@ def __init__(self, num_thread: int = 3):
)

def shutdown(self):
self.executor.shutdown(wait=False)
self.executor.shutdown(wait=True, cancel_futures=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if we get any issues when doing a wait=True here? I am fine with cancelling the futures, but this could block the rest of the shutdown quite early (depending on the shutdown order), in contrast to at the very end of the interpreter shutdown.

Copy link
Contributor Author
@gregfurman gregfurman Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed these dispatched jobs were running while the SQS provider was already supposed to have shut down + while others were in the process of shutting down.

The wait=True was to prevent this behaviour, meaning fewer exceptions being raised and no botocore retrying happening simultaneously to LS exiting.

If we're OK with the dispatched jobs running while other services are shutting down then then happy to make this false again since the cancel_futures behaviour is definitely more important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine, I am just worried about it blocking too long without a timeout after which we call it quits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. Let's allow the interpreter to tear these down instead. I'll change wait=False

@gregfurman gregfurman requested a review from dfangl February 7, 2025 14:39
Copy link
Member
@dfangl dfangl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is innocent enough, LGTM!

Comment on lines +15 to +16
if self.is_shutdown:
raise Empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are for example also not raising exceptions on timeouts < 0 in this case here, but that might not be all too important.

8000
@@ -192,7 +194,7 @@ def __init__(self, num_thread: int = 3):
)

def shutdown(self):
self.executor.shutdown(wait=False)
self.executor.shutdown(wait=True, cancel_futures=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine, I am just worried about it blocking too long without a timeout after which we call it quits.

@gregfurman gregfurman changed the title [SQS] Wait for running tasks to complete and cancel pending on shutdown [SQS] Cancel pending and future tasks on shutdown Feb 10, 2025
@gregfurman gregfurman merged commit afb3931 into master Feb 10, 2025
35 checks passed
@gregfurman gregfurman deleted the fix/sqs/hanging-threads branch February 10, 2025 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:sqs Amazon Simple Queue Service semver: patch Non-breaking changes which can be included in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0