-
Notifications
You must be signed in to change notification settings - Fork 616
call terminate after the last prediction or after a timeout if shutting down #1843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
28ec85a
to
5db1cfe
Compare
@@ -198,8 +198,12 @@ async def inner() -> SetupResult: | |||
except Exception: | |||
logs.append(traceback.format_exc()) | |||
status = schema.Status.FAILED | |||
except asyncio.CancelledError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for the log line?
python/cog/server/runner.py
Outdated
@@ -381,16 +394,25 @@ def shutdown(self) -> None: | |||
if self._child.is_alive(): | |||
self.log.info("child is alive during shutdown, sending Shutdown event") | |||
self._events.send(Shutdown()) | |||
asyncio.create_task(self.terminate_later()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. Generally in this codebase shutdown
has meant "graceful shutdown" and terminate
has meant "stop what you're doing right now". It doesn't really make sense for the former to call the latter.
It's also more than a little hairy for this kind of stuff to be coordinated in Worker
rather than at the top level of the program as it is in main
(see my comment above).
9ac588c
to
adaf31c
Compare
…nate. also call terminate 10s after shutting down Signed-off-by: technillogue <technillogue@gmail.com>
…behavior of Server.stop, should_exit, force_exit, and app shutdown handler,
fe6f3c6
to
0abbe2a
Compare
* start with just changing Exception to BaseException to catch cancellation * add much more shutdown logging move runner.terminate into runner.shutdown after waiting for predictions to complete * move runner.terminate into shutdown, make it async, and document the behavior of Server.stop, should_exit, force_exit, and app shutdown handler, * fix tests --------- Signed-off-by: technillogue <technillogue@gmail.com>
this addresses the problem described in https://www.notion.so/replicate/Short-Term-Async-Fixes-9d11cfc15bb84878aedd9d8ddf31840d?pvs=4#8a86d46ada954ff5b509231224c24f2f
prior to 974675f9#diff-136cc6506e8aaea24081862f89e1cb778251b7726180fe237c8aaf560ae7ec69R315-R338, runner.shutdown() would call worker.terminate(). as far as I understand, neither in that branch nor in main is worker.shutdown() ever called. in that commit, I split runner.shutdown into runner.shutdown and runner.terminate, but never made anything call terminate. because I also removed the
should_exit = True
line, the uvicorn server would never exit, so the shutdown handler would eventually notice this and kill the whole process, so we never noticed that terminate was never called.runner.terminate does: