8000 call terminate after the last prediction or after a timeout if shutting down by technillogue · Pull Request #1843 · replicate/cog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

call terminate after the last prediction or after a timeout if shutting down #1843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 2, 2024

Conversation

technillogue
Copy link
Contributor
@technillogue technillogue commented Aug 1, 2024

this addresses the problem described in https://www.notion.so/replicate/Short-Term-Async-Fixes-9d11cfc15bb84878aedd9d8ddf31840d?pvs=4#8a86d46ada954ff5b509231224c24f2f

prior to 974675f9#diff-136cc6506e8aaea24081862f89e1cb778251b7726180fe237c8aaf560ae7ec69R315-R338, runner.shutdown() would call worker.terminate(). as far as I understand, neither in that branch nor in main is worker.shutdown() ever called. in that commit, I split runner.shutdown into runner.shutdown and runner.terminate, but never made anything call terminate. because I also removed the should_exit = True line, the uvicorn server would never exit, so the shutdown handler would eventually notice this and kill the whole process, so we never noticed that terminate was never called.

runner.terminate does:

  • cancel any predict tasks
  • mark us as defunct
  • if the child is alive, terminate it and then waitpid
  • close the events pipe
  • cancel the read events task

@technillogue technillogue requested review from a team and tempusfrangit August 1, 2024 20:01
@technillogue technillogue force-pushed the syl/call-terminate-after-shutdown branch 2 times, most recently from 28ec85a to 5db1cfe Compare August 1, 2024 20:48
@@ -198,8 +198,12 @@ async def inner() -> SetupResult:
except Exception:
logs.append(traceback.format_exc())
status = schema.Status.FAILED
except asyncio.CancelledError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for the log line?

@@ -381,16 +394,25 @@ def shutdown(self) -> None:
if self._child.is_alive():
self.log.info("child is alive during shutdown, sending Shutdown event")
self._events.send(Shutdown())
asyncio.create_task(self.terminate_later())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. Generally in this codebase shutdown has meant "graceful shutdown" and terminate has meant "stop what you're doing right now". It doesn't really make sense for the former to call the latter.

It's also more than a little hairy for this kind of stuff to be coordinated in Worker rather than at the top level of the program as it is in main (see my comment above).

@technillogue technillogue force-pushed the syl/call-terminate-after-shutdown branch from 9ac588c to adaf31c Compare August 2, 2024 18:45
…nate. also call terminate 10s after shutting down

Signed-off-by: technillogue <technillogue@gmail.com>
…behavior of Server.stop, should_exit, force_exit, and app shutdown handler,
@technillogue technillogue force-pushed the syl/call-terminate-after-shutdown branch from fe6f3c6 to 0abbe2a Compare August 2, 2024 19:00
@technillogue technillogue merged commit 0976a05 into async Aug 2, 2024
9 checks passed
@technillogue technillogue deleted the syl/call-terminate-after-shutdown branch August 2, 2024 19:09
technillogue added a commit that referenced this pull request Aug 6, 2024
* start with just changing Exception to BaseException to catch cancellation
* add much more shutdown logging

move runner.terminate into runner.shutdown after waiting for predictions to complete

* move runner.terminate into shutdown, make it async, and document the behavior of Server.stop, should_exit, force_exit, and app shutdown handler,
* fix tests

---------

Signed-off-by: technillogue <technillogue@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
0