8000 [Serve.llm][P/D] Fix health check in prefill disagg by kouroshHakha · Pull Request #53937 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Serve.llm][P/D] Fix health check in prefill disagg #53937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 22, 2025

Conversation

kouroshHakha
Copy link
Contributor
@kouroshHakha kouroshHakha commented Jun 18, 2025

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@@ -466,25 +469,10 @@ async def __init__(

self.response_postprocessor = ResponsePostprocessor()

@property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useless. I don't know why we had it. removing.

@@ -816,9 +816,9 @@ async def check_health(self) -> None:
raise RuntimeError(f"{type(self.engine)} does not support health check.")

try:
return await asyncio.wait_for(self.engine.check_health(), timeout=15)
await asyncio.wait_for(self.engine.check_health(), timeout=15)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the timeout time since ray serve has an adjustable timeout per deployment anyways.

@@ -160,13 +151,6 @@ async def _predict(
):
yield chunk

async def check_health(self) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must be removed. In general the health check of a deployment is not bounded to the health check of its child deployments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @lk-chen fyi

@@ -232,12 +232,6 @@ async def _setup_handle_and_config_maps(

async def check_health(self):
await self._init_completed.wait()
await asyncio.gather(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing applies here.

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@@ -27,6 +27,7 @@ async def test_engine_metrics():
model="Qwen/Qwen2.5-0.5B-Instruct",
dtype="auto",
disable_log_stats=False,
enforce_eager=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the tests faster

@kouroshHakha kouroshHakha marked this pull request as ready for review June 19, 2025 02:50
@kouroshHakha kouroshHakha requested a review from a team as a code owner June 19, 2025 02:50
@kouroshHakha kouroshHakha requested a review from eicherseiji June 19, 2025 02:52
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 19, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
@@ -170,8 +170,6 @@ async def test_check_health(self, llm_config: LLMConfig):

await router.check_health()

assert server.check_health.remote.call_count == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing router's health check has nothing to do with server's health check.

@kouroshHakha kouroshHakha merged commit 55b8ce9 into ray-project:master Jun 22, 2025
5 checks passed
minerharry pushed a commit to minerharry/ray that referenced this pull request Jun 27, 2025
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0