8000 [core] Use GetResourceLoadRequest as a substitute liveness check by dayshah · Pull Request #52971 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[core] Use GetResourceLoadRequest as a substitute liveness check #52971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dayshah
Copy link
Contributor
@dayshah dayshah commented May 13, 2025

Why are these changes needed?

The GCS sends over GetResourceLoad every second to all alive nodes. If the raylet got this request, it means the gcs considers it to be alive and we can skip the CheckAlive. Most times the raylet should never need to send a check alive unless the gcs event loop gets super slow. This should help scalability too since the gcs will have to handle less rpc's.

Context for why the max time between liveness checks should be 60 seconds again, not 5 can be found here #52945.

@dayshah dayshah marked this pull request as ready for review May 14, 2025 06:01
@dayshah dayshah requested review from jjyao, israbbani and edoakes May 14, 2025 06:04
@dayshah dayshah added the go add ONLY when ready to merge, run all tests label May 14, 2025
@dayshah dayshah force-pushed the resource-as-liveness branch from 9c15ee2 to 0ec78f7 Compare May 14, 2025 21:13
@edoakes
Copy link
Collaborator
edoakes commented May 15, 2025

testing plan @dayshah ?

@dayshah
Copy link
Contributor Author
dayshah commented May 15, 2025

testing plan @dayshah ?

I can write a node manager unit test to test this logic, but there is no NodeManager test that actually creates a NodeManager so there might be a little bit of work to actually set that up... 😃

@edoakes
Copy link
Collaborator
edoakes commented May 16, 2025

I can write a node manager unit test to test this logic, but there is no NodeManager test that actually creates a NodeManager so there might be a little bit of work to actually set that up... 😃

let's do it 😈

@dayshah dayshah requested review from a team and removed request for a team May 16, 2025 21:28
@dayshah dayshah marked this pull request as draft May 16, 2025 21:28
@dayshah dayshah force-pushed the resource-as-liveness branch from 8d35d8f to 3aa358a Compare May 17, 2025 19:46
dayshah added 5 commits May 24, 2025 12:11
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
@dayshah dayshah force-pushed the resource-as-liveness branch from 3aa358a to 85a075e Compare May 24, 2025 19:11
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 10, 2025
@dayshah dayshah removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 10, 2025
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 25, 2025
@dayshah dayshah removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0