-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[core] Use GetResourceLoadRequest as a substitute liveness check #52971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9c15ee2
to
0ec78f7
Compare
testing plan @dayshah ? |
I can write a node manager unit test to test this logic, but there is no NodeManager test that actually creates a NodeManager so there might be a little bit of work to actually set that up... 😃 |
let's do it 😈 |
8d35d8f
to
3aa358a
Compare
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
Signed-off-by: dayshah <dhyey2019@gmail.com>
3aa358a
to
85a075e
Compare
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Why are these changes needed?
The GCS sends over GetResourceLoad every second to all alive nodes. If the raylet got this request, it means the gcs considers it to be alive and we can skip the CheckAlive. Most times the raylet should never need to send a check alive unless the gcs event loop gets super slow. This should help scalability too since the gcs will have to handle less rpc's.
Context for why the max time between liveness checks should be 60 seconds again, not 5 can be found here #52945.