8000 [refactor][operator]: make `RayStartParams` optional by davidxia · Pull Request #3202 · ray-project/kuberay · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[refactor][operator]: make RayStartParams optional #3202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 11, 2025

Conversation

davidxia
Copy link
Contributor
@davidxia davidxia commented Mar 17, 2025

in RayCluster CRD by marking it with // +optional.
Make it not serialized when empty with omitempty.

This change is backwards compatible on K8s clusters with existing
RayCluster CRD and RayCluster CRs.

  • RayClusters with non-empty RayStartParams will serialize the same as
    before this change.
  • RayClusters with empty RayStartParams will serialize the same as
    before this change: as an empty map, {} in YAML.
  • New behavior: we can kubectl create|apply RayClusters that lack
    RayStartParams in both head and worker groups. Previously the K8s API
    would return validation errors requiring this field to be present.
    When we get these RayClusters back with kubectl get raycluster -o yaml, there's no rayStartParams field present.

See Optional vs. Required in K8s API Conventions doc.

Optional fields have the following properties:

  • They have the +optional comment tag in Go.
  • They are a pointer type in the Go definition (e.g. AwesomeFlag *SomeFlag)
    or have a built-in nil value (e.g. maps and slices).
  • The API server should allow POSTing and PUTing a resource with this field
    unset.

The required changes to the RayCluster CRD YAML can be achieved with only
omitempty, but we also add // +optional as recommended by the doc above.

Signed-off-by: David Xia david@davidxia.com

@davidxia davidxia force-pushed the omitempty-raystartparams branch from a8efa58 to a1e53d9 Compare March 17, 2025 15:57
@davidxia davidxia changed the title addresses TODO in kubectl-plugin about not being able to set empty map [refactor][operator]: set omitempty on RayStartParams Mar 17, 2025
@davidxia davidxia force-pushed the omitempty-raystartparams branch 2 times, most recently from c9781e1 to 80eb350 Compare March 17, 2025 17:32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the important changes are in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update tests

Comment on lines 89 to 149
// TODO: Look for better workaround/fixes for RayStartParams. Currently using `WithRayStartParams()` requires
// a non-empty map with valid key value pairs and will not populate the field with empty/nil values. This
// isn't ideal as it forces the generated RayCluster yamls to use those parameters.
headRayStartParams := map[string]string{
"dashboard-host": "0.0.0.0",
}
workerRayStartParams := map[string]string{
"metrics-export-port": "8080",
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this now since the map can be empty

@davidxia davidxia force-pushed the omitempty-raystartparams branch 3 times, most recently from 98625b2 to 973af72 Compare March 21, 2025 12:20
@davidxia davidxia force-pushed the omitempty-raystartparams branch 13 times, most recently from aa3243a to 8b0650b Compare March 23, 2025 17:06
@davidxia davidxia changed the title [refactor][operator]: set omitempty on RayStartParams [refactor][operator]: make RayStartParams optional Mar 23, 2025
@davidxia davidxia force-pushed the omitempty-raystartparams branch from 8b0650b to 6359479 Compare March 23, 2025 19:47
Comment on lines 57 to 60
expectedWorkerRayStartParams := map[string]string{
"metrics-export-port": "8080",
}
maps.Copy(expectedWorkerRayStartParams, testRayClusterYamlObject.WorkerRayStartParams)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the hard-coded metrics-export-port now.

Comment on lines -184 to -346
rayStartParams:
metrics-export-port: "8080"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows that when rayStartParams aren't set in the RayClusterApplyConfiguration, the field is absent from the serialized YAML.

@davidxia davidxia force-pushed the omitempty-raystartparams branch 3 times, most recently from 05f6a13 to 557b90a Compare March 24, 2025 15:36
@davidxia
Copy link
Contributor Author

Is test-e2e-rayservice-nightly-operator expected to fail in PRs like this where the CRD changes?

hm this is still failing 🤔

rueian
rueian approved these changes 8000 Apr 1, 2025
@rueian
Copy link
Contributor
rueian commented Apr 1, 2025

The changes look good to me. I got the failed test from logs:

--- RUN   TestAutoscalingRayService
[2025-03-31T19:12:16Z]     rayservice_ha_test.go:82: [2025-03-31T19:06:25Z] Created ConfigMap test-ns-rtl5s/locust-runner-script successfully
[2025-03-31T19:12:16Z]     rayservice_ha_test.go:85: [2025-03-31T19:06:25Z] Successfully applied testdata/rayservice.autoscaling.yaml to namespace test-ns-rtl5s
[2025-03-31T19:12:16Z]     rayservice_ha_test.go:88: [2025-03-31T19:06:25Z] Created RayService test-ns-rtl5s/test-rayservice successfully
[2025-03-31T19:12:16Z]     rayservice_ha_test.go:90: [2025-03-31T19:06:25Z] Waiting for RayService test-ns-rtl5s/test-rayservice to running
[2025-03-31T19:12:16Z]     rayservice_ha_test.go:92: 
[2025-03-31T19:12:16Z]         Timed out after 300.001s.
[2025-03-31T19:12:16Z]         Expected
[2025-03-31T19:12:16Z]             <bool>: false
[2025-03-31T19:12:16Z]         to be true
[2025-03-31T19:12:16Z]     test.go:112: [2025-03-31T19:11:25Z] Retrieving Pod Container test-ns-rtl5s/test-rayservice-8dspv-head/ray-head logs
[2025-03-31T19:12:16Z]     test.go:100: [2025-03-31T19:11:25Z] Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
[2025-03-31T19:12:16Z]     test.go:103: [2025-03-31T19:11:25Z] Output directory has been created at: /tmp/TestAutoscalingRayService3172182483/001
[2025-03-31T19:12:16Z]     test.go:112: [2025-03-31T19:11:25Z] Retrieving Pod Container test-ns-rtl5s/test-rayservice-8dspv-head/autoscaler logs
[2025-03-31T19:12:16Z] ### FAIL: TestAutoscalingRayService (300.24s)

Hope that logs help.

@davidxia davidxia force-pushed the omitempty-raystartparams branch from cbe7c1f to dd7033e Compare April 2, 2025 23:25
davidxia added a commit to davidxia/ray that referenced this pull request Apr 3, 2025
We are planning on making `rayStartParams` optional
and not present in the RayCluster K8s custom resource output
if it's empty. ray-project/kuberay#3202

We need to make its autoscaler code not error when it's absent.
This change is also needed for the integration tests in that PR to pass.

Signed-off-by: David Xia <david@davidxia.com>
@davidxia
Copy link
Contributor Author
davidxia commented Apr 3, 2025

Need a review on ray-project/ray#51954. It should fix some of the buildkite/ray-ecosystem-ci-kuberay-ci/test-e2e-rayservice-nightly-operator failures

davidxia added a commit to davidxia/ray that referenced this pull request Apr 3, 2025
We are planning on making `rayStartParams` optional
and not present in the RayCluster K8s custom resource output
if it's empty. ray-project/kuberay#3202

We need to make its autoscaler code not error when it's absent.
This change is also needed for the integration tests in that PR to pass.

Signed-off-by: David Xia <david@davidxia.com>
@davidxia
Copy link
Contributor Author
davidxia commented Apr 4, 2025

PR is blocked until ray releases a new version with ray-project/ray#51954 and we upgrade to that version in this repo

@davidxia
Copy link
Contributor Author
davidxia commented May 5, 2025

depends on #3547

@kevin85421
Copy link
Member

@davidxia, how about adding a key-value pair so that the CI tests won’t fail and I can merge the PR? I’ll bump the Ray version for the v1.4.0 release, which is a pretty heavyweight process. We can remove the key-value pair 8000 after bumping the version but before the release.

@kevin85421 kevin85421 mentioned this pull request May 9, 2025
4 tasks
davidxia added 2 commits May 9, 2025 15:53
used in `kubectl ray create cluster`.

Signed-off-by: David Xia <david@davidxia.com>
in `RayCluster` CRD by marking it with `// +optional`.
Make it not serialized when empty with `omitempty`.

This change is backwards compatible on K8s clusters with existing
RayCluster CRD and RayCluster CRs.

* RayClusters with non-empty RayStartParams will serialize the same as
  before this change.
* RayClusters with empty RayStartParams will serialize the same as
  before this change: as an empty map, `{}` in YAML.
* New behavior: we can `kubectl create|apply` RayClusters that lack
  RayStartParams in both head and worker groups. Previously the K8s API
  would return validation errors requiring this field to be present.
  When we get these RayClusters back with `kubectl get raycluster -o
  yaml`, there's no `rayStartParams` field present.

See [Optional vs. Required][1] in K8s API Conventions doc.

> Optional fields have the following properties:

> * They have the `+optional` comment tag in Go.
> * They are a pointer type in the Go definition (e.g. AwesomeFlag *SomeFlag)
>   or have a built-in nil value (e.g. maps and slices).
> * The API server should allow POSTing and PUTing a resource with this field
>   unset.

The required changes to the RayCluster CRD YAML can be achieved with only
`omitempty`, but we also add `// +optional` as recommended by the doc above.

[1]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#optional-vs-required

Signed-off-by: David Xia <david@davidxia.com>
@davidxia davidxia force-pushed the omitempty-raystartparams branch from dd7033e to e940dbe Compare May 9, 2025 20:45
@davidxia
Copy link
Contributor Author
davidxia commented May 9, 2025

@kevin85421 makes sense, updated in e940dbe

628C

@kevin85421
Copy link
Member

Does this mean that the Autoscaler will fail if users specify rayStartParams: {} in the YAML when using Ray versions older than 2.45.0?

@davidxia
Copy link
Contributor Author
davidxia commented May 9, 2025

Good point. Yes. We should make this backwards compatible. I’ll think a bit more on this.

* `RayClusters` with `rayStartParams: {}` will continue to be serialized with
  the empty map.
* `RayClusters` without `rayStartParams` will be serialized without the field.

This prevents a backwards-incompatible change with RayClusters that use the
autoscaler with Ray versions earlier than 2.45.0 which lack this change
ray-project/ray#51954.

Signed-off-by: David Xia <david@davidxia.com>
@davidxia davidxia force-pushed the omitempty-raystartparams branch from e940dbe to b1c0d28 Compare May 10, 2025 14:43
@davidxia
Copy link
Contributor Author

@kevin85421 I think b1c0d28 makes backwards compatible.

@kevin85421 kevin85421 merged commit bf8a931 into ray-project:master May 11, 2025
24 checks passed
@kevin85421
Copy link
Member

Open a follow up issue: #3580

davidxia added a commit to davidxia/kuberay that referenced this pull request May 13, 2025
No longer needed after ray-project#3202

Signed-off-by: David Xia <david@davidxia.com>
davidxia added a commit to davidxia/kuberay that referenced this pull request May 13, 2025
No longer needed after ray-project#3202

Signed-off-by: David Xia <david@davidxia.com>
davidxia added a commit to davidxia/kuberay that referenced this pull request May 13, 2025
No longer needed after ray-project#3202

Signed-off-by: David Xia <david@davidxia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0