[Doc][KubeRay] verl example #54114

kevin85421 · 2025-06-25T23:34:53Z

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

Copilot

Pull Request Overview

This PR adds documentation for integrating the open-source verl framework with KubeRay for RLHF training of large language models.

Introduces a new example walkthrough ("verl-post-training.md") detailing step-by-step instructions.
Updates the examples index ("examples.md") to reference the new example.
Expands the accepted vocabulary in Vale's configuration to include new terms related to verl and RLHF.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File	Description
doc/source/cluster/kubernetes/examples/verl-post-training.md	New guide demonstrating RLHF training with verl on KubeRay.
doc/source/cluster/kubernetes/examples.md	Updated examples index to include the new verl example.
.vale/styles/config/vocabularies/General/accept.txt	Added new accepted vocabulary terms ("open-source", "RLHF", "verl").

Comments suppressed due to low confidence (1)

doc/source/cluster/kubernetes/examples.md:36

The reference identifier 'kuberay-verl' may be confusing given that the example file is named 'verl-post-training.md'. Consider aligning the naming for consistency.

- {ref}`kuberay-verl`

dstrodtman

Some small questions and comments, but generally LGTM

doc/source/cluster/kubernetes/examples/verl-post-training.md

dstrodtman · 2025-06-26T10:34:54Z

doc/source/cluster/kubernetes/examples/verl-post-training.md

@@ -0,0 +1,150 @@
+(kuberay-verl)=
+# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay


Suggested change

# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay

# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay

Maybe a naive question, but where does the human feedback come into this example? Is there an interactive step that I'm missing where the user is rating the output?

"human feedback" happens during training the reward model. You can read https://arxiv.org/pdf/2203.02155 for more details.

Ahh, makes sense. So the human feedback is already built into the model, not active at this stage. Thanks for enlightening me!

Co-authored-by: Douglas Strodtman <douglas@anyscale.com> Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

kevin85421 · 2025-06-27T00:05:17Z

cc @jjyao @edoakes would you mind merging this PR? Thanks.

update

20ebeb4

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

kevin85421 marked this pull request as ready for review June 26, 2025 06:09

Copilot AI review requested due to automatic review settings June 26, 2025 06:09

kevin85421 requested review from pcmoritz and a team as code owners June 26, 2025 06:09

Copilot AI reviewed Jun 26, 2025

View reviewed changes

dstrodtman approved these changes Jun 26, 2025

View reviewed changes

kevin85421 and others added 3 commits June 26, 2025 11:47

Apply suggestions from code review

a4774d2

Co-authored-by: Douglas Strodtman <douglas@anyscale.com> Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>

update

832f7ca

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

update

5472a3f

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

kevin85421 mentioned this pull request Jun 26, 2025

Build an image with verl installed volcengine/verl#2222

Open

update

fb43a6c

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>

kevin85421 added the go add ONLY when ready to merge, run all tests label Jun 26, 2025

edoakes merged commit cd6bbbb into ray-project:master Jun 27, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc][KubeRay] verl example #54114

[Doc][KubeRay] verl example #54114

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,150 @@
		(kuberay-verl)=
		# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay

[Doc][KubeRay] verl example #54114

[Doc][KubeRay] verl example #54114

Conversation

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!