8000 [Doc][KubeRay] verl example by kevin85421 · Pull Request #54114 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Doc][KubeRay] verl example #54114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 27, 2025
Merged

Conversation

kevin85421
Copy link
Member

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
@kevin85421 kevin85421 marked this pull request as ready for review June 26, 2025 06:09
@Copilot Copilot AI review requested due to automatic review settings June 26, 2025 06:09
@kevin85421 kevin85421 requested review from pcmoritz and a team as code owners June 26, 2025 06:09
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds documentation for integrating the open-source verl framework with KubeRay for RLHF training of large language models.

  • Introduces a new example walkthrough ("verl-post-training.md") detailing step-by-step instructions.
  • Updates the examples index ("examples.md") to reference the new example.
  • Expands the accepted vocabulary in Vale's configuration to include new terms related to verl and RLHF.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
doc/source/cluster/kubernetes/examples/verl-post-training.md New guide demonstrating RLHF training with verl on KubeRay.
doc/source/cluster/kubernetes/examples.md Updated examples index to include the new verl example.
.vale/styles/config/vocabularies/General/accept.txt Added new accepted vocabulary terms ("open-source", "RLHF", "verl").
Comments suppressed due to low confidence (1)

doc/source/cluster/kubernetes/examples.md:36

  • The reference identifier 'kuberay-verl' may be confusing given that the example file is named 'verl-post-training.md'. Consider aligning the naming for consistency.
- {ref}`kuberay-verl`

Copy link
Contributor
@dstrodtman dstrodtman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small questions and comments, but generally LGTM

@@ -0,0 +1,150 @@
(kuberay-verl)=
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay

Maybe a naive question, but where does the human feedback come into this example? Is there an interactive step that I'm missing where the user is rating the output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"human feedback" happens during training the reward model. You can read https://arxiv.org/pdf/2203.02155 for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, makes sense. So the human feedback is already built into the model, not active at this stage. Thanks for enlightening me!

kevin85421 and others added 3 commits June 26, 2025 11:47
Co-authored-by: Douglas Strodtman <douglas@anyscale.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@anyscale.com>
@kevin85421 kevin85421 added the go add ONLY when ready to merge, run all tests label Jun 26, 2025
@kevin85421
Copy link
Member Author

cc @jjyao @edoakes would you mind merging this PR? Thanks.

@edoakes edoakes merged commit cd6bbbb into ray-project:master Jun 27, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0