-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[Doc][KubeRay] verl example #54114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Doc][KubeRay] verl example #54114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds documentation for integrating the open-source verl framework with KubeRay for RLHF training of large language models.
- Introduces a new example walkthrough ("verl-post-training.md") detailing step-by-step instructions.
- Updates the examples index ("examples.md") to reference the new example.
- Expands the accepted vocabulary in Vale's configuration to include new terms related to verl and RLHF.
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
doc/source/cluster/kubernetes/examples/verl-post-training.md | New guide demonstrating RLHF training with verl on KubeRay. |
doc/source/cluster/kubernetes/examples.md | Updated examples index to include the new verl example. |
.vale/styles/config/vocabularies/General/accept.txt | Added new accepted vocabulary terms ("open-source", "RLHF", "verl"). |
Comments suppressed due to low confidence (1)
doc/source/cluster/kubernetes/examples.md:36
- The reference identifier 'kuberay-verl' may be confusing given that the example file is named 'verl-post-training.md'. Consider aligning the naming for consistency.
- {ref}`kuberay-verl`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small questions and comments, but generally LGTM
@@ -0,0 +1,150 @@ | |||
(kuberay-verl)= | |||
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay | |
# Reinforcement Learning with Human Feedback (RLHF) for LLMs with verl on KubeRay |
Maybe a naive question, but where does the human feedback come into this example? Is there an interactive step that I'm missing where the user is rating the output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"human feedback" happens during training the reward model. You can read https://arxiv.org/pdf/2203.02155 for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, makes sense. So the human feedback is already built into the model, not active at this stage. Thanks for enlightening me!
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.