-
Notifications
You must be signed in to change notification settings - Fork 25
fix: fix accumulation of loss across microbatches #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
terrykong
reviewed
Apr 25, 2025
terrykong
reviewed
Apr 25, 2025
terrykong
reviewed
Apr 25, 2025
terrykong
reviewed
Apr 25, 2025
terrykong
reviewed
Apr 25, 2025
terrykong
previously approved these changes
May 5, 2025
SahilJain314
previously approved these changes
May 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the many repeats of masked_mean for token vs seq-level may be candidates for simplification, but I'm happy with this.
terrykong
previously approved these changes
May 7, 2025
terrykong
previously approved these changes
May 7, 2025
commit c9b29c8 Author: Terry Kong <terryk@nvidia.com> Date: Wed May 7 15:13:49 2025 -0700 fix tests Signed-off-by: Terry Kong <terryk@nvidia.com> commit 2834320 Author: ashors1 <ashors@nvidia.com> Date: Tue May 6 21:54:56 2025 -0700 fix hyperlink Signed-off-by: ashors1 <ashors@nvidia.com> commit 702786a Author: ashors1 <ashors@nvidia.com> Date: Tue May 6 20:44:11 2025 -0700 fix config and failing tests Signed-off-by: ashors1 <ashors@nvidia.com> commit f093628 Author: ashors1 <ashors@nvidia.com> Date: Tue May 6 20:17:37 2025 -0700 add loss functions doc to index Signed-off-by: ashors1 <ashors@nvidia.com> commit 68b4122 Author: Terry Kong <terryk@nvidia.com> Date: Tue May 6 18:50:34 2025 -0700 bump the recipes Signed-off-by: Terry Kong <terryk@nvidia.com> commit 6b7474f Author: Terry Kong <terryk@nvidia.com> Date: Tue May 6 18:46:22 2025 -0700 bumped all convergence runs Signed-off-by: Terry Kong <terryk@nvidia.com> commit 4fc5601 Author: Terry Kong <terryk@nvidia.com> Date: Tue May 6 18:21:03 2025 -0700 fix test Signed-off-by: Terry Kong <terryk@nvidia.com> commit b01c149 Merge: 742ba42 6fb1c5f Author: ashors1 <ashors@nvidia.com> Date: Tue May 6 17:50:37 2025 -0700 Merge branch 'main' of github.com:NVIDIA/nemo-rl into ashors/sft-mb-loss-bug commit 742ba42 Author: ashors1 <ashors@nvidia.com> Date: Tue May 6 17:48:21 2025 -0700 remove old code Signed-off-by: ashors1 <ashors@nvidia.com> commit 20e725f Author: ashors1 <ashors@nvidia.com> Date: Fri May 2 10:30:57 2025 -0700 cleanup, fix logging of normalization factor Signed-off-by: ashors1 <ashors@nvidia.com> commit 3427775 Author: ashors1 <ashors@nvidia.com> Date: Thu May 1 18:15:41 2025 -0700 move metric to cpu Signed-off-by: ashors1 <ashors@nvidia.com> commit 3480271 Author: ashors1 <ashors@nvidia.com> Date: Thu May 1 18:03:14 2025 -0700 scale only the loss to avoid erroneously scaling metrics logged to wandb Signed-off-by: ashors1 <ashors@nvidia.com> commit a4196a9 Author: ashors1 <ashors@nvidia.com> Date: Thu May 1 14:53:30 2025 -0700 make loss type configurable from yaml Signed-off-by: ashors1 <ashors@nvidia.com> commit fd5207b Merge: 868f94b ebb46c3 Author: ashors1 <ashors@nvidia.com> Date: Thu May 1 14:49:12 2025 -0700 Merge branch 'main' of github.com:NVIDIA/nemo-rl into ashors/sft-mb-loss-bug commit 868f94b Author: ashors1 <ashors@nvidia.com> Date: Wed Apr 30 19:08:39 2025 -0700 bug fixes Signed-off-by: ashors1 <ashors@nvidia.com> commit 6931ee1 Merge: 25a93b0 5069 8000 10a Author: ashors1 <ashors@nvidia.com> Date: Tue Apr 29 14:53:40 2025 -0700 Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug commit 25a93b0 Author: ashors1 <ashors@nvidia.com> Date: Tue Apr 29 14:45:36 2025 -0700 address comments Signed-off-by: ashors1 <ashors@nvidia.com> commit 5f0a624 Author: ashors1 <ashors@nvidia.com> Date: Tue Apr 29 08:52:11 2025 -0700 fix tests Signed-off-by: ashors1 <ashors@nvidia.com> commit 64a094f Author: ashors1 <ashors@nvidia.com> Date: Mon Apr 28 17:03:19 2025 -0700 fix metric logging with multiple global batches per rollout Signed-off-by: ashors1 <ashors@nvidia.com> commit 671f464 Author: ashors1 <ashors@nvidia.com> Date: Mon Apr 28 13:43:34 2025 -0700 fix normalization when rollout bs > gbs Signed-off-by: ashors1 <ashors@nvidia.com> commit 51f9e5c Author: ashors1 <ashors@nvidia.com> Date: Mon Apr 28 09:54:41 2025 -0700 add doctest Signed-off-by: ashors1 <ashors@nvidia.com> commit 43e51f2 Merge: c3266b1 09f5416 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 16:18:16 2025 -0700 Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug commit c3266b1 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 13:18:37 2025 -0700 fix some typos Signed-off-by: ashors1 <ashors@nvidia.com> commit 83ed569 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 13:13:56 2025 -0700 add documentation Signed-off-by: ashors1 <ashors@nvidia.com> commit 2eeb9cc Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 12:42:56 2025 -0700 cleanup pt 2 Signed-off-by: ashors1 <ashors@nvidia.com> commit 2a38649 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 12:42:05 2025 -0700 cleanup Signed-off-by: ashors1 <ashors@nvidia.com> commit 65c48c3 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 12:32:33 2025 -0700 fix metric logging Signed-off-by: ashors1 <ashors@nvidia.com> commit d4fe6d7 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 25 11:56:23 2025 -0700 fix tests Signed-off-by: ashors1 <ashors@nvidia.com> commit b054dff Author: ashors1 <ashors@nvidia.com> Date: Thu Apr 24 17:57:11 2025 -0700 fix accumulation over microbatches to account for cases where rollout batch size > gbs Signed-off-by: ashors1 <ashors@nvidia.com> commit cf11dde Author: ashors1 <ashors@nvidia.com> Date: Thu Apr 24 12:53:08 2025 -0700 consider loss multiplier in token count, small fixes Signed-off-by: ashors1 <ashors@nvidia.com> commit 7f22a5f Merge: 7f99933 c8f0a01 Author: ashors1 <ashors@nvidia.com> Date: Thu Apr 24 12:18:48 2025 -0700 Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug commit 7f99933 Author: ashors1 <ashors@nvidia.com> Date: Thu Apr 24 10:39:53 2025 -0700 fix tests Signed-off-by: ashors1 <ashors@nvidia.com> commit 840fda7 Author: ashors1 <ashors@nvidia.com> Date: Wed Apr 23 20:52:16 2025 -0700 fix tests Signed-off-by: ashors1 <ashors@nvidia.com> commit 20eb1ef Author: ashors1 <ashors@nvidia.com> Date: Wed Apr 23 16:54:32 2025 -0700 small fixes Signed-off-by: ashors1 <ashors@nvidia.com> commit ffc5245 Author: ashors1 <ashors@nvidia.com> Date: Wed Apr 23 16:51:28 2025 -0700 fixes, update GRPO Signed-off-by: ashors1 <ashors@nvidia.com> commit a867283 Author: ashors1 <ashors@nvidia.com> Date: Wed Apr 23 12:27:03 2025 -0700 fixes Signed-off-by: ashors1 <ashors@nvidia.com> commit 7716cd7 Author: ashors1 <ashors@nvidia.com> Date: Fri Apr 18 16:05:51 2025 -0700 wip fix for accumulating sft loss across microbathes Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Terry Kong <terryk@nvidia.com>
c9b29c8
to
c881174
Compare
terrykong
previously approved these changes
May 7, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
terrykong
previously approved these changes
May 8, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
terrykong
previously approved these changes
May 9, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
terrykong
approved these changes
May 9, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
closes #212
closes #213
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information