8000 fix: fix accumulation of loss across microbatches by ashors1 · Pull Request #266 · NVIDIA/NeMo-RL · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: fix accumulation of loss across microbatches #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 9, 2025

Conversation

ashors1
Copy link
Collaborator
@ashors1 ashors1 commented Apr 25, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

closes #212
closes #213

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 25, 2025
@ashors1 ashors1 marked this pull request as ready for review May 5, 2025 21:42
terrykong
terrykong previously approved these changes May 5, 2025
SahilJain314
SahilJain314 previously approved these changes May 7, 2025
Copy link
Collaborator
@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the many repeats of masked_mean for token vs seq-level may be candidates for simplification, but I'm happy with this.

@ashors1 ashors1 dismissed stale reviews from SahilJain314 and terrykong via 742ba42 May 7, 2025 00:48
@terrykong terrykong enabled auto-merge May 7, 2025 02:56
terrykong
terrykong previously approved these changes May 7, 2025
@terrykong terrykong added this pull request to the merge queue May 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 7, 2025
terrykong
terrykong previously approved these changes May 7, 2025
@terrykong terrykong added this pull request to the merge queue May 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 7, 2025
@ashors1 ashors1 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels May 7, 2025
commit c9b29c8
Author: Terry Kong <terryk@nvidia.com>
Date:   Wed May 7 15:13:49 2025 -0700

    fix tests

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 2834320
Author: ashors1 <ashors@nvidia.com>
Date:   Tue May 6 21:54:56 2025 -0700

    fix hyperlink

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 702786a
Author: ashors1 <ashors@nvidia.com>
Date:   Tue May 6 20:44:11 2025 -0700

    fix config and failing tests

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit f093628
Author: ashors1 <ashors@nvidia.com>
Date:   Tue May 6 20:17:37 2025 -0700

    add loss functions doc to index

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 68b4122
Author: Terry Kong <terryk@nvidia.com>
Date:   Tue May 6 18:50:34 2025 -0700

    bump the recipes

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 6b7474f
Author: Terry Kong <terryk@nvidia.com>
Date:   Tue May 6 18:46:22 2025 -0700

    bumped all convergence runs

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4fc5601
Author: Terry Kong <terryk@nvidia.com>
Date:   Tue May 6 18:21:03 2025 -0700

    fix test

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit b01c149
Merge: 742ba42 6fb1c5f
Author: ashors1 <ashors@nvidia.com>
Date:   Tue May 6 17:50:37 2025 -0700

    Merge branch 'main' of github.com:NVIDIA/nemo-rl into ashors/sft-mb-loss-bug

commit 742ba42
Author: ashors1 <ashors@nvidia.com>
Date:   Tue May 6 17:48:21 2025 -0700

    remove old code

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 20e725f
Author: ashors1 <ashors@nvidia.com>
Date:   Fri May 2 10:30:57 2025 -0700

    cleanup, fix logging of normalization factor

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3427775
Author: ashors1 <ashors@nvidia.com>
Date:   Thu May 1 18:15:41 2025 -0700

    move metric to cpu

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3480271
Author: ashors1 <ashors@nvidia.com>
Date:   Thu May 1 18:03:14 2025 -0700

    scale only the loss to avoid erroneously scaling metrics logged to wandb

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit a4196a9
Author: ashors1 <ashors@nvidia.com>
Date:   Thu May 1 14:53:30 2025 -0700

    make loss type configurable from yaml

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit fd5207b
Merge: 868f94b ebb46c3
Author: ashors1 <ashors@nvidia.com>
Date:   Thu May 1 14:49:12 2025 -0700

    Merge branch 'main' of github.com:NVIDIA/nemo-rl into ashors/sft-mb-loss-bug

commit 868f94b
Author: ashors1 <ashors@nvidia.com>
Date:   Wed Apr 30 19:08:39 2025 -0700

    bug fixes

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 6931ee1
Merge: 25a93b0 5069
8000
10a
Author: ashors1 <ashors@nvidia.com>
Date:   Tue Apr 29 14:53:40 2025 -0700

    Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug

commit 25a93b0
Author: ashors1 <ashors@nvidia.com>
Date:   Tue Apr 29 14:45:36 2025 -0700

    address comments

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 5f0a624
Author: ashors1 <ashors@nvidia.com>
Date:   Tue Apr 29 08:52:11 2025 -0700

    fix tests

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 64a094f
Author: ashors1 <ashors@nvidia.com>
Date:   Mon Apr 28 17:03:19 2025 -0700

    fix metric logging with multiple global batches per rollout

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 671f464
Author: ashors1 <ashors@nvidia.com>
Date:   Mon Apr 28 13:43:34 2025 -0700

    fix normalization when rollout bs > gbs

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 51f9e5c
Author: ashors1 <ashors@nvidia.com>
Date:   Mon Apr 28 09:54:41 2025 -0700

    add doctest

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 43e51f2
Merge: c3266b1 09f5416
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 16:18:16 2025 -0700

    Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug

commit c3266b1
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 13:18:37 2025 -0700

    fix some typos

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 83ed569
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 13:13:56 2025 -0700

    add documentation

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2eeb9cc
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 12:42:56 2025 -0700

    cleanup pt 2

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2a38649
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 12:42:05 2025 -0700

    cleanup

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 65c48c3
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 12:32:33 2025 -0700

    fix metric logging

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit d4fe6d7
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 25 11:56:23 2025 -0700

    fix tests

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit b054dff
Author: ashors1 <ashors@nvidia.com>
Date:   Thu Apr 24 17:57:11 2025 -0700

    fix accumulation over microbatches to account for cases where rollout batch size > gbs

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf11dde
Author: ashors1 <ashors@nvidia.com>
Date:   Thu Apr 24 12:53:08 2025 -0700

    consider loss multiplier in token count, small fixes

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 7f22a5f
Merge: 7f99933 c8f0a01
Author: ashors1 <ashors@nvidia.com>
Date:   Thu Apr 24 12:18:48 2025 -0700

    Merge branch 'main' of github.com:NVIDIA/reinforcer into ashors/sft-mb-loss-bug

commit 7f99933
Author: ashors1 <ashors@nvidia.com>
Date:   Thu Apr 24 10:39:53 2025 -0700

    fix tests

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 840fda7
Author: ashors1 <ashors@nvidia.com>
Date:   Wed Apr 23 20:52:16 2025 -0700

    fix tests

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 20eb1ef
Author: ashors1 <ashors@nvidia.com>
Date:   Wed Apr 23 16:54:32 2025 -0700

    small fixes

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ffc5245
Author: ashors1 <ashors@nvidia.com>
Date:   Wed Apr 23 16:51:28 2025 -0700

    fixes, update GRPO

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit a867283
Author: ashors1 <ashors@nvidia.com>
Date:   Wed Apr 23 12:27:03 2025 -0700

    fixes

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 7716cd7
Author: ashors1 <ashors@nvidia.com>
Date:   Fri Apr 18 16:05:51 2025 -0700

    wip fix for accumulating sft loss across microbathes

    Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the ashors/sft-mb-loss-bug branch from c9b29c8 to c881174 Compare May 7, 2025 22:33
Signed-off-by: Terry Kong <terryk@nvidia.com>
terrykong
terrykong previously approved these changes May 7, 2025
@terrykong terrykong added this pull request to the merge queue May 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch May 7, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels May 8, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels May 8, 2025
@terrykong terrykong enabled auto-merge May 8, 2025 21:32
terrykong
terrykong previously approved these changes May 8, 2025
@terrykong terrykong added CI:docs Run doctest and removed CI:L1 Run doctests, unit tests, and functional tests labels May 9, 2025
@terrykong terrykong added this pull request to the merge queue May 9, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch May 9, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
@terrykong terrykong enabled auto-merge May 9, 2025 18:26
terrykong
terrykong previously approved these changes May 9, 2025
@terrykong terrykong added this pull request to the merge queue May 9, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 9, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
@terrykong terrykong added this pull request to the merge queue May 9, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 9, 2025
@terrykong terrykong added this pull request to the merge queue May 9, 2025
Merged via the queue into main with commit af9f6e8 May 9, 2025
13 checks passed
@terrykong terrykong deleted the ashors/sft-mb-loss-bug branch May 9, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:docs Run doctest documentation Improvements or additions to documentation r0.2.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix gradient accumulation bug in SFT Correctly handle loss normalization over microbatches
3 participants
0