8000 fix: Fixed max seqlen not respected correctly by SahilJain314 · Pull Request #299 · NVIDIA/NeMo-RL · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix: Fixed max seqlen not respected correctly #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 30, 2025

Conversation

SahilJain314
Copy link
Collaborator

No description provided.

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
parthchadha
parthchadha previously approved these changes Apr 30, 2025
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
@SahilJain314 SahilJain314 added this pull request to the merge queue Apr 30, 2025
Merged via the queue into main with commit 04f30bb Apr 30, 2025
12 checks passed
@SahilJain314 SahilJain314 deleted the sahilj/multiturn_seqlen_fix branch April 30, 2025 20:20
terrykong added a commit that referenced this pull request May 1, 2025
commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>
terrykong pushed a commit that referenced this pull request May 1, 2025
Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Tech pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

fix typo

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Incorporated Reviewer Comments in ReadMe

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs updates to file

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pups updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech pubs updates to resolve some threads

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Tech Pubs minor edits to files

Signed-off-by: Jennifer Gerhold <163925524+jgerh@users.noreply.github.com>

Squashed commit of the following:

commit ebb46c3
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:46 2025 -0700

    fix: fix dtype of empty `token_ids` for consistency (#290)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit cf8f045
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 30 15:03:19 2025 -0700

    chore: Remove outdated comment in DPO config (#293)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 04f30bb
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Wed Apr 30 12:19:47 2025 -0700

    fix: Fixed max seqlen not respected correctly (#299)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit daac5d9
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 29 17:30:05 2025 -0700

    chore: Remove online hf checkpointing (#285)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 3cd8be8
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 29 15:18:37 2025 -0700

    feat: Remove 'last 100' hack for math verifier (#287)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Co-authored-by: Terry Kong <terryk@nvidia.com>

commit 506910a
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 11:29:22 2025 -0700

    test: add a test that checks if recipes can be merged into the base config (#288)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af43261
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 29 09:18:14 2025 -0700

    chore: add isort rules and pyflakes in ruff/precommit (#291)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8b0837c
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 29 23:57:41 2025 +0800

    ci: add eval functional test (#269)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit 68beb6d
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 23:35:01 2025 -0700

    feat: rename ratio_eps_{min/max} to ratio_clip_{min/max} for clarity (#283)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 2f5d22f
Author: Hemil Desai <hemild@nvidia.com>
Date:   Mon Apr 28 16:09:00 2025 -0700

    feat: Add hydra style overrides to SFT (#208)

    Signed-off-by: Hemil Desai <hemild@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Co-authored-by: ashors1 <ashors@nvidia.com>

commit 8a22c44
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 15:11:03 2025 -0700

    feat: publish convergence/release runs (#214)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit af94d43
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 15:02:19 2025 -0700

    fix: fixes #264 where tied weights check didn't work on fsdp1 (#284)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Parth Chadha <parth29@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 1363dba
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 28 12:44:56 2025 -0700

    fix: improve port selection and exiting early from ray.sub (#272)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 044f385
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Mon Apr 28 14:22:55 2025 -0500

    docs: Correcting build issues and CI (#270)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 0fae6bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Mon Apr 28 11:08:51 2025 -0700

    feat: Updated Name to NeMo RL (#265)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 34cae3a
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 28 08:16:51 2025 -0700

    fix: add bibtex entry (#273)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit ee0d2c8
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sat Apr 26 20:15:38 2025 -0700

    docs: instruct users to git clone before beginning (#257)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 09f5416
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 25 13:46:41 2025 -0700

    feat: E2E multi-turn RL example with a sliding puzzle game (#242)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 47e51d3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Fri Apr 25 10:13:59 2025 -0700

    chore: better logging when insufficient resources (#271)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 98473c6
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 22:28:05 2025 -0700

    fix: Update DPO and SFT configs to use dtensor (#256)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit 2558444
Author: Anna Shors <ashors@nvidia.com>
Date:   Thu Apr 24 11:02:26 2025 -0700

    fix: Fix fsdp1 grad clipping and log grad norm (#251)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit c8f0a01
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 17:58:43 2025 -0700

    docs: add qwen 32b instruction and add 0.3 planned features (#255)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0a5f31d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 17:49:06 2025 -0400

    fix: fix broken eval script (#253)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2f8a140
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Wed Apr 23 12:47:18 2025 -0700

    ci: L1 default and increase test time (#252)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 1c7cbd9
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 23 12:52:13 2025 -0400

    fix: use find_tied_parameters api from HF for tied weight keys (#250)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 1788e4c
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 22 22:05:53 2025 -0400

    fix: raise error if tied weights model is being trained with fsdp1 or… (#229)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: mckimn <nmckimpson@nvidia.com>

commit 1fa4c7a
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 16:38:50 2025 -0700

    fix: Fix indent in dtensor policy (#248)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit ed546ae
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Wed Apr 23 07:29:47 2025 +0800

    feat: streaming each dtensor in refit (#176)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>

commit 5c62657
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Tue Apr 22 14:14:40 2025 -0700

    feat: Importance sampling trick (#174)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit deaece6
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Tue Apr 22 12:39:35 2025 -0700

    feat: Add support for multi-turn generations and RL (tools, games, etc) (#218)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit 1245c50
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 12:19:42 2025 -0700

    fix: Speed up DPO functional test (#241)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit af369a3
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 22 12:17:03 2025 -0700

    fix: Move ray worker port range start from 20001 to 53001 (#235)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 756152c
Author: Anna Shors <ashors@nvidia.com>
Date:   Tue Apr 22 10:02:34 2025 -0700

    feat: Support multi-epoch training in SFT (#177)

    Signed-off-by: ashors1 <ashors@nvidia.com>

commit bbdd671
Author: Anna Shors <ashors@nvidia.com>
Date:   Mon Apr 21 22:16:15 2025 -0700

    feat: DPO (#180)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 88bc0fd
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 21 17:31:23 2025 -0700

    ci: Remove external config from project (#200)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 4a2e126
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 21 17:34:59 2025 -0400

    fix: skip vllm p2p check since its flaky (#238)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 22af21c
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:29 2025 -0700

    feat: FSDP2 SFT (#206)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit e36f488
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Mon Apr 21 12:41:24 2025 -0700

    fix: Fix missing import (#222)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>

commit 98b7a90
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 10:06:09 2025 -0700

    docs: update docs everywhere to remove uv pip install which isn't reliable (#217)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit da191b4
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Sun Apr 20 07:56:55 2025 -0700

    feat: introduce a debug API for backoff and retries for RayVirtualCluster (#234)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8780093
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Fri Apr 18 17:06:54 2025 -0700

    feat: Add total logging of generations in training (#172)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

commit ce2d121
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Sat Apr 19 00:22:11 2025 +0800

    fix: fix chat_template in eval (#210)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>

commit f8b6ba9
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 12:52:19 2025 -0700

    fix: grpo func test 10 step -> 3 step to speed up CI (#209)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 4a6f62b
Author: Gerald Shen <119401249+gshennvm@users.noreply.github.com>
Date:   Thu Apr 17 11:06:05 2025 -0700

    feat: Add FSDP2, DTensor SP/TP, activation checkpointing support (#131)

    Signed-off-by: Gerald Shen <geshen@nvidia.com>

commit 78a9834
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Thu Apr 17 10:03:34 2025 -0700

    fix: ci uses umask (#211)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 5ff10f6
Author: alexchiu <qiuzhaopeng@foxmail.com>
Date:   Thu Apr 17 08:38:45 2025 +0800

    fix: prevent division by zero in ClippedPGLossFn calculation (#166)

    Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
    Signed-off-by: Alex Qiu <alexq@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit 6db2f7a
Author: Yi-Fu Wu <yifu.wu@gmail.com>
Date:   Wed Apr 16 15:53:12 2025 -0700

    feat: Fix CPU offloading + add options for FSDP offload and activation ckpting (#123)

    Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit 62ac8d2
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 15:38:53 2025 -0500

    ci: Only include dependencies in test container (#203)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

commit b00fcc8
Author: Anna Shors <ashors@nvidia.com>
Date:   Wed Apr 16 13:23:40 2025 -0700

    fix: chat template improvements (#148)

    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Parth Chadha <pchadha@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit df31f50
Author: Charlie Truong <chtruong@nvidia.com>
Date:   Wed Apr 16 13:13:58 2025 -0500

    ci: Run tests only in merge queue or when labeled (#159)

    Signed-off-by: Charlie Truong <chtruong@nvidia.com>

commit e3af337
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Wed Apr 16 09:23:30 2025 -0700

    feat: Upgrade to vllm v1 runtime (#170)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: Anna Shors <ashors@nvidia.com>
    Signed-off-by: Terry Kong <terryk@nvidia.com>
    Signed-off-by: Sahil Jain <sahilj@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
    Co-authored-by: yuki <48991475+yuki-666@users.noreply.github.com>
    Co-authored-by: Anna Shors <ashors@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit dd7c2d7
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 16:00:04 2025 -0700

    fix: unit test script halts on first failure (#189)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 92c3f1d
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Tue Apr 15 15:42:01 2025 -0700

    feat: add a unique seed for each vllm llm engine (#171)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit 2ae8935
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 14:41:21 2025 -0700

    docs: remove backticks from uv.md title (#179)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 9ac4e62
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 12:37:35 2025 -0700

    fix: convert DCP to HF script works without ray cluster (#185)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 8213014
Author: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Date:   Tue Apr 15 13:55:54 2025 -0500

    docs: Correcting file names (#161)

    Signed-off-by: Andrew Schilling <aschilling@nvidia.com>

commit 4db3167
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Tue Apr 15 11:07:51 2025 -0700

    fix: default to less verbose logging + uv-venv log once per worker  (#141)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit bda6522
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 22:31:56 2025 -0700

    docs: run tests with --group test to avoid missing test deps (#188)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit c1fc972
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 20:43:51 2025 -0700

    ci: Update to include public/ folder for pages deployment (#182)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit e9812f1
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Mon Apr 14 20:05:46 2025 -0700

    fix: don't use cuda-graphs for vllm generation (#187)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit d7d4cd6
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 15:46:13 2025 -0700

    ci: labels for docs/L0/L1/L2 and run even if only doc test (#181)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit 0637511
Author: yuki <48991475+yuki-666@users.noreply.github.com>
Date:   Tue Apr 15 05:24:07 2025 +0800

    feat: support arbitrary end_strings (#96)

    Signed-off-by: Yuki Huang <yukih@nvidia.com>
    Co-authored-by: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>

commit c99585c
Author: Terry Kong <terrycurtiskong@gmail.com>
Date:   Mon Apr 14 14:44:43 2025 -0700

    fix: allow configuring ray ports in ray.sub in case conflict on cluster (#173)

    Signed-off-by: Terry Kong <terryk@nvidia.com>

commit a5547f2
Author: mckimn <nmckimpson@nvidia.com>
Date:   Mon Apr 14 09:18:02 2025 -0700

    docs: Fix doc build warnings and add external CI config (#157)

    Signed-off-by: Nathan McKimpson <nmckimpson@nvidia.com>

commit 32953be
Author: Parth Chadha <pchadha@nvidia.com>
Date:   Fri Apr 11 10:18:03 2025 -0700

    fix: always test vllm (#167)

    Signed-off-by: Parth Chadha <pchadha@nvidia.com>

commit c00b8bc
Author: Sahil Jain <48468750+SahilJain314@users.noreply.github.com>
Date:   Thu Apr 10 22:38:40 2025 -0700

    test: Add grpo/reinforce/ppo loss tests (prep for incoming vocab parallel changes) (#162)

    Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0