Pulse · NVIDIA-NeMo/RL · GitHub

8000 Pulse · NVIDIA-NeMo/RL · GitHub

More Web Proxy on the site http://driver.im/

June 20, 2025 – June 27, 2025

Overview

35 Active pull requests

16 Active issues

16 Pull requests merged by 9 people

feat: support async in non-colocated
#523 merged Jun 27, 2025
fix: add dynamic_batching key to SFT OpenMathInstruct config
#570 merged Jun 27, 2025
feat: Log code in wandb
#175 merged Jun 27, 2025
feat: vllm Model diagnostic test checking long generation quality
#516 merged Jun 27, 2025
Allow uneven shards for multi-GPU inference in vllm backend
#494 merged Jun 27, 2025
fix: remove visualization code
#566 merged Jun 27, 2025
fix: Add assertion if async is disabled when using pp with vllm
#565 merged Jun 26, 2025
fix: remove reference_model_buffers in fsdp2
#558 merged Jun 26, 2025
fix: fix pytest -k test usage
#556 merged Jun 26, 2025
feat: Multi turn async
#506 merged Jun 26, 2025
ci: Reduce expected mem usage for sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long
#548 merged Jun 26, 2025
docs: release runs on front page readme
#550 merged Jun 25, 2025
docs: enable the mcore instructions
#546 merged Jun 25, 2025
feat: make torch index explicit to support grace-hopper/GH200/aarch64
#533 merged Jun 25, 2025
fix: fix Ray typing to not use internal package
#537 merged Jun 24, 2025
fix: increase test timeout 2hr -> 3hr
#542 merged Jun 24, 2025

19 Pull requests opened by 11 people

feat: MLFlow Integration for experiment tracking
#534 opened Jun 21, 2025
feat: Support pass@k
#536 opened Jun 23, 2025
fix: Fix checkpoint overriding
#538 opened Jun 23, 2025
feat: improve worker group args/kwargs
#539 opened Jun 24, 2025
draft: fp8 block scaling
#543 opened Jun 24, 2025
fix: load HF model only on rank 0
#544 opened Jun 24, 2025
feat: add flash-attn to core dependencies
#545 opened Jun 24, 2025
docs: Add a note on supported backends
#553 opened Jun 25, 2025
feat: Add megatron to hf converter
#555 opened Jun 25, 2025
feat: supports evaluation of multiple-choice benchmarks
#559 opened Jun 26, 2025
docs: fix some typos on nsys/model-quirk pages
#560 opened Jun 26, 2025
fix: fix overlap param gather
#561 opened Jun 26, 2025
feat: Reduce number of Cuda IPC in Refit
#568 opened Jun 26, 2025
feat: Megatron EP + Deepseek
#571 opened Jun 27, 2025
fix: correct mcore dtype + assertion on activation_func
#572 opened Jun 27, 2025
feat: Qwen3 MoE support
#573 opened Jun 27, 2025
fix: move core ray port from 6379 -> 54258 to reduce port collision freq
#574 opened Jun 27, 2025
feat: decouple checkpointing from validation
#575 opened Jun 27, 2025
fix: Megatron config fixes
#576 opened Jun 27, 2025

7 Issues closed by 2 people

Add non-colocated refit
#394 closed Jun 27, 2025
support async non-colocated vllm
#508 closed Jun 27, 2025
save diffs when running code automatically
#146 closed Jun 27, 2025
Support un-even dispatches
#125 closed Jun 27, 2025
Number of eval samples has to be divisible by `gpus_per_node`
#562 closed Jun 26, 2025
Megatron Training + In-framework inference
#48 closed Jun 24, 2025
Megatron Training + vLLM inference
#47 closed Jun 24, 2025

9 Issues opened by 7 people

Holistic Profiling Tool in RL Workflow
#569 opened Jun 26, 2025
Does this project require CUDA 12.8?
#567 opened Jun 26, 2025
NCCL error when using non-colocated generation and set_model_state_dict apis
#564 opened Jun 26, 2025
Evaluation results differ when `num_prompts_per_step` is different
#563 opened Jun 26, 2025
Support non-colocated in mcore worker
#557 opened Jun 26, 2025
Fix overlap param gather + distributed optimizer in Megatron path
#552 opened Jun 25, 2025
Parallel get logprobs when CP/TP mixed case for FSDP2.
#549 opened Jun 25, 2025
Support cons@k in eval
#541 opened Jun 24, 2025
Support pass@k in eval
#540 opened Jun 24, 2025

17 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

v0 VLM support + GRPO pipeline
#521 commented on Jun 26, 2025 • 15 new comments
feat: optimize get logprobs when cp enabled.
#528 commented on Jun 26, 2025 • 3 new comments
docs: Update guide to include minimum compute requirement
#505 commented on Jun 27, 2025 • 1 new comment
Add code environment
#497 commented on Jun 27, 2025 • 1 new comment
feat: guide to configure custom vllm version
#529 commented on Jun 27, 2025 • 0 new comments
fix: Mcore: remove explicit refit buffer sizing and added functional grpo test
#527 commented on Jun 26, 2025 • 0 new comments
Speedup refit
#519 commented on Jun 25, 2025 • 0 new comments
chore: Update github url after org transfer
#512 commented on Jun 26, 2025 • 0 new comments
docs: Add missing arguments to DeepScaler evaluation
#502 commented on Jun 27, 2025 • 0 new comments
feat: Enable vLLM cudagraphs
#498 commented on Jun 27, 2025 • 0 new comments
ci: Add python and ray to the base build stage
#483 commented on Jun 24, 2025 • 0 new comments
[diff only do not merge] tracks local against nemo-posttraining
#458 commented on Jun 27, 2025 • 0 new comments
feat: add data shuffle option
#334 commented on Jun 26, 2025 • 0 new comments
Added sequence packing
#300 commented on Jun 26, 2025 • 0 new comments
Allow saving checkpoints in sft without running validation
#441 commented on Jun 27, 2025 • 0 new comments
Support for MLflow
#514 commented on Jun 21, 2025 • 0 new comments
Install on GH200 - Torch not compiled with CUDA enabled
#532 commented on Jun 21, 2025 • 0 new comments

0