-
Notifications
You must be signed in to change notification settings - Fork 56
Insights: NVIDIA-NeMo/RL
Overview
Could not load contribution data
Please try again later
16 Pull requests merged by 9 people
-
feat: support async in non-colocated
#523 merged
Jun 27, 2025 -
fix: add dynamic_batching key to SFT OpenMathInstruct config
#570 merged
Jun 27, 2025 -
feat: Log code in wandb
#175 merged
Jun 27, 2025 -
feat: vllm Model diagnostic test checking long generation quality
#516 merged
Jun 27, 2025 -
Allow uneven shards for multi-GPU inference in vllm backend
#494 merged
Jun 27, 2025 -
fix: remove visualization code
#566 merged
Jun 27, 2025 -
fix: Add assertion if async is disabled when using pp with vllm
#565 merged
Jun 26, 2025 -
fix: remove reference_model_buffers in fsdp2
#558 merged
Jun 26, 2025 -
fix: fix pytest -k test usage
#556 merged
Jun 26, 2025 -
feat: Multi turn async
#506 merged
Jun 26, 2025 -
ci: Reduce expected mem usage for sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long
#548 merged
Jun 26, 2025 -
docs: release runs on front page readme
#550 merged
Jun 25, 2025 -
docs: enable the mcore instructions
#546 merged
Jun 25, 2025 -
feat: make torch index explicit to support grace-hopper/GH200/aarch64
#533 merged
Jun 25, 2025 -
fix: fix Ray typing to not use internal package
#537 merged
Jun 24, 2025 -
fix: increase test timeout 2hr -> 3hr
#542 merged
Jun 24, 2025
19 Pull requests opened by 11 people
-
feat: MLFlow Integration for experiment tracking
#534 opened
Jun 21, 2025 -
feat: Support pass@k
#536 opened
Jun 23, 2025 -
fix: Fix checkpoint overriding
#538 opened
Jun 23, 2025 -
feat: improve worker group args/kwargs
#539 opened
Jun 24, 2025 -
draft: fp8 block scaling
#543 opened
Jun 24, 2025 -
fix: load HF model only on rank 0
#544 opened
Jun 24, 2025 -
feat: add flash-attn to core dependencies
#545 opened
Jun 24, 2025 -
docs: Add a note on supported backends
#553 opened
Jun 25, 2025 -
feat: Add megatron to hf converter
#555 opened
Jun 25, 2025 -
feat: supports evaluation of multiple-choice benchmarks
#559 opened
Jun 26, 2025 -
docs: fix some typos on nsys/model-quirk pages
#560 opened
Jun 26, 2025 -
fix: fix overlap param gather
#561 opened
Jun 26, 2025 -
feat: Reduce number of Cuda IPC in Refit
#568 opened
Jun 26, 2025 -
feat: Megatron EP + Deepseek
#571 opened
Jun 27, 2025 -
fix: correct mcore dtype + assertion on activation_func
#572 opened
Jun 27, 2025 -
feat: Qwen3 MoE support
#573 opened
Jun 27, 2025 -
fix: move core ray port from 6379 -> 54258 to reduce port collision freq
#574 opened
Jun 27, 2025 -
feat: decouple checkpointing from validation
#575 opened
Jun 27, 2025 -
fix: Megatron config fixes
#576 opened
Jun 27, 2025
7 Issues closed by 2 people
-
Add non-colocated refit
#394 closed
Jun 27, 2025 -
support async non-colocated vllm
#508 closed
Jun 27, 2025 -
save diffs when running code automatically
#146 closed
Jun 27, 2025 -
Support un-even dispatches
#125 closed
Jun 27, 2025 -
Number of eval samples has to be divisible by `gpus_per_node`
#562 closed
Jun 26, 2025 -
Megatron Training + In-framework inference
#48 closed
Jun 24, 2025 -
Megatron Training + vLLM inference
#47 closed
Jun 24, 2025
9 Issues opened by 7 people
-
Holistic Profiling Tool in RL Workflow
#569 opened
Jun 26, 2025 -
Does this project require CUDA 12.8?
#567 opened
Jun 26, 2025 -
NCCL error when using non-colocated generation and set_model_state_dict apis
#564 opened
Jun 26, 2025 -
Evaluation results differ when `num_prompts_per_step` is different
#563 opened
Jun 26, 2025 -
Support non-colocated in mcore worker
#557 opened
Jun 26, 2025 -
Fix overlap param gather + distributed optimizer in Megatron path
#552 opened
Jun 25, 2025 -
Parallel get logprobs when CP/TP mixed case for FSDP2.
#549 opened
Jun 25, 2025 -
Support cons@k in eval
#541 opened
Jun 24, 2025 -
Support pass@k in eval
#540 opened
Jun 24, 2025
17 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
v0 VLM support + GRPO pipeline
#521 commented on
Jun 26, 2025 • 15 new comments -
feat: optimize get logprobs when cp enabled.
#528 commented on
Jun 26, 2025 • 3 new comments -
docs: Update guide to include minimum compute requirement
#505 commented on
Jun 27, 2025 • 1 new comment -
Add code environment
#497 commented on
Jun 27, 2025 • 1 new comment -
feat: guide to configure custom vllm version
#529 commented on
Jun 27, 2025 • 0 new comments -
fix: Mcore: remove explicit refit buffer sizing and added functional grpo test
#527 commented on
Jun 26, 2025 • 0 new comments -
Speedup refit
#519 commented on
Jun 25, 2025 • 0 new comments -
chore: Update github url after org transfer
#512 commented on
Jun 26, 2025 • 0 new comments -
docs: Add missing arguments to DeepScaler evaluation
#502 commented on
Jun 27, 2025 • 0 new comments -
feat: Enable vLLM cudagraphs
#498 commented on
Jun 27, 2025 • 0 new comments -
ci: Add python and ray to the base build stage
#483 commented on
Jun 24, 2025 • 0 new comments -
[diff only do not merge] tracks local against nemo-posttraining
#458 commented on
Jun 27, 2025 • 0 new comments -
feat: add data shuffle option
#334 commented on
Jun 26, 2025 • 0 new comments -
Added sequence packing
#300 commented on
Jun 26, 2025 • 0 new comments -
Allow saving checkpoints in sft without running validation
#441 commented on
Jun 27, 2025 • 0 new comments -
Support for MLflow
#514 commented on
Jun 21, 2025 • 0 new comments -
Install on GH200 - Torch not compiled with CUDA enabled
#532 commented on
Jun 21, 2025 • 0 new comments