Release v0.1.0
Release v0.1.0
- ✅ Fast Generation - vLLM backend for optimized inference
- ✅ HuggingFace Integration - Works with 1-8B models (Qwen1.5, Llama)
- ✅ Distributed Training - FSDP support and Ray-based infrastructure
- ✅ Environment Support - Support for multi-environment training.
- ✅ Learning Algorithms - GRPO (Group Relative Policy Optimization) and SFT (Supervised Fine-Tuning)
- ✅ Worker Isolation - Process isolation between RL Actors (no worries about global state)
What's Changed
- ci: Add initial GHA by @chtruong814 in #1
- feat: reinforcer initial commit by @terrykong in #3
- Checkpointing fixes by @ashors1 in #9
- docs: Move adding_new_models doc to guides section by @parthchadha in #11
- fix: disable mixed precision training until #13 is resolved by @parthchadha in #14
- docs: Small update to sft documentation by @ashors1 in #12
- ci: Update unit tests to run on self-hosted runner by @chtruong814 in #6
- feat: SFT improvements: refactor and add validation and checkpointing by @ashors1 in #15
- docs: GRPO documentation and Configuration cleanup by @SahilJain314 in #7
- feat: lots of fixes by @terrykong in #17
- feat: Configurable precision by @SahilJain314 in #19
- ci: OPTIONAL -> IS_OPTIONAL by @terrykong in #22
- feat: disable ray usage collection stats be default by @terrykong in #24
- docs: refresh our PR template by @terrykong in #23
- docs: micro doc update with a helpful reminder on environment variables by @SahilJain314 in #20
- fix: disable usage stats more forcefully since container env took precedence by @terrykong in #25
- feat: Enable amp with autocast (fix poor bf16 convergence on GRPO by @SahilJain314 in #26
- feat: Use openmathinstruct2 training in grpo math example by @parthchadha in #18
- docs: Updated adding models docs to fix latex rendering errors and fix math by @SahilJain314 in #28
- fix: updated stale cluster.md by @terrykong in #30
- feat: SFT convergence run changes by @yfw in #21
- docs: Add SFT quickstart by @ashors1 in #29
- feat: Change vllm frac to 0.6 by @parthchadha in #31
New Contributors
- @chtruong814 made their first contribution in #1
- @terrykong made their first contribution in #3
- @ashors1 made their first contribution in #9
- @parthchadha made their first contribution in #11
- @yfw made their first contribution in #21
Known Issues
- There is a known bug with SFT checkpointing that requires the full model to be gathered on GPU before saving a checkpoint. This causes OOM for larger model sizes. If you run into OOM when checkpointing, disable checkpointing by adding
checkpointing.enabled=False
to your run command.
Full Changelog: https://github.com/NVIDIA/NeMo-RL/commits/v0.1.0