Release v0.1.0

@chtruong814

✅ Fast Generation - vLLM backend for optimized inference
✅ HuggingFace Integration - Works with 1-8B models (Qwen1.5, Llama)
✅ Distributed Training - FSDP support and Ray-based infrastructure
✅ Environment Support - Support for multi-environment training.
✅ Learning Algorithms - GRPO (Group Relative Policy Optimization) and SFT (Supervised Fine-Tuning)
✅ Worker Isolation - Process isolation between RL Actors (no worries about global state)

What's Changed

ci: Add initial GHA by @chtruong814 in #1
feat: reinforcer initial commit by @terrykong in #3
Checkpointing fixes by @ashors1 in #9
docs: Move adding_new_models doc to guides section by @parthchadha in #11
fix: disable mixed precision training until #13 is resolved by @parthchadha in #14
docs: Small update to sft documentation by @ashors1 in #12
ci: Update unit tests to run on self-hosted runner by @chtruong814 in #6
feat: SFT improvements: refactor and add validation and checkpointing by @ashors1 in #15
docs: GRPO documentation and Configuration cleanup by @SahilJain314 in #7
feat: lots of fixes by @terrykong in #17
feat: Configurable precision by @SahilJain314 in #19
ci: OPTIONAL -> IS_OPTIONAL by @terrykong in #22
feat: disable ray usage collection stats be default by @terrykong in #24
docs: refresh our PR template by @terrykong in #23
docs: micro doc update with a helpful reminder on environment variables by @SahilJain314 in #20
fix: disable usage stats more forcefully since container env took precedence by @terrykong in #25
feat: Enable amp with autocast (fix poor bf16 convergence on GRPO by @SahilJain314 in #26
feat: Use openmathinstruct2 training in grpo math example by @parthchadha in #18
docs: Updated adding models docs to fix latex rendering errors and fix math by @SahilJain314 in #28
fix: updated stale cluster.md by @terrykong in #30
feat: SFT convergence run changes by @yfw in #21
docs: Add SFT quickstart by @ashors1 in #29
feat: Change vllm frac to 0.6 by @parthchadha in #31

There is a known bug with SFT checkpointing that requires the full model to be gathered on GPU before saving a checkpoint. This causes OOM for larger model sizes. If you run into OOM when checkpointing, disable checkpointing by adding checkpointing.enabled=False to your run command.

Full Changelog: https://github.com/NVIDIA/NeMo-RL/commits/v0.1.0