8000 Release Release v0.1.1 · NVIDIA/NeMo-RL · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Release v0.1.1

Compare
Choose a tag to compare
@terrykong terrykong released this 25 Mar 16:28
· 157 commits to main since this release
fbe85c5

Release v0.1.1

Patch release on top of v0.1.0

🛠️ More stable mixed precision configurations and resolves OOMs observed in llama 8b
🛠️ Fixes race condition in ray.sub where pyxis can fail if subsequent srun commands are run too early (with --overlap)

What's Changed

  • fix: ray.sub race condition when overlapping srun commands on same node by @terrykong in #39
  • feat: add gpu mem and util logging to wandb/tensorboard by @terrykong in #37
  • ci: tests now run with HF_DATASETS_CACHE to speed up e2e time by @terrykong in #41
  • fix: update the instructions for multi-node setup; change the title f… by @parthchadha in #78
  • fix: Mixed Prec memory improvements and better default configs (converge-able) by @SahilJain314 in #32

Known Issues

  • gpu memory and utilization in wandb/tensorboard has a bug when enabled. This is tracked in #83

Full Changelog: v0.1.0...v0.1.1

0