Release v0.1.1
Release v0.1.1
Patch release on top of v0.1.0
🛠️ More stable mixed precision configurations and resolves OOMs observed in llama 8b
🛠️ Fixes race condition in ray.sub
where pyxis can fail if subsequent srun
commands are run too early (with --overlap
)
What's Changed
- fix: ray.sub race condition when overlapping srun commands on same node by @terrykong in #39
- feat: add gpu mem and util logging to wandb/tensorboard by @terrykong in #37
- ci: tests now run with HF_DATASETS_CACHE to speed up e2e time by @terrykong in #41
- fix: update the instructions for multi-node setup; change the title f… by @parthchadha in #78
- fix: Mixed Prec memory improvements and better default configs (converge-able) by @SahilJain314 in #32
Known Issues
- gpu memory and utilization in wandb/tensorboard has a bug when enabled. This is tracked in #83
Full Changelog: v0.1.0...v0.1.1