Release v0.1.1

@terrykong

Release v0.1.1

Patch release on top of v0.1.0

🛠️ More stable mixed precision configurations and resolves OOMs observed in llama 8b
🛠️ Fixes race condition in ray.sub where pyxis can fail if subsequent srun commands are run too early (with --overlap)

What's Changed

fix: ray.sub race condition when overlapping srun commands on same node by @terrykong in #39
feat: add gpu mem and util logging to wandb/tensorboard by @terrykong in #37
ci: tests now run with HF_DATASETS_CACHE to speed up e2e time by @terrykong in #41
fix: update the instructions for multi-node setup; change the title f… by @parthchadha in #78
fix: Mixed Prec memory improvements and better default configs (converge-able) by @SahilJain314 in #32

Known Issues

gpu memory and utilization in wandb/tensorboard has a bug when enabled. This is tracked in #83

Full Changelog: v0.1.0...v0.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.1.1

Release v0.1.1

What's Changed

Known Issues

Contributors

Uh oh!