v2.1.0

What's new

Added 🎉

Added 50B Dolmino 11/24 mix.
Added support for auxiliary-loss-free MoE load-balancing, similar to DeepSeek-v3. You can activate this by setting bias_gamma to a non-zero float in your MoERouter config.
Added support for sequence-level MoE load balancing loss.
Compatibility with B200s.
Added support for warmup_fraction as an alternative to warmup_steps in all schedulers, allowing warmup to be specified as a fraction of total training steps.
A better config for the 1B model, ported from the old OLMo trainer.
Added auto_resume option to CometCallback for resume an existing run.
(BETA) Added methods load_hf_model and save_hf_model for saving supported OLMo Core models to HF transformers format.
Also added lower-level methods for converting state between the formats.
Added the ability to run the evaluator callback on .pre_train() by setting eval_on_startup=True, and to cancel the run after the first time evals run by setting cancel_after_first_eval=True.
Added support for label mask files with numpy FSL datasets.
Added a git configuration to BeakerLaunchConfig.

Changed ⚠️

TransformerTrainModuleConfig can now be used to build a TransformerPipelineTrainModule by adding a pp_config spec. This makes the TransformerPipelineTrainModuleConfig redundant, but it will be kept around for backwards compatibility until the next major release.
Several state dict methods in TrainModule now take an optim option, which can disable the use of optimizer state.
Updated Float8Config for latest version of torchao.
Undo a fix applied to olmo_core.data.numpy_dataset.NumpyFSLDatasetMixture that was generating a mismatch between the shape of instances in the dataset and the shape of instances in the data loader.
Made the 1B and 7B scripts more similar to each other.
Changed underlying logic and top-level arguments of convert_checkpoint_from_hf.py and convert_checkpoint_to_hf.py.
Beaker experiments launched with the BeakerLaunchConfig will now log with ANSI colors enabled.

Fixed ✅

Fixed calculation of total steps based on epochs at the end of a training job.
Fixed a bug where the trainer might try to save a duplicate final checkpoint if the run that already completed was restarted.
When submitting a Beaker job from a branch that's tracking a GitHub fork, OLMo-core now instructs Beaker to pull from the fork instead of from the main repo.
Made Beaker image resolution more robust.
Having t_max overrides in the default model configs is confusing and error prone, so we removed them.
Beaker launcher will only clone a single branch at runtime when possible, which can be much faster.

Commits

b8070fb (chore) prepare for release v2.1.0
7bc8aa2 remove erroneous license in test file
db91b7f Add a git config to BeakerLaunchConfig (#251)
36b791a [HF Converter] Expect model and optim state in model_and_optim subdirectory (#253)
1f2f6f9 Log with ANSI colors in Beaker (#252)
d0ab790 No more t_max (#247)
5653c92 rename * (unscaled) metrics to * unscaled
60a19c3 clone single branch when possible (#250)
e9a34e8 More MoE updates (#246)
c149b73 Update images for torch 2.7.0 (#249)
6d2bb0a Added 50B Dolmino-1124 mix (#248)
53e67ce Add option to cancel run after first evals (#244)
b493d50 fix in-loop normalization with v2 (#243)
a07ef78 Add a self-contained template train script (#242)
746408e Port the 1B from old OLMo (#234)
4ec0866 Add support for label masks with numpy datasets (#241)
ecb14e0 only resume if name matches (#240)
fc84edc Add option to auto resume Comet experiments (#239)
f5d85a9 OLMo Core to HF conversion refactor (#226)
23c6cb1 clean up logging output from source mixture tests
d502b7e Mapping new ladder to old ladder (#146)
a135883 fix calculation of max steps based on epoch at the end (#236)
2f66fd9 Added warmup_fraction to all schedulers (#235)
be06aa0 B200 compatibility (#232)
0973d4d make beaker image resolution more robust (#233)
78be552 Pick the correct remote (#230)
590138d Temp disables custom read_chunk_from_array in SourceMixture (#231)
082e0b1 Fix bug when restarting a completed run (#229)
6c626f2 Update float8 API for latest torchao (#228)
8919dff Some MoE changes/additions to support auxiliary-loss-free load-balancing (#227)
26e9476 Allow train modules to not load/save optimizer state (#225)
8c20a64 run cuda gc at the end of training
a907892 Merge transformer train module configs (#224)
b47e01c Added 32B stage2 checkpoints .csv (#220)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!