What's new
Added 🎉
- Added 50B Dolmino 11/24 mix.
- Added support for auxiliary-loss-free MoE load-balancing, similar to DeepSeek-v3. You can activate this by setting
bias_gamma
to a non-zero float in yourMoERouter
config. - Added support for sequence-level MoE load balancing loss.
- Compatibility with B200s.
- Added support for
warmup_fraction
as an alternative towarmup_steps
in all schedulers, allowing warmup to be specified as a fraction of total training steps. - A better config for the 1B model, ported from the old OLMo trainer.
- Added
auto_resume
option toCometCallback
for resume an existing run. - (BETA) Added methods
load_hf_model
andsave_hf_model
for saving supported OLMo Core models to HF transformers format.
Also added lower-level methods for converting state between the formats. - Added the ability to run the evaluator callback on
.pre_train()
by settingeval_on_startup=True
, and to cancel the run after the first time evals run by settingcancel_after_first_eval=True
. - Added support for label mask files with numpy FSL datasets.
- Added a
git
configuration toBeakerLaunchConfig
.
Changed ⚠️
TransformerTrainModuleConfig
can now be used to build aTransformerPipelineTrainModule
by adding app_config
spec. This makes theTransformerPipelineTrainModuleConfig
redundant, but it will be kept around for backwards compatibility until the next major release.- Several state dict methods in
TrainModule
now take anoptim
option, which can disable the use of optimizer state. - Updated
Float8Config
for latest version oftorchao
. - Undo a fix applied to
olmo_core.data.numpy_dataset.NumpyFSLDatasetMixture
that was generating a mismatch between the shape of instances in the dataset and the shape of instances in the data loader. - Made the 1B and 7B scripts more similar to each other.
- Changed underlying logic and top-level arguments of
convert_checkpoint_from_hf.py
andconvert_checkpoint_to_hf.py
. - Beaker experiments launched with the
BeakerLaunchConfig
will now log with ANSI colors enabled.
Fixed ✅
- Fixed calculation of total steps based on epochs at the end of a training job.
- Fixed a bug where the trainer might try to save a duplicate final checkpoint if the run that already completed was restarted.
- When submitting a Beaker job from a branch that's tracking a GitHub fork, OLMo-core now instructs Beaker to pull from the fork instead of from the main repo.
- Made Beaker image resolution more robust.
- Having
t_max
overrides in the default model configs is confusing and error prone, so we removed them. - Beaker launcher will only clone a single branch at runtime when possible, which can be much faster.
Commits
b8070fb (chore) prepare for release v2.1.0
7bc8aa2 remove erroneous license in test file
db91b7f Add a git config to BeakerLaunchConfig (#251)
36b791a [HF Converter] Expect model and optim state in model_and_optim subdirectory (#253)
1f2f6f9 Log with ANSI colors in Beaker (#252)
d0ab790 No more t_max
(#247)
5653c92 rename * (unscaled)
metrics to * unscaled
60a19c3 clone single branch when possible (#250)
e9a34e8 More MoE updates (#246)
c149b73 Update images for torch 2.7.0 (#249)
6d2bb0a Added 50B Dolmino-1124 mix (#248)
53e67ce Add option to cancel run after first evals (#244)
b493d50 fix in-loop normalization with v2 (#243)
a07ef78 Add a self-contained template train script (#242)
746408e Port the 1B from old OLMo (#234)
4ec0866 Add support for label masks with numpy datasets (#241)
ecb14e0 only resume if name matches (#240)
fc84edc Add option to auto resume Comet experiments (#239)
f5d85a9 OLMo Core to HF conversion refactor (#226)
23c6cb1 clean up logging output from source mixture tests
d502b7e Mapping new ladder to old ladder (#146)
a135883 fix calculation of max steps based on epoch at the end (#236)
2f66fd9 Added warmup_fraction to all schedulers (#235)
be06aa0 B200 compatibility (#232)
0973d4d make beaker image resolution more robust (#233)
78be552 Pick the correct remote (#230)
590138d Temp disables custom read_chunk_from_array in SourceMixture (#231)
082e0b1 Fix bug when restarting a completed run (#229)
6c626f2 Update float8 API for latest torchao (#228)
8919dff Some MoE changes/additions to support auxiliary-loss-free load-balancing (#227)
26e9476 Allow train modules to not load/save optimizer state (#225)
8c20a64 run cuda gc at the end of training
a907892 Merge transformer train module configs (#224)
b47e01c Added 32B stage2 checkpoints .csv (#220)