Releases · allenai/OLMo-core

What's new

Added 🎉

Added 50B Dolmino 11/24 mix.
Added support for auxiliary-loss-free MoE load-balancing, similar to DeepSeek-v3. You can activate this by setting bias_gamma to a non-zero float in your MoERouter config.
Added support for sequence-level MoE load balancing loss.
Compatibility with B200s.
Added support for warmup_fraction as an alternative to warmup_steps in all schedulers, allowing warmup to be specified as a fraction of total training steps.
A better config for the 1B model, ported from the old OLMo trainer.
Added auto_resume option to CometCallback for resume an existing run.
(BETA) Added methods load_hf_model and save_hf_model for saving supported OLMo Core models to HF transformers format.
Also added lower-level methods for converting state between the formats.
Added the ability to run the evaluator callback on .pre_train() by setting eval_on_startup=True, and to cancel the run after the first time evals run by setting cancel_after_first_eval=True.
Added support for label mask files with numpy FSL datasets.
Added a git configuration to BeakerLaunchConfig.

Changed ⚠️

TransformerTrainModuleConfig can now be used to build a TransformerPipelineTrainModule by adding a pp_config spec. This makes the TransformerPipelineTrainModuleConfig redundant, but it will be kept around for backwards compatibility until the next major release.
Several state dict methods in TrainModule now take an optim option, which can disable the use of optimizer state.
Updated Float8Config for latest version of torchao.
Undo a fix applied to olmo_core.data.numpy_dataset.NumpyFSLDatasetMixture that was generating a mismatch between the shape of instances in the dataset and the shape of instances in the data loader.
Made the 1B and 7B scripts more similar to each other.
Changed underlying logic and top-level arguments of convert_checkpoint_from_hf.py and convert_checkpoint_to_hf.py.
Beaker experiments launched with the BeakerLaunchConfig will now log with ANSI colors enabled.

Fixed ✅

Fixed calculation of total steps based on epochs at the end of a training job.
Fixed a bug where the trainer might try to save a duplicate final checkpoint if the run that already completed was restarted.
When submitting a Beaker job from a branch that's tracking a GitHub fork, OLMo-core now instructs Beaker to pull from the fork instead of from the main repo.
Made Beaker image resolution more robust.
Having t_max overrides in the default model configs is confusing and error prone, so we removed them.
Beaker launcher will only clone a single branch at runtime when possible, which can be much faster.

Commits

b8070fb (chore) prepare for release v2.1.0
7bc8aa2 remove erroneous license in test file
db91b7f Add a git config to BeakerLaunchConfig (#251)
36b791a [HF Converter] Expect model and optim state in model_and_optim subdirectory (#253)
1f2f6f9 Log with ANSI colors in Beaker (#252)
d0ab790 No more t_max (#247)
5653c92 rename * (unscaled) metrics to * unscaled
60a19c3 clone single branch when possible (#250)
e9a34e8 More MoE updates (#246)
c149b73 Update images for torch 2.7.0 (#249)
6d2bb0a Added 50B Dolmino-1124 mix (#248)
53e67ce Add option to cancel run after first evals (#244)
b493d50 fix in-loop normalization with v2 (#243)
a07ef78 Add a self-contained template train script (#242)
746408e Port the 1B from old OLMo (#234)
4ec0866 Add support for label masks with numpy datasets (#241)
ecb14e0 only resume if name matches (#240)
fc84edc Add option to auto resume Comet experiments (#239)
f5d85a9 OLMo Core to HF conversion refactor (#226)
23c6cb1 clean up logging output from source mixture tests
d502b7e Mapping new ladder to old ladder (#146)
a135883 fix calculation of max steps based on epoch at the end (#236)
2f66fd9 Added warmup_fraction to all schedulers (#235)
be06aa0 B200 compatibility (#232)
0973d4d make beaker image resolution more robust (#233)
78be552 Pick the correct remote (#230)
590138d Temp disables custom read_chunk_from_array in SourceMixture (#231)
082e0b1 Fix bug when restarting a completed run (#229)
6c626f2 Update float8 API for latest torchao (#228)
8919dff Some MoE changes/additions to support auxiliary-loss-free load-balancing (#227)
26e9476 Allow train modules to not load/save optimizer state (#225)
8c20a64 run cuda gc at the end of training
a907892 Merge transformer train module configs (#224)
b47e01c Added 32B stage2 checkpoints .csv (#220)

What's new

Added 🎉

Added information about the official 32B training run.
Added automatic support for LL128 when running on Augusta.

Fixed ✅

The official config for the 32B had unrealistic batch size settings.
Ignore group_overrides for frozen parameters instead of throwing an error.

Removed 👋

Removed the "fused" cross-entropy loss variant. It had a bug and consistently under-performed the native PyTorch version when compiled. See Post Incident Report: bug with fused CE loss for more information.

Commits

27b1ae8 (chore) prepare for release v2.0.1
79ebc7f Add hybrid MoE transformer architecture (#223)
bce2b5b authenticate with Docker Hub to avoid rate limits
b1e0bbd Remove fused CE loss, reorganize MoE kernels/ops (#221)
56e06ee Ignore group_overrides for frozen params (#219)
9d80e8d Update logo for README header. (#218)
974e555 fix some typos, consistent naming
45fe007 Updated documentation (#217)
51aedcf More working config (#216)
47b2ad5 add release PR comments back in

What's new

This major release introduces a few breaking changes. We've provided more information here: OLMo-core v2 design and upgrade guide.

Added 🎉

Added TrainModule abstraction with TransformerTrainModule implementation, which encapsulates both a model and optimizer.
Added namespace argument to Trainer.record_metric().
Added support for context parallelism.
Added support for expert parallelism with MoE models.
Added in-loop evals for Minerva, GSM, HumanEval, MBPP (ai2-olmo-eval==0.7.0)
Added CosWithWarmupAndLinearDecay learning rate scheduler
Added WSD learning rate scheduler

Changed ⚠️

The Trainer now takes a TrainModule instead of a model and optimizer, and several configuration options have been moved to TransformerTrainModule, including rank_microbatch_size, fused_loss, compile_loss, z_loss_multiplier, and autocast_precision.
Several TransformerModelConfig options have been to TransformerTrainModule / TransformerTrainModuleConfig, including dp_config, tp_config, float8_config, and compile.

Removed 👋

Removed the following callbacks: MoEHandlerCallback, SchedulerCallback, MatrixNormalizerCallback, GradClipperCallback, and Float8HandlerCallback.
The functionality from all of those callbacks has been moved to the TransformerTrainModule class.
Removed the callback methods .pre_eval_batch() and .post_eval_batch().

Fixed ✅

Fixed the model ladder code when training on mps or cpu device

Commits

dfa8f2b (chore) prepare for release v2.0.0
95fb084 add work-around for pytorch/ao#1871 (#205)
3ce0c58 32B Documentation (#210)
41f8ddc Add a public "official" version of our 32B train script (#214)
7e58d12 Update data paths in example to public URLs (#213)
4327bb9 upload data to r2 and updated their paths (#208)
0e6ea23 Assorted improvements (#207)
9ceb1e4 Add CUDA 12.6 images (#209)
eda3afb guard against wrapping MoE modules for AC (#206)
6e5b16f Bump ai2-olmo-eval==0.7.0 (in-loop Minerva, GSM, HumanEval, MBPP) (#204)
eccdc00 Make it easier for external users to run train scripts (#203)
da33f5b fix entrypoint steps
947a293 clean up changelog
725adf3 V2 (#202)

What's new

Fixed ✅

Ensure certain optimizer param group fields are not overridden by the values in a checkpoint.

Added 🎉

Added instance_filter_config field to NumpyDatasetConfig.
Added conversion script for OLMo 2 checkpoints to Huggingface format.
Added BeakerCallback.
Added logging for in-loop eval throughput

Fixed ✅

Ensure certain optimizer param group fields are not overridden by the values in a checkpoint.
Fixed issue where non-zero ranks would report partially-reduced values for training metrics.

Commits

41a7dbd (chore) prepare for release v1.9.0
d7301e6 32B scripts (#201)
d55562c Log in-loop eval throughput (#200)
260dafd Add support for BF16 optim state in SkipStepAdamW (#148)
e522437 fix inferring sequence length
0bef5aa allow dynamic batch sizes (#170)
fa11a40 Port over instance filtering from old codebase (#157)
8ef038a update formatting of bucket distribution
c9ca78a Add a BeakerCallback (#177)
e1cd8f6 use effective sequence length
32cb0fa Conversion script for OLMo 2 models trained with OLMo core to HuggingFace (#158)
feb57eb all-reduce train metrics (#166)
2b43d59 reset initial LR to configured value after loading (#163)
2902a9c Improve Config.from_dict (#156)
b4cee6d ignore class name field when config from dict
c1d1a53 update DTensor imports to use public module (#153)
4594231 activate virtual env before running script

What's new

Added 🎉

Added support for tensor parallelism. See the TransformerConfig class for usage.
Added more downstream tasks from the model ladder.
Added io.copy_dir() function.
Added new LR schedulers: LinearWithWarmup, InvSqrtWithWarmup, ConstantWithWarmup, SequentialScheduler.
Added option to pre-download checkpoint files from remote storage before trying to load a checkpoint.
Added a callback for sending Slack notifications.
Makes the MPS device work on Apple Silicon
Added SkipStepAdamW optimizer.
The trainer can load model-only checkpoints now.
Added the option to throttle checkpoint uploads to one rank from each node at a time.
Added support for logging rich Table objects as text in source mixture datasets.
Added unshard_strategy parameter to unshard_checkpoint() function in olmo_core.distributed.checkpoint.
Added function load_keys() to olmo_core.distributed.checkpoint.

Changed ⚠️

Changed storage of shared shard state in sharded checkpoints from smallest shard to lowest rank (normally 0).
Changed how the trainer handles loading a checkpoint when load_path is provided. Now load_path is only used if no checkpoint is found in the save_folder.

Fixed ✅

Added missing weights_only=False argument to fix loading train checkpoints with newer versions of PyTorch.
Fixed bug where GCS upload does not retry on transient failures.
Fixed bug where source mixture datasets were truncating source files instead of randomly sampling.
Fixed bug in source mixture datsets where sampling from small npy files raised an mmap exception due to 0 instances in the sampled index.

Commits

7899e7c (chore) prepare for release v1.8.0
907b9c5 Send Slack notification on releases (#151)
1ef7851 fix get_mock_batch() when training on MPS again
29a468d Fix mixture dataset class (#147)
98ccb67 remove ganymede cluster
205fe90 remove deleted cluster
7ec9114 always make mock batch on CPU
7122b1d save max steps to trainer state (#143)
9a78829 Log elapsed time per eval (#149)
075a36a Make training on the MPS device work (#131)
b4a195b Add more options to the unshard_checkpoint function to help scale (#145)
16885ab fix merge list with prefix
7b755c9 minor logging improvement
212108f Add option to throttle checkpoint uploads to one rank from each node at a time (#142)
7633461 pull fixes from 32B branch (#139)
48abe8c checkpoint hot fix (#140)
0c096e2 Handle model-only checkpoints with the trainer
9818232 move release scripts to subfolder (#137)
05ab673 update cluster list (#136)
7ccf726 add pr comments on release
0ff19d7 update citation
7519e0a Change the way load_path is handled (#132)
03a597a limit the number of exception lines posted to Slack
c634066 include link to Beaker job with Slack noties
3505660 Make context manager set original state correctly (#126)
9e0992b Add a callback for sending Slack notifications (#125)
6d60464 fix
ee27348 Sync eval changes in OLMo/ladder-1xC to here (#122)
0789479 Add option to pre-download checkpoint to load (#123)
1380f0e add copy_dir() io function
5cc704f Add learning rate schedulers (#119)
de5be27 don't check for beaker-py upgrades
b0103f0 Fix loading train state for newer versions of torch
5de774f updates
8474ee8 update docker image tags
d3f6f01 Update PyTorch and other deps in Docker images, change naming scheme of images (#120)
10c4978 Publish Docker images to GHCR (#118)
d6981b3 Add support for tensor parallelism and add OLMo2-26B model config / train script (#117)
aa4d188 Update table formatting

What's new

Added 🎉

Added key_mapping argument to olmo_core.distributed.checkpoint.load_model_and_optim_state()
for loading checkpoints with different key names.
Added load_key_mapping field to the trainer, same idea as the new key_mapping argument above.
Added an implementation of nGPT called NormalizedTransformer.
Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
Added an API for scaling RoPE embeddings.
Added a ModelLadder API.

Changed ⚠️

The w_out and norm top-level children of the Transformer model are now wrapped together in an lm_head module. Training scripts will have backwards compatibility with older checkpoints due to the load_key_mapping explained above.

Fixed ✅

(Optimization) Mark model input sizes as dynamic for torch.compile() to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
Made HTTPS and GCS IO functions more robust.
Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

Commits

62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script

What's new

Added 🎉

Added olmo_core.distributed.checkpoint.get_checkpoint_metadata() function.
(BETA) Added flag to compile the optimizer step. So far only tested with AdamW. May not work with other optimizers.

Fixed ✅

Old ephemeral checkpoints won't be removed until after the latest ephemeral checkpoint is saved successfully.
Made GCS uploads more robust.
Fixed single-node training on Google Augusta cluster.
numpy.random.dirichlet() does not always sum to 1.0, so allow for a small tolerance in validating domain weights.

Commits

9c52bea (chore) prepare for release v1.6.3
ad5e9e5 Upgrade flash-attn to v2.7.0 (#104)
b9e9193 [beta] Enable compiling optimizer step (tested with AdamW) (#103)
fdbb76e Use allclose for comparing sum of small numbers (#102)
3284742 make GCS uploads more robust (#101)
63b3f43 Update isort requirement from <5.13,>=5.12 to >=5.12,<5.14 (#93)
dcbd988 update docs and theme version
6615ba9 Bump actions/download-artifact from 3 to 4 (#100)
2e2b35b Add function to get checkpoint metadata
c0e47cc clean up Dockerfile (#99)
6300bc7 replace printing table with logging table (#98)
e522886 Don't prematurely delete old ephemeral checkpoints (#97)
dea10fd Bump actions/upload-artifact from 3 to 4 (#90)
c2fe2db skip another test when creds missing
3ea9fa2 Bump softprops/action-gh-release from 1 to 2 (#87)
5a5c17f Bump actions/checkout from 3 to 4 (#91)
9c99b9c skip some tests when missing relevant credentials (#96)
53efa8c Bump actions/setup-python from 4 to 5 (#88)
d548d3b Bump actions/cache from 3 to 4 (#86)
ab80395 add depandabot config

What's new

Added 🎉

Added option to disable GarbageCollectorCallback, not that you'd want to do this usually, but I needed to run an experiment to show how important that callback is.

Fixed ✅

Fixed a bug where some default callbacks could be added twice if given a different name by the user.
Fixed a bug where some Trainer bookkeeping tasks may not complete before .fit() returns.

Commits

2384472 (chore) prepare for release v1.6.2
f721fa1 Ensure all bookkeeping tasks complete (#85)
26a2c63 Some callback improvements (#84)

What's new

Added 🎉

Added retries field to BeakerLaunchConfig.
Allow running on Augusta cluster with existing train scripts.
Added olmo_core.utils.logging_configured() function to check if logging has been configured.

Fixed ✅

Fixed a potential distributed deadlock bug when training without a separate CPU-only bookkeeping backend.
Removed some unnecessary host-device syncs in olmo_core.distributed.utils.
Added Trainer(Config).async_bookkeeping field to toggle async bookkeeping.

Commits

cae88f5 (chore) prepare for release v1.6.1
83db5f7 Some fixes/improvements around synchronous bookkeeping operations (#83)
c435c94 increase timeout for CI checks
4a56200 update cluster list (#82)
e27ba74 Update throughput numbers, add logging_configured() util function (#81)
bec0a3c Allow running on Augusta cluster (#80)
c7c3a5a Set env vars for Augusta cluster
b9351e2 Add retries field to BeakerLaunchConfig (#79)

What's new

Added 🎉

Added option to compile the trainer's loss function (Trainer.compile_loss).
Added SourceMixtureDataset for composing a training mixture based on ratios of source datasets.
Added NumpyFSLDatasetMixture for constructing a NumpyDatasetBase from a SourceMixtureDataset. Note this is only supported for FSL datasets.
Added tests for SourceMixture* and NumpyFSLDatasetMixture.
Added DownstreamEvaluatorCallbackConfig class for running in-loop downstream eval via OLMo-in-loop-evals.

Changed ⚠️

Moved some types into olmo_core.data.types to avoid some circular dependencies.

Fixed ✅

Made GCS client more robust by automatically retrying timeout errors for most operations.

Commits

29e1276 (chore) prepare for release v1.6.0
da39e97 Add note about optional dependencies
81b1249 Missed _bust_index_cache in one spot (#78)
00d34f6 Add option to compile loss function, move logits FP32 casting into loss function (#77)
4928f82 Adds mixing loader for FSL datasets (#70)
ecb0686 Allow stopping the experiment on keyboard int
41400c4 Add Llama 8B config (#76)
282c120 Update Docker build (#75)
55d261e Make GCS client more robust (#74)
3fe59b6 Add a callback for downstream evals, update Docker builds (#73)
ecd523e include release chore commit in release notes

Releases: allenai/OLMo-core

v2.1.0

What's new

Added 🎉

Changed ⚠️

Fixed ✅

Commits

Uh oh!

v2.0.1

What's new

Added 🎉

Fixed ✅

Removed 👋

Commits

Uh oh!

v2.0.0

What's new

Added 🎉

Changed ⚠️

Removed 👋

Fixed ✅

Commits

Uh oh!

v1.9.0

What's new

Fixed ✅

Added 🎉

Fixed ✅

Commits

Uh oh!

v1.8.0

What's new

Added 🎉

Changed ⚠️

Fixed ✅

Commits

Uh oh!

v1.7.0

What's new

Added 🎉

Changed ⚠️

Fixed ✅

Commits

Uh oh!

v1.6.3

What's new

Added 🎉

Fixed ✅

Commits

Uh oh!

v1.6.2

What's new

Added 🎉

Fixed ✅

Commits

Uh oh!

v1.6.1

What's new

Added 🎉

Fixed ✅

Commits

Uh oh!

v1.6.0

What's new

Added 🎉

Changed ⚠️

Fixed ✅

Commits

Uh oh!