8000 Release v2.1.0 · allenai/OLMo-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v2.1.0

Latest 8000
Compare
Choose a tag to compare
@github-actions github-actions released this 14 Apr 18:08
· 48 commits to main since this release

What's new

Added 🎉

  • Added 50B Dolmino 11/24 mix.
  • Added support for auxiliary-loss-free MoE load-balancing, similar to DeepSeek-v3. You can activate this by setting bias_gamma to a non-zero float in your MoERouter config.
  • Added support for sequence-level MoE load balancing loss.
  • Compatibility with B200s.
  • Added support for warmup_fraction as an alternative to warmup_steps in all schedulers, allowing warmup to be specified as a fraction of total training steps.
  • A better config for the 1B model, ported from the old OLMo trainer.
  • Added auto_resume option to CometCallback for resume an existing run.
  • (BETA) Added methods load_hf_model and save_hf_model for saving supported OLMo Core models to HF transformers format.
    Also added lower-level methods for converting state between the formats.
  • Added the ability to run the evaluator callback on .pre_train() by setting eval_on_startup=True, and to cancel the run after the first time evals run by setting cancel_after_first_eval=True.
  • Added support for label mask files with numpy FSL datasets.
  • Added a git configuration to BeakerLaunchConfig.

Changed ⚠️

  • TransformerTrainModuleConfig can now be used to build a TransformerPipelineTrainModule by adding a pp_config spec. This makes the TransformerPipelineTrainModuleConfig redundant, but it will be kept around for backwards compatibility until the next major release.
  • Several state dict methods in TrainModule now take an optim option, which can disable the use of optimizer state.
  • Updated Float8Config for latest version of torchao.
  • Undo a fix applied to olmo_core.data.numpy_dataset.NumpyFSLDatasetMixture that was generating a mismatch between the shape of instances in the dataset and the shape of instances in the data loader.
  • Made the 1B and 7B scripts more similar to each other.
  • Changed underlying logic and top-level arguments of convert_checkpoint_from_hf.py and convert_checkpoint_to_hf.py.
  • Beaker experiments launched with the BeakerLaunchConfig will now log with ANSI colors enabled.

Fixed ✅

  • Fixed calculation of total steps based on epochs at the end of a training job.
  • Fixed a bug where the trainer might try to save a duplicate final checkpoint if the run that already completed was restarted.
  • When submitting a Beaker job from a branch that's tracking a GitHub fork, OLMo-core now instructs Beaker to pull from the fork instead of from the main repo.
  • Made Beaker image resolution more robust.
  • Having t_max overrides in the default model configs is confusing and error prone, so we removed them.
  • Beaker launcher will only clone a single branch at runtime when possible, which can be much faster.

Commits

b8070fb (chore) prepare for release v2.1.0
7bc8aa2 remove erroneous license in test file
db91b7f Add a git config to BeakerLaunchConfig (#251)
36b791a [HF Converter] Expect model and optim state in model_and_optim subdirectory (#253)
1f2f6f9 Log with ANSI colors in Beaker (#252)
d0ab790 No more t_max (#247)
5653c92 rename * (unscaled) metrics to * unscaled
60a19c3 clone single branch when possible (#250)
e9a34e8 More MoE updates (#246)
c149b73 Update images for torch 2.7.0 (#249)
6d2bb0a Added 50B Dolmino-1124 mix (#248)
53e67ce Add option to cancel run after first evals (#244)
b493d50 fix in-loop normalization with v2 (#243)
a07ef78 Add a self-contained template train script (#242)
746408e Port the 1B from old OLMo (#234)
4ec0866 Add support for label masks with numpy datasets (#241)
ecb14e0 only resume if name matches (#240)
fc84edc Add option to auto resume Comet experiments (#239)
f5d85a9 OLMo Core to HF conversion refactor (#226)
23c6cb1 clean up logging output from source mixture tests
d502b7e Mapping new ladder to old ladder (#146)
a135883 fix calculation of max steps based on epoch at the end (#236)
2f66fd9 Added warmup_fraction to all schedulers (#235)
be06aa0 B200 compatibility (#232)
0973d4d make beaker image resolution more robust (#233)
78be552 Pick the correct remote (#230)
590138d Temp disables custom read_chunk_from_array in SourceMixture (#231)
082e0b1 Fix bug when restarting a completed run (#229)
6c626f2 Update float8 API for latest torchao (#228)
8919dff Some MoE changes/additions to support auxiliary-loss-free load-balancing (#227)
26e9476 Allow train modules to not load/save optimizer state (#225)
8c20a64 run cuda gc at the end of training
a907892 Merge transformer train module configs (#224)
b47e01c Added 32B stage2 checkpoints .csv (#220)

0