Open
Description
I'm trying to unshard and load some trained OLMo2 base 13B checkpoints that were saved with the torch_new
sharded checkpointer in the following format:
step100/
├── config.yaml # Model configuration file
├── train/ # Directory containing rank files
│ ├── rank0.pt # Rank 0 checkpoint file
│ ├── rank1.pt # Rank 1 checkpoint file
│ └── ... # Additional rank files
└── model_and_optim/ # Directory containing distributed checkpoint files
├── .metadata
├── __0_0.distcp # Distributed checkpoint file
├── __0_1.distcp # Distributed checkpoint file
└── ... # Additional distcp files
I tried using scripts/unshard.py
, but it seems to be incompatible with type torch_new
, as it doesn't have an unshard_checkpoint
function.
Metadata
Metadata
Assignees
Labels
No labels