[Feature] Explicit failure for unmatched model and checkpoints #415

KiddoZhu · 2025-05-19T21:26:54Z

Currently nemo-rl always tries to resume from the last checkpoint in the checkpoint path. When we change the policy model, the new model will fail silently at loading old checkpoints, resulting in two negative consequences:

New checkpoints will overwrite old checkpoints from a different model.
The training step is counted from the old checkpoint, even if the new model is actually trained from scratch.

I feel it's better to fail explicitly when the policy model doesn't match the checkpoints, to prevent such undefined behaviors.

parthchadha added the bug Something isn't working label May 20, 2025

parthchadha added this to NeMo RL v0.3 May 20, 2025

parthchadha self-assigned this May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Explicit failure for unmatched model and checkpoints #415

[Feature] Explicit failure for unmatched model and checkpoints #415

[Feature] Explicit failure for unmatched model and checkpoints #415

[Feature] Explicit failure for unmatched model and checkpoints #415

Comments