You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently nemo-rl always tries to resume from the last checkpoint in the checkpoint path. When we change the policy model, the new model will fail silently at loading old checkpoints, resulting in two negative consequences:
New checkpoints will overwrite old checkpoints from a different model.
The training step is counted from the old checkpoint, even if the new model is actually trained from scratch.
I feel it's better to fail explicitly when the policy model doesn't match the checkpoints, to prevent such undefined behaviors.
The text was updated successfully, but these errors were encountered:
Currently nemo-rl always tries to resume from the last checkpoint in the checkpoint path. When we change the policy model, the new model will fail silently at loading old checkpoints, resulting in two negative consequences:
I feel it's better to fail explicitly when the policy model doesn't match the checkpoints, to prevent such undefined behaviors.
The text was updated successfully, but these errors were encountered: