You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, What are some best practices for using wandb to log multiple rounds of fine-tuning? For example, let's say I have one run logged for 100 epoch. I want to spawn 3 fine-tuning runs with different hyperparameters or data that start from the last epoch of the initial run. I've used wandb.init("id":<id>,"resume":"must") to resume runs, but this doesn't allow me multiple fine-tuning runs.
Thanks for the help!
The text was updated successfully, but these errors were encountered:
Thanks for reaching out! I understand you're looking to create multiple fine-tuning runs that start from the same checkpoint of an initial run.
Based on your description, you're trying to:
Complete an initial run for 100 epochs
Create 3 different fine-tuning runs starting from that 100th epoch
Use different hyperparameters or data for each of these new runs
The "resume" functionality you've been using (wandb.init("id":<id>,"resume":"must")) is designed to continue the exact same run, which is why it doesn't support multiple branching runs. What you're trying to do is actually a perfect use case for our "fork" feature!
The forking feature allows you to create new runs that branch off from a specific point in an existing run. This lets you explore different parameters or models without affecting the original run - exactly what you need!
Here's how you would implement it:
import wandb
# Assuming your initial run is complete:
original_run_id = "your_initial_run_id"
# Create your first fine-tuning run branching from epoch 100
fine_tune_run1 = wandb.init(
project="your_project_name",
fork_from=f"{original_run_id}?_step=100", # Specify step/epoch to fork from
# Add your new hyperparameters for this specific fine-tuning run
config={"learning_rate": 0.001, "batch_size": 32}
)
# Continue training with new hyperparameters
# ...
# Similarly for your other fine-tuning runs with different configs
The forking feature is currently in private preview at the moment.
Alternative Approach
If you need an immediate solution while waiting for fork access, you could:
Save model checkpoints at epoch 100 of your initial run
Start completely new runs that load this checkpoint
Use W&B's Groups feature to organize these related runs
Let me know if you have any questions or if you'd like more details on either approach!
Ask your question
Hi all, What are some best practices for using wandb to log multiple rounds of fine-tuning? For example, let's say I have one run logged for 100 epoch. I want to spawn 3 fine-tuning runs with different hyperparameters or data that start from the last epoch of the initial run. I've used
wandb.init("id":<id>,"resume":"must")
to resume runs, but this doesn't allow me multiple fine-tuning runs.Thanks for the help!
The text was updated successfully, but these errors were encountered: