8000 [Q]: Fine-tuning best practices · Issue #9877 · wandb/wandb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Q]: Fine-tuning best practices #9877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
echen1214 opened this issue May 19, 2025 · 2 comments
Closed

[Q]: Fine-tuning best practices #9877

echen1214 opened this issue May 19, 2025 · 2 comments
Labels
ty:question type of issue is a question

Comments

@echen1214
Copy link

Ask your question

Hi all, What are some best practices for using wandb to log multiple rounds of fine-tuning? For example, let's say I have one run logged for 100 epoch. I want to spawn 3 fine-tuning runs with different hyperparameters or data that start from the last epoch of the initial run. I've used wandb.init("id":<id>,"resume":"must") to resume runs, but this doesn't allow me multiple fine-tuning runs.

Thanks for the help!

@echen1214 echen1214 added the ty:question type of issue is a question label May 19, 2025
@JasonArkens17
Copy link

Hi @echen1214,

Thanks for reaching out! I understand you're looking to create multiple fine-tuning runs that start from the same checkpoint of an initial run.

Based on your description, you're trying to:

  1. Complete an initial run for 100 epochs
  2. Create 3 different fine-tuning runs starting from that 100th epoch
  3. Use different hyperparameters or data for each of these new runs

The "resume" functionality you've been using (wandb.init("id":<id>,"resume":"must")) is designed to continue the exact same run, which is why it doesn't support multiple branching runs. What you're trying to do is actually a perfect use case for our "fork" feature!

The forking feature allows you to create new runs that branch off from a specific point in an existing run. This lets you explore different parameters or models without affecting the original run - exactly what you need!

Here's how you would implement it:

import wandb

# Assuming your initial run is complete:
original_run_id = "your_initial_run_id"

# Create your first fine-tuning run branching from epoch 100
fine_tune_run1 = wandb.init(
    project="your_project_name",
    fork_from=f"{original_run_id}?_step=100",  # Specify step/epoch to fork from
    # Add your new hyperparameters for this specific fine-tuning run
    config={"learning_rate": 0.001, "batch_size": 32}
)

# Continue training with new hyperparameters
# ...

# Similarly for your other fine-tuning runs with different configs

The forking feature is currently in private preview at the moment.

Alternative Approach
If you need an immediate solution while waiting for fork access, you could:

  1. Save model checkpoints at epoch 100 of your initial run
  2. Start completely new runs that load this checkpoint
  3. Use W&B's Groups feature to organize these related runs

Let me know if you have any questions or if you'd like more details on either approach!

Best,
Jason

@echen1214
Copy link
Author

Hi Jason, thanks for the tip. I've implemented the alternative approach and will take keep a lookout on the forking feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty:question type of issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants
0