-
Notifications
You must be signed in to change notification settings - Fork 748
Resuming grid sweep does not launch missing runs #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @ExpectationMax CodeI have tried out with a very basic example to showcase the features of resuming the grid sweep. import wandb
# Set up your default hyperparameters before wandb.init
# so they get properly set in the sweep
hyperparameter_defaults = {
'epoch':2
}
# Pass your defaults to wandb.init
wandb.init(config=hyperparameter_defaults)
config = wandb.config
# Log metrics inside your training loop
metrics = {'custom_metric': config["epoch"]/2}
wandb.log(metrics)
program: train.py
method: grid
metric:
name: custom_metric
goal: minimize
parameters:
epoch:
values:
- 2
- 4 Steps
I think the detailed walkthrough would make it feasible for you to combat your issues. Feel free to write in if this does not help you. |
When following these instructions, I did not see a "Resume" button on the sweep control page. There was only Pause / Unpause. I tried pausing and unpausing, but the agents still refused to restart the deleted jobs. |
WandB Internal User commented: CodeI have tried out with a very basic example to showcase the features of resuming the grid sweep. import wandb
# Set up your default hyperparameters before wandb.init
# so they get properly set in the sweep
hyperparameter_defaults = {
'epoch':2
}
# Pass your defaults to wandb.init
wandb.init(config=hyperparameter_defaults)
config = wandb.config
# Log metrics inside your training loop
metrics = {'custom_metric': config["epoch"]/2}
wandb.log(metrics)
program: train.py
method: grid
metric:
name: custom_metric
goal: minimize
parameters:
epoch:
values:
- 2
- 4 Steps
I think the detailed walkthrough would make it feasible for you to combat your issues. Feel free to write in if this does not help you. |
@jvlmdr, can you please provide a link to your workspace for review. Thanks. |
@jvlmdr, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know! |
Question: if some runs fail or are deleted, is the (grid) sweep supposed to go through them again and relaunch? |
@jpcbertoldo , depends on the search strategy - if you were using a grid search, yes the agent will create a run for the exact same run configuration (since a grid search is an exhaustive search of your config). bayes and random search sample from a distribution, so there is a probability to run the same config, but for most configs to control a neural network, the chance of you reaching the same config is low - specially if you have a continuous distribution like normal or uniform defined in your config. |
So, for the record, I think there is a bug on that as well. |
I bumped into the same issue. |
yeah, this is busted. same experience |
Have tested based on original feedback (#1787 (comment)) and this is working as intended. |
Describe the bug
After removing some runs from a completed grid sweep and resuming the sweep, no new runs are passed to the agents and the sweep goes back into the complete state.
This is in contrast to the documentation which describes that the missing configurations will be launched when the sweep is resumed: https://docs.wandb.ai/sweeps/faq#rerun-grid-search
To Reproduce
Steps to reproduce the behavior:
completed
Expected behavior
The missing sweep should be run by the agent after it has been deleted from the sweep.
The text was updated successfully, but these errors were encountered: