[Q]: UsageError: Unable to attach to run ... #9948

Sascha-Roe · 2025-06-01T10:52:53Z

Hey everyone,

I have trained a TFT-Model using WandB, which worked just fine. But when i try to predict using the trained model i get this error

WandbAttachFailedError: Failed to attach because the run does not belong to the current service process, or because the service process is busy (unlikely)

UsageError: Unable to attach to run g43vyyi7

Has anyone encountered a similar error or knows how to fix this?
I am using wandb version 0.19.11 with python version 3.12.10.

A small example on how I try to make predictions:

from darts.models import TFTModel model = TFTModel model_best = model.load_from_checkpoint(work_dir=work_dir, model_name=model_name, best=True)

I then prepare my data for the predictions and try to make the predictions using:

pred_series = model_best.predict(n=pred_size, series=ts_ttest_temp[pred_idxs[0]:pred_idxs[1]], future_covariates= tcox_test_future[pred_idxs[0]:pred_idxs[3]], past_covariates=tcov_test[pred_idxs[0]:pred_idxs[1]], num_samples=1, n_jobs=-1)

The entire Traceback looks the following:

--------------------------------------------------------------------------- WandbAttachFailedError Traceback (most recent call last) File /srv/jupyterhub/lib/python3.12/site-packages/wandb/sdk/wandb_init.py:1186, in _attach(attach_id, run_id, run) 1185 try: -> 1186 attach_settings = service.inform_attach(attach_id=attach_id) 1187 except Exception as e: File /srv/jupyterhub/lib/python3.12/site-packages/wandb/sdk/lib/service_connection.py:182, in ServiceConnection.inform_attach(self, attach_id) 181 except TimeoutError: --> 182 raise WandbAttachFailedError( 183 "Failed to attach because the run does not belong to" 184 " the current service process, or because the service" 185 " process is busy (unlikely)." 186 ) from None WandbAttachFailedError: Failed to attach because the run does not belong to the current service process, or because the service process is busy (unlikely). The above exception was the direct cause of the following exception: UsageError Traceback (most recent call last) Cell In[19], line 28 25 print('pred_idx: ',pred_idxs) 26 #print(tcov_test.start_time()) 27 #print(tcov_test.end_time()) ---> 28 pred_t = evalTFT.pred_multi(model_best, pred_size, pred_idxs, ts_ttest_temp, tcov_test, tcov_test_future) 29 print("PREDICTED!") 30 print(pred_t.start_time().weekday()) File ~/Documents/Code/Giaco/Evaluation/evalTFThelper.py:63, in pred_multi(model, pred_size, pred_idxs, ts_ttest_temp, tcov_test, tcox_test_future) 62 def pred_multi(model, pred_size, pred_idxs, ts_ttest_temp, tcov_test, tcox_test_future): ---> 63 pred_series = model.predict(n=pred_size, 64 series=ts_ttest_temp[pred_idxs[0]:pred_idxs[1]], 65 future_covariates= tcox_test_future[pred_idxs[0]:pred_idxs[3]], 66 past_covariates=tcov_test[pred_idxs[0]:pred_idxs[1]], 67 num_samples=1, 68 n_jobs=-1) 69 return pred_series File /srv/jupyterhub/lib/python3.12/site-packages/darts/utils/torch.py:80, in random_method.<locals>.decorator(self, *args, **kwargs) 78 with fork_rng(): 79 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE)) ---> 80 return decorated(self, *args, **kwargs) File /srv/jupyterhub/lib/python3.12/site-packages/darts/models/forecasting/torch_forecasting_model.py:1530, in TorchForecastingModel.predict(self, n, series, past_covariates, future_covariates, trainer, batch_size, verbose, n_jobs, roll_size, num_samples, dataloader_kwargs, mc_dropout, predict_likelihood_parameters, show_warnings) 1511 super().predict( 1512 n, 1513 series, (...) 1518 show_warnings=show_warnings, 1519 ) 1521 dataset = self._build_inference_dataset( 1522 target=series, 1523 n=n, (...) 1527 bounds=None, 1528 ) -> 1530 predictions = self.predict_from_dataset( 1531 n, 1532 dataset, 1533 trainer=trainer, 1534 verbose=verbose, 1535 batch_size=batch_size, 1536 n_jobs=n_jobs, 1537 roll_size=roll_size, 1538 num_samples=num_samples, 1539 dataloader_kwargs=dataloader_kwargs, 1540 mc_dropout=mc_dropout, 1541 predict_likelihood_parameters=predict_likelihood_parameters, 1542 ) 1544 return predictions[0] if called_with_single_series else predictions File /srv/jupyterhub/lib/python3.12/site-packages/darts/utils/torch.py:80, in random_method.<locals>.decorator(self, *args, **kwargs) 78 with fork_rng(): 79 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE)) ---> 80 return decorated(self, *args, **kwargs) File /srv/jupyterhub/lib/python3.12/site-packages/darts/models/forecasting/torch_forecasting_model.py:1679, in TorchForecastingModel.predict_from_dataset(self, n, input_series_dataset, trainer, batch_size, verbose, n_jobs, roll_size, num_samples, dataloader_kwargs, mc_dropout, predict_likelihood_parameters) 1674 self.trainer = self._setup_trainer( 1675 trainer=trainer, model=self.model, verbose=verbose, epochs=self.n_epochs 1676 ) 1678 # prediction output comes as nested list: list of predicted `TimeSeries` for each batch. -> 1679 predictions = self.trainer.predict(model=self.model, dataloaders=pred_loader) 1680 # flatten and return 1681 return [ts for batch in predictions for ts in batch] File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:887, in Trainer.predict(self, model, dataloaders, datamodule, return_predictions, ckpt_path) 885 self.state.status = TrainerStatus.RUNNING 886 self.predicting = True --> 887 return call._call_and_handle_interrupt( 888 self, self._predict_impl, model, dataloaders, datamodule, return_predictions, ckpt_path 889 ) File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:48, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs) 46 if trainer.strategy.launcher is not None: 47 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs) ---> 48 return trainer_fn(*args, **kwargs) 50 except _TunerExitException: 51 _call_teardown_hook(trainer) File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:928, in Trainer._predict_impl(self, model, dataloaders, datamodule, return_predictions, ckpt_path) 924 download_model_from_registry(ckpt_path, self) 925 ckpt_path = self._checkpoint_connector._select_ckpt_path( 926 self.state.fn, ckpt_path, model_provided=model_provided, model_connected=self.lightning_module is not None 927 ) --> 928 results = self._run(model, ckpt_path=ckpt_path) 930 assert self.state.stopped 931 self.predicting = False File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:974, in Trainer._run(self, model, ckpt_path) 971 log.debug(f"{self.__class__.__name__}: preparing data") 972 self._data_connector.prepare_data() --> 974 call._call_setup_hook(self) # allow user to set up LightningModule in accelerator environment 975 log.debug(f"{self.__class__.__name__}: configuring model") 976 call._call_configure_model(self) File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:101, in _call_setup_hook(trainer) 99 # Trigger lazy creation of experiment in loggers so loggers have their metadata available 100 for logger in loggers: --> 101 if hasattr(logger, "experiment"): 102 _ = logger.experiment 104 trainer.strategy.barrier("pre_setup") File /srv/jupyterhub/lib/python3.12/site-packages/lightning_fabric/loggers/logger.py:118, in rank_zero_experiment.<locals>.experiment(self) 116 if rank_zero_only.rank > 0: 117 return _DummyExperiment() --> 118 return fn(self) File /srv/jupyterhub/lib/python3.12/site-packages/pytorch_lightning/loggers/wandb.py:404, in WandbLogger.experiment(self) 401 self._experiment = wandb.run 402 elif attach_id is not None and hasattr(wandb, "_attach"): 403 # attach to wandb process referenced --> 404 self._experiment = wandb._attach(attach_id) 405 else: 406 # create new wandb process 407 self._experiment = wandb.init(**self._wandb_init) File /srv/jupyterhub/lib/python3.12/site-packages/wandb/sdk/wandb_init.py:1188, in _attach(attach_id, run_id, run) 1186 attach_settings = service.inform_attach(attach_id=attach_id) 1187 except Exception as e: -> 1188 raise UsageError(f"Unable to attach to run {attach_id}") from e 1190 settings: Settings = copy.copy(_wl._settings) 1192 settings.update_from_dict( 1193 { 1194 "run_id": attach_id, (...) 1197 } 1198 ) UsageError: Unable to attach to run g43vyyi7

exalate-issue-sync · 2025-06-03T18:13:18Z

Thomas Drayton commented:
Hi @Sascha-Roe,

Thanks for reaching out! I appreciate the detail you've provided regarding this issue that you're having.

Based on the traceback, it looks like our service is trying to re-attach to run g43vyyi7 but can’t because it is attempting to connect to a run that doesn’t match the workspace/run-id.

If you don't object, would you also mind sharing:

How the original training run was configured? The link to the run in your workspace would be great i.e. exact run ID and workspace (project/entity) you’re trying to attach to?
run-id you are passing to your prediction script
What PyTorch Lightning version and Darts version you used?
A minimal working example of your prediction script, if possible?

Thanks in advance!

Best,
Thomas

Sascha-Roe · 2025-06-04T17:35:21Z

Hey Thomas,

I am using darts 0.35.0 and PyTorch Lightning 2.5.1.post0.
The workspace is private, but the run that I'm trying to attach to is visible in my workspace.
In my prediction script i load the model using

model_best = model.load_from_checkpoint(work_dir=work_dir, model_name=model_name, best=True)

where

work_dir = "./models/first_runs/"
model_name = "warm-waterfall-26"

The warm-waterfall-26 model is actually located in ./models/first_runs/
A minimal example for the prediction can be found in the original post.

The training is configured the following:

First all arguments are read in, then the training is started using:

run_name = wandb_go(args)
SAVE = '/models/first_runs/' + run_name +'.pth.tar'
model = define_model(args, run_name)

model.fit(ts_ttrain_list, 
                future_covariates=[tcov_train_future] * num_knoten, 
                past_covariates=[tcov_train] * num_knoten,
                verbose=True,
                val_series=ts_ttest_list,
                val_future_covariates=[tcov_test_future] * num_knoten,
                val_past_covariates=[tcov_test] * num_knoten
            )
wandb.finish()

def wandb_go(args):
    '''Start wandb session with parameters'''
    wandb.init(project=args.project_name, entity="MY_ENTITY", sync_tensorboard=True, config=args)
    name = wandb.run.name
    print("Name of run for wandb: ", name)
    return name

def define_model(args, model_name):
    wandb_logger = WandbLogger() 
    lr_monitor = LearningRateMonitor(logging_interval='step')
    n_categories = 70  # how many nodes exist
    embedding_size = 70  # embed the categorical variable into a numeric vector of size 2
    categorical_embedding_sizes = {"Knoten": (n_categories, embedding_size)}

    model = TFTModel(input_chunk_length=args.back_window,
                output_chunk_length=args.horizon,
                hidden_size=args.hidden,
                lstm_layers=args.lstm_layers,
                num_attention_heads=args.att_heads,
                full_attention=args.full_att,
                dropout=args.dropout,
                batch_size=args.batch_size,
                n_epochs=args.epochs,
                likelihood=args.likelihood, 
                loss_fn=args.loss,
                lr_scheduler_cls=args.decay_lr_class,
                lr_scheduler_kwargs={"gamma":0.1},
                random_state=args.rand, 
                force_reset=True,
                log_tensorboard=True,
                save_checkpoints=True,
                model_name=model_name,
                categorical_embedding_sizes=categorical_embedding_sizes,
                work_dir = "./models/first_runs",
                pl_trainer_kwargs={
                    "accelerator": "gpu",
                    "devices": -1, 
                    "logger":[wandb_logger],
                    "callbacks":[lr_monitor]
                }) 
    return model

I hope this helps.

Thank you for your assistance.

Sascha-Roe added the ty:question type of issue is a question label Jun 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Q]: UsageError: Unable to attach to run ... #9948

[Q]: UsageError: Unable to attach to run ... #9948

Uh oh!

Uh oh!

[Q]: UsageError: Unable to attach to run ... #9948

[Q]: UsageError: Unable to attach to run ... #9948

Comments

Uh oh!

Uh oh!

Uh oh!