-
Notifications
You must be signed in to change notification settings - Fork 747
[Q]Sync offline metrics on another machine #3098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Leslie commented: |
I tried to do this, but got this error: |
@THU-syh because you logged an artifact in the original run, we must sync from the same machine where that artifact existed. Were the rest of the metrics synced or did this error cause the entire sync process to crash? If it crashed, it's a bug and we should handle this error in a future release. |
Thanks! Yes, this is indeed caused by this reason. However, the rest of the indicators are not synchronized, and the entire synchronization process is crashed. |
Hi @THU-syh, |
This issue is stale because it has been open 60 days with no activity. |
Hi, is there any movement on this case? I am facing the exact same issue. |
Delete the files in the wandb/file works with me. |
Hi all, looks like this hasn't been addressed yet. I'll raise this again so we can get this assigned to the engineering team. Thank you, |
Hi, has there been any progress regarding this issue? I'm currently experiencing the same situation. |
Hey @sfxgxexo, just noticed — if uploading artifacts isn't necessary, feel free to ignore the error. The rest of the metrics will still go through. |
Hey @sfxgxexo, we've had a couple of backend changes on our end that could improve this behavior, but it doesn't look like its an easy fix, I'll let you know if i get any updates regarding this from our eng team. Does this happen to you anytime you try to sync a run with an artifact in offline mode? |
Hi,I encountered this issue when trying to upload the running results from my Linux server offline on my local Windows machine: ERROR Error uploading "/roo" t/.local/share/wandb/artifacts/staging/tmpf0djvf9c": FileNotFoundError, [Errno 2] No such file or directory: '/root/.local/share/wandb/artifacts/staging/tmpf0djvf9c' |
To confirm, you are running your experiments on linux, but then you are trying to sync them to the UI on Windows? |
Hey @ArtsiomWB . Yes, that’s correct — thank you for confirming. |
If you have to sync from a windows machine, are you able to boot up a linux vm on it an sync it that way instead? I think what's happening is that when you try syncing from linux to windows, the directory structure is completely different, and windows uses different ""s in their directory paths. Therefore when wandb tries syncing on windows, it expects different paths, but since you have created your runs on linux, the paths env vars and "/"s look different. |
If I train on a gpu machine that cannot be connected to the Internet and use offline wandb to record metrics, can I move the generated offline folder to another machine and synchronize it to the cloud? Which files must I save and move?
The text was updated successfully, but these errors were encountered: