8000 [Q]Sync offline metrics on another machine · Issue #3098 · wandb/wandb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Q]Sync offline metrics on another machine #3098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
THU-syh opened this issue Jan 3, 2022 · 17 comments
Open

[Q]Sync offline metrics on another machine #3098

THU-syh opened this issue Jan 3, 2022 · 17 comments
Labels
c:artifacts Candidate for artifact branch c:sdk:sync Component: Synchronization - wandb sync (syncing runs that ran in offline mode)

Comments

@THU-syh
Copy link
THU-syh commented Jan 3, 2022

If I train on a gpu machine that cannot be connected to the Internet and use offline wandb to record metrics, can I move the generated offline folder to another machine and synchronize it to the cloud? Which files must I save and move?

@exalate-issue-sync
Copy link

Leslie commented:
Hi! You can use wandb sync <run_path> on an offline run on another machine

@THU-syh
Copy link
Author
THU-syh commented Jan 4, 2022

Leslie commented: Hi! You can use wandb sync <run_path> on an offline run on another machine

I tried to do this, but got this error:
wandb: ERROR Error uploading "***/.cache/wandb/artifacts/obj/md5/ee/cd3b3cf65f625269a2b7377b89***": FileNotFoundError,
wandb: ERROR Uploading artifact file failed. Artifact won't be committed.
Does it seem that I still need to move some files to another machine?

@vanpelt
Copy link
Contributor
vanpelt commented Jan 4, 2022

@THU-syh because you logged an artifact in the original run, we must sync from the same machine where that artifact existed. Were the rest of the metrics synced or did this error cause the entire sync process to crash? If it crashed, it's a bug and we should handle this error in a future release.

@THU-syh
Copy link
Author
THU-syh commented Jan 6, 2022

@THU-syh because you logged an artifact in the original run, we must sync from the same machine where that artifact existed. Were the rest of the metrics synced or did this error cause the entire sync process to crash? If it crashed, it's a bug and we should handle this error in a future release.

Thanks! Yes, this is indeed caused by this reason. However, the rest of the indicators are not synchronized, and the entire synchronization process is crashed.

@nate-wandb
Copy link
Contributor

Hi @THU-syh,
Thank you for reporting this issue. I have created in internal ticket to address the issue and I'll update you when there's some movement on this case.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity.

@github-actions github-actions bot added the stale label Mar 14, 2022
@sydholl sydholl removed the stale label May 9, 2022
@WasifurRahman
Copy link

Hi, is there any movement on this case? I am facing the exact same issue.

@kptkin kptkin added the c:sdk:sync Component: Synchronization - wandb sync (syncing runs that ran in offline mode) label Mar 2, 2023
@HenryHZY
Copy link
HenryHZY commented Mar 3, 2023

Delete the files in the wandb/file works with me.

@nate-wandb
Copy link
Contributor

Hi all, looks like this hasn't been addressed yet. I'll raise this again so we can get this assigned to the engineering team.

Thank you,
Nate

@kptkin kptkin added the c:artifacts Candidate for artifact branch label Dec 19, 2023
@sfxgxexo
Copy link
sfxgxexo commented Jun 1, 2025

Hi, has there been any progress regarding this issue? I'm currently experiencing the same situation.

@hellopahe
Copy link

Hey @sfxgxexo, just noticed — if uploading artifacts isn't necessary, feel free to ignore the error. The rest of the metrics will still go through.

@ArtsiomWB
Copy link
Contributor
ArtsiomWB commented Jun 4, 2025

Hey @sfxgxexo, we've had a couple of backend changes on our end that could improve this behavior, but it doesn't look like its an easy fix, I'll let you know if i get any updates regarding this from our eng team.

Does this happen to you anytime you try to sync a run with an artifact in offline mode?

@sfxgxexo
Copy link
sfxgxexo commented Jun 5, 2025

Hi,I encountered this issue when trying to upload the running results from my Linux server offline on my local Windows machine: ERROR Error uploading "/roo" t/.local/share/wandb/artifacts/staging/tmpf0djvf9c": FileNotFoundError, [Errno 2] No such file or directory: '/root/.local/share/wandb/artifacts/staging/tmpf0djvf9c'
wandb: ERROR Failed to upload artifact file. The artifact will not be committed. Only system information is available on wandb.

@sfxgxexo
Copy link
sfxgxexo commented Jun 5, 2025

It didn't go smoothly. This is the information I saw. Image

@ArtsiomWB
Copy link
Contributor

To confirm, you are running your experiments on linux, but then you are trying to sync them to the UI on Windows?

@sfxgxexo
Copy link
sfxgxexo commented Jun 6, 2025

Hey @ArtsiomWB . Yes, that’s correct — thank you for confirming.

@ArtsiomWB
Copy link
Contributor

If you have to sync from a windows machine, are you able to boot up a linux vm on it an sync it that way instead? I think what's happening is that when you try syncing from linux to windows, the directory structure is completely different, and windows uses different ""s in their directory paths. Therefore when wandb tries syncing on windows, it expects different paths, but since you have created your runs on linux, the paths env vars and "/"s look different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c:artifacts Candidate for artifact branch c:sdk:sync Component: Synchronization - wandb sync (syncing runs that ran in offline mode)
Projects
None yet
Development

No branches or pull requests

10 participants
0