-
Notifications
You must be signed in to change notification settings - Fork 747
[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @chufanchen! Thank you for writing in. Very interesting that you are seeing this. I just ran your code, and I am seeing the output and the summary files. Could you please provide the debug.log and debug-internal.log files associated with the run where you are running into this issue? These files should be located in the wandb folder relative to your working directory. |
Thanks for the quick reply, @ArtsiomWB. I’ve attached the debug.log and debug-internal.log files from the run where I encountered the issue. I wonder if it's possible that there's a difference in how WandB behaves when running in offline mode on an online server versus on a completely offline server. It's just a possibility I'm considering. offline-run-20250328_052127-dis3duzl.zip debug.log
debug-internal.log
|
I encountered the same problem as you, have you solved it |
I haven't resolved it yet. Let's wait for a response from the WandB developers. |
�[36m(_run_job pid=2748835)�[0m message_loop has been closed
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 27, in _read_message
�[36m(_run_job pid=2748835)�[0m return self._sock_client.read_server_response(timeout=1)
�[36m(_run_job pid=2748835)�[0m File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 235, in read_server_response
�[36m(_run_job pid=2748835)�[0m data = self._read_packet_bytes(timeout=timeout)
�[36m(_run_job pid=2748835)�[0m File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 220, in _read_packet_bytes
�[36m(_run_job pid=2748835)�[0m raise SockClientClosedError
�[36m(_run_job pid=2748835)�[0m wandb.sdk.lib.sock_client.SockClientClosedError
�[36m(_run_job pid=2748835)�[0m
�[36m(_run_job pid=2748835)�[0m The above exception was the direct cause of the following exception:
�[36m(_run_job pid=2748835)�[0m
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router.py", line 56, in message_loop
�[36m(_run_job pid=2748835)�[0m msg = self._read_message()
�[36m(_run_job pid=2748835)�[0m File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 29, in _read_message
�[36m(_run_job pid=2748835)�[0m raise MessageRouterClosedError from e
�[36m(_run_job pid=2748835)�[0m wandb.sdk.interface.router.MessageRouterClosedError I don't know if it's related. It seems like wandb still trying to connect to server even in offline mode. |
Thank you for the follow-up. Unfortunately, I do not see much inside of the debug logs above. Our offline mode should not be trying to access the internet. What kind of data are you currently logging? At some point, we had an issue where logging wandb tables in offline mode would cause SDK to throw a network error. |
@chufanchen is the error in your most recent message from the same run? Was there anything else printed? That particular error is a little confusing, but it likely means that the internal service process ( I don't see anything surprising in the log files. |
No, the error in my most recent message is from the actual reinforcement learning experiment. The complete output from the minimal example is attached here for reference: offline-run-20250328_052127-dis3duzl.zip. I ran some tests using a minimal example and observed different behaviors depending on whether the host machine had internet access.
The run completed without error, but:
Here is the exact setup I used: Dockerfile
wandb_offline_test.py import os
import time
import wandb
# # Explicitly tell wandb to work offline
os.environ["WANDB_MODE"] = "offline"
run = wandb.init(
mode="offline",
project="test-offline-project",
name="offline-debug-run",
)
# Log some dummy metrics
for step in range(5):
wandb.log({
"loss": 1.0 / (step + 1),
"accuracy": step * 0.1
})
print(f"Logged step {step}")
time.sleep(0.1)
wandb.finish() Run commands docker build -t wandb-offline-test .
mkdir -p wandb_logs
docker run --rm \
--network none \
-v $(pwd)/wandb_logs:/workspace/wandb \
wandb-offline-test @timoffex Let me know if I can provide any additional logs or environment info. It seems like the behavior may differ depending on whether the host machine has internet access, even in offline mode. |
Hey @chufanchen! Thank you for the repro. I was also able to repro this by running your script and turning off the wifi on my machine entirely. I see that wandb-summay is not generated and output.log is empty. Will create a ticket for our SDK team! |
@chufanchen Hi, I encountered the same issue, have you solved it? |
Hey @liuxuexun, doesn't look like it has been fixed yet. |
Describe the bug
Minimal example:
After run this code:
python test.py
I got empty
output.log
inwandb/offline-run-xxxxx/
. Alsowandb-summary.json
is not generated like online mode.The text was updated successfully, but these errors were encountered: