8000 [Bug]: No wandb-summary.json generated and empty output.log after offline run · Issue #9646 · wandb/wandb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chufanchen opened this issue Mar 27, 2025 · 11 comments
Assignees
Labels
a:sdk Area: sdk related issues ty:bug type of the issue is a bug

Comments

@chufanchen
Copy link

Describe the bug

  • Wandb verison: 0.19.8
  • Python verison: 3.9

Minimal example:

import os
import time
import wandb

run = wandb.init(
    mode="offline",
    project="test-offline-project",
    name="offline-debug-run",
)

# Log some dummy metrics
for step in range(5):
    wandb.log({
        "loss": 1.0 / (step + 1),
        "accuracy": step * 0.1
    })
    print(f"Logged step {step}")
    time.sleep(0.1)

wandb.finish()

After run this code:

python test.py

I got empty output.log in wandb/offline-run-xxxxx/. Also wandb-summary.json is not generated like online mode.

@chufanchen chufanchen added a:sdk Area: sdk related issues ty:bug type of the issue is a bug labels Mar 27, 2025
@ArtsiomWB
Copy link
Contributor

Hey @chufanchen! Thank you for writing in.

Very interesting that you are seeing this. I just ran your code, and I am seeing the output and the summary files.

Could you please provide the debug.log and debug-internal.log files associated with the run where you are running into this issue? These files should be located in the wandb folder relative to your working directory.

@chufanchen
Copy link
Author
chufanchen commented Mar 28, 2025

Hey @chufanchen! Thank you for writing in.

Very interesting that you are seeing this. I just ran your code, and I am seeing the output and the summary files.

Could you please provide the debug.log and debug-internal.log files associated with the run where you are running into this issue? These files should be located in the wandb folder relative to your working directory.

Thanks for the quick reply, @ArtsiomWB.

I’ve attached the debug.log and debug-internal.log files from the run where I encountered the issue.

I wonder if it's possible that there's a difference in how WandB behaves when running in offline mode on an online server versus on a completely offline server. It's just a possibility I'm considering.

offline-run-20250328_052127-dis3duzl.zip

debug.log

2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Current SDK version is 0.19.8
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Configure stats pid to 1242465
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from /home/zju/.config/wandb/settings
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from /home/zju/QT-main/wandb/settings
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from environment variables
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:setup_run_log_directory():647] Logging user logs to /home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug.log
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:setup_run_log_directory():648] Logging internal logs to /home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug-internal.log
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():761] calling init triggers
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():766] wandb.init called with sweep_config: {}
config: {'_wandb': {}}
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():784] starting backend
2025-03-28 05:21:27,484 INFO    MainThread:1242465 [wandb_init.py:init():788] sending inform_init request
2025-03-28 05:21:27,494 INFO    MainThread:1242465 [backend.py:_multiprocessing_setup():101] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-03-28 05:21:27,494 INFO    MainThread:1242465 [wandb_init.py:init():798] backend started and connected
2025-03-28 05:21:27,496 INFO    MainThread:1242465 [wandb_init.py:init():891] updated telemetry
2025-03-28 05:21:27,496 INFO    MainThread:1242465 [wandb_init.py:init():915] communicating run to backend with 90.0 second timeout
2025-03-28 05:21:27,608 INFO    MainThread:1242465 [wandb_init.py:init():990] starting run threads in backend
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_console_start():2375] atexit reg
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2227] redirect: wrap_raw
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2292] Wrapping output streams.
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2315] Redirects installed.
2025-03-28 05:21:27,710 INFO    MainThread:1242465 [wandb_init.py:init():1032] run started, returning control to user process
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_finish():2112] finishing run test-offline-project/dis3duzl
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_atexit_cleanup():2340] got exitcode: 0
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_restore():2322] restore
2025-03-28 05:21:28,215 INFO    MainThread:1242465 [wandb_run.py:_restore():2328] restore done
2025-03-28 05:21:28,219 INFO    MainThread:1242465 [wandb_run.py:_footer_history_summary_info():3956] rendering history
2025-03-28 05:21:28,219 INFO    MainThread:1242465 [wandb_run.py:_footer_history_summary_info():3988] rendering summary

debug-internal.log

{"time":"2025-03-28T05:21:27.496291626Z","level":"INFO","msg":"stream: starting","core version":"0.19.8","symlink path":"/home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug-core.log"}
{"time":"2025-03-28T05:21:27.606365417Z","level":"INFO","msg":"created new stream","id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606406643Z","level":"INFO","msg":"stream: started","id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606435737Z","level":"INFO","msg":"writer: Do: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606451005Z","level":"INFO","msg":"handler: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606453169Z","level":"INFO","msg":"sender: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.612953498Z","level":"INFO","msg":"Starting system monitor"}
{"time":"2025-03-28T05:21:28.215373809Z","level":"INFO","msg":"Stopping system monitor"}
{"time":"2025-03-28T05:21:28.215971364Z","level":"INFO","msg":"Stopped system monitor"}
{"time":"2025-03-28T05:21:28.216665657Z","level":"INFO","msg":"handler: operation stats","stats":{}}
{"time":"2025-03-28T05:21:28.220270623Z","level":"INFO","msg":"stream: closing","id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.22029105Z","level":"INFO","msg":"handler: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220301259Z","level":"INFO","msg":"writer: Close: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220309965Z","level":"INFO","msg":"sender: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220352744Z","level":"INFO","msg":"stream: closed","id":"dis3duzl"}

@laozhanger
Copy link

I encountered the same problem as you, have you solved it

@chufanchen
Copy link
Author

I encountered the same problem as you, have you solved it

I haven't resolved it yet. Let's wait for a response from the WandB developers.

@chufanchen
Copy link
Author
�[36m(_run_job pid=2748835)�[0m message_loop has been closed
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 27, in _read_message
�[36m(_run_job pid=2748835)�[0m     return self._sock_client.read_server_response(timeout=1)
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 235, in read_server_response
�[36m(_run_job pid=2748835)�[0m     data = self._read_packet_bytes(timeout=timeout)
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 220, in _read_packet_bytes
�[36m(_run_job pid=2748835)�[0m     raise SockClientClosedError
�[36m(_run_job pid=2748835)�[0m wandb.sdk.lib.sock_client.SockClientClosedError
�[36m(_run_job pid=2748835)�[0m 
�[36m(_run_job pid=2748835)�[0m The above exception was the direct cause of the following exception:
�[36m(_run_job pid=2748835)�[0m 
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router.py", line 56, in message_loop
�[36m(_run_job pid=2748835)�[0m     msg = self._read_message()
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 29, in _read_message
�[36m(_run_job pid=2748835)�[0m     raise MessageRouterClosedError from e
�[36m(_run_job pid=2748835)�[0m wandb.sdk.interface.router.MessageRouterClosedError

I don't know if it's related. It seems like wandb still trying to connect to server even in offline mode.

@ArtsiomWB
Copy link
Contributor

Thank you for the follow-up.
Could you please talk a bit about the environment you are running your experiments on?

Unfortunately, I do not see much inside of the debug logs above. Our offline mode should not be trying to access the internet.

What kind of data are you currently logging? At some point, we had an issue where logging wandb tables in offline mode would cause SDK to throw a network error.

@timoffex
Copy link
Contributor
timoffex commented Apr 2, 2025

@chufanchen is the error in your most recent message from the same run? Was there anything else printed?

That particular error is a little confusing, but it likely means that the internal service process (wandb-core) crashed.

I don't see anything surprising in the log files.

@timoffex timoffex self-assigned this Apr 2, 2025
8000
@chufanchen
Copy link
Author
chufanchen commented Apr 2, 2025

is the error in your most recent message from the same run? Was there anything else printed?

No, the error in my most recent message is from the actual reinforcement learning experiment. The complete output from the minimal example is attached here for reference: offline-run-20250328_052127-dis3duzl.zip.

I ran some tests using a minimal example and observed different behaviors depending on whether the host machine had internet access.

  1. On a server with internet access (still using WANDB_MODE=offline):
    Everything worked as expected — the run produced both wandb-history.jsonl and wandb-summary.json.

  2. On a fully offline server (true network isolation using Docker --network none)

The run completed without error, but:

  • No wandb-summary.json was generated
  • wandb-history.jsonl was empty

Here is the exact setup I used:

Dockerfile

FROM python:3.8-slim

RUN pip install --no-cache-dir wandb==0.19.8

WORKDIR /workspace

COPY wandb_offline_test.py .

ENV WANDB_MODE=offline

VOLUME ["/workspace/wandb"]

CMD ["python", "wandb_offline_test.py"]

wandb_offline_test.py

import os
import time
import wandb

# # Explicitly tell wandb to work offline
os.environ["WANDB_MODE"] = "offline"

run = wandb.init(
    mode="offline",
    project="test-offline-project",
    name="offline-debug-run",
)

# Log some dummy metrics
for step in range(5):
    wandb.log({
        "loss": 1.0 / (step + 1),
        "accuracy": step * 0.1
    })
    print(f"Logged step {step}")
    time.sleep(0.1)

wandb.finish()

Run commands

docker build -t wandb-offline-test .
mkdir -p wandb_logs
docker run --rm \
  --network none \
  -v $(pwd)/wandb_logs:/workspace/wandb \
  wandb-offline-test

@timoffex Let me know if I can provide any additional logs or environment info. It seems like the behavior may differ depending on whether the host machine has internet access, even in offline mode.

@ArtsiomWB
Copy link
Contributor

Hey @chufanchen! Thank you for the repro. I was also able to repro this by running your script and turning off the wifi on my machine entirely. I see that wandb-summay is not generated and output.log is empty.

Will create a ticket for our SDK team!

@liuxuexun
Copy link

@chufanchen Hi, I encountered the same issue, have you solved it?

@ArtsiomWB
Copy link
Contributor

Hey @liuxuexun, doesn't look like it has been fixed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:sdk Area: sdk related issues ty:bug type of the issue is a bug
Projects
None yet
Development

No branches or pull requests

5 participants
0