[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

chufanchen · 2025-03-27T08:18:35Z

Describe the bug

Wandb verison: 0.19.8
Python verison: 3.9

Minimal example:

import os
import time
import wandb

run = wandb.init(
    mode="offline",
    project="test-offline-project",
    name="offline-debug-run",
)

# Log some dummy metrics
for step in range(5):
    wandb.log({
        "loss": 1.0 / (step + 1),
        "accuracy": step * 0.1
    })
    print(f"Logged step {step}")
    time.sleep(0.1)

wandb.finish()

After run this code:

python test.py

I got empty output.log in wandb/offline-run-xxxxx/. Also wandb-summary.json is not generated like online mode.

The text was updated successfully, but these errors were encountered:

ArtsiomWB · 2025-03-27T15:35:59Z

Hey @chufanchen! Thank you for writing in.

Very interesting that you are seeing this. I just ran your code, and I am seeing the output and the summary files.

Could you please provide the debug.log and debug-internal.log files associated with the run where you are running into this issue? These files should be located in the wandb folder relative to your working directory.

chufanchen · 2025-03-28T05:24:58Z

Hey @chufanchen! Thank you for writing in.

Very interesting that you are seeing this. I just ran your code, and I am seeing the output and the summary files.

Could you please provide the debug.log and debug-internal.log files associated with the run where you are running into this issue? These files should be located in the wandb folder relative to your working directory.

Thanks for the quick reply, @ArtsiomWB.

I’ve attached the debug.log and debug-internal.log files from the run where I encountered the issue.

I wonder if it's possible that there's a difference in how WandB behaves when running in offline mode on an online server versus on a completely offline server. It's just a possibility I'm considering.

offline-run-20250328_052127-dis3duzl.zip

debug.log

2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Current SDK version is 0.19.8
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Configure stats pid to 1242465
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from /home/zju/.config/wandb/settings
2025-03-28 05:21:27,271 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from /home/zju/QT-main/wandb/settings
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_setup.py:_flush():67] Loading settings from environment variables
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:setup_run_log_directory():647] Logging user logs to /home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug.log
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:setup_run_log_directory():648] Logging internal logs to /home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug-internal.log
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():761] calling init triggers
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():766] wandb.init called with sweep_config: {}
config: {'_wandb': {}}
2025-03-28 05:21:27,272 INFO    MainThread:1242465 [wandb_init.py:init():784] starting backend
2025-03-28 05:21:27,484 INFO    MainThread:1242465 [wandb_init.py:init():788] sending inform_init request
2025-03-28 05:21:27,494 INFO    MainThread:1242465 [backend.py:_multiprocessing_setup():101] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2025-03-28 05:21:27,494 INFO    MainThread:1242465 [wandb_init.py:init():798] backend started and connected
2025-03-28 05:21:27,496 INFO    MainThread:1242465 [wandb_init.py:init():891] updated telemetry
2025-03-28 05:21:27,496 INFO    MainThread:1242465 [wandb_init.py:init():915] communicating run to backend with 90.0 second timeout
2025-03-28 05:21:27,608 INFO    MainThread:1242465 [wandb_init.py:init():990] starting run threads in backend
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_console_start():2375] atexit reg
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2227] redirect: wrap_raw
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2292] Wrapping output streams.
2025-03-28 05:21:27,709 INFO    MainThread:1242465 [wandb_run.py:_redirect():2315] Redirects installed.
2025-03-28 05:21:27,710 INFO    MainThread:1242465 [wandb_init.py:init():1032] run started, returning control to user process
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_finish():2112] finishing run test-offline-project/dis3duzl
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_atexit_cleanup():2340] got exitcode: 0
2025-03-28 05:21:28,214 INFO    MainThread:1242465 [wandb_run.py:_restore():2322] restore
2025-03-28 05:21:28,215 INFO    MainThread:1242465 [wandb_run.py:_restore():2328] restore done
2025-03-28 05:21:28,219 INFO    MainThread:1242465 [wandb_run.py:_footer_history_summary_info():3956] rendering history
2025-03-28 05:21:28,219 INFO    MainThread:1242465 [wandb_run.py:_footer_history_summary_info():3988] rendering summary

debug-internal.log

{"time":"2025-03-28T05:21:27.496291626Z","level":"INFO","msg":"stream: starting","core version":"0.19.8","symlink path":"/home/zju/QT-main/wandb/offline-run-20250328_052127-dis3duzl/logs/debug-core.log"}
{"time":"2025-03-28T05:21:27.606365417Z","level":"INFO","msg":"created new stream","id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606406643Z","level":"INFO","msg":"stream: started","id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606435737Z","level":"INFO","msg":"writer: Do: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606451005Z","level":"INFO","msg":"handler: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.606453169Z","level":"INFO","msg":"sender: started","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:27.612953498Z","level":"INFO","msg":"Starting system monitor"}
{"time":"2025-03-28T05:21:28.215373809Z","level":"INFO","msg":"Stopping system monitor"}
{"time":"2025-03-28T05:21:28.215971364Z","level":"INFO","msg":"Stopped system monitor"}
{"time":"2025-03-28T05:21:28.216665657Z","level":"INFO","msg":"handler: operation stats","stats":{}}
{"time":"2025-03-28T05:21:28.220270623Z","level":"INFO","msg":"stream: closing","id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.22029105Z","level":"INFO","msg":"handler: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220301259Z","level":"INFO","msg":"writer: Close: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220309965Z","level":"INFO","msg":"sender: closed","stream_id":"dis3duzl"}
{"time":"2025-03-28T05:21:28.220352744Z","level":"INFO","msg":"stream: closed","id":"dis3duzl"}

laozhanger · 2025-03-28T09:54:00Z

I encountered the same problem as you, have you solved it

chufanchen · 2025-03-29T02:25:33Z

I encountered the same problem as you, have you solved it

I haven't resolved it yet. Let's wait for a response from the WandB developers.

chufanchen · 2025-03-31T01:24:50Z

�[36m(_run_job pid=2748835)�[0m message_loop has been closed
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 27, in _read_message
�[36m(_run_job pid=2748835)�[0m     return self._sock_client.read_server_response(timeout=1)
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 235, in read_server_response
�[36m(_run_job pid=2748835)�[0m     data = self._read_packet_bytes(timeout=timeout)
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/lib/sock_client.py", line 220, in _read_packet_bytes
�[36m(_run_job pid=2748835)�[0m     raise SockClientClosedError
�[36m(_run_job pid=2748835)�[0m wandb.sdk.lib.sock_client.SockClientClosedError
�[36m(_run_job pid=2748835)�[0m 
�[36m(_run_job pid=2748835)�[0m The above exception was the direct cause of the following exception:
�[36m(_run_job pid=2748835)�[0m 
�[36m(_run_job pid=2748835)�[0m Traceback (most recent call last):
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router.py", line 56, in message_loop
�[36m(_run_job pid=2748835)�[0m     msg = self._read_message()
�[36m(_run_job pid=2748835)�[0m   File "/home/zju/erqt/lib/python3.9/site-packages/wandb/sdk/interface/router_sock.py", line 29, in _read_message
�[36m(_run_job pid=2748835)�[0m     raise MessageRouterClosedError from e
�[36m(_run_job pid=2748835)�[0m wandb.sdk.interface.router.MessageRouterClosedError

I don't know if it's related. It seems like wandb still trying to connect to server even in offline mode.

ArtsiomWB · 2025-04-01T17:53:47Z

Thank you for the follow-up.
Could you please talk a bit about the environment you are running your experiments on?

Unfortunately, I do not see much inside of the debug logs above. Our offline mode should not be trying to access the internet.

What kind of data are you currently logging? At some point, we had an issue where logging wandb tables in offline mode would cause SDK to throw a network error.

timoffex · 2025-04-02T00:09:22Z

@chufanchen is the error in your most recent message from the same run? Was there anything else printed?

That particular error is a little confusing, but it likely means that the internal service process (wandb-core) crashed.

I don't see anything surprising in the log files.

chufanchen · 2025-04-02T02:40:15Z

is the error in your most recent message from the same run? Was there anything else printed?

No, the error in my most recent message is from the actual reinforcement learning experiment. The complete output from the minimal example is attached here for reference: offline-run-20250328_052127-dis3duzl.zip.

I ran some tests using a minimal example and observed different behaviors depending on whether the host machine had internet access.

On a server with internet access (still using WANDB_MODE=offline):
Everything worked as expected — the run produced both wandb-history.jsonl and wandb-summary.json.
On a fully offline server (true network isolation using Docker --network none)

The run completed without error, but:

No wandb-summary.json was generated
wandb-history.jsonl was empty

Here is the exact setup I used:

Dockerfile

FROM python:3.8-slim

RUN pip install --no-cache-dir wandb==0.19.8

WORKDIR /workspace

COPY wandb_offline_test.py .

ENV WANDB_MODE=offline

VOLUME ["/workspace/wandb"]

CMD ["python", "wandb_offline_test.py"]

wandb_offline_test.py

import os
import time
import wandb

# # Explicitly tell wandb to work offline
os.environ["WANDB_MODE"] = "offline"

run = wandb.init(
    mode="offline",
    project="test-offline-project",
    name="offline-debug-run",
)

# Log some dummy metrics
for step in range(5):
    wandb.log({
        "loss": 1.0 / (step + 1),
        "accuracy": step * 0.1
    })
    print(f"Logged step {step}")
    time.sleep(0.1)

wandb.finish()

Run commands

docker build -t wandb-offline-test .
mkdir -p wandb_logs
docker run --rm \
  --network none \
  -v $(pwd)/wandb_logs:/workspace/wandb \
  wandb-offline-test

@timoffex Let me know if I can provide any additional logs or environment info. It seems like the behavior may differ depending on whether the host machine has internet access, even in offline mode.

ArtsiomWB · 2025-04-04T19:42:23Z

Hey @chufanchen! Thank you for the repro. I was also able to repro this by running your script and turning off the wifi on my machine entirely. I see that wandb-summay is not generated and output.log is empty.

Will create a ticket for our SDK team!

liuxuexun · 2025-06-02T08:14:28Z

@chufanchen Hi, I encountered the same issue, have you solved it?

ArtsiomWB · 2025-06-02T16:39:55Z

Hey @liuxuexun, doesn't look like it has been fixed yet.

chufanchen added a:sdk Area: sdk related issues ty:bug type of the issue is a bug labels Mar 27, 2025

timoffex self-assigned this Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

[Bug]: No wandb-summary.json generated and empty output.log after offline run #9646

Comments

Describe the bug

Uh oh!

Uh oh!

debug.log

debug-internal.log

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!