-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Intermittent "unexpected EOF" while downloading container layers when built with go 1.24 #49513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting; unfortunately the ❓ 👉 Are you able to obtain logs from the daemon when this happens? Ideally, with the daemon running with debug enabled (which will log requests together with other events), which can correlate what happened related to what request, but if something went really wrong, logs would likely contain some information in either case. |
I should point to a couple of things I noticed in your
The docker daemon source commit looks to have been overridden, because b0f5bc3 is a commit from 4 Years ago for a 20.10 release of docker (#42352)
All of that make it possible that OpenMandriva is shipping with modifications of the source-code that could be relevant; we should check the daemon logs for sure to narrow down what's happening, but might be worth to also report with the OpenMandriva packager. |
I am the OpenMandriva packager -- this error is happening while testing if our updated package is ready to go out to end users (FWIW the status is "almost ready" -- it works perfectly once the containers are installed, the seemingly random crashes while unpacking layers are the only problem). We aren't currently applying any patches. The wrong git commit being listed is indeed an oversight (good catch!), we're usually using release tarballs (which obviously don't have a commit id that The
Unfortunately the backtrace doesn't look very useful (at least to me), looks like a crash during memory allocation with no indicator of what is being allocated.
|
Ah! Sorry, didn't notice that 🙈 - I've been bit by some cases where packaging changes were relevant and very subtly breaking things (most recently we had issues with the debian packages, due to breaking changes in one of our dependencies that happened to be updated in their packaging pipeline). Thanks for trying to get more info; it's indeed unfortunately not providing a lot of info; I do see things seem to go bad around the invocation of Line 196 in 459686b
|
Disabling pigz makes the log output slightly different, but doesn't make the problem go away.
The backtrace remains the same too. I'm starting to suspect go 1.24 may have something to do with this -- I've started bisecting this and unless I messed something up, if I rebuild the known good 27.5.0 package in today's environment, it starts showing the same breakage. The main difference between the build environment when the known good package was built and today's build environment is a go update from 1.23.something to 1.24. Will run some more checks on that. |
Oh! I missed you're building with go1.24; yes, I've seen cases in other repositories/projects where go1.24 broke things. We are currently still on go1.23 (usually wait upgrading to latest go, as we 9 out of 10 times run into subtle regressions in areas the Go maintainers didn't expect things to be used) |
I have a draft PR that was used to do initial testing with go1.24; we'll likely be updating our master/main branch once we have the most urgent v28.0.0 kinks fixed, but currently our master/main (and release branch) is still on go1.23; |
I've confirmed that the problem with 28.0.0 goes away if I rebuild it with go 1.23.6 -- so this is definitely a go bug or a bad use of go APIs that is no longer possible with 1.24 |
Thanks! Hm.. so now the challenge is to find indeed where the problem lies. Looking if I see anything suspicious; last log around that issue is Lines 220 to 253 in 459686b
That code also involves https://github.com/moby/moby/blob/v27.5.1/vendor.mod#L95 Diff; vbatts/tar-split@v0.11.5...v0.11.6 This commit looks to be in the I did notice is that we're not on the latest version of |
I can reproduce the "unexpected EOF" things if I rebuild 27.5.1 with go 1.24.0 (I initially thought this was a 28.0 regression because we updated go between the release of 27.5.1 and 28.0.0 -- so the bug happened to occur with the 28.0 update -- but it's actually present in 27.5.1 too), so the |
Thank you! That's a useful datapoint; much appreciated. |
FWIW, I tested 28.0.0 built with go1.24 on Archlinux and didn't experience this. Does it happen with all images? |
I had experienced exact the same issue with docker 28.0.0 shipped with archlinux official repository, which is believed to be built with go 1.24.0, as the 'last update' of reference:
|
Besides, when docker daemon ( |
@leo9800 Can you provide the example image that this happens with? |
It seems to happen more or less at random (which is why just running the pull/compose command on the same container a few times in a row usually "fixes" it). |
Virtually any image could cause this problem, without stable reproduction. Says, I upgraded image While on another host B (virtual machine) mocking the environment of host A, pull After restore snapshot for host B, reboot, pull ollama and nats again, in same sequence, both were succeeded without any error. Besides, if I'll try chroot-rebuild
|
@leo9800 are you on a From the
|
@sipsma ironically, the below commit is what I was testing overnight in my latest bisect, and I was unable to reproduce the crash over 12 hours of continual testing.
I'll try with your minimal reproduction too, though here's where I'm at so far with two more bisect steps to go:
|
It wouldn't shock me if that vdso change by itself made this "possible but rare" and then golang/go@8678196 (that your git bisect hit) made it "possible and common" since it further changed some of the parameters relevant to when
Thanks! @Doridian's comment here is possibly relevant too: #49513 (comment), may want to limit processes in order for it to get hit more consistently, depending on your hardware. |
OHMAN, Y'all make me happy here! Thanks everyone here for helping out on this one 🤞 hope that's indeed gonna help fix the issue, but this looks REALLY hopeful! |
I've just managed to reproduce it with Docker against Updated the bisect below, I'll close this off by testing both with Docker and the minimal repro from here on.
|
I'm fairly confident that @sipsma is correct. I had to do a local replace of Anything before golang/go@eb6f2c2 will obviously fail to compile. Also, the mildly educated guess, which is tied to what I initially thought to be the culprit commit, makes sense too. Final bisect below:
|
Arch will push a patched |
rebuilt λ go » time go run .
signal: segmentation fault (core dumped)
go run . 1.21s user 0.14s system 514% cpu 0.262 total
λ go » sudo pacman -U /var/cache/pacman/pkg/go-2:1.24.1-2-x86_64.pkg.tar.zst
[...snip...]
λ go » time go run .
^Csignal: interrupt
go run . 72.98s user 0.22s system 1808% cpu 4.046 total |
Feedback: Applied |
Unfortunately it just happened again to me while pulling an image on an Aarch64 machine running ArchLinux ARM with docker 28.0.4-2. |
Happens to me quite frequently as well on the same setup. Can we be sure that docker 28.0.4-2 on Aarch64 is actually built against a patched go-version? If not, this would explain why we're still experiencing this problem. |
One could verify which version of Go was used for building a particular binary by using
Will give 28.0.4-2 a try some time later. |
|
Seems to be working for me at the moment, the next release of Immich will be the acid test as that's what failed previously, will report back later this week as they tend to run on a weekly/bi-weekly release schedule. Many of my other images/containers have been pulling fine so far. |
ALARM is not Arch, so I would not bet on that being the case. You can check by running.
or something similar. |
|
|
Not sure if totally related but I experience crashes of dockerd aswell when pulling some images (not 100% sure which ones exactly).
Docker Info
Re-running the command after docker re-started works. The kernel log shows segfaults when the service crashes: (Collection of segfaults during the last days)
Edit: Added bug on ubuntu docker.io package: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/2109499 |
My system is running a Ryzen 5 3600 with around 40 different docker images, it happened randomly on any of the container pull extractions. The system basically acts as a home server with dockerized services like Nextcloud, Mailcow, Immich, Jellyfin, Paperless, Vaultwarden, Homeassistant and many more... So I rolled back to 27.5.1-1 package at the time, as I could not have essential stuff outages... But at that time occasionally also happened on my desktop which uses a Ryzen 7 5800X3D, but dont use docker there often. I just updated the home server to docker 28.1.1-1 and latest packages, also updated the whole system and set docker debug to true, will report back if I have issues during updates. Update: Updated all the available images, so far no crashes, the new package also seems to use go version 1.24.2:
|
Is there a hint (apart from the packages existing in e.g. arch linux) whether it's safe to upgrade to Docker 1.28 on |
I just downgraded to |
To anyone who wants to add a comment saying they still have this bug -- check that your Docker is compiled by go1.24.3 (by running e.g. If you see Go 1.24 version older than go1.24.3, file a bug with your distro vendor. |
I think Arch Linux is the exception. They are at 1.24.2 but they patched it in. |
I'm on Arch Linux, and still seeing the issue (latest docker version, and it's built with 1.24.2). I've commented over there to see if the maintainer can update the build. |
@englut Really? Hmm I had the issue but haven't had one since. I'm on docker 1:28.1.1-1 |
Same here, I'm on Arch and I had the issue but since the patch it's solved it for me. According to |
Description
Since updating to 28.0.0, I'm getting a lot of "unexpected EOF" errors when bringing up a series of containers.
Unfortunately this seems to happen at random, so there's no safe reproducer. The connection is fast and reliable, there is likely no massive timeout involved.
I've seen it happening both with
docker pull
anddocker compose up
while pulling layers.Setting
"max-download-attempts": 5000
(or even more ridiculous values) in/etc/docker/daemon.json
doesn't fix it; chances are that since the error is something other thanconnection timed out
or so, docker doesn't recognize this as a download failure and therefore doesn't make another download attempt.Simply running the same docker pull or docker compose command again "fixes" it most of the time (and when it doesn't, surely running it a third or fourth time does).
Reproduce
Expected behavior
it works
docker version
Client: Version: 28.0.0 API version: 1.48 Go version: go1.24.0 Git commit: Built: Thu Feb 20 22:16:09 2025 OS/Arch: linux/amd64 Context: default Server: Engine: Version: 28.0.0 API version: 1.48 (minimum version 1.24) Go version: go1.24.0 Git commit: b0f5bc3 Built: Thu Feb 20 22:15:42 2025 OS/Arch: linux/amd64 Experimental: false containerd: Version: 2.0.2 GitCommit: .m runc: Version: 1.20 [crun] GitCommit: 1.20-1 docker-init: Version: 0.19.0 GitCommit:
docker info
Additional Info
No response
The text was updated successfully, but these errors were encountered: