8000 Sporadic RPC error "containerd: container did not start before the specified timeout" · Issue #22226 · moby/moby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Sporadic RPC error "containerd: container did not start before the specified timeout" #22226
Closed
@ghost

Description

Output of docker version:

Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:17:17 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:17:17 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 38
 Running: 23
 Paused: 0
 Stopped: 15
Images: 37
Server Version: 1.11.0
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.0-0.bpo.1-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.24 GiB
Name: talon.one
ID: 4S3Y:6F4A:EYIB:DWKQ:C5C7:5RLX:YAM6:O426:BLGU:HP47:KN6J:FA5N
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No kernel memory limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
Physical box:
Debian 8.4
Linux talon.one 4.4.0-0.bpo.1-amd64 #1 SMP Debian 4.4.6-1~bpo8+1 (2016-03-20) x86_64 GNU/Linux

Steps to reproduce the issue:

We build our image using Drone, and at the end of the build the new docker image is automatically deployed on the same server:

$ docker run --restart=always --name talon-master-api -e POSTGRES_PORT_5432_TCP_ADDR=talon-master-postgres --env-file=/home/talonone/secrets/salesforce --net=talon-master-nw --expose=9000 -d docker.talon.one/talon-api/master:latest
27625ea8e6ca8d59c2b501451846d476a0ce50f8c6d87a7e465be255d5a3de7a

Sometimes we get the response:

docker: Error response from daemon: rpc error: code = 2 desc = "containerd: container did not start before the specified timeout".

The issue is similar to #22053 but we don't use docker compose.

I can also reproduce this issue by manually doing:

docker restart talon-master-api

The command then hangs for a while and outputs with intermittent kernel messages:

Message from syslogd@talon at Apr 21 13:59:30 ...
 kernel:[71227.458071] unregister_netdevice: waiting for lo to become free. Usage count = 1
Error response from daemon: Cannot restart container talon-master-api: rpc error: code = 2 desc = "containerd: container did not start before the specified timeout"

So this issue might be triggered by / connected to the open issue #5618

I restarted the docker daemon with systemctl after adding the --debug flag in the systemd unit. This made the problem go away temporarly; I suspect this is because it cleans up the "waiting for lo to become free" bug.

After around 2 minutes, the "waiting for lo to become free" problem reappears.

While trying to reproduce this issue, I run into another problem (not sure if connected): docker start and ps see a different state of one of the containers:

root@talon /home/talonone # docker start demo-telco-master
Error response from daemon: Container 8b8eac92b62c06297ec87a1f27ef7e3d26aabd8b571bc026f88edbf9538d1e2c is aleady active
Error: failed to start containers: demo-telco-master
root@talon /home/talonone # docker ps | grep demo-telco-master
root@talon /home/talonone # docker ps -a | grep demo-telco-master
8b8eac92b62c        docker.talon.one/demo-telco/master:latest             "/bin/sh -c 'ruby app"   17 hours ago        Exited (128) 6 minutes ago                                             demo-telco-master

I will file a seperate issue for this.

In docker debug mode, i wasn't quickly able to reproduce the main issue (container did not start). I will submit daemon debug logs as soon as it reappears.

I will gladly supply any needed debug info to help resolve the issue.

Additional information you deem important (e.g. issue happens only occasionally):

Dockerfile:

FROM alpine:latest

RUN apk --update upgrade && \
    apk add ca-certificates && \
    update-ca-certificates && \
    rm -rf /var/cache/apk/*

COPY talon /talon/talon
WORKDIR /talon
ENV PATH=$PATH:/talon 
CMD ["talon"]

EXPOSE 9000   

Reference to discussion on twitter: https://twitter.com/mntmn/status/723108094620786688

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugBugs are bugs. The cause may or may not be known at triage time so debugging may be needed.version/1.11

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0