Open
Description
I have a swarm cluster 17.10 with AWS Autoscale.
I am creating new nodes of swarm workers with a label on the daemon
cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service
Wants=network-online.target
Requires=docker.socket
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd --label=boost-prod=back-xlarge -H fd://
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target
After starting the daemon and running the join command the daemon crashes with the following exception
Nov 01 20:43:54 ip-172-19-27-114 systemd[1]: Starting Docker Application Container Engine...
Nov 01 20:43:54 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:54.305632645Z" level=info msg="libcontainerd: new containerd process, pid: 9165"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.410981198Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.411161433Z" level=warning msg="Your kernel does not support swap memory limit"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.411213383Z" level=warning msg="Your kernel does not support cgroup rt period"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.411223063Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.411558648Z" level=info msg="Loading containers: start."
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.848256514Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.887437398Z" level=info msg="Loading containers: done."
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.912610480Z" level=info msg="Docker daemon" commit=f4ffd25 graphdriver(s)=overlay2 version=17.10.0-ce
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.912724474Z" level=info msg="Daemon has completed initialization"
Nov 01 20:43:55 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:43:55.923831611Z" level=info msg="API listen on /var/run/docker.sock"
Nov 01 20:43:55 ip-172-19-27-114 systemd[1]: Started Docker Application Container Engine.
Nov 01 20:44:04 ip-172-19-27-114 systemd[1]: Stopping Docker Application Container Engine...
Nov 01 20:44:04 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:44:04.630751303Z" level=info msg="Processing signal 'terminated'"
Nov 01 20:44:04 ip-172-19-27-114 dockerd[9157]: time="2017-11-01T20:44:04.666193863Z" level=info msg="stopping containerd after receiving terminated"
Nov 01 20:44:05 ip-172-19-27-114 systemd[1]: Stopped Docker Application Container Engine.
Nov 01 20:44:05 ip-172-19-27-114 systemd[1]: Stopped Docker Application Container Engine.
Nov 01 20:44:05 ip-172-19-27-114 systemd[1]: Stopped Docker Application Container Engine.
Nov 01 20:44:05 ip-172-19-27-114 systemd[1]: Starting Docker Application Container Engine...
Nov 01 20:44:05 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:05.818166570Z" level=info msg="libcontainerd: new containerd process, pid: 10897"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.824053619Z" level=info msg="[graphdriver] using prior storage driver: overlay2"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.828343876Z" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.828582473Z" level=warning msg="Your kernel does not support swap memory limit"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.828636365Z" level=warning msg="Your kernel does not support cgroup rt period"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.828656031Z" level=warning msg="Your kernel does not support cgroup rt runtime"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.829117288Z" level=info msg="Loading containers: start."
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.902881792Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.938404858Z" level=info msg="Loading containers: done."
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.956849284Z" level=info msg="Docker daemon" commit=f4ffd25 graphdriver(s)=overlay2 version=17.10.0-ce
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.956925487Z" level=info msg="Daemon has completed initialization"
Nov 01 20:44:06 ip-172-19-27-114 dockerd[10889]: time="2017-11-01T20:44:06.964093684Z" level=info msg="API listen on /var/run/docker.sock"
Nov 01 20:44:06 ip-172-19-27-114 systemd[1]: Started Docker Application Container Engine.
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: panic: runtime error: index out of range
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: goroutine 216 [running]:
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: github.com/docker/docker/daemon/cluster/executor/container.(*executor).Configure(0xc42046d6c0, 0x7f0b11d00528, 0xc4207c8ea0, 0xc420391540, 0x12, 0xa)
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: /go/src/github.com/docker/docker/daemon/cluster/executor/container/executor.go:146 +0x1b3
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).handleSessionMessage(0xc420136540, 0x7f0b11d00528, 0xc4207c8ea0, 0xc420863920, 0xc42046336
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:408 +0x168
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).run(0xc420136540, 0x7f0b11d00528, 0xc4207c8ea0)
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:294 +0xe8c
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: created by github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).Start.func1
Nov 01 20:44:37 ip-172-19-27-114 dockerd[10889]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:83 +0x88
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: docker.service: Unit entered failed state.
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: Stopped Docker Application Container Engine.
Nov 01 20:44:37 ip-172-19-27-114 systemd[1]: Starting Docker Application Container Engine...
I dont know how exactly how reproduce it, because sometimes a worker starts, because if after a while Ill start the daemon again on the worker node, it will start.
This is the info on the worker node:
root@ip-172-19-27-114:~# docker version
Client:
Version: 17.10.0-ce
API version: 1.33
Go version: go1.8.3
Git commit: f4ffd25
Built: Tue Oct 17 19:04:16 2017
OS/Arch: linux/amd64
Server:
Version: 17.10.0-ce
API version: 1.33 (minimum version 1.12)
Go version: go1.8.3
Git commit: f4ffd25
Built: Tue Oct 17 19:02:56 2017
OS/Arch: linux/amd64
Experimental: false
root@ip-172-19-27-114:~# docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.10.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: x781firlupkc71k9s966anpcr
Is Manager: false
Node Address: 172.19.27.114
Manager Addresses:
172.19.18.32:2377
172.19.27.231:2377
172.19.38.28:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1022-aws
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: ip-172-19-27-114
ID: ATPM:UMCC:PWY6:EWF6:CV6F:U6NF:L4DF:KG6T:K72W:UOAY:3EUB:KJKO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
boost-prod=back-xlarge
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
This is the manager:
root@swarm-as-prod-01:~# docker info && docker version
Containers: 5
Running: 2
Paused: 0
Stopped: 3
Images: 4
Server Version: 17.10.0-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
NodeID: uym2mqwhl31phqzrtlwo94vqo
Is Manager: true
ClusterID: yjdpqiyumu8w93vajs9fedjwv
Managers: 3
Nodes: 54
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.19.18.32
Manager Addresses:
172.19.18.32:2377
172.19.27.231:2377
172.19.38.28:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-1022-aws
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: swarm-as-prod-01.naturalint.com
ID: KEL2:7C54:IY67:6CCK:OLQQ:Z2XE:6QRJ:OXAT:SJNS:7Y2K:QARP:PQ3K
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Client:
Version: 17.10.0-ce
API version: 1.33
Go version: go1.8.3
Git commit: f4ffd25
Built: Tue Oct 17 19:04:16 2017
OS/Arch: linux/amd64
Server:
Version: 17.10.0-ce
API version: 1.33 (minimum version 1.12)
Go version: go1.8.3
Git commit: f4ffd25
Built: Tue Oct 17 19:02:56 2017
OS/Arch: linux/amd64
Experimental: false
I dont see any logs in the swarm manager for this exception of index out of range on the worker during the join
I will gladly provide more information.
On a different worker node, exception is the same index out of range
ov 02 01:26:07 ip-172-19-21-216 dockerd[12126]: time="2017-11-02T01:26:07.261826901Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Nov 02 01:26:07 ip-172-19-21-216 dockerd[12126]: time="2017-11-02T01:26:07.532300039Z" level=info msg="Loading containers: done."
Nov 02 01:26:07 ip-172-19-21-216 dockerd[12126]: time="2017-11-02T01:26:07.558990677Z" level=info msg="Docker daemon" commit=f4ffd25 graphdriver(s)=overlay2 version=17.10.0-ce
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: panic: runtime error: index out of range
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: goroutine 143 [running]:
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: github.com/docker/docker/daemon/cluster/executor/container.(*executor).Configure(0xc4202d5940, 0x7fa8df8dd398, 0xc4207d5050, 0xc4202763c0, 0x12, 0xa)
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: /go/src/github.com/docker/docker/daemon/cluster/executor/container/executor.go:146 +0x1b3
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).handleSessionMessage(0xc4200c6600, 0x7fa8df8dd398, 0xc4207d5050, 0xc420988fc0, 0xc4202ee820, 0x0, 0xc4209493b0)
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:408 +0x168
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).run(0xc4200c6600, 0x7fa8df8dd398, 0xc4207d5050)
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:294 +0xe8c
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: created by github.com/docker/docker/vendor/github.com/docker/swarmkit/agent.(*Agent).Start.func1
Nov 02 01:26:08 ip-172-19-21-216 dockerd[12126]: /go/src/github.com/docker/docker/vendor/github.com/docker/swarmkit/agent/agent.go:83 +0x88
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: docker.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: Failed to start Docker Application Container Engine.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: docker.service: Unit entered failed state.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: docker.service: Failed with result 'exit-code'.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: Stopped Docker Application Container Engine.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: docker.service: Start request repeated too quickly.
Nov 02 01:26:08 ip-172-19-21-216 systemd[1]: Failed to start Docker Application Container Engine.