Description
Hi
3-node CoreOS Beta channel cluster. One node on CoreOS 522.5 (the problematic one), two on 522.4.
Last night, one of my nodes decided to upgrade its CoreOS version. cool.
This morning I find that a few of the services that should run on this node are inactive/dead.
For this issue's sake, I will use two of the services:
https://gist.github.com/yaronr/62e70a897a5560a8cc63
weave.service 1cf0847f.../10.0.4.65 active running
weave.service 57c5b6a6.../10.0.5.237 active running
weave.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@1.service 1cf0847f.../10.0.4.65 active running
zookeeper-weave-sidekick@2.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@3.service 57c5b6a6.../10.0.5.237 inactive dead
zookeeper@1.service 1cf0847f.../10.0.4.65 active running
zookeeper@2.service a3a566ba.../10.0.0.168 active running
zookeeper@3.service 57c5b6a6.../10.0.5.237 inactive dead
registry.service is actually started by systemd and not via fleet (cloud-init), but it's also up:
core@ip-10-0-5-237 ~ $ systemctl | grep registry
registry.service loaded active running Custom Docker Registry
I tried digging a bit deeper:
core@ip-10-0-5-237 ~ $ systemctl status zookeeper@3.service
● zookeeper@3.service - Zookeeper 3
Loaded: loaded (/run/fleet/units/zookeeper@3.service; linked-runtime)
Active: inactive (dead)
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopping Zookeeper 3...
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal docker[9047]: zoo3
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopped Zookeeper 3.
core@ip-10-0-5-237 ~ $ systemctl status zookeeper-weave-sidekick@3.service
● zookeeper-weave-sidekick@3.service - zookeeper-weave-sidekick-3 service
Loaded: loaded (/run/fleet/units/zookeeper-weave-sidekick@3.service; linked-runtime)
Active: inactive (dead)
Interestingly, fleetctl list-unit-files gives:
zookeeper@3.service 090d52d launched launched 57c5b6a6.../10.0.5.237
even though list-units shows it as inactive/dead.
Ok, so I try:
fleetctl start zookeeper@3.service
Nothing changes, also systemctl status is the same (and no new logs)
sudo systemctl restart zookeeper@3.service
does the trick, both unit and sidekick are started.
fleetctl shows it as 'running/active'
Question: Could this be related to the Requires dependency on a non-Fleet unit? (even though the unit IS running, it's a systemd unit and not a fleet one)