-
Notifications
You must be signed in to change notification settings - Fork 301
Fleet fails to start units after restart #1090
Comments
Note. Another couple of units (also, unit+sidekick) failed the same way, and have no dependency on registry.service or any other non-fleet controlled unit, so I guess there's one less variable in the equation. |
Ok, one additional piece of information: [Unit] Wants=etcd.service After=mesos-master@%i.service [Service] [X-Fleet] |
Hopefully this information helps you figure out what's going on here. |
@bcwaldon thanks for your attention. I have another case: Wants=etcd.service BindsTo=wordpress.service Restart=always Getting: Is it the same thing? |
Yes, this is likely related, if the |
Just an update: Thanks |
Same here: CoreOS stable (607.0.0) Not even doing: sudo locksmithctl reboot. |
This bug should be fixed in all channels. Please share any fleet logs that demonstrate this issue if you are still experiencing it (not just log snippets, it all matters). The exact contents of unit files would be useful, too. Please read through #1158 as well, as that may be the root cause. |
@bcwaldon I think this issue should be re-opened. stop-destroy-start doesn't solve the problem. ● marathon-weave-sidekick@1.service Apr 12 07:34:57 localhost systemd[1]: Cannot add dependency job for unit marathon-weave-sidekick@1.service, ignoring: Unit marathon-weave-sidekick@1.service failed to load: No such file or directory. |
fleet v0.9.2 (available in Alpha) addresses the problem you describe above. |
Hi
3-node CoreOS Beta channel cluster. One node on CoreOS 522.5 (the problematic one), two on 522.4.
Last night, one of my nodes decided to upgrade its CoreOS version. cool.
This morning I find that a few of the services that should run on this node are inactive/dead.
For this issue's sake, I will use two of the services:
https://gist.github.com/yaronr/62e70a897a5560a8cc63
weave.service 1cf0847f.../10.0.4.65 active running
weave.service 57c5b6a6.../10.0.5.237 active running
weave.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@1.service 1cf0847f.../10.0.4.65 active running
zookeeper-weave-sidekick@2.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@3.service 57c5b6a6.../10.0.5.237 inactive dead
zookeeper@1.service 1cf0847f.../10.0.4.65 active running
zookeeper@2.service a3a566ba.../10.0.0.168 active running
zookeeper@3.service 57c5b6a6.../10.0.5.237 inactive dead
registry.service is actually started by systemd and not via fleet (cloud-init), but it's also up:
core@ip-10-0-5-237 ~ $ systemctl | grep registry
registry.service loaded acti 8000 ve running Custom Docker Registry
I tried digging a bit deeper:
core@ip-10-0-5-237 ~ $ systemctl status zookeeper@3.service
● zookeeper@3.service - Zookeeper 3
Loaded: loaded (/run/fleet/units/zookeeper@3.service; linked-runtime)
Active: inactive (dead)
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopping Zookeeper 3...
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal docker[9047]: zoo3
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopped Zookeeper 3.
core@ip-10-0-5-237 ~ $ systemctl status zookeeper-weave-sidekick@3.service
● zookeeper-weave-sidekick@3.service - zookeeper-weave-sidekick-3 service
Loaded: loaded (/run/fleet/units/zookeeper-weave-sidekick@3.service; linked-runtime)
Active: inactive (dead)
Interestingly, fleetctl list-unit-files gives:
zookeeper@3.service 090d52d launched launched 57c5b6a6.../10.0.5.237
even though list-units shows it as inactive/dead.
Ok, so I try:
fleetctl start zookeeper@3.service
Nothing changes, also systemctl status is the same (and no new logs)
sudo systemctl restart zookeeper@3.service
does the trick, both unit and sidekick are started.
fleetctl shows it as 'running/active'
Question: Could this be related to the Requires dependency on a non-Fleet unit? (even though the unit IS running, it's a systemd unit and not a fleet one)
The text was updated successfully, but these errors were encountered: