8000 Fleet fails to start units after restart · Issue #1090 · coreos/fleet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
This repository was archived by the owner on Jan 30, 2020. It is now read-only.
Fleet fails to start units after restart #1090
Closed
@yaronr

Description

@yaronr

Hi

3-node CoreOS Beta channel cluster. One node on CoreOS 522.5 (the problematic one), two on 522.4.

Last night, one of my nodes decided to upgrade its CoreOS version. cool.
This morning I find that a few of the services that should run on this node are inactive/dead.
For this issue's sake, I will use two of the services:
https://gist.github.com/yaronr/62e70a897a5560a8cc63

weave.service 1cf0847f.../10.0.4.65 active running
weave.service 57c5b6a6.../10.0.5.237 active running
weave.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@1.service 1cf0847f.../10.0.4.65 active running
zookeeper-weave-sidekick@2.service a3a566ba.../10.0.0.168 active running
zookeeper-weave-sidekick@3.service 57c5b6a6.../10.0.5.237 inactive dead
zookeeper@1.service 1cf0847f.../10.0.4.65 active running
zookeeper@2.service a3a566ba.../10.0.0.168 active running
zookeeper@3.service 57c5b6a6.../10.0.5.237 inactive dead

registry.service is actually started by systemd and not via fleet (cloud-init), but it's also up:
core@ip-10-0-5-237 ~ $ systemctl | grep registry
registry.service loaded active running Custom Docker Registry

I tried digging a bit deeper:
core@ip-10-0-5-237 ~ $ systemctl status zookeeper@3.service
zookeeper@3.service - Zookeeper 3
Loaded: loaded (/run/fleet/units/zookeeper@3.service; linked-runtime)
Active: inactive (dead)

Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopping Zookeeper 3...
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal docker[9047]: zoo3
Jan 13 05:20:44 ip-10-0-5-237.ec2.internal systemd[1]: Stopped Zookeeper 3.

core@ip-10-0-5-237 ~ $ systemctl status zookeeper-weave-sidekick@3.service
zookeeper-weave-sidekick@3.service - zookeeper-weave-sidekick-3 service
Loaded: loaded (/run/fleet/units/zookeeper-weave-sidekick@3.service; linked-runtime)
Active: inactive (dead)

Interestingly, fleetctl list-unit-files gives:
zookeeper@3.service 090d52d launched launched 57c5b6a6.../10.0.5.237
even though list-units shows it as inactive/dead.

Ok, so I try:
fleetctl start zookeeper@3.service

Nothing changes, also systemctl status is the same (and no new logs)

sudo systemctl restart zookeeper@3.service
does the trick, both unit and sidekick are started.

fleetctl shows it as 'running/active'

Question: Could this be related to the Requires dependency on a non-Fleet unit? (even though the unit IS running, it's a systemd unit and not a fleet one)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0