-
Notifications
You must be signed in to change notification settings - Fork 301
fleetd: keep a list of listeners across reconfiguring fleetd #1517
Changes from all commits
7295e80
686fd7a
7f9fcff
755ed0b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
// Copyright 2016 CoreOS, Inc. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package functional | ||
|
||
import ( | ||
"fmt" | ||
"strings" | ||
"testing" | ||
|
||
"github.com/coreos/fleet/functional/platform" | ||
"github.com/coreos/fleet/functional/util" | ||
) | ||
|
||
// TestReconfigureServer checks whether fleetd managed to keep its listeners | ||
// across reconfiguration of fleetd after receiving SIGHUP. | ||
func TestReconfigureServer(t *testing.T) { | ||
cluster, err := platform.NewNspawnCluster("smoke") | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
defer cluster.Destroy() | ||
|
||
m0, err := cluster.CreateMember() | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
_, err = cluster.WaitForNMachines(m0, 1) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
|
||
// NOTE: we need to sleep once here to get reliable results. | ||
// Without this sleep, the entire fleetd test always ends up succeeding | ||
// no matter whether SIGHUP came or not. | ||
_, _ = cluster.MemberCommand(m0, "sh", "-c", `'sleep 2'`) | ||
|
||
err = waitForFleetdSocket(cluster, m0) | ||
if err != nil { | ||
t.Fatalf("Failed to get a list of fleetd sockets: %v", err) | ||
} | ||
|
||
// send a SIGHUP to fleetd, and periodically checks if a message | ||
// "Reloading configuration" appears in fleet's journal, up to timeout (15) seconds. | ||
stdout, _ := cluster.MemberCommand(m0, "sudo", "systemctl", "kill", "-s", "SIGHUP", "fleet") | ||
if strings.TrimSpace(stdout) != "" { | ||
t.Fatalf("Sending SIGHUP to fleetd returned: %s", stdout) | ||
} | ||
|
||
err = waitForReloadConfig(cluster, m0) | ||
if err != nil { | ||
t.Fatalf("Failed to get log about reconfiguration: %v", err) | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we still need this? I would assume it's no longer necessary, now that we actually check for the socket connection still working?... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see that you use it here now to check for fleet being up, which makes sense I guess. With this in place, do you still need the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes, I still need the sleep. I just did another round of tests, but it's still the same. |
||
|
||
// check if fleetd is still running correctly, by running fleetctl status | ||
_, _, err = cluster.Fleetctl(m0, "list-units") | ||
if err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dongsu: could you please change this to: Thanks you! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tixxdz this reveals that there's still a bug somewhere. I'll try to investigate soon. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tixxdz Fixed a bug. Checking with "fleetctl list-units" is now working. |
||
t.Fatalf("Unable to check list-units. Please check for fleetd socket. err:%v", err) | ||
} | ||
|
||
// Check for HTTP listener error looking into the fleetd journal | ||
stdout, _ = cluster.MemberCommand(m0, "journalctl _PID=$(pidof fleetd)") | ||
if strings.Contains(strings.TrimSpace(stdout), "Failed serving HTTP on listener:") { | ||
t.Fatalf("Fleetd log returned error on HTTP listeners: %s", stdout) | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check seems fine, but what worries me is we are checking an error message that might disappear for other reasons! is there a better way to check for sockets in: systemctl status fleetd; grep pid, then inspect /proc/pid/* ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I was just lazy when I wrote this part. I'll try to change it like that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Trying to check sockets sounds complicated, and I'm not convinced it's really a good idea to rely on internals like that... TBH, I don't think either of these approaches is really the right way to test this. Rather, we should try to check whether actual socket connections keep working as expected. I don't have any specific setup in mind for this -- but surely it should be possible to do this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @antrik indeed, if it's easy to check if the socket is still working in this context then go for it ? my comment was a bit generic... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1, test for the desired behaviour by testing the desired behaviour :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Done. |
||
// Check expected state after reconfiguring fleetd | ||
stdout, _ = cluster.MemberCommand(m0, "systemctl", "show", "--property=ActiveState", "fleet") | ||
if strings.TrimSpace(stdout) != "ActiveState=active" { | ||
t.Fatalf("Fleet unit not reported as active: %s", stdout) | ||
} | ||
stdout, _ = cluster.MemberCommand(m0, "systemctl", "show", "--property=Result", "fleet") | ||
if strings.TrimSpace(stdout) != "Result=success" { | ||
t.Fatalf("Result for fleet unit not reported as success: %s", stdout) | ||
} | ||
} | ||
|
||
// waitForReloadConfig returns if a message "Reloading configuration" exists | ||
// in the journal, periodically checking for the journal up to the timeout. | ||
func waitForReloadConfig(cluster platform.Cluster, m0 platform.Member) (err error) { | ||
_, err = util.WaitForState( | ||
func() bool { | ||
// NOTE: journalctl should run just simply like "journalctl -u fleet", | ||
// without being piped with grep. Doing | ||
// "journalctl -u fleet | grep \"Reloading configuration\"" is racy | ||
// in a subtle way, so that it sometimes fails only on semaphoreci. | ||
// - dpark 20160408 | ||
stdout, _ := cluster.MemberCommand(m0, "journalctl _PID=$(pidof fleetd)") | ||
journalfleet := strings.TrimSpace(stdout) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dongsupark does it make a difference here if you do "journalctl _PID=$(pidof fleetd)" ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why (I don't see why using grep wouldn't be reliable in the first place, though... But doing the compare in Go is fine too; so I guess this is a non-issue.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the gains of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and yeh I forget to say that some developers are also just lazy so they run the tests again and again on the same VM grepping the old bogus journal logs... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tixxdz there are no "old bogus journal logs" -- each test spawns fresh containers with fresh logs. And I still think the (PIDs are in fact prone to race conditions in general -- though I guess that's not really an issue in this specific case.) |
||
if !strings.Contains(journalfleet, "Reloading configuration") { | ||
fmt.Errorf("Fleetd is not fully reconfigured, retrying... entire fleet journal:\n%v", journalfleet) | ||
return false | ||
} | ||
return true | ||
}, | ||
) | ||
if err != nil { | ||
return fmt.Errorf("Reloading configuration log not found: %v", err) | ||
} | ||
|
||
return nil | ||
} | ||
|
||
// waitForFleetdSocket returns if /var/run/fleet.sock exists, periodically | ||
// checking for states. | ||
func waitForFleetdSocket(cluster platform.Cluster, m0 platform.Member) (err error) { | ||
_, err = util.WaitForState( | ||
func() bool { | ||
stdout, _ := cluster.MemberCommand(m0, "test -S /var/run/fleet.sock && echo 1") | ||
if strings.TrimSpace(stdout) == "" { | ||
fmt.Errorf("Fleetd is not fully started, retrying...") | ||
return false | ||
} | ||
return true | ||
}, | ||
) | ||
if err != nil { | ||
return fmt.Errorf("Fleetd socket not found: %v", err) | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, would it help after SIGHUP is sent to check the journal if "fleet.Reloading configuration." is really there ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AIUI the problem is actually that we need to make sure fleetd is fully initialised before even sending the SIGHUP. I agree though that it would be better to wait for this by checking the logs. (Unless there is some more elegant method for checking fleetd's exact status?...)
Note that a proper check like that will also make the preceding checks using systemctl unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tixxdz
Yes, I updated it like that.
But that change didn't allow me to remove th 8000 e sleep command before sending SIGHUP. Without the sleep, test always ends up succeeding, no matter what.. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongsupark actually, I just realised that
WaitForNMachines()
will only succeed once fleet is fully up? So thesystemctl
checks should definitely be redundant.Also, I still don't understand why the sleep is necessary... But I guess that's not a showstopper.