-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Kill check processes after the timeout is reached #3567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fe27432
to
1c289bd
Compare
Kill the subprocess spawned by a script check once the timeout is reached. Previously Consul just marked the check critical and left the subprocess around. Fixes #3565.
1c289bd
to
01d2c3f
Compare
agent/check.go
Outdated
errCh <- fmt.Errorf("Timed out running check '%s'", c.Script) | ||
}() | ||
err = <-errCh | ||
err = <-waitCh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set err
to fmt.Errorf("Timed out running check '%s'", c.Script)
to make sure the check gets set to critical (otherwise if the wait returns nil
it can go passing, and then we just block on the waitCh
but don't set err
, so:
err = fmt.Errorf("Timed out running check '%s'", c.Script)
<-waitCh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - can you add a test that runs a sleep that exceeds the timeout as a sanity check to force it through this path?
db36343
to
40b10fe
Compare
|
||
func SetSysProcAttr(cmd *exec.Cmd) {} | ||
|
||
func KillCommandSubtree(cmd *exec.Cmd) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this behavior difference on https://www.consul.io/docs/agent/checks.html.
if c.Timeout > 0 { | ||
time.Sleep(c.Timeout) | ||
|
||
timeout := 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have test for this - I thought saw it here a little while ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an existing test, I just ended up not having to modify it at all
@slackpad @kyhavlov I'm a bit late to the party/review, but first, thanks for the quick fix, and then a couple questions:
|
Hi @nh2 we added a wait in the timeout path so the |
Ah, great.
@slackpad I can see this docs change https://github.com/hashicorp/consul/pull/3567/files#diff-736ca63f9e217827ea7d70758c40cedeR28 but it doesn't really explain to the user how their scripts will be killed. |
That's fair - it says "processes" but should be clearer about process groups and which signal is sent. @kyhavlov can you update the docs to make that more clear? |
Version 1.0.0 * tag 'v1.0.0': (455 commits) Release v1.0.0 Puts the tree in 1.0 final release mode. Fixes an XSS issue with unescaped node names. (hashicorp#3578) Adds a note about the Raft protocol not being the same as the Consul protocol. Adds a 1.0 section to the upgrade guide and cleans up the change log. Update sentinel.html.markdown.erb Update dns forwarding documentation (hashicorp#3574) Adds a brief wait and poll period to update check status after a timeout. (hashicorp#3573) Cleans up some drift between the OSS and Enterprise trees. Clarify the docs around script check timeout behavior Updates the change log. Kill check processes after the timeout is reached (hashicorp#3567) Updates the change log. retry locks on network errors (hashicorp#3553) Fix example code formatting in godoc config: remove redundant code config: fix check for segment.port <= 0 and add test Adds check to make sure port is given so we avoid a nil bind address. Removes obsolete segment stub. agent: add option to discard health output (hashicorp#3562) ...
Kill the subprocess spawned by a script check once the timeout is reached. Previously Consul just marked the check critical and left the subprocess around.
Fixes #3565.