-
Notifications
You must be signed in to change notification settings - Fork 215
Add signal-reboot #814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add signal-reboot #814
Conversation
In my test this works, using the same values.yml file:
Then journald shows:
and the system immediately does a clean reboot. However note that "SIGRTMIN+5" was sent but it was interpreted by host as SIGRTMIN+6. I believe that is related to the different output of Anyway this was running on a v1.23 k8s cluster with the old PSP admission controller disabled. The nodes are Almalinux 8.8. I'm afraid I could not reproduce the issue; it works fine for me. Looking at the capabilities of the kured process it also appears fine:
Also when I touched the sentinel file on the host, kured worked as expected and the node rebooted:
I confirmed with
|
Thanks for your tests and the positive feedback! Yes, the specific signal value may indeed be a problem. We would need to add a good guidance for the user in our docs, to determine which signal value is correct for their host OS. Most of the users won't know as this is expert knowlege 😆 |
I made it work on my cluster with
|
38dfbb2
to
ae027af
Compare
Ah of course, I'm not familiar with AppArmor at all but that is the SELinux equivalent on some OSes. |
This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days). |
Is anyone available to review this? |
@rptaylor thanks for your patience, added some review comments for @ckotzbauer, I think this generally looks like a great evolution, thanks this work! |
Sorry for the very long delay, this is ready for another review @jackfrancis @rptaylor |
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
Signed-off-by: Christian Kotzbauer <git@ckotzbauer.de>
2e2a527
to
544a287
Compare
@ckotzbauer very exciting! I can test it on a dev cluster, is there a new image available in ghcr.io/ckotzbauer/ ? |
Thanks for the hint: |
Okay refreshing my memory on what this is even about 4 months later .... ;p On a dev cluster I temporarily disabled PSP admission control for testing purposes.
Then on a node I touched /var/run/reboot-required. pod logs:
Eventually .... node rebooted:
Does that cover everything needed? Is it good to merge now? |
Actually wait ... the node powered off instead of rebooting, that's not good... |
I tried again and the node rebooted normally. Sometimes our cloud infrastructure is a bit flakey... Anyway as long as systemd on the node reports "Received SIGRTMIN+5" it should be good. |
// MethodSignal is used as "--reboot-method" value when rebooting with a SIGRTMIN+5 signal. | ||
MethodSignal = "signal" | ||
|
||
sigTrminPlus5 = 34 + 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be sigRtminPlus5 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I fix the typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the typo is still there? But otherwise this should be finally good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't catch that. Let's fix the typo. If we're going to use camel case I'd vote for:
sigRTMin
Maybe:
const (
...
sigRTMin = 34
defaultSignal = sigRTMin + 5
...
)
And we should probably change the MethodSignal
comment so that it doesn't imply that SIGRTMIN+5 is static (it's just a default, but the user can use whatever number within the min-max range that their OS allows). Maybe:
// MethodSignal is used as "--reboot-method" value when rebooting according to a signal in the allowed range between SIGRTMIN and SIGRTMAX as defined on the OS platform running kured
Thanks for testing @rptaylor, this should cover all needed changes, so this should be ready for merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
great work, thank you!
Based on #813
close #416
close #722
This PR adds a
--reboot-method
flag with "command" (default) and "signal" option. The "command" option uses the--reboot-command
on the host withnsenter
as before. The new "signal" mode uses aSIGRTMIN+5
signal by default against PID 1 (systemd) to reboot the node. The signal can be changed via--reboot-signal
flag.With this, the kured pod runs without privileged mode.
This PR is published as docker-image (amd64 and arm64):
ghcr.io/ckotzbauer/kured:1.14.0-alpha.2
Usage (based on the latest helm-chart) - Helm values.yaml: