8000 daemon: Cleanup after a failed start by joamaki · Pull Request #32673 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

daemon: Cleanup after a failed start #32673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joamaki
Copy link
Contributor
@joamaki joamaki commented May 22, 2024

If the daemon start failed due to e.g. invalid configuration the cleanup was not performed as it was only done from the stop function that was not invoked since the daemon start had failed.

This in some situations lead to confusing crashes as hive did stop all the dependencies of the legacy daemon, for example this crash was reported:

   level=fatal msg="failed to start: daemon creation failed: failed to detect devices: ...
   ... bunch of "stopping" messages omitted ...
   panic: runtime error: invalid memory address or nil pointer dereference
   [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1dfa041]
   goroutine 460 [running]:
     github.com/cilium/cilium/pkg/node.(*LocalNodeStore).Observe(0x3f4cfb0? ...
     	<autogenerated>:1 +0x21

The crash was caused by the LocalNodeStore stopping and clearing the global variable in the node package (to make sure for example our tests don't leak data across them and to make sure we do cleanup correctly), while the DNSProxy was still running in the background processing DNS packets and using the globals in the node package.

To fix this, run the cleanup functions also when "newDaemon" or "startDaemon" fails with an error.

Fixes: f21f303 ("daemon: bubble up error from startDaemon instead of fataling")

cilium-agent: Fix crash due to skipped resource cleanup when agent is stopping due to failed start.

8000
If the daemon start failed due to e.g. invalid configuration the cleanup was not
performed as it was only done from the stop function that was not invoked since
the daemon start had failed.

This in some situations lead to confusing crashes as hive did stop all the dependencies
of the legacy daemon, for example this crash was reported:

   level=fatal msg="failed to start: daemon creation failed: failed to detect devices: ...
   ... bunch of "stopping" messages omitted ...
   panic: runtime error: invalid memory address or nil pointer dereference
   [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1dfa041]
   goroutine 460 [running]:
     github.com/cilium/cilium/pkg/node.(*LocalNodeStore).Observe(0x3f4cfb0? ...
     	<autogenerated>:1 +0x21

The crash was caused by the LocalNodeStore stopping and clearing the global variable
in the node package (to make sure for example our tests don't leak data across them
and to make sure we do cleanup correctly).

To fix this, run the cleanup functions also when "newDaemon" or "startDaemon" fails
with an error.

Fixes: f21f303 ("daemon: bubble up error from startDaemon instead of fataling")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki requested a review from a team as a code owner May 22, 2024 15:11
@joamaki joamaki requested a review from ldelossa May 22, 2024 15:11
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 22, 2024
@joamaki joamaki added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 22, 2024
@joamaki joamaki mentioned this pull request May 22, 2024
3 tasks
@joamaki joamaki added backport/author The backport will be carried out by the author of the PR. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels May 22, 2024
@joamaki
Copy link
Contributor Author
joamaki commented May 22, 2024

/test

Copy link
Contributor
@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me. 👍

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 23, 2024
@joamaki joamaki added this pull request to the merge queue May 24, 2024
Merged via the queue into cilium:main with commit edc9cba May 24, 2024
65 of 66 checks passed
@joamaki joamaki deleted the pr/joamaki/fix-cleanup-daemon-on-failed-start branch May 24, 2024 09:29
@giorio94
Copy link
Member

👋 @joamaki Backporter here. Double-checking whether this PR is still intended to backported to v1.15, or the label should be dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/author The backport will be carried out by the author of the PR. needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
Status: Needs backport from main
Development

Successfully merging this pull request may close these issues.

3 participants
0