8000 kubernetes: let kubelet start when swap is on by bcressey · Pull Request #473 · bottlerocket-os/bottlerocket-core-kit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kubernetes: let kubelet start when swap is on #473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms 8000 of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025

Conversation

bcressey
Copy link
Contributor

Issue number:
Related: bottlerocket-os/bottlerocket#4075

Description of changes:
Set failSwapOn: false for all kubelet configs.

The goal of this change is to simplify experiments with swap enabled on the host as a possible remedy for the unreachable nodes discussed in the related issue. This requires letting kubelet actually start if swap is enabled. Allowing pods to use swap is a larger change that's out of scope here.

Testing done:
For each Kubernetes variant from 1.25 to 1.32, I confirmed that kubelet started when swap was enabled on the node, in both cgroups v1 and cgroups v2 configurations.

With cgroups v1, I confirmed that memory.memsw.limit_in_bytes was set to the same value as memory.limit_in_bytes, which prevents the use of swap:

$ find /sys/fs/cgroup -mindepth 6 -name memory.memsw.limit_in_bytes -exec head -v {} \;
==> /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.memsw.limit_in_bytes <==
268435456

$ head -v /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.limit_in_bytes
==> /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.limit_in_bytes <==
268435456

With cgroups v2, I confirmed that memory.swap.max was set to zero for workload containers.

$ find /sys/fs/cgroup -mindepth 5 -name memory.swap.max -exec head -v {} \;
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4fa0c88a_aaeb_4b63_b2b6_42e1459543c8.slice/cri-containerd-3b5c2ebdb64969a0c4a03e87ef29291a5c2b70c4cf04deb306209430b563b8bf.scope/memory.swap.max <==
0

$ head -v /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4fa0c88a_aaeb_4b63_b2b6_42e1459543c8.slice/cri-containerd-3b5c2ebdb64969a0c4a03e87ef29291a5c2b70c4cf04deb306209430b563b8bf.scope/memory.max
268435456

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Signed-off-by: Ben Cressey <bcressey@amazon.com>
@bcressey bcressey requested review from KCSesh and koooosh April 15, 2025 23:13
@cartermckinnon
Copy link
cartermckinnon commented Apr 15, 2025

Do you want to go ahead and open the NodeSwap feature gate on 1.29 and lower? That will give you reporting of the swap capacity: https://github.com/kubernetes/kubernetes/blob/44c230bf5c321056e8bc89300b37c497f464f113/pkg/kubelet/nodestatus/setters.go#L359

Copy link
@dims dims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been running ok on a lot of upstream CI jobs for a while now.

@bcressey
Copy link
Contributor Author

Do you want to go ahead and open the NodeSwap feature gate on 1.29 and lower?

From what I gathered based on the release notes for 1.28 and 1.30:

  • For 1.28 and 1.29, the supported options were LimitedSwap (default) and UnlimitedSwap, so pods would always get some swap if the feature gate is enabled.
  • For 1.30 and beyond, where the feature gate is enabled, the supported options are NoSwap (default) and LimitedSwap, so there's no implicit swap usage.

Consequently I don't want to enable the feature gate for the older versions. In the short term, I'd like swap to be available as a tool to manage the kernel's reclaim behavior in near-OOM conditions. It'll be more difficult to evaluate its effectiveness if pods are also utilizing available swap space.

8000

@cartermckinnon
Copy link

Makes sense, I didn’t catch the change in default swapBehavior 👍

Copy link
Contributor
@koooosh koooosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the underlying reason for the cgroups v1 and cgroups v2 configs being tested differently?

@bcressey
Copy link
Contributor Author

What is the underlying reason for the cgroups v1 and cgroups v2 configs being tested differently?

The mechanism for disabling swap in containers is different, and I wanted to make sure the right thing happened for both.

Also, the k8s 1.32 post says:

On Linux nodes, Kubernetes only supports running with swap enabled for hosts that use cgroup v2. On cgroup v1 systems, all Kubernetes workloads are not allowed to use swap memory.

The pairing of these two statements could imply that kubelet would fail to run if swap was enabled on a host using cgroup v1, and I wanted to confirm that wasn't the case. If it's merely "unsupported" that's fine since the goal is not to enable swap for pods.

@bcressey bcressey merged commit 9752dc9 into bottlerocket-os:develop Apr 16, 2025
2 checks passed
@bcressey bcressey deleted the kubelet-swap branch April 16, 2025 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0