kubernetes: let kubelet start when swap is on #473

bcressey · 2025-04-15T23:13:12Z

Issue number:
Related: bottlerocket-os/bottlerocket#4075

Description of changes:
Set failSwapOn: false for all kubelet configs.

The goal of this change is to simplify experiments with swap enabled on the host as a possible remedy for the unreachable nodes discussed in the related issue. This requires letting kubelet actually start if swap is enabled. Allowing pods to use swap is a larger change that's out of scope here.

Testing done:
For each Kubernetes variant from 1.25 to 1.32, I confirmed that kubelet started when swap was enabled on the node, in both cgroups v1 and cgroups v2 configurations.

With cgroups v1, I confirmed that memory.memsw.limit_in_bytes was set to the same value as memory.limit_in_bytes, which prevents the use of swap:

$ find /sys/fs/cgroup -mindepth 6 -name memory.memsw.limit_in_bytes -exec head -v {} \;
==> /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.memsw.limit_in_bytes <==
268435456

$ head -v /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.limit_in_bytes
==> /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod958dcfcf_dfbf_4f6d_91a2_df92b5e0bf9a.slice/cri-containerd-986e63e82f67895ee8300badee54629b6b445d77a75b5d4bc2d2c221b2cf99d1.scope/memory.limit_in_bytes <==
268435456

With cgroups v2, I confirmed that memory.swap.max was set to zero for workload containers.

$ find /sys/fs/cgroup -mindepth 5 -name memory.swap.max -exec head -v {} \;
==> /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4fa0c88a_aaeb_4b63_b2b6_42e1459543c8.slice/cri-containerd-3b5c2ebdb64969a0c4a03e87ef29291a5c2b70c4cf04deb306209430b563b8bf.scope/memory.swap.max <==
0

$ head -v /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod4fa0c88a_aaeb_4b63_b2b6_42e1459543c8.slice/cri-containerd-3b5c2ebdb64969a0c4a03e87ef29291a5c2b70c4cf04deb306209430b563b8bf.scope/memory.max
268435456

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Signed-off-by: Ben Cressey <bcressey@amazon.com>

cartermckinnon · 2025-04-15T23:33:18Z

Do you want to go ahead and open the NodeSwap feature gate on 1.29 and lower? That will give you reporting of the swap capacity: https://github.com/kubernetes/kubernetes/blob/44c230bf5c321056e8bc89300b37c497f464f113/pkg/kubelet/nodestatus/setters.go#L359

dims

This has been running ok on a lot of upstream CI jobs for a while now.

bcressey · 2025-04-16T14:08:09Z

Do you want to go ahead and open the NodeSwap feature gate on 1.29 and lower?

From what I gathered based on the release notes for 1.28 and 1.30:

For 1.28 and 1.29, the supported options were LimitedSwap (default) and UnlimitedSwap, so pods would always get some swap if the feature gate is enabled.
For 1.30 and beyond, where the feature gate is enabled, the supported options are NoSwap (default) and LimitedSwap, so there's no implicit swap usage.

Consequently I don't want to enable the feature gate for the older versions. In the short term, I'd like swap to be available as a tool to manage the kernel's reclaim behavior in near-OOM conditions. It'll be more difficult to evaluate its effectiveness if pods are also utilizing available swap space.

8000

cartermckinnon · 2025-04-16T17:12:48Z

Makes sense, I didn’t catch the change in default swapBehavior 👍

koooosh

What is the underlying reason for the cgroups v1 and cgroups v2 configs being tested differently?

bcressey · 2025-04-16T22:47:11Z

What is the underlying reason for the cgroups v1 and cgroups v2 configs being tested differently?

The mechanism for disabling swap in containers is different, and I wanted to make sure the right thing happened for both.

Also, the k8s 1.32 post says:

On Linux nodes, Kubernetes only supports running with swap enabled for hosts that use cgroup v2. On cgroup v1 systems, all Kubernetes workloads are not allowed to use swap memory.

The pairing of these two statements could imply that kubelet would fail to run if swap was enabled on a host using cgroup v1, and I wanted to confirm that wasn't the case. If it's merely "unsupported" that's fine since the goal is not to enable swap for pods.

kubernetes: let kubelet start when swap is on

a12292c

Signed-off-by: Ben Cressey <bcressey@amazon.com>

bcressey requested review from KCSesh and koooosh April 15, 2025 23:13

dims approved these changes Apr 16, 2025

View reviewed changes

cartermckinnon approved these changes Apr 16, 2025

View reviewed changes

ytsssun approved these changes Apr 16, 2025

View reviewed changes

larvacea approved these changes Apr 16, 2025

View reviewed changes

koooosh reviewed Apr 16, 2025

View reviewed changes

koooosh approved these changes Apr 16, 2025

View reviewed changes

bcressey merged commit 9752dc9 into bottlerocket-os:develop Apr 16, 2025
2 checks passed

bcressey deleted the kubelet-swap branch April 16, 2025 22:48

koooosh mentioned this pull request Apr 17, 2025

Add k8s-1.33 and ecr-credential-provider-1.33 pkgs with pre-release sources #476

Merged

z0rc mentioned this pull request Apr 18, 2025

Support Swap bottlerocket-os/bottlerocket#1911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kubernetes: let kubelet start when swap is on #473

kubernetes: let kubelet start when swap is on #473

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kubernetes: let kubelet start when swap is on #473

kubernetes: let kubelet start when swap is on #473

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!