8000 v1.17 Backports 2025-04-22 by jschwinger233 · Pull Request #39075 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v1.17 Backports 2025-04-22 #39075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 24, 2025
Merged

Conversation

@jschwinger233 jschwinger233 added kind/backports This PR provides functionality previously merged into master. backport/1.17 This PR represents a backport for Cilium 1.17.x of a PR that was merged to main. labels Apr 22, 2025
@jschwinger233 jschwinger233 force-pushed the pr/v1.17-backport-2025-04-22-03-07 branch from 2317391 to 0f4a053 Compare April 22, 2025 09:33
squeed
squeed a 8000 pproved these changes Apr 22, 2025
@jschwinger233 jschwinger233 force-pushed the pr/v1.17-backport-2025-04-22-03-07 branch from 0f4a053 to f11254a Compare April 22, 2025 09:43
Copy link
Member
@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pippolo84
Copy link
Member

@gray Could you please apply this patch to adapt the ipset new test? Thanks!

@jschwinger233
Copy link
Member Author

@pippolo84 Thank you! Exactly what I need 💌

Copy link
Contributor
@devodev devodev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jschwinger233 jschwinger233 force-pushed the pr/v1.17-backport-2025-04-22-03-07 branch 2 times, most recently from 24595fd to 041a927 Compare April 23, 2025 03:22
pippolo84 and others added 8 commits April 23, 2025 11:43
[ upstream commit 5b22071 ]
[ backporter's note: applied #38949 (comment) ]

The new test exercises the reconciler when a batched operation fails and
a retry is attempted. When retrying, the cilium/statedb reconciler calls
the sequential operations (e.g: `Update` instead of `UpdateBatch`) for
each item in the failed batch. Since the sequential operations are not
expected to be called in the current ipset implementation, the
reconciler panics:

=== RUN   TestOpsRetry
panic: Unexpectedly Update() called for reconciliation

goroutine 53 [running]:
github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?)
	/home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25
github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0)
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170
github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0)
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58
github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0)
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de
github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000})
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568
github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...})
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd
created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50
	/home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172
FAIL	github.com/cilium/cilium/pkg/datapath/iptables/ipset	0.233s
FAIL

This will be fixed in a subsequent commit.

Related: 7b80f94 ("iptables/ipset: Use batch operations")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: gray <greyschwinger@gmail.com>

s
[ upstream commit 0491518 ]

When a cilium/statedb reconciler relies on batched operations and a
failure happens, the subsequent retry calls the non-batched operations
(e.g: `Update` instead of `UpdateBatch`). Therefore, to avoid a panic in
case of a retry, we need to define the non-batched operations too for
the ipset reconciler.

Fixes: 7b80f94 ("iptables/ipset: Use batch operations")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 32adef7 ]

Previously, if this job failed due to the host IP not being available,
we would return an error but never unlock the endpoint. This would leave
the system deadlocked and would be unrecoverable, even if a valid host
IP was later added.

Fixes #37791

Signed-off-by: Emily Shepherd <emily@redcoat.dev>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 464dacf ]

The `hubble-metrics` flag changed from being comma-separated to being
space-separated in #36371, this PR updates the metrics documentation
page to reflect that change.

Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 2e101d4 ]

The layer-7 proxies do not respect EnableDefaultDeny, since L7 policy
"overrides" L3, and EnableDefaultDeny is L3-only. Call this out in the
documentation.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit d292ef7 ]

The fix from a83ddf0
moved the conditional for setting the dynamic metrics config
path under the same if as for static metrics. The thing is,
we can't have both static and dynamic metrics enabled at
the same time otherwise hubble will return an error.

To fox, move the dynamic metrics config path flag out
of the if for static metrics.

Do the same for enabling openmetrics as this is scoped to the
server and not static metrics.

Signed-off-by: Alexandre Barone <abalexandrebarone@gmail.com>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 814bb37 ]

[ backporter's note: ditched "networkPolicyCorrelation" changes in
install/kubernetes/cilium/values.yaml.tmpl:
f0260c8#diff-aa0cf5fa6e9857c22d4fa74fb53d84cd302757b5ed8889c1111c32e0525230e2
]

The default configuration for hubble dynamic metrics references
the invalid "all" metric config and result in an error when dynamic
metrics is enabled with no other changes. Fix by setting
the config content to empty, mirroring the behaviour of static
metrics.

Additionally, the go templating for the configmap would not generate
a valid config file when the content is empty. Update the templating
to not render the metrics object when content is empty.

Signed-off-by: Alexandre Barone <abalexandrebarone@gmail.com>
Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit 21a15af ]

[ backporter's note: fixed GOLANGCILINT_WANT_VERSION conflicts in Makefile.defs ]

Currently, the golangci-lint version detection is broken because the flag
`--format short` has been replaced with `--short` with v2.

This commit fixes this by using the new flag.

Context: golangci-lint version detection is used to decide whether `make lint`
should use the locally installed version or whether golangci-lint
should be executed with the expected version in a docker container.

Reported-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: gray <gray.liang@isovalent.com>
borkmann and others added 7 commits April 23, 2025 11:43
[ upstream commit 6d119aa ]

When BPFDistributedLRU is set, the kernel internally adjusts the map's
maximum entry size as follows:

	htab->map.max_entries = roundup(attr->max_entries,
					num_possible_cpus());
	if (htab->map.max_entries < attr->max_entries)
		htab->map.max_entries = rounddown(attr->max_entries,
						  num_possible_cpus());

The problem with this kernel logic is that the next time Cilium tries
to regenerate endpoints, it will find that the map properties mismatch
from Cilium's request, since the original max_entries from Cilium has
been rounded up internally. So we keep continuously rebuilding all CT
and NAT maps, leading to connectivity disruption. The only way to solve
it is to adjust the Cilium's max_entries request also as a rounded up
value, only then we keep matching the map properties.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit edc02e8 ]

The distributed lru setting makes sense with dynamic map sizing given
the kernel internally adjusts the map's max_entries as a rounded up
value to a multiple of num_possible_cpus().

With statically configured map sizing it can happen that the exact
sizing "breaks" in case of heterogenious nodes (w/ different # of CPUs)
in the cluster if the max CT/NAT map sizes were fixed to a specific
value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit b2a8d51 ]

Add a test case with distributed LRU and dynamic map sizing to make sure
we get CPU-aligned map sizing.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 31e7623 ]

[ backporter's note: fixed conflicts in CODEOWNER ]

Move the implementation into its own util package for reuse, so we do not
have to open-code it each time.

Suggested-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit 58b3573 ]

Add a note that it only works on combination with dynamic map sizing.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit f7e076e ]

The ethertype needs to be stored in big-endian layout.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 6c884f1 ]

Replace inconsistent variable names like EIP_ALLOCATION_ID_2 with more
descriptive and consistent names like Cluster_1_EIP_2 in the EKS-to-EKS
Clustermesh Preparation documentation to improve clarity and maintain
consistent naming convention.

Signed-off-by: zzuckerfrei <seonmin1219@gmail.com>
Signed-off-by: gray <greyschwinger@gmail.com>
@jschwinger233 jschwinger233 force-pushed the pr/v1.17-backport-2025-04-22-03-07 branch from 041a927 to d61226b Compare April 23, 2025 03:43
@jschwinger233
Copy link
Member Auth 8000 or

/test

@jschwinger233 jschwinger233 marked this pull request as ready for review April 23, 2025 06:47
@jschwinger233 jschwinger233 requested a review from a team as a code owner April 23, 2025 06:47
Copy link
Member
@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 💯

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 24, 2025
@julianwiedmann julianwiedmann added this pull request to the merge queue Apr 24, 2025
Merged via the queue into v1.17 with commit 86170b4 Apr 24, 2025
355 of 356 checks passed
@julianwiedmann julianwiedmann deleted the pr/v1.17-backport-2025-04-22-03-07 branch April 24, 2025 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.17 This PR represents a backport for Cilium 1.17.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0