-
Notifications
You must be signed in to change notification settings - Fork 3.2k
v1.17 Backports 2025-04-22 #39075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.17 Backports 2025-04-22 #39075
Conversation
2317391
to
0f4a053
Compare
0f4a053
to
f11254a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@gray Could you please apply this patch to adapt the ipset new test? Thanks! |
@pippolo84 Thank you! Exactly what I need 💌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
24595fd
to
041a927
Compare
[ upstream commit 5b22071 ] [ backporter's note: applied #38949 (comment) ] The new test exercises the reconciler when a batched operation fails and a retry is attempted. When retrying, the cilium/statedb reconciler calls the sequential operations (e.g: `Update` instead of `UpdateBatch`) for each item in the failed batch. Since the sequential operations are not expected to be called in the current ipset implementation, the reconciler panics: === RUN TestOpsRetry panic: Unexpectedly Update() called for reconciliation goroutine 53 [running]: github.com/cilium/cilium/pkg/datapath/iptables/ipset.(*ops).Update(0xc000929000?, {0x7f2a50ec86d8?, 0x7f2a97b32a78?}, {0x70?, 0xc000780008?}, 0xc0006508c0?) /home/pippolo/cilium/cilium/pkg/datapath/iptables/ipset/ops.go:77 +0x25 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processSingle(0x37926c0, 0xc000929000, 0x2, 0x0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:214 +0x170 github.com/cilium/statedb/reconciler.(*incrementalRound[...]).processRetries(0x37926c0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:190 +0x58 github.com/cilium/statedb/reconciler.(*reconciler[...]).incremental(0x3778660, {0x37469f0, 0xc000952ab0}, {0x371ddb0, 0xc0006443c0}, 0xc0006303a0) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/incremental.go:73 +0x2de github.com/cilium/statedb/reconciler.(*reconciler[...]).reconcileLoop(0x3778660, {0x37469f0, 0xc000952ab0}, {0x374b780, 0xc000988000}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/statedb/reconciler/reconciler.go:92 +0x568 github.com/cilium/hive/job.(*jobOneShot).start(0xc00093cde0, {0x37469f0, 0xc000952ab0}, 0x0?, {0x374b780, 0xc00093cae0}, {{{0x0, 0x0, 0x0}}, 0xc0003493e0, ...}) /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/oneshot.go:138 +0x4fd created by github.com/cilium/hive/job.(*group).Start.func1 in goroutine 50 /home/pippolo/cilium/cilium/vendor/github.com/cilium/hive/job/job.go:167 +0x172 FAIL github.com/cilium/cilium/pkg/datapath/iptables/ipset 0.233s FAIL This will be fixed in a subsequent commit. Related: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com> s
[ upstream commit 0491518 ] When a cilium/statedb reconciler relies on batched operations and a failure happens, the subsequent retry calls the non-batched operations (e.g: `Update` instead of `UpdateBatch`). Therefore, to avoid a panic in case of a retry, we need to define the non-batched operations too for the ipset reconciler. Fixes: 7b80f94 ("iptables/ipset: Use batch operations") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 32adef7 ] Previously, if this job failed due to the host IP not being available, we would return an error but never unlock the endpoint. This would leave the system deadlocked and would be unrecoverable, even if a valid host IP was later added. Fixes #37791 Signed-off-by: Emily Shepherd <emily@redcoat.dev> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 464dacf ] The `hubble-metrics` flag changed from being comma-separated to being space-separated in #36371, this PR updates the metrics documentation page to reflect that change. Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 2e101d4 ] The layer-7 proxies do not respect EnableDefaultDeny, since L7 policy "overrides" L3, and EnableDefaultDeny is L3-only. Call this out in the documentation. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit d292ef7 ] The fix from a83ddf0 moved the conditional for setting the dynamic metrics config path under the same if as for static metrics. The thing is, we can't have both static and dynamic metrics enabled at the same time otherwise hubble will return an error. To fox, move the dynamic metrics config path flag out of the if for static metrics. Do the same for enabling openmetrics as this is scoped to the server and not static metrics. Signed-off-by: Alexandre Barone <abalexandrebarone@gmail.com> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 814bb37 ] [ backporter's note: ditched "networkPolicyCorrelation" changes in install/kubernetes/cilium/values.yaml.tmpl: f0260c8#diff-aa0cf5fa6e9857c22d4fa74fb53d84cd302757b5ed8889c1111c32e0525230e2 ] The default configuration for hubble dynamic metrics references the invalid "all" metric config and result in an error when dynamic metrics is enabled with no other changes. Fix by setting the config content to empty, mirroring the behaviour of static metrics. Additionally, the go templating for the configmap would not generate a valid config file when the content is empty. Update the templating to not render the metrics object when content is empty. Signed-off-by: Alexandre Barone <abalexandrebarone@gmail.com> Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit 21a15af ] [ backporter's note: fixed GOLANGCILINT_WANT_VERSION conflicts in Makefile.defs ] Currently, the golangci-lint version detection is broken because the flag `--format short` has been replaced with `--short` with v2. This commit fixes this by using the new flag. Context: golangci-lint version detection is used to decide whether `make lint` should use the locally installed version or whether golangci-lint should be executed with the expected version in a docker container. Reported-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit 6d119aa ] When BPFDistributedLRU is set, the kernel internally adjusts the map's maximum entry size as follows: htab->map.max_entries = roundup(attr->max_entries, num_possible_cpus()); if (htab->map.max_entries < attr->max_entries) htab->map.max_entries = rounddown(attr->max_entries, num_possible_cpus()); The problem with this kernel logic is that the next time Cilium tries to regenerate endpoints, it will find that the map properties mismatch from Cilium's request, since the original max_entries from Cilium has been rounded up internally. So we keep continuously rebuilding all CT and NAT maps, leading to connectivity disruption. The only way to solve it is to adjust the Cilium's max_entries request also as a rounded up value, only then we keep matching the map properties. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit edc02e8 ] The distributed lru setting makes sense with dynamic map sizing given the kernel internally adjusts the map's max_entries as a rounded up value to a multiple of num_possible_cpus(). With statically configured map sizing it can happen that the exact sizing "breaks" in case of heterogenious nodes (w/ different # of CPUs) in the cluster if the max CT/NAT map sizes were fixed to a specific value. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit b2a8d51 ] Add a test case with distributed LRU and dynamic map sizing to make sure we get CPU-aligned map sizing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 31e7623 ] [ backporter's note: fixed conflicts in CODEOWNER ] Move the implementation into its own util package for reuse, so we do not have to open-code it each time. Suggested-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: gray <gray.liang@isovalent.com>
[ upstream commit 58b3573 ] Add a note that it only works on combination with dynamic map sizing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit f7e076e ] The ethertype needs to be stored in big-endian layout. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: gray <greyschwinger@gmail.com>
[ upstream commit 6c884f1 ] Replace inconsistent variable names like EIP_ALLOCATION_ID_2 with more descriptive and consistent names like Cluster_1_EIP_2 in the EKS-to-EKS Clustermesh Preparation documentation to improve clarity and maintain consistent naming convention. Signed-off-by: zzuckerfrei <seonmin1219@gmail.com> Signed-off-by: gray <greyschwinger@gmail.com>
041a927
to
d61226b
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 💯
hubble-metrics
flag documentation #38960 (@HadrienPatte)Once this PR is merged, a GitHub action will update the labels of these PRs: