-
Notifications
You must be signed in to change notification settings - Fork 3.2k
fix pod-to-pod MTU drop when both in+egress proxy and IPSec enabled #35173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
All reactions
833e7a2
to
3f1fffa
Compare
/test |
3f1fffa
to
a7d6165
Compare
/test |
a7d6165
to
db8c141
Compare
db8c141
to
3b98e89
Compare
/ci-ipsec-e2e |
1 similar comment
/ci-ipsec-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re https://github.com/cilium/cilium/actions/runs/11240829491/job/31251421453:
I found the leak packets are all TCP reset:
Error: bpftrace output is not empty
[18:14:44:129639] fd00:10:244:1::f085:34820 -> fd00:10:244:3::82cc:8080 (proto: 6, TCP flags: ...R, encap: 1, ifindex: 10, netns: f0000098, override: 0)
(Love this tcp flag information, love Marco!)
So it's probably caused by ipcache race issue when pods are removed but packets are still lingering around, maybe there is a small window when ipcache entry for that lingered packet is gone. We saw the similar issue before but forgot how we worked around it...
Interesting, failed jobs are all vxlan tunnel.
Maybe that's how CI warned us? But geneve tunnel was green... Can you check the v6 route table, what's the mtu there, does that look good to you? |
All configs failing are using VxLan. Tests with Geneve are working.
fd00:10:244:1::f085:34820 -> fd00:10:244:3::82cc:8080 (proto: 6, TCP flags: ...R, encap: 1, ifindex: 10, netns: f0000098, override: 0) Table 200Default table to use for IPSec routing rules (0xe00, 0xd00). MTU set to 1450: 1500 - 50 (tunnel overhead) # IPv4
local 10.244.1.0/24 dev cilium_vxlan proto kernel scope host
10.244.0.0/24 dev cilium_host proto kernel mtu 1450
10.244.3.0/24 dev cilium_host proto kernel mtu 1450
# IPv6
local fd00:10:244:1::/64 dev cilium_vxlan proto kernel metric 1024 pref medium
fd00:10:244::/64 dev cilium_host proto kernel metric 1024 mtu 1450 pref medium
fd00:10:244:3::/64 dev cilium_host proto kernel metric 1024 mtu 1450 pref medium
Table 2004Default table to use routing rules to the proxy. # IPv4
local default dev lo proto kernel scope host
# IPv6
local default dev lo proto kernel metric 1024 pref medium DefaultMTU set to 1373: 1500 - 50 (tunnel) - 77 (IPSec) # IPv4
default via 172.18.0.1 dev eth0
10.244.0.0/24 via 10.244.1.124 dev cilium_host proto kernel src 10.244.1.124 mtu 1373
10.244.1.0/24 via 10.244.1.124 dev cilium_host proto kernel src 10.244.1.124
10.244.1.124 dev cilium_host proto kernel scope link
10.244.3.0/24 via 10.244.1.124 dev cilium_host proto kernel src 10.244.1.124 mtu 1373
172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.4
192.168.0.0/16 dev eth1 proto kernel scope link src 192.168.0.2
# IPv6
fc00:c111::/64 dev eth0 proto kernel metric 256 pref medium
fc00:c112::/64 dev eth1 proto kernel metric 256 pref medium
fd00:10:244::/64 dev cilium_host proto kernel src fd00:10:244:1::5c12 metric 1024 mtu 1373 pref medium
fd00:10:244:1::5c12 dev cilium_host proto kernel metric 256 pref medium
fd00:10:244:1::/64 dev cilium_host proto kernel src fd00:10:244:1::5c12 metric 1024 pref medium
fd00:10:244:3::/64 dev cilium_host proto kernel src fd00:10:244:1::5c12 metric 1024 mtu 1373 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium
fe80::/64 dev cilium_net proto kernel metric 256 pref medium
fe80::/64 dev cilium_vxlan proto kernel metric 256 pref medium
fe80::/64 dev lxc_health proto kernel metric 256 pref medium
fe80::/64 dev lxc61a1aee0bc58 proto kernel metric 256 pref medium
fe80::/64 dev lxc2a78580b2efb proto kernel metric 256 pref medium
fe80::/64 dev lxc408d98fc6df1 proto kernel metric 256 pref medium
default via fc00:c111::1 dev eth0 metric 1024 pref medium I'm not seeing particular differences wrt routing table with Geneve. |
@smagnani96 word-smithing the release note a bit, I'd suggest
or slightly rephrased
Striking the test change, as it's not user-relevant (and this note also gets copied into the release note for stable releases when backporting) |
This commit enables the pod-to-pod-with-l7-policy-encryption for IPSec in IPv6. In cilium#35173 we fixed the MTU issue and enabled this test for IPv4. Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
I think this is also worth backporting to v1.15. Adding the label, please shout if you disagree. |
MTU+IPsec+ingress L7+egress L7 doesn't really scream major bugfix to me 😅 |
The MTU aspect isn't something that users would actively (mis-)configure though, no? It's just what's causing the problem. Imho we shouldn't straight-out break connectivity like this, in particular when users have no easy work-around (disable IPsec or L7 policies, pick your security poison?). |
This commit enables the `pod-to-pod-with-l7-policy-encryption` cli connectivity test from v1.15, after the successful backports of cilium#35173 in: * v1.15: cilium#35586 * v1.16: cilium#35543 While enabling the test, in this commit we split the version check logic (that is independent from the IP family used) from the check for running IPv6+IPsec (that should be prevented due to a current limitation of having a flaky plain-text packet in the test suite, tracked in cilium#35485). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit enables the `pod-to-pod-with-l7-policy-encryption` cli connectivity test for v1.15 and v1.16, after the backports of cilium#35173 in: * v1.15: cilium#35586 * v1.16: cilium#35543 While enabling the test, in this commit we split the version check logic (that is independent from the IP family used) from the check for running IPv6+IPsec (that should be prevented due to a current limitation of having a flaky plain-text packet in the test suite, tracked in cilium#35485). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This commit enables the `pod-to-pod-with-l7-policy-encryption` cli connectivity test for v1.15 and v1.16, after the backports of #35173 in: * v1.15: #35586 * v1.16: #35543 While enabling the test, in this commit we split the version check logic (that is independent from the IP family used) from the check for running IPv6+IPsec (that should be prevented due to a current limitation of having a flaky plain-text packet in the test suite, tracked in #35485). Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
This PR fixes the pod-to-pod traffic being dropped because of using a higher MTU value. This is caused by a non-configuration of the routes from proxy, in the routing table n. 2005 respectively. While the pod-to-pod route is being adjusted according to the IPSec overhead and the adjusted size of the authentication key, the from-proxy route is not changed as well.
This also enables the
pod-to-pod-with-l7-policy-encryption
test for IPSec in IPv4. Such test is therefore skipped if/for:Fixes: #33168