-
Notifications
You must be signed in to change notification settings - Fork 3.2k
v1.11 backports 2022-05-02 #19671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2022-05-02 #19671
Conversation
[ upstream commit 4c2c244 ] Use strings.Builder instead of fmt.Sprintf() and preallocate the size of the string so that Go doesn't need to over-allocate if the string ends up longer than what the buffer growth algorithm predicts. Results: ``` $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > old.txt $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > new.txt $ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkFQDNSelectorString-8 690 180 -73.97% benchmark old allocs new allocs delta BenchmarkFQDNSelectorString-8 9 4 -55.56% benchmark old bytes new bytes delta BenchmarkFQDNSelectorString-8 288 208 -27.78% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 1db91ca ] The documentation for setting up Nodeport XDP acceleration on AWS mentions that the MTU for the ena interface must be lower down so that XDP can work. It is indeed necessary; but the value which is provided as the maximal possible MTU is outdated, and not working. After installing the latest kernel through the RPM package kernel-ng (as prescribed in the documentation), the EKS nodes currently end up with Linux 5.10: $ uname -r 5.10.106-102.504.amzn2.x86_64 If we keep on following the docs and lower the MTU to 3818, the Cilium pods fail to get ready, and tell in their logs that the XDP program cannot be set due to the MTU. This is also confirmed from the dmesg of the nodes: [ 3617.059219] ena 0000:00:05.0 eth0: Failed to set xdp program, the current MTU (3818) is larger than the maximum allowed MTU (3498) while xdp is on The value 3818 comes from the legacy definition of ENA_XDP_MAX_MTU, in drivers/net/ethernet/amazon/ena/ena_netdev.h, which used to be defined as such: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM) Where ETH_LEN is 14, ETH_FCS_LEN and VLAN_HLEN are both 4, and XDP_PACKET_HEADROOM is 256. But after Linux commit 08fc1cfd2d25 ("ena: Add XDP frame size to amazon NIC driver"), from Linux 5.8, the definition changed to: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) As a result, the maximum value for the MTU for kernels 5.8+ is 3498 bytes. This is indeed the maximum value that I could use when setting up XDP on an EKS cluster. Let's update the documentation accordingly. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 351c5d8 ] SortedList() and FormatForKVStore() can be very hot code in environments where there's constant policy churn, especially CIDR policies where there can be a large number of CIDR labels. This commit adds benchmarks for later commits to use as a baseline. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 0790e07 ] FormatForKVStore() previously returned a string for no reason as every caller converted the return value to a byte slice. This allows us to eliminate string concatenation entirely and use the bytes.Buffer directly. Building on the above, given that SortedList() returns a byte slice and calls FormatForKVStore() for its output, we can optimize it with the same technique to eliminate string concatenation. Here are the benchmark comparisons: ``` $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > old.txt $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta BenchmarkLabels_SortedList-8 2612 1120 -57.12% BenchmarkLabel_FormatForKVStore-8 262 54.5 -79.18% benchmark old allocs new allocs delta BenchmarkLabels_SortedList-8 35 13 -62.86% BenchmarkLabel_FormatForKVStore-8 4 1 -75.00% benchmark old bytes new bytes delta BenchmarkLabels_SortedList-8 1112 664 -40.29% BenchmarkLabel_FormatForKVStore-8 96 48 -50.00% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit ad4a900 ] The XDP-based NodePort acceleration currently can't be used in combination with EgressGW. The underlying reason is that bpf_xdp has no support for tunnel encapsulation, so it can't redirect the EgressGW's return traffic. The EgressGW processing is done in nodeport_lb4() and rev_nodeport_lb4() from lib/nodeport.h (tech debt). Improve this situation by skipping the tunnel-forward in rev_nodeport_lb4() when called from XDP context. Thus return traffic for the EgressGW gets reverse-SNATed inside XDP, and is then passed up to bpf_host with XFER_PKT_NO_SVC. It bypasses the NodePort processing, and enters the kernel stack where it gets forwarded to the tunnel netdev due to an IP route in the host netns stack (i.e., "pod cidr on node X via cilium_vxlan"). This lets users run with the NodePort acceleration and EgressGW enabled at the same time. Note that the approach of letting the kernel forward the traffic into the tunnel can fail (if the iptables' default policy for the FORWARD chain is DROP). Long-term we want to cleanly handle this with tunneling support inside bpf_xdp [1]. [1]: cilium#17770 The commit message was suggested by Julian Wiedmann. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 769ae73 ] Accordingly to some examples found on google the path pointed by html_extra_path should contain the direct path for the robots.txt file. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 6f9fedd ] As the operator uses a different constructor for the GC of the allocator, this constructor was not being initialized with the min and max values of the identities that it should be GC. This commit adds those options making it possible for the Operator to GC those identities. Fixes: 0c0f965 ("operator: only GC identity keys of its own cluster") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>
fa8ba6b
to
86b3deb
Compare
86b3deb
to
df87b12
Compare
[ upstream commit 6154322 ] commit cilium#16861 introduced a normalization of error handling into the daemon/cmd package. by doing so we swallowed useful error logs. this commit adds the error logs back and adds a few additional fmt.Errorf wrappers where logging is not adequate Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 5fa7ae2 ] Advanced users can configure the LRU size for the cache holding the compiled regex expressions of FQDN match{Pattern,Name}. This is useful if users are experiencing high memory usage spikes with many FQDN policies that have repeated matchPattern or matchName across many different policies. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
df87b12
to
dce32b6
Compare
/test-backport-1.11 Job 'Cilium-PR-Runtime-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
gke, external-workloads and multicluster failed with the same error -
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits are looking good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ci-gke-1.11 |
/ci-external-workloads-v1.11 |
/ci-multicluster-1.11 |
/test-1.23-4.9 |
/test-1.21-5.4 |
/test-1.20-4.9 |
/ci-external-workloads-v1.11 |
/test-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
[ upstream commit 38c0036 ] When comparing efficieny of increasing the LRU size from 128 to 1024 with ~22k CNPs, we see the following results: ``` \# LRU size 128. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt \# LRU size 1024. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta Benchmark_perEPAllow_setPortRulesForID_large-8 3954101340 3010934555 -23.85% benchmark old allocs new allocs delta Benchmark_perEPAllow_setPortRulesForID_large-8 26480632 24167742 -8.73% benchmark old bytes new bytes delta Benchmark_perEPAllow_setPortRulesForID_large-8 2899811832 1824062992 -37.10% ``` Here's the raw test run with LRU size at 128: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 45 MiB Sys = 1260 MiB NumGC = 15 After (N=1) Alloc = 445 MiB HeapInuse = 459 MiB Sys = 1260 MiB NumGC = 40 ``` Here's the raw test run with LRU size at 1024: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 48 MiB Sys = 1177 MiB NumGC = 17 After (N=1) Alloc = 78 MiB HeapInuse = 93 MiB Sys = 1177 MiB NumGC = 53 ``` We can see that it's saving ~300MB. Furthermore, if we compare the memprofiles from the benchmark run via ``` go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out ``` we see an ~800MB reduction in the regex compilation. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit ce9583d ] Following the previous commit's benchmark result, let's update the LRU default size to be 1024, given that it only results in a few 10's of MBs increase when the cache nears full. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
dce32b6
to
f56ddc6
Compare
/test-runtime |
All tests have passed, except the runtime. Missed an import (change in the latest push) /cc @christarazi -
Runtime test has passed with the small change. |
Resolved conflicts (please check)
Resolved conflicts (please check)
Once this PR is merged, you can update the PR labels via: