v1.11 backports 2022-05-02 #19671

aditighag · 2022-05-02T21:22:37Z

pkg/policy/api: Optimize FQDNSelector String() #19570 -- pkg/policy/api: Optimize FQDNSelector String() (@christarazi)
docs: Update max MTU value for Nodeport XDP on AWS #19593 -- docs: Update max MTU value for Nodeport XDP on AWS (@qmonnet)
docs: set the right url for API version check #19610 -- docs: set the right url for API version check (@aanm)
pkg/labels: Optimize SortedList() and FormatForKVStore() #19423 -- pkg/labels: Optimize SortedList() and FormatForKVStore() (@christarazi)
datapath: Allow egress GW with XDP #19587 -- datapath: Allow egress GW with XDP (@brb)
docs: set right path for robots.txt #19638 -- docs: set right path for robots.txt (@aanm)
daemon, fqdn: Add flag to control FQDN regex LRU size #19383 -- daemon, fqdn: Add flag to control FQDN regex LRU size (@christarazi) --
Resolved conflicts (please check)
operator: fix identity GC collection #19649 -- operator: fix identity GC collection (@aanm)
logging: do not swallow subsystem logs #19188 -- logging: do not swallow subsystem logs (@ldelossa) --
Resolved conflicts (please check)

Once this PR is merged, you can update the PR labels via:

$ for pr in 19570 19593 19610 19423 19587 19638 19383 19649 19188; do contrib/backporting/set-labels.py $pr done 1.11; done

[ upstream commit 4c2c244 ] Use strings.Builder instead of fmt.Sprintf() and preallocate the size of the string so that Go doesn't need to over-allocate if the string ends up longer than what the buffer growth algorithm predicts. Results: ``` $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > old.txt $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > new.txt $ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkFQDNSelectorString-8 690 180 -73.97% benchmark old allocs new allocs delta BenchmarkFQDNSelectorString-8 9 4 -55.56% benchmark old bytes new bytes delta BenchmarkFQDNSelectorString-8 288 208 -27.78% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 1db91ca ] The documentation for setting up Nodeport XDP acceleration on AWS mentions that the MTU for the ena interface must be lower down so that XDP can work. It is indeed necessary; but the value which is provided as the maximal possible MTU is outdated, and not working. After installing the latest kernel through the RPM package kernel-ng (as prescribed in the documentation), the EKS nodes currently end up with Linux 5.10: $ uname -r 5.10.106-102.504.amzn2.x86_64 If we keep on following the docs and lower the MTU to 3818, the Cilium pods fail to get ready, and tell in their logs that the XDP program cannot be set due to the MTU. This is also confirmed from the dmesg of the nodes: [ 3617.059219] ena 0000:00:05.0 eth0: Failed to set xdp program, the current MTU (3818) is larger than the maximum allowed MTU (3498) while xdp is on The value 3818 comes from the legacy definition of ENA_XDP_MAX_MTU, in drivers/net/ethernet/amazon/ena/ena_netdev.h, which used to be defined as such: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM) Where ETH_LEN is 14, ETH_FCS_LEN and VLAN_HLEN are both 4, and XDP_PACKET_HEADROOM is 256. But after Linux commit 08fc1cfd2d25 ("ena: Add XDP frame size to amazon NIC driver"), from Linux 5.8, the definition changed to: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) As a result, the maximum value for the MTU for kernels 5.8+ is 3498 bytes. This is indeed the maximum value that I could use when setting up XDP on an EKS cluster. Let's update the documentation accordingly. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit af8151d ] The right format for this field should contain the protocol and a trailing "/" to work properly. Fixes: b3b0502 ("docs: fix version warning URL to point to docs.cilium.io") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 351c5d8 ] SortedList() and FormatForKVStore() can be very hot code in environments where there's constant policy churn, especially CIDR policies where there can be a large number of CIDR labels. This commit adds benchmarks for later commits to use as a baseline. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 0790e07 ] FormatForKVStore() previously returned a string for no reason as every caller converted the return value to a byte slice. This allows us to eliminate string concatenation entirely and use the bytes.Buffer directly. Building on the above, given that SortedList() returns a byte slice and calls FormatForKVStore() for its output, we can optimize it with the same technique to eliminate string concatenation. Here are the benchmark comparisons: ``` $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > old.txt $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta BenchmarkLabels_SortedList-8 2612 1120 -57.12% BenchmarkLabel_FormatForKVStore-8 262 54.5 -79.18% benchmark old allocs new allocs delta BenchmarkLabels_SortedList-8 35 13 -62.86% BenchmarkLabel_FormatForKVStore-8 4 1 -75.00% benchmark old bytes new bytes delta BenchmarkLabels_SortedList-8 1112 664 -40.29% BenchmarkLabel_FormatForKVStore-8 96 48 -50.00% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit ad4a900 ] The XDP-based NodePort acceleration currently can't be used in combination with EgressGW. The underlying reason is that bpf_xdp has no support for tunnel encapsulation, so it can't redirect the EgressGW's return traffic. The EgressGW processing is done in nodeport_lb4() and rev_nodeport_lb4() from lib/nodeport.h (tech debt). Improve this situation by skipping the tunnel-forward in rev_nodeport_lb4() when called from XDP context. Thus return traffic for the EgressGW gets reverse-SNATed inside XDP, and is then passed up to bpf_host with XFER_PKT_NO_SVC. It bypasses the NodePort processing, and enters the kernel stack where it gets forwarded to the tunnel netdev due to an IP route in the host netns stack (i.e., "pod cidr on node X via cilium_vxlan"). This lets users run with the NodePort acceleration and EgressGW enabled at the same time. Note that the approach of letting the kernel forward the traffic into the tunnel can fail (if the iptables' default policy for the FORWARD chain is DROP). Long-term we want to cleanly handle this with tunneling support inside bpf_xdp [1]. [1]: cilium#17770 The commit message was suggested by Julian Wiedmann. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 769ae73 ] Accordingly to some examples found on google the path pointed by html_extra_path should contain the direct path for the robots.txt file. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 6f9fedd ] As the operator uses a diffe 8000 rent constructor for the GC of the allocator, this constructor was not being initialized with the min and max values of the identities that it should be GC. This commit adds those options making it possible for the Operator to GC those identities. Fixes: 0c0f965 ("operator: only GC identity keys of its own cluster") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 6154322 ] commit cilium#16861 introduced a normalization of error handling into the daemon/cmd package. by doing so we swallowed useful error logs. this commit adds the error logs back and adds a few additional fmt.Errorf wrappers where logging is not adequate Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit 5fa7ae2 ] Advanced users can configure the LRU size for the cache holding the compiled regex expressions of FQDN match{Pattern,Name}. This is useful if users are experiencing high memory usage spikes with many FQDN policies that have repeated matchPattern or matchName across many different policies. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

aditighag · 2022-05-02T22:30:29Z

/test-backport-1.11

Job 'Cilium-PR-Runtime-net-next' failed:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-Runtime-net-next so I can create one.

aditighag · 2022-05-02T22:50:39Z

gke, external-workloads and multicluster failed with the same error -

WARNING: Could not open the configuration file: [/home/runner/.config/gcloud/configurations/config_default].
ERROR: (gcloud.container.clusters.delete) You do not currently have an active account selected.
Please run:

  $ gcloud auth login

to obtain new credentials.

If you have already logged in with a different account:

    $ gcloud config set account ACCOUNT

to select an already authenticated account to use.

aanm

My commits are looking good, thanks!

brb

My changes LGTM, thanks.

qmonnet

#19593 -- docs: Update max MTU value for Nodeport XDP on AWS (@qmonnet)

is good, thank you!

aditighag · 2022-05-03T15:53:43Z

/ci-gke-1.11

aditighag · 2022-05-03T15:53:48Z

/ci-external-workloads-v1.11

aditighag · 2022-05-03T15:53:56Z

/ci-multicluster-1.11

aditighag · 2022-05-03T16:32:53Z

/test-1.23-4.9

aditighag · 2022-05-03T16:33:03Z

/test-1.21-5.4

aditighag · 2022-05-03T16:33:12Z

/test-1.20-4.9

aditighag · 2022-05-03T16:33:25Z

/ci-external-workloads-v1.11

aditighag · 2022-05-03T16:33:51Z

aditighag commented

May 3, 2022

/test-runtime

christarazi

👍

[ upstream commit 38c0036 ] When comparing efficieny of increasing the LRU size from 128 to 1024 with ~22k CNPs, we see the following results: ``` \# LRU size 128. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt \# LRU size 1024. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta Benchmark_perEPAllow_setPortRulesForID_large-8 3954101340 3010934555 -23.85% benchmark old allocs new allocs delta Benchmark_perEPAllow_setPortRulesForID_large-8 26480632 24167742 -8.73% benchmark old bytes new bytes delta Benchmark_perEPAllow_setPortRulesForID_large-8 2899811832 1824062992 -37.10% ``` Here's the raw test run with LRU size at 128: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 45 MiB Sys = 1260 MiB NumGC = 15 After (N=1) Alloc = 445 MiB HeapInuse = 459 MiB Sys = 1260 MiB NumGC = 40 ``` Here's the raw test run with LRU size at 1024: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 48 MiB Sys = 1177 MiB NumGC = 17 After (N=1) Alloc = 78 MiB HeapInuse = 93 MiB Sys = 1177 MiB NumGC = 53 ``` We can see that it's saving ~300MB. Furthermore, if we compare the memprofiles from the benchmark run via ``` go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out ``` we see an ~800MB reduction in the regex compilation. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

[ upstream commit ce9583d ] Following the previous commit's benchmark result, let's update the LRU default size to be 1024, given that it only results in a few 10's of MBs increase when the cache nears full. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>

aditighag · 2022-05-03T21:38:10Z

/test-runtime

aditighag · 2022-05-03T22:37:59Z

All tests have passed, except the runtime. Missed an import (change in the latest push) /cc @christarazi -

$ git diff 
diff --git a/pkg/fqdn/dnsproxy/proxy_test.go b/pkg/fqdn/dnsproxy/proxy_test.go
index 1aecc02d2b..866f1d3fe0 100644
--- a/pkg/fqdn/dnsproxy/proxy_test.go
+++ b/pkg/fqdn/dnsproxy/proxy_test.go
@@ -42,6 +42,7 @@ import (
        "github.com/golang/groupcache/lru"
        "github.com/miekg/dns"
        . "gopkg.in/check.v1"
+       "sigs.k8s.io/yaml"
 )
 
 // Hook up gocheck into the "go test" runner.

https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/2109/testReport/junit/(root)/Suite-runtime/RuntimePrivilegedUnitTests_Run_Tests/

Runtime test has passed with the small change.

christarazi and others added 8 commits May 2, 2022 21:10

aditighag requested a review from a team as a code owner May 2, 2022 21:22

aditighag requested a review from a team May 2, 2022 21:22

aditighag requested a review from a team as a code owner May 2, 2022 21:22

aditighag requested review from nathanjsweet and jrajahalme May 2, 2022 21:22

aditighag added backport/1.11 kind/backports This PR provides functionality previously merged into master. labels May 2, 2022

aditighag requested review from aanm, brb, christarazi, ldelossa and qmonnet May 2, 2022 21:22

aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from fa8ba6b to 86b3deb Compare May 2, 2022 21:46

aditighag marked this pull request as draft May 2, 2022 22:07

aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from 86b3deb to df87b12 Compare May 2, 2022 22:12

ldelossa and others added 2 commits May 2, 2022 15:22

aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from df87b12 to dce32b6 Compare May 2, 2022 22:23

aditighag marked this pull request as ready for review May 2, 2022 22:30

aanm approved these changes May 2, 2022

View reviewed changes

brb approved these changes May 3, 2022

View reviewed changes

qmonnet approved these changes May 3, 2022

View reviewed changes

christarazi approved these changes May 3, 2022

View reviewed changes

christarazi added 2 commits May 3, 2022 14:37

aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from dce32b6 to f56ddc6 Compare May 3, 2022 21:37

aditighag merged commit 9a42d48 into cilium:v1.11 May 3, 2022

aanm mentioned this pull request May 10, 2022

Prepare for release v1.11.5 #19756

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.11 backports 2022-05-02 #19671

v1.11 backports 2022-05-02 #19671

Uh oh!

Uh oh!

Test Name

Failure Output

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

v1.11 backports 2022-05-02 #19671

v1.11 backports 2022-05-02 #19671

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Test Name

Failure Output

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!