10000 v1.11 backports 2022-05-02 by aditighag · Pull Request #19671 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

v1.11 backports 2022-05-02 #19671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 3, 2022

Conversation

aditighag
Copy link
Member
@aditighag aditighag commented May 2, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 19570 19593 19610 19423 19587 19638 19383 19649 19188; do contrib/backporting/set-labels.py $pr done 1.11; done

christarazi and others added 8 commits May 2, 2022 21:10
[ upstream commit 4c2c244 ]

Use strings.Builder instead of fmt.Sprintf() and preallocate the size of
the string so that Go doesn't need to over-allocate if the string ends
up longer than what the buffer growth algorithm predicts.

Results:

```
$ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > old.txt
$ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > new.txt
$ benchcmp old.txt new.txt
benchmark                         old ns/op     new ns/op     delta
BenchmarkFQDNSelectorString-8     690           180           -73.97%

benchmark                         old allocs     new allocs     delta
BenchmarkFQDNSelectorString-8     9              4              -55.56%

benchmark                         old bytes     new bytes     delta
BenchmarkFQDNSelectorString-8     288           208           -27.78%
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 1db91ca ]

The documentation for setting up Nodeport XDP acceleration on AWS
mentions that the MTU for the ena interface must be lower down so that
XDP can work. It is indeed necessary; but the value which is provided as
the maximal possible MTU is outdated, and not working.

After installing the latest kernel through the RPM package kernel-ng (as
prescribed in the documentation), the EKS nodes currently end up with
Linux 5.10:

    $ uname -r
    5.10.106-102.504.amzn2.x86_64

If we keep on following the docs and lower the MTU to 3818, the Cilium
pods fail to get ready, and tell in their logs that the XDP program
cannot be set due to the MTU. This is also confirmed from the dmesg of
the nodes:

    [ 3617.059219] ena 0000:00:05.0 eth0: Failed to set xdp program,
        the current MTU (3818) is larger than the maximum allowed MTU (3498) while xdp is on

The value 3818 comes from the legacy definition of ENA_XDP_MAX_MTU, in
drivers/net/ethernet/amazon/ena/ena_netdev.h, which used to be defined
as such:

    #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \
                             VLAN_HLEN - XDP_PACKET_HEADROOM)

Where ETH_LEN is 14, ETH_FCS_LEN and VLAN_HLEN are both 4, and
XDP_PACKET_HEADROOM is 256.

But after Linux commit 08fc1cfd2d25 ("ena: Add XDP frame size to amazon
NIC driver"), from Linux 5.8, the definition changed to:

    #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN -	\
                             VLAN_HLEN - XDP_PACKET_HEADROOM -		\
                             SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

As a result, the maximum value for the MTU for kernels 5.8+ is 3498
bytes. This is indeed the maximum value that I could use when setting up
XDP on an EKS cluster. Let's update the documentation accordingly.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit af8151d ]

The right format for this field should contain the protocol and a
trailing "/" to work properly.

Fixes: b3b0502 ("docs: fix version warning URL to point to docs.cilium.io")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 351c5d8 ]

SortedList() and FormatForKVStore() can be very hot code in environments
where there's constant policy churn, especially CIDR policies where
there can be a large number of CIDR labels.

This commit adds benchmarks for later commits to use as a baseline.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 0790e07 ]

FormatForKVStore() previously returned a string for no reason as every
caller converted the return value to a byte slice. This allows us to
eliminate string concatenation entirely and use the bytes.Buffer
directly.

Building on the above, given that SortedList() returns a byte slice and
calls FormatForKVStore() for its output, we can optimize it with the
same technique to eliminate string concatenation.

Here are the benchmark comparisons:

```
$ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > old.txt
$ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > new.txt
$ benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                             old ns/op     new ns/op     delta
BenchmarkLabels_SortedList-8          2612          1120          -57.12%
BenchmarkLabel_FormatForKVStore-8     262           54.5          -79.18%

benchmark                             old allocs     new allocs     delta
BenchmarkLabels_SortedList-8          35             13             -62.86%
BenchmarkLabel_FormatForKVStore-8     4              1              -75.00%

benchmark                             old bytes     new bytes     delta
BenchmarkLabels_SortedList-8          1112          664           -40.29%
BenchmarkLabel_FormatForKVStore-8     96            48            -50.00%
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit ad4a900 ]

The XDP-based NodePort acceleration currently can't be used in
combination with EgressGW. The underlying reason is that bpf_xdp has no
support for tunnel encapsulation, so it can't redirect the EgressGW's
return traffic. The EgressGW processing is done in nodeport_lb4() and
rev_nodeport_lb4() from lib/nodeport.h (tech debt).

Improve this situation by skipping the tunnel-forward in
rev_nodeport_lb4() when called from XDP context. Thus return traffic for
the EgressGW gets reverse-SNATed inside XDP, and is then passed up to
bpf_host with XFER_PKT_NO_SVC. It bypasses the NodePort processing, and
enters the kernel stack where it gets forwarded to the tunnel netdev due
to an IP route in the host netns stack (i.e., "pod cidr on node X via
cilium_vxlan").

This lets users run with the NodePort acceleration and EgressGW enabled
at the same time. Note that the approach of letting the kernel forward
the traffic into the tunnel can fail (if the iptables' default policy
for the FORWARD chain is DROP).

Long-term we want to cleanly handle this with tunneling support inside
bpf_xdp [1].

[1]: cilium#17770

The commit message was suggested by Julian Wiedmann.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 769ae73 ]

Accordingly to some examples found on google the path pointed by
html_extra_path should contain the direct path for the robots.txt file.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 6f9fedd ]

As the operator uses a different constructor for the GC of the
allocator, this constructor was not being initialized with the min and
max values of the identities that it should be GC. This commit adds
those options making it possible for the Operator to GC those
identities.

Fixes: 0c0f965 ("operator: only GC identity keys of its own cluster")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag aditighag requested a review from a team as a code owner May 2, 2022 21:22
@aditighag aditighag requested a review from a team May 2, 2022 21:22
@aditighag aditighag requested a review from a team as a code owner May 2, 2022 21:22
@aditighag aditighag added backport/1.11 kind/backports This PR provides functionality previously merged into master. labels May 2, 2022
@aditighag aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from fa8ba6b to 86b3deb Compare May 2, 2022 21:46
@aditighag aditighag marked this pull request as draft May 2, 2022 22:07
@aditighag aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from 86b3deb to df87b12 Compare May 2, 2022 22:12
ldelossa and others added 2 commits May 2, 2022 15:22
[ upstream commit 6154322 ]

commit cilium#16861 introduced a normalization of error handling into the
daemon/cmd package.

by doing so we swallowed useful error logs.

this commit adds the error logs back and adds a few additional
fmt.Errorf wrappers where logging is not adequate

Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 5fa7ae2 ]

Advanced users can configure the LRU size for the cache holding the
compiled regex expressions of FQDN match{Pattern,Name}. This is useful
if users are experiencing high memory usage spikes with many FQDN
policies that have repeated matchPattern or matchName across many
different policies.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from df87b12 to dce32b6 Compare May 2, 2022 22:23
@aditighag aditighag marked this pull request as ready for review May 2, 2022 22:30
@aditighag
Copy link
Member Author
aditighag commented May 2, 2022

/test-backport-1.11

Job 'Cilium-PR-Runtime-net-next' failed:

Click to show.

Test Name

RuntimePrivilegedUnitTests Run Tests

Failure Output

FAIL: Failed to run privileged unit tests

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-Runtime-net-next so I can create one.

@aditighag
Copy link
Member Author

gke, external-workloads and multicluster failed with the same error -

WARNING: Could not open the configuration file: [/home/runner/.config/gcloud/configurations/config_default].
ERROR: (gcloud.container.clusters.delete) You do not currently have an active account selected.
Please run:

  $ gcloud auth login

to obtain new credentials.

If you have already logged in with a different account:

    $ gcloud config set account ACCOUNT

to select an already authenticated account to use.

Copy link
Member
@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits are looking good, thanks!

Copy link
Member
@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes LGTM, thanks.

Copy link
Member
@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19593 -- docs: Update max MTU value for Nodeport XDP on AWS (@qmonnet)

is good, thank you!

@aditighag
Copy link
Member Author

/ci-gke-1.11

@aditighag
Copy link
Member Author

/ci-external-workloads-v1.11

@aditighag
Copy link
Member Author

/ci-multicluster-1.11

@aditighag
Copy link
Member Author

/test-1.23-4.9

@aditighag
Copy link
Member Author

/test-1.21-5.4

@aditighag
Copy link
Member Author

/test-1.20-4.9

@aditighag
Copy link
Member Author

/ci-external-workloads-v1.11

@aditighag
Copy link
Member Author

/test-runtime

Copy link
Member
@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

[ upstream commit 38c0036 ]

When comparing efficieny of increasing the LRU size from 128 to 1024
with ~22k CNPs, we see the following results:

```
\# LRU size 128.
$ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt

\# LRU size 1024.
$ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt

$ benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                                          old ns/op      new ns/op      delta
Benchmark_perEPAllow_setPortRulesForID_large-8     3954101340     3010934555     -23.85%

benchmark                                          old allocs     new allocs     delta
Benchmark_perEPAllow_setPortRulesForID_large-8     26480632       24167742       -8.73%

benchmark                                          old bytes      new bytes      delta
Benchmark_perEPAllow_setPortRulesForID_large-8     2899811832     1824062992     -37.10%
```

Here's the raw test run with LRU size at 128:

```
Before (N=1)
Alloc = 31 MiB  HeapInuse = 45 MiB      Sys = 1260 MiB  NumGC = 15
After (N=1)
Alloc = 445 MiB HeapInuse = 459 MiB     Sys = 1260 MiB  NumGC = 40
```

Here's the raw test run with LRU size at 1024:

```
Before (N=1)
Alloc = 31 MiB  HeapInuse = 48 MiB      Sys = 1177 MiB  NumGC = 17
After (N=1)
Alloc = 78 MiB  HeapInuse = 93 MiB      Sys = 1177 MiB  NumGC = 53
```

We can see that it's saving ~300MB.

Furthermore, if we compare the memprofiles from the benchmark run via

```
go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out
```

we see an ~800MB reduction in the regex compilation.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit ce9583d ]

Following the previous commit's benchmark result, let's update the LRU
default size to be 1024, given that it only results in a few 10's of MBs
increase when the cache nears full.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
@aditighag aditighag force-pushed the pr/v1.11-backport-2022-05-02 branch from dce32b6 to f56ddc6 Compare May 3, 2022 21:37
@aditighag
Copy link
Member Author

/test-runtime

@aditighag
Copy link
Member Author

All tests have passed, except the runtime. Missed an import (change in the latest push) /cc @christarazi -

$ git diff 
diff --git a/pkg/fqdn/dnsproxy/proxy_test.go b/pkg/fqdn/dnsproxy/proxy_test.go
index 1aecc02d2b..866f1d3fe0 100644
--- a/pkg/fqdn/dnsproxy/proxy_test.go
+++ b/pkg/fqdn/dnsproxy/proxy_test.go
@@ -42,6 +42,7 @@ import (
        "github.com/golang/groupcache/lru"
        "github.com/miekg/dns"
        . "gopkg.in/check.v1"
+       "sigs.k8s.io/yaml"
 )
 
 // Hook up gocheck into the "go test" runner.

https://jenkins.cilium.io/job/Cilium-PR-Runtime-net-next/2109/testReport/junit/(root)/Suite-runtime/RuntimePrivilegedUnitTests_Run_Tests/

Screen Shot 2022-05-03 at 2 20 40 PM

Runtime test has passed with the small change.

@aditighag aditighag merged commit 9a42d48 into cilium:v1.11 May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0