8000 bugtool: Collect XFRM error counters twice by pchaigno · Pull Request #28790 · cilium/cilium · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

bugtool: Collect XFRM error counters twice #28790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 26, 2023

Conversation

pchaigno
Copy link
Member

This pull request changes the bugtool report to collect the XFRM error counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will collect at the beginning and end of the bugtool collection. In that way, there will be around 5-6 seconds between the two collections and we may see if any counter is currently increasing.

$ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
5c5
< XfrmInStateProtoError   	4
---
> XfrmInStateProtoError   	6

In this example, we can easily see that the XfrmInStateProtoError is increasing. That suggests a key rotation issue is currently ongoing (cf. IPsec troubleshooting docs).

I tried other approaches to collect over a longer timespan. That may allow us to see slower increases. They all end up being more complex or messier (we'd need to collect at beginning and end of the sysdump instead). In the end, considering this is already a fallback plan for when customers don't collect Prometheus metrics, I think the current, simpler approach is good enough.

Fixes: #16538.

@pchaigno pchaigno added area/bugtool Impacts gathering of data for debugging purposes. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.12 labels Oct 25, 2023
@pchaigno pchaigno force-pushed the collect-xfrm-stats-twice branch from 96e76f3 to dfac417 Compare October 25, 2023 14:40
@pchaigno pchaigno marked this pull request as ready for review October 25, 2023 14:45
@pchaigno pchaigno requested a review from a team as a code owner October 25, 2023 14:45
@pchaigno pchaigno requested a review from tklauser October 25, 2023 14:45
Copy link
Member
@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question inline to help my own understanding.

@pchaigno pchaigno force-pushed the collect-xfrm-stats-twice branch from dfac417 to e02b0b1 Compare October 25, 2023 17:33
This commit changes the bugtool report to collect the XFRM error
counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will
collect at the beginning and end of the bugtool collection. In that way,
there will be around 5-6 seconds between the two collections and we may
see if any counter is currently increasing.

    $ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
    5c5
    < XfrmInStateProtoError   	4
    ---
    > XfrmInStateProtoError   	6

In this example, we can easily see that the XfrmInStateProtoError is
increasing. That suggests a key rotation issue is currently ongoing (cf.
IPsec troubleshooting docs).

I tried other approaches to collect over a longer timespan. That may
allow us to see slower increases. They all end up being more complex or
messier (we'd need to collect at beginning and end of the sysdump
instead). In the end, considering this is already a fallback plan for
when customers don't collect Prometheus metrics, I think the current,
simpler approach is good enough.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
@pchaigno pchaigno force-pushed the collect-xfrm-stats-twice branch from e02b0b1 to 2c1a62f Compare October 25, 2023 17:35
@pchaigno
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 26, 2023
@dylandreimerink dylandreimerink merged commit c1803ba into cilium:main Oct 26, 2023
@pchaigno pchaigno deleted the collect-xfrm-stats-twice branch October 26, 2023 10:08
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
9 tasks
@pippolo84 pippolo84 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 labels Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
6 tasks
@pippolo84 pippolo84 mentioned this pull request Oct 31, 2023
4 tasks
@pippolo84 pippolo84 added backport-pending/1.12 backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.12 labels Oct 31, 2023
@jibi jibi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bugtool Impacts gathering of data for debugging purposes. area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
Status: Released
Status: Released
Development

Successfully merging this pull request may close these issues.

Collect /proc/net/xfrm_stat twice in bugtool
5 participants
0