-
Notifications
You must be signed in to change notification settings - Fork 714
Report a subset of connections from/to the same endpoint #3709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to try to understand all of the code - I left a couple of comments, mainly about clarifying how the subsets are being chosen, but the idea seems good.
Before I approve, I'd be cool if you added some unit tests for the touchy parts like findModulus
and performEbpfTrack
. Alternatively having a nice integration test and/or explanation how to reproduce and observe the improvement would also be great.
So far I've only looked at the code but I'm not really sure whether I'd be able to tell if something was functionally broken by running Scope, so having some test coverage there would make me feel more confident this is safe to merge.
probe/endpoint/connection_tracker.go
Outdated
* connections from local processes to an off-box address+port | ||
- we will know the pids of the local processes but not the remote | ||
* connections from local processes to a local process | ||
- these connections will each be reported twice by ebpf, as incoming and as outgoing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
extraFromNode, extraToNode = extraToNode, extraFromNode | ||
// Pick a subset of the connections to send, such that if two probes | ||
// on different machines go through the same process there is a good | ||
// chance of overlap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain a little bit why this is the case, i.e. how modulus plays a role here? Or perhaps even a more basic question, what does a good subset here means and why couldn't we just pass first N connections and hide the rest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a process is continually opening and closing connections, then the originating port number will change over time, e.g. it could be 30002, 30004, 30006, ... The connections will be open for a short time, then go into the "recently closed" set, then disappear.
When there are a big number, if we pick the first N then there's some chance someone else has observed 30062, 30064, 30066, ... and so the first N don't match. If we pick modulus 3 or 9 or 27..., then a subset of ports from all across the numeric range will be sent, and there is more chance of a match.
func findModulus(ports mapPortToPids) (modulus uint16, count int) { | ||
modulus = 1 | ||
count = len(ports) | ||
// Check they all come from/to the same pid (or zero): if differing we need another strategy to thin them down |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if differing we need another strategy
From what I can see, this PR doesn't offer an alternative strategy yet - correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
probe/endpoint/connection_tracker.go
Outdated
if sent == count { | ||
skipped += (len(portToPids) - i - 1) | ||
} | ||
t.addConnection(rpt, hostNodeID, tuple, uint(pids.fromPid), uint(pids.toPid), triple.networkNamespace, skipped+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my understanding is that all the sent connections will include the preceding skipped connections, except the last sent connection which will include both the preceding and the following skipped connections - maybe it's worth putting that in a comment.
Also, related to my previous comment, I don't really understand why we pick the connections subset the way we do and why they're grouped the way they are here :)
ret := ebpfKey{fourTuple: tuple} | ||
if net.IP(tuple.fromAddr[:]).IsLoopback() || net.IP(tuple.toAddr[:]).IsLoopback() { | ||
ret.networkNamespace = namespace | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if neither is loopback? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we don't "scope" the addresses by namespace.
This matches what the code did before - it's an assumption that only loopback addresses need to be scoped to a container.
We may scope the address by hostname further on in the reporting process.
71be7f1
to
353b6e8
Compare
The previous code tracked only by four-tuple, which meant that two connections with same address/port combinations in different namespace would clash and one would get dropped. Also previously the tuple was duplicated between the map key and value, so we remove it from the value. We only add the namespace in the case that the local address is loopback, which matches how the rest of Scope treats addresses.
The app will only show one line, regardless of how many connections we have, so reduce the number to save bandwidth and rendering time. We filter by choosing a modulus, e.g. send every connection that is a multiple of 3, or 9, and so on. We avoid multiples of 2 because port numbers are often a multiple of 2 or 4 for bit-encoding reasons.
a6039e0
to
4a95903
Compare
Utility functions to create fake sets of connections for testing, and then exercising the subset filtering code to check that quantities come out as expected.
4a95903
to
de939cc
Compare
I've added a unit test and an integration test. |
Fixes #3584
The app will only show one line, regardless of how many connections we have, so reduce the number we send to save bandwidth and rendering time. Include a note of how many connections were elided in the report, so the app can still show the same number in the detail window.
I've only done this for the ebpf collector for now.
We collect the details by "triple" of (source address, destination address, destination port), recording what we know about the source or destination pid, then where there are more than 5 connections going to the same endpoint, all for the same process, we thin them down using a modulus.
Inspired by the discussion at #3452 (review)
Note this is a (minor) breaking change in the protocol: older apps will show an incorrect connection count.