Report a subset of connections from/to the same endpoint #3709

bboreham · 2019-10-12T14:15:48Z

The app will only show one line, regardless of how many connections we have, so reduce the number we send to save bandwidth and rendering time. Include a note of how many connections were elided in the report, so the app can still show the same number in the detail window.

I've only done this for the ebpf collector for now.

We collect the details by "triple" of (source address, destination address, destination port), recording what we know about the source or destination pid, then where there are more than 5 connections going to the same endpoint, all for the same process, we thin them down using a modulus.

Inspired by the discussion at #3452 (review)

Note this is a (minor) breaking change in the protocol: older apps will show an incorrect connection count.

fbarl

Took me a while to try to understand all of the code - I left a couple of comments, mainly about clarifying how the subsets are being chosen, but the idea seems good.

Before I approve, I'd be cool if you added some unit tests for the touchy parts like findModulus and performEbpfTrack. Alternatively having a nice integration test and/or explanation how to reproduce and observe the improvement would also be great.

So far I've only looked at the code but I'm not really sure whether I'd be able to tell if something was functionally broken by running Scope, so having some test coverage there would make me feel more confident this is safe to merge.

render/detailed/connections.go

fbarl · 2019-10-24T10:06:02Z

probe/endpoint/connection_tracker.go

+		    * connections from local processes to an off-box address+port
+		      - we will know the pids of the local processes but not the remote
+		    * connections from local processes to a local process
+	          - these connections will each be reported twice by ebpf, as incoming and as outgoing.


Indentation

fbarl · 2019-10-24T10:38:05Z

probe/endpoint/connection_tracker.go

-		extraFromNode, extraToNode = extraToNode, extraFromNode
+// Pick a subset of the connections to send, such that if two probes
+// on different machines go through the same process there is a good
+// chance of overlap.


Could you please explain a little bit why this is the case, i.e. how modulus plays a role here? Or perhaps even a more basic question, what does a good subset here means and why couldn't we just pass first N connections and hide the rest?

If a process is continually opening and closing connections, then the originating port number will change over time, e.g. it could be 30002, 30004, 30006, ... The connections will be open for a short time, then go into the "recently closed" set, then disappear.

When there are a big number, if we pick the first N then there's some chance someone else has observed 30062, 30064, 30066, ... and so the first N don't match. If we pick modulus 3 or 9 or 27..., then a subset of ports from all across the numeric range will be sent, and there is more chance of a match.

fbarl · 2019-10-24T10:39:52Z

probe/endpoint/connection_tracker.go

+func findModulus(ports mapPortToPids) (modulus uint16, count int) {
+	modulus = 1
+	count = len(ports)
+	// Check they all come from/to the same pid (or zero): if differing we need another strategy to thin them down


if differing we need another strategy

From what I can see, this PR doesn't offer an alternative strategy yet - correct?

fbarl · 2019-10-24T10:54:21Z

probe/endpoint/connection_tracker.go

+				if sent == count {
+					skipped += (len(portToPids) - i - 1)
+				}
+				t.addConnection(rpt, hostNodeID, tuple, uint(pids.fromPid), uint(pids.toPid), triple.networkNamespace, skipped+1)


So my understanding is that all the sent connections will include the preceding skipped connections, except the last sent connection which will include both the preceding and the following skipped connections - maybe it's worth putting that in a comment.

Also, related to my previous comment, I don't really understand why we pick the connections subset the way we do and why they're grouped the way they are here :)

fbarl · 2019-10-24T10:55:20Z

probe/endpoint/ebpf.go

+	ret := ebpfKey{fourTuple: tuple}
+	if net.IP(tuple.fromAddr[:]).IsLoopback() || net.IP(tuple.toAddr[:]).IsLoopback() {
+		ret.networkNamespace = namespace
+	}


What happens if neither is loopback? 🤔

Then we don't "scope" the addresses by namespace.
This matches what the code did before - it's an assumption that only loopback addresses need to be scoped to a container.
We may scope the address by hostname further on in the reporting process.

The previous code tracked only by four-tuple, which meant that two connections with same address/port combinations in different namespace would clash and one would get dropped. Also previously the tuple was duplicated between the map key and value, so we remove it from the value. We only add the namespace in the case that the local address is loopback, which matches how the rest of Scope treats addresses.

The app will only show one line, regardless of how many connections we have, so reduce the number to save bandwidth and rendering time. We filter by choosing a modulus, e.g. send every connection that is a multiple of 3, or 9, and so on. We avoid multiples of 2 because port numbers are often a multiple of 2 or 4 for bit-encoding reasons.

Utility functions to create fake sets of connections for testing, and then exercising the subset filtering code to check that quantities come out as expected.

bboreham · 2020-01-13T15:23:52Z

I've added a unit test and an integration test.
Should be good to go now.

bboreham requested review from fbarl, qiell and satyamz as code owners October 12, 2019 14:15

bboreham mentioned this pull request Oct 13, 2019

Include network connection metrics in topology view #3123

Open

fbarl reviewed Oct 24, 2019

View reviewed changes

bboreham force-pushed the report-endpoint-subset branch from 71be7f1 to 353b6e8 Compare October 24, 2019 17:32

bboreham added 7 commits January 13, 2020 08:53

refactor(probe): move host/pid encoding into addConnection() function

a6da810

improvement: only record connections that we have a PID for

2767e2b

enhancement(probe): debug message for initial connection

d57a4df

comment: add explanatory comment on handleFdInstall()

9758c81

8000 ui: include skipped connection count in rendering

4c9cf13

bboreham force-pushed the report-endpoint-subset branch from a6039e0 to 4a95903 Compare January 13, 2020 08:56

bboreham added 2 commits January 13, 2020 08:59

tests: connection subset testing

b7b245e

Utility functions to create fake sets of connections for testing, and then exercising the subset filtering code to check that quantities come out as expected.

tests: add an integration test to check connection thinning

de939cc

bboreham force-pushed the report-endpoint-subset branch from 4a95903 to de939cc Compare January 13, 2020 09:00

Refactor: improve readability based on review feedback

7dc7215

bboreham merged commit 574c76a into master Jan 27, 2020

bboreham deleted the report-endpoint-subset branch January 27, 2020 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Report a subset of connections from/to the same endpoint #3709

Report a subset of connections from/to the same endpoint #3709

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Report a subset of connections from/to the same endpoint #3709

Report a subset of connections from/to the same endpoint #3709

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!