8000 Report a subset of connections from/to the same endpoint by bboreham · Pull Request #3709 · weaveworks/scope · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Report a subset of connections from/to the same endpoint #3709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 27, 2020

Conversation

bboreham
Copy link
Collaborator
@bboreham bboreham commented Oct 12, 2019

Fixes #3584

The app will only show one line, regardless of how many connections we have, so reduce the number we send to save bandwidth and rendering time. Include a note of how many connections were elided in the report, so the app can still show the same number in the detail window.

I've only done this for the ebpf collector for now.

We collect the details by "triple" of (source address, destination address, destination port), recording what we know about the source or destination pid, then where there are more than 5 connections going to the same endpoint, all for the same process, we thin them down using a modulus.

Inspired by the discussion at #3452 (review)

Note this is a (minor) breaking change in the protocol: older apps will show an incorrect connection count.

Copy link
Contributor
@fbarl fbarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to try to understand all of the code - I left a couple of comments, mainly about clarifying how the subsets are being chosen, but the idea seems good.

Before I approve, I'd be cool if you added some unit tests for the touchy parts like findModulus and performEbpfTrack. Alternatively having a nice integration test and/or explanation how to reproduce and observe the improvement would also be great.

So far I've only looked at the code but I'm not really sure whether I'd be able to tell if something was functionally broken by running Scope, so having some test coverage there would make me feel more confident this is safe to merge.

* connections from local processes to an off-box address+port
- we will know the pids of the local processes but not the remote
* connections from local processes to a local process
- these connections will each be reported twice by ebpf, as incoming and as outgoing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

extraFromNode, extraToNode = extraToNode, extraFromNode
// Pick a subset of the connections to send, such that if two probes
// on different machines go through the same process there is a good
// chance of overlap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain a little bit why this is the case, i.e. how modulus plays a role here? Or perhaps even a more basic question, what does a good subset here means and why couldn't we just pass first N connections and hide the rest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a process is continually opening and closing connections, then the originating port number will change over time, e.g. it could be 30002, 30004, 30006, ... The connections will be open for a short time, then go into the "recently closed" set, then disappear.

When there are a big number, if we pick the first N then there's some chance someone else has observed 30062, 30064, 30066, ... and so the first N don't match. If we pick modulus 3 or 9 or 27..., then a subset of ports from all across the numeric range will be sent, and there is more chance of a match.

func findModulus(ports mapPortToPids) (modulus uint16, count int) {
modulus = 1
count = len(ports)
// Check they all come from/to the same pid (or zero): if differing we need another strategy to thin them down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if differing we need another strategy

From what I can see, this PR doesn't offer an alternative strategy yet - correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

if sent == count {
skipped += (len(portToPids) - i - 1)
}
t.addConnection(rpt, hostNodeID, tuple, uint(pids.fromPid), uint(pids.toPid), triple.networkNamespace, skipped+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my understanding is that all the sent connections will include the preceding skipped connections, except the last sent connection which will include both the preceding and the following skipped connections - maybe it's worth putting that in a comment.

Also, related to my previous comment, I don't really understand why we pick the connections subset the way we do and why they're grouped the way they are here :)

ret := ebpfKey{fourTuple: tuple}
if net.IP(tuple.fromAddr[:]).IsLoopback() || net.IP(tuple.toAddr[:]).IsLoopback() {
ret.networkNamespace = namespace
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if neither is loopback? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we don't "scope" the addresses by namespace.
This matches what the code did before - it's an assumption that only loopback addresses need to be scoped to a container.
We may scope the address by hostname further on in the reporting process.

@bboreham bboreham force-pushed the report-endpoint-subset branch from 71be7f1 to 353b6e8 Compare October 24, 2019 17:32
The previous code tracked only by four-tuple, which meant that two
connections with same address/port combinations in different namespace
would clash and one would get dropped.

Also previously the tuple was duplicated between the map key and
value, so we remove it from the value.

We only add the namespace in the case that the local address is
loopback, which matches how the rest of Scope treats addresses.
The app will only show one line, regardless of how many connections we
have, so reduce the number to save bandwidth and rendering time.

We filter by choosing a modulus, e.g. send every connection that is a
multiple of 3, or 9, and so on. We avoid multiples of 2 because port
numbers are often a multiple of 2 or 4 for bit-encoding reasons.
@bboreham bboreham force-pushed the report-endpoint-subset branch from a6039e0 to 4a95903 Compare January 13, 2020 08:56
Utility functions to create fake sets of connections for testing, and
then exercising the subset filtering code to check that quantities
come out as expected.
@bboreham bboreham force-pushed the report-endpoint-subset branch from 4a95903 to de939cc Compare January 13, 2020 09:00
@bboreham
Copy link
Collaborator Author

I've added a unit test and an integration test.
Should be good to go now.

@bboreham bboreham merged commit 574c76a into master Jan 27, 2020
@bboreham bboreham deleted the report-endpoint-subset branch January 27, 2020 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deal better with nodes that have thousands of sockets
2 participants
0