-
Notifications
You must be signed in to change notification settings - Fork 449
[FEATURE] New network code with tests #2200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] New network code with tests #2200
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
clang-12 generates a binary that triggers verifier issues in older kernels... clang-14 OTOH generates a binary capable of running in all kernels (including older ones when binary is static). Checking...
|
All CO-RE and non CO-RE tests passed, now I'm adding the network tests to CO-RE and non CO-RE (to make sure all events are good in all supported environments)... might need some fixes here and there (in the runners also) but it allows reviews to happen. |
@rafaeldtinoco I've cancelled the NONCORE e2e flows since the network part seemed to be hanging and other PRs were waiting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Rafael.
Left some thoughts and notes I had while going through, would like to address them before +1ing on my end.
@@ -0,0 +1,199 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is mostly a duplicate of the original kerneltest.sh
right? Could you elaborate why you chose to duplicate it instead of reusing the former? I think it would be beneficial if we could use the former as a reusable basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well there are differences in the two, the best we could do is to make a third file and use it as a library (with functions and common code) for both. Can I add this as a TODO instead of doing it now ? I promise I'll do it (but I wanted to check test results during the subsequent (of this one) PRs and adjust things right if needed first.
Thanks for the careful review, will address the concerns and push code again. |
Alright, some tests were failing for multiple reasons (runners configurations and some minor timeouts in between interfaces being up and down). I believe it is all set now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice work @rafaeldtinoco !
i like the approach and your protocol handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all - great work @rafaeldtinoco! This will help us a lot in the future to develop network features in Tracee.
I only reviewed the BPF part since I didn't have the time to go over all the PR (hope to do it soon), but I thought it will be better not to wait for the full review before sending my current comments. Most of the comments are either nit or requests for clarifications, but others might require some extra work.
Since most of the BPF code here is new (and there's a lot of it), I wonder if this can be a good opportunity to start splitting the BPF code to different files. Do you think you can put the network BPF code in a separate file, or should we wait with this for a future PR? (totally fine with me, just asking, since the code is new, and mostly self contained).
Yes, I have opened #2222 for it. |
I still need to finish the capability NET_ADMIN needed for kernels <= 5.8 AND decide if cleaning the maps for sure in sched_proc_exit() is worth (for performance reasons) OR simply having the LRU maps (as I've done now) is good enough to prevent MAP exhaustion in case of map entry leaks (like we already had in the past, and also solved with LRU maps). |
@geyslan, remember to review this. |
I discovered an issue with part of this code and I'm fixing (FYI). |
In order to have this merged, I need #2378 merged first and this one upgrading types to that commit. |
Rebased after latest changes. |
{Type: "const char*", Name: "src"}, | ||
{Type: "const char*", Name: "dst"}, | ||
{Type: "u16", Name: "src_port"}, | ||
{Type: "u16", Name: "dst_port"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: makes more sense to have the order be src_addr, src_port, dst_addr, dst_port, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other network events that don't have a port (L3) have 1st argument as src, 2nd argument as dst. I added a 3rd and 4th argument so they wouldn't be different in the first 2 arguments. If you still think it is better I can make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on DO Kubernetes, and GKE today. Looking good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, LGTM!
This is a huge and welcome change.
One last request before you merge. Since tests are failing on kernel 4.19, can you please manually check that your code works on this kernel? Also please manually run e2e tests so we won't need to fix anything after the merge if tests fail.
We will have to investigate further for the code to work in 4.19. Unfortunately it seems that Checking if I can mitigate it... |
I was able to index the skb by "hash" instead of "tstamp" but then: bpf_sk_fullsock() isn't available < 5.1. So I can return 0 if the type does not exist, but networking code won't work without it, since: I might be able to use __skb_buf |
- preparation for the new networking code. Related: #2152
There are multiple ways to follow ingress/egress for a task. One way is to try to track all flows within network interfaces and keep a map of addresses tuples and translations. OR, sk_storage and socket cookies might help in understanding which sock/sk_buff context the bpf program is dealing with but, at the end, the need is always to tie a flow to a task (specially when hooking ingress skb bpf programs, when the current task is a kernel thread most of the times). Unfortunately that gets even more complicated in older kernels: the cgroup skb programs have almost no bpf helpers to use, and most of common code causes verifier to fail. With that in mind, this approach uses a technique of kprobing the function responsible for calling the cgroup/skb programs. All the work, that should be done by the cgroup/skb programs in the common case, in this case is done by this kprobe/kretprobe hook logic (right before and right after the cgroup/skb program runs). By doing that, all the data that cgroup/skb programs need to use is already placed in a map. Obviously this has some cons: this kprobe->cgroup/skb->kretprobe execution flow does not have preemption disabled, so the map used in between the 3 hooks need to use something that is available to all 3 of them. At the end, the logic is simple: every time a socket is created an inode is also created. The task owning the socket is indexed by the socket inode so everytime this socket is used we know which task it belongs to (specially during ingress hook). Related: #2152
1. Add support (event) for the following network protocols: - IPV4 - IPv6 - TCP - UDP - ICMP - ICMPv6 - DNS 2. Create a fake net_packet_base event so all needed probes and capabilities, for network events, are set in a single place. 3. Network cgroup probes should only be attached when events are selected. With the following added events: - NetPacketIPBase - NetPacketIPv4 - NetPacketIPv6 - NetPacketTCPBase - NetPacketTCP - NetPacketUDPBase - NetPacketUDP - NetPacketICMPBase - NetPacketICMP - NetPacketICMPv6Base - NetPacketICMP - NetPacketDNSBase - NetPacketDNS NOTE: All the "Base" events are raw network events (with a single argument being the packet payload) and internal only. They are used for deriving subsequent events depending on the base event. Example: A "NetPacketUDPBase" event has a "payload" argument, which is the "IP+UDP" header in "bytes". It will be derived into a "NetPacketUDP" event, with appropriate type (that can be used by signatures and so on). Related: #2152
For now these signatures are only meant to test the network protocol events, but, in near future, they might be able to test any existing event without the need for a real signature, for that particular event, to exist. For now one may execute the tests by doing: $ sudo ./dist/tracee-ebpf \ --output format:json \ --output option:parse-arguments \ --output option:detect-syscall \ --trace event=$(./dist/tracee-rules \ --rules-dir ./dist/e2e-net-rules \ --list-events) \ --trace comm=ping,nc | \ ./dist/tracee-rules \ --input-tracee file:stdin \ --input-tracee format:json \ --rules-dir ./dist/e2e-net-rules And observe network protocol event detections from the executions of scripts from "tests/e2e-net-rules/scripts/". Do not forget to run "setup.sh" for creating needed testing namespaces.
Instead of relying in ingress/egress flows containing a process context in order to obtain new sockets that are already connected, simply probe: - security_socket_sendmsg - security_socket_recvmsg This has also the advantage of always updating the inodemap (the socket <=> task context map) with new task context if it has recomputed its scope during should_trace()). Other socket operations functions might be added in the future, in order to create new entries in the inodemap (see "socket_file_ops" struct), if needed. Related: #2152
Achievements:
Still missing (some are for other PRs):
Near future (as needed):
Reorganize existing network events using the sockets + cgroup approach (instead of relying on network tuples).
Derived DNS events (queries/replies) filtered from DNS protocol events
Protocol based events (signal anytime a protocol event happens):
Socket based events (signal anytime a socket fd is accessed):
Initial Checklist
Description (git log)
Type of change
How Has This Been Tested?
Tests being included in this PR:
Reproduce the test by running:
Compile
Run tracee
Run a test
Final Checklist:
Pick "Bug Fix" or "Feature", delete the other and mark appropriate checks.
Git Log Checklist:
My commits logs have:
Extra Info
Performance does not look bad when tracing IPv4, TCP, UDP and DNS altogether: