[FEATURE] New network code with tests #2200

rafaeldtinoco · 2022-09-28T03:08:23Z

Achievements:

Network Protocol Events with complete headers support:
- IPv4, IPv6, TCP, UDP, ICMPv4, ICMPv6, DNS (all types)
Easily extend L3, L4 and L7 protocols as needed
Socket based network events (no interface binding)
New technique for missing eBPF helpers in cgroup BPF context
E2E tests based in golang signatures using created types
Code is ready for blocking network events (per protocol)
- Need to define interface to provide/consume the maps

Still missing (some are for other PRs):

A PR for the network event types (temporary in this PR)
non CO-RE code support
execute network event tests automatically on each PR
DNS events tests
events documentation (markdown files)
tracee documentation (regarding the new network events)
Pcap capturing portion of network protocol events

Near future (as needed):

Reorganize existing network events using the sockets + cgroup approach (instead of relying on network tuples).
Derived DNS events (queries/replies) filtered from DNS protocol events
- needs a definition on how these would be consumed
Protocol based events (signal anytime a protocol event happens):
- bind
- connect
- accept
- listen
- setsockopt
- {send,recv}msg
- ... all from the "proto_ops"
Socket based events (signal anytime a socket fd is accessed):
- {read,write}iter
- mmap
- sendpage
- splice_{read,write}
- ... all from the "socket inode" "file_ops"

Initial Checklist

There is an issue describing the need for this PR.
Git log contains summary of the change.
Git log contains motivation and context of the change.
If part of an EPIC, PR git log contains EPIC number.
If part of an EPIC, PR was added to EPIC description.

Description (git log)

aa9ec05 - e2e-rules: start golang signatures for e2e tests
8804b70 - types: add network protocol types
3b5b5a5 - network: add protocol events and types
9635ed2 - tracee.bpf.c: beginning of new cgroup based network code
c4e471d - vmlinux.h: adjust needed types and missing definitions
75afcb0 - probes: cgroupProbe type implementing the Probe interface
314a8c8 - TEMPORARY: tracee types replacement

Type of change

Bug fix (non-breaking change fixing an issue, preferable).
Quick fix (minor non-breaking change requiring no issue, use with care)
Code refactor (code improvement and/or code removal)
New feature (non-breaking change, adding functionality).
Breaking change (cause existing functionality not to work as expected).

How Has This Been Tested?

Tests being included in this PR:

$ git log -1 -p | diffstat -l
Makefile
tests/e2e-rules/e2e-icmp.go
tests/e2e-rules/e2e-icmpv6.go
tests/e2e-rules/e2e-ipv4.go
tests/e2e-rules/e2e-ipv6.go
tests/e2e-rules/e2e-tcp.go
tests/e2e-rules/e2e-udp.go
tests/e2e-rules/export.go
tests/e2e-rules/helpers.go
tests/e2e-rules/scripts/icmp.sh
tests/e2e-rules/scripts/icmpv6.sh
tests/e2e-rules/scripts/ipv4.sh
tests/e2e-rules/scripts/ipv6.sh
tests/e2e-rules/scripts/setup.sh
tests/e2e-rules/scripts/tcp.sh
tests/e2e-rules/scripts/udp.sh

Reproduce the test by running:

Compile

$ make clean
$ make all
$ make e2e-net-rules

Run tracee

sudo ./dist/tracee-ebpf \
  --output format:json \
  --output option:parse-arguments \
  --output option:detect-syscall \
  --trace event=$(./dist/tracee-rules \
    --rules-dir ./dist/e2e-net-rules \
    --list-events) --trace comm=ping,nc | \
  ./dist/tracee-rules \
  --input-tracee file:stdin \
  --input-tracee format:json \
  --rules-dir ./dist/e2e-net-rules

Run a test

sudo ./tests/e2e-net-test.sh

Final Checklist:

Pick "Bug Fix" or "Feature", delete the other and mark appropriate checks.

I have made corresponding changes to the documentation.
My code follows the style guidelines (C and Go) of this project.
I have performed a self-review of my own code.
I have commented all functions/methods created, explaining what they do.
I have commented my code, particularly in hard-to-understand areas.
My changes generate no new warnings.
I have added tests that prove my fix, or feature, is effective.
New and existing unit tests pass locally with my changes.
Any dependent changes have been merged and published before.

Git Log Checklist:

My commits logs have:

Subject starts with "subsystem|file: description".
Do not end the subject line with a period.
Limit the subject line to 50 characters.
Separate subject from body with a blank line.
Use the imperative mood in the subject line.
Wrap the body at 72 characters.
Use the body to explain what and why instead of how.

Extra Info

Performance does not look bad when tracing IPv4, TCP, UDP and DNS altogether:

rafaeldtinoco · 2022-09-28T17:14:46Z

clang-12 generates a binary that triggers verifier issues in older kernels... clang-14 OTOH generates a binary capable of running in all kernels (including older ones when binary is static). Checking...

Edit: Solved (clang-12 missed null variable initialization)

rafaeldtinoco · 2022-09-29T04:27:04Z

All CO-RE and non CO-RE tests passed, now I'm adding the network tests to CO-RE and non CO-RE (to make sure all events are good in all supported environments)... might need some fixes here and there (in the runners also) but it allows reviews to happen.

NDStrahilevitz · 2022-09-29T10:53:09Z

@rafaeldtinoco I've cancelled the NONCORE e2e flows since the network part seemed to be hanging and other PRs were waiting.

NDStrahilevitz

Awesome work Rafael.
Left some thoughts and notes I had while going through, would like to address them before +1ing on my end.

types/trace/trace.go

pkg/ebpf/c/tracee.bpf.c

pkg/ebpf/tracee.go

pkg/events/derive/net_packet_dns.go

cmd/tracee-rules/input.go

pkg/events/events.go

tests/e2e-net-rules/helpers.go

NDStrahilevitz · 2022-10-03T15:41:32Z

tests/e2e-net-test.sh

@@ -0,0 +1,199 @@
+#!/bin/bash


I assume this is mostly a duplicate of the original kerneltest.sh right? Could you elaborate why you chose to duplicate it instead of reusing the former? I think it would be beneficial if we could use the former as a reusable basis.

Well there are differences in the two, the best we could do is to make a third file and use it as a library (with functions and common code) for both. Can I add this as a TODO instead of doing it now ? I promise I'll do it (but I wanted to check test results during the subsequent (of this one) PRs and adjust things right if needed first.

rafaeldtinoco · 2022-10-03T18:58:18Z

Thanks for the careful review, will address the concerns and push code again.

rafaeldtinoco · 2022-10-04T03:26:59Z

I have not put the new events in the "network events range" yet:

On purpose. I'll do it once I have the simple DNS query/response events, so I can remove the previous events and things are more clear (that the new network events replaced the old ones).

rafaeldtinoco · 2022-10-04T17:56:14Z

Alright, some tests were failing for multiple reasons (runners configurations and some minor timeouts in between interfaces being up and down). I believe it is all set now.

roikol

very nice work @rafaeldtinoco !

i like the approach and your protocol handling

go.mod

pkg/ebpf/c/tracee.bpf.c

pkg/events/derive/net_packet_dns.go

pkg/ebpf/c/tracee.bpf.c

yanivagman

First of all - great work @rafaeldtinoco! This will help us a lot in the future to develop network features in Tracee.

I only reviewed the BPF part since I didn't have the time to go over all the PR (hope to do it soon), but I thought it will be better not to wait for the full review before sending my current comments. Most of the comments are either nit or requests for clarifications, but others might require some extra work.

Since most of the BPF code here is new (and there's a lot of it), I wonder if this can be a good opportunity to start splitting the BPF code to different files. Do you think you can put the network BPF code in a separate file, or should we wait with this for a future PR? (totally fine with me, just asking, since the code is new, and mostly self contained).

pkg/ebpf/c/tracee.bpf.c

cmd/tracee-ebpf/capabilities.go

pkg/ebpf/c/tracee.bpf.c

pkg/ebpf/c/vmlinux.h

pkg/ebpf/probes/probes.go

rafaeldtinoco · 2022-10-11T02:05:07Z

Since most of the BPF code here is new (and there's a lot of it), I wonder if this can be a good opportunity to start splitting the BPF code to different files. Do you think you can put the network BPF code in a separate file, or should we wait with this for a future PR? (totally fine with me, just asking, since the code is new, and mostly self contained).

Yes, I have opened #2222 for it.

rafaeldtinoco · 2022-10-11T02:07:31Z

I still need to finish the capability NET_ADMIN needed for kernels <= 5.8 AND decide if cleaning the maps for sure in sched_proc_exit() is worth (for performance reasons) OR simply having the LRU maps (as I've done now) is good enough to prevent MAP exhaustion in case of map entry leaks (like we already had in the past, and also solved with LRU maps).

rafaeldtinoco · 2022-10-18T03:47:31Z

This PR is held until #2243 is merged.

and then until

#2233 is merged.

geyslan · 2022-10-18T19:52:53Z

@geyslan, remember to review this.

tests/e2e-net-test.sh

pkg/ebpf/c/missing_definitions.h

pkg/ebpf/c/vmlinux.h

cmd/tracee-ebpf/capabilities_test.go

pkg/ebpf/c/tracee.bpf.c

rafaeldtinoco · 2022-11-14T21:22:46Z

I discovered an issue with part of this code and I'm fixing (FYI).

rafaeldtinoco · 2022-11-24T17:13:13Z

In order to have this merged, I need #2378 merged first and this one upgrading types to that commit.

rafaeldtinoco · 2022-11-29T16:57:37Z

Discovered a bug by profiling the code:

The derive functions were always enabled due to wrong emit check:

With this change, the derive function only runs when needed.

rafaeldtinoco · 2022-11-30T14:21:18Z

Rebased after latest changes.

go.sum

pkg/ebpf/probes/cgroup.go

pkg/ebpf/probes/probes.go

pkg/ebpf/c/missing_definitions.h

yanivagman · 2022-11-30T14:04:26Z

pkg/events/events.go

+				{Type: "const char*", Name: "src"},
+				{Type: "const char*", Name: "dst"},
+				{Type: "u16", Name: "src_port"},
+				{Type: "u16", Name: "dst_port"},


nit: makes more sense to have the order be src_addr, src_port, dst_addr, dst_port, WDYT?

All other network events that don't have a port (L3) have 1st argument as src, 2nd argument as dst. I added a 3rd and 4th argument so they wouldn't be different in the first 2 arguments. If you still think it is better I can make the change.

signatures/helpers/helpers.go

josedonizetti

Tested on DO Kubernetes, and GKE today. Looking good!

yanivagman

OK, LGTM!
This is a huge and welcome change.
One last request before you merge. Since tests are failing on kernel 4.19, can you please manually check that your code works on this kernel? Also please manually run e2e tests so we won't need to fix anything after the merge if tests fail.

rafaeldtinoco · 2022-12-01T13:09:04Z

Since tests are failing on kernel 4.19, can you please manually check that your code works on this kernel?

We will have to investigate further for the code to work in 4.19. Unfortunately it seems that __sk_buff does not have a tstamp field, which I'm using to index the skb in between the cgroup/skb program (pre and post cgroup program, the technique for older kernels).

Checking if I can mitigate it...

rafaeldtinoco · 2022-12-01T14:10:17Z

Since tests are failing on kernel 4.19, can you please manually check that your code works on this kernel?

We will have to investigate further for the code to work in 4.19. Unfortunately it seems that __sk_buff does not have a tstamp field, which I'm using to index the skb in between the cgroup/skb program (pre and post cgroup program, the technique for older kernels).

Checking if I can mitigate it...

I was able to index the skb by "hash" instead of "tstamp" but then:

bpf_sk_fullsock() isn't available < 5.1. So I can return 0 if the type does not exist, but networking code won't work without it, since:

I might be able to use __skb_buf data and data_end pointers, but will require more work and tests.

rafaeldtinoco · 2022-12-01T14:57:48Z

I believe we have bigger problems in 4.19 for non CORE:

I'm able to compile CORE, but 4.19 won't work due to missing BTF. I'm putting ifdefs for not including the new network hooks for <= 5.0 so it compiles in 4.19 both CORE and non CORE.

Related: #2152

- preparation for the new networking code. Related: #2152

There are multiple ways to follow ingress/egress for a task. One way is to try to track all flows within network interfaces and keep a map of addresses tuples and translations. OR, sk_storage and socket cookies might help in understanding which sock/sk_buff context the bpf program is dealing with but, at the end, the need is always to tie a flow to a task (specially when hooking ingress skb bpf programs, when the current task is a kernel thread most of the times). Unfortunately that gets even more complicated in older kernels: the cgroup skb programs have almost no bpf helpers to use, and most of common code causes verifier to fail. With that in mind, this approach uses a technique of kprobing the function responsible for calling the cgroup/skb programs. All the work, that should be done by the cgroup/skb programs in the common case, in this case is done by this kprobe/kretprobe hook logic (right before and right after the cgroup/skb program runs). By doing that, all the data that cgroup/skb programs need to use is already placed in a map. Obviously this has some cons: this kprobe->cgroup/skb->kretprobe execution flow does not have preemption disabled, so the map used in between the 3 hooks need to use something that is available to all 3 of them. At the end, the logic is simple: every time a socket is created an inode is also created. The task owning the socket is indexed by the socket inode so everytime this socket is used we know which task it belongs to (specially during ingress hook). Related: #2152

1. Add support (event) for the following network protocols: - IPV4 - IPv6 - TCP - UDP - ICMP - ICMPv6 - DNS 2. Create a fake net_packet_base event so all needed probes and capabilities, for network events, are set in a single place. 3. Network cgroup probes should only be attached when events are selected. With the following added events: - NetPacketIPBase - NetPacketIPv4 - NetPacketIPv6 - NetPacketTCPBase - NetPacketTCP - NetPacketUDPBase - NetPacketUDP - NetPacketICMPBase - NetPacketICMP - NetPacketICMPv6Base - NetPacketICMP - NetPacketDNSBase - NetPacketDNS NOTE: All the "Base" events are raw network events (with a single argument being the packet payload) and internal only. They are used for deriving subsequent events depending on the base event. Example: A "NetPacketUDPBase" event has a "payload" argument, which is the "IP+UDP" header in "bytes". It will be derived into a "NetPacketUDP" event, with appropriate type (that can be used by signatures and so on). Related: #2152

For now these signatures are only meant to test the network protocol events, but, in near future, they might be able to test any existing event without the need for a real signature, for that particular event, to exist. For now one may execute the tests by doing: $ sudo ./dist/tracee-ebpf \ --output format:json \ --output option:parse-arguments \ --output option:detect-syscall \ --trace event=$(./dist/tracee-rules \ --rules-dir ./dist/e2e-net-rules \ --list-events) \ --trace comm=ping,nc | \ ./dist/tracee-rules \ --input-tracee file:stdin \ --input-tracee format:json \ --rules-dir ./dist/e2e-net-rules And observe network protocol event detections from the executions of scripts from "tests/e2e-net-rules/scripts/". Do not forget to run "setup.sh" for creating needed testing namespaces.

Instead of relying in ingress/egress flows containing a process context in order to obtain new sockets that are already connected, simply probe: - security_socket_sendmsg - security_socket_recvmsg This has also the advantage of always updating the inodemap (the socket <=> task context map) with new task context if it has recomputed its scope during should_trace()). Other socket operations functions might be added in the future, in order to create new entries in the inodemap (see "socket_file_ops" struct), if needed. Related: #2152

rafaeldtinoco self-assigned this Sep 28, 2022

rafaeldtinoco added the area/network label Sep 28, 2022

rafaeldtinoco added this to the v0.9.0-rc1 milestone Sep 28, 2022

rafaeldtinoco linked an issue Sep 28, 2022 that may be closed by this pull request

Redesign network events and capture features #2152

Closed

rafaeldtinoco requested review from yanivagman, roikol and NDStrahilevitz September 28, 2022 03:30

This comment was marked as outdated.

Sign in to view

NDStrahilevitz reviewed Oct 3, 2022

View reviewed changes

roikol reviewed Oct 9, 2022

View reviewed changes

yanivagman reviewed Oct 9, 2022

View reviewed changes

rafaeldtinoco mentioned this pull request Oct 10, 2022

Recalculate scope if needed in cgroup network programs #2221

Closed

rafaeldtinoco mentioned this pull request Oct 13, 2022

[REFACTOR] Cgroup Interface (cgroupv1 and cgroupv2 initialization) #2233

Merged

5 tasks

rafaeldtinoco changed the title ~~New network code with tests~~ [FEATURE] New network code with tests Oct 18, 2022

rafaeldtinoco mentioned this pull request Oct 18, 2022

[FEATURE][FIX] New capabilities with rings (replacing old capabilities feature) #2243

Merged

24 tasks

geyslan reviewed Oct 20, 2022

View reviewed changes

tests/e2e-net-test.sh Outdated Show resolved Hide resolved

geyslan reviewed Oct 20, 2022

View reviewed changes

pkg/ebpf/c/missing_definitions.h Show resolved Hide resolved

geyslan reviewed Oct 21, 2022

View reviewed changes

pkg/ebpf/c/vmlinux.h Outdated Show resolved Hide resolved

geyslan reviewed Oct 21, 2022

View reviewed changes

cmd/tracee-ebpf/capabilities_test.go Outdated Show resolved Hide resolved

geyslan reviewed Oct 21, 2022

View reviewed changes

pkg/ebpf/c/tracee.bpf.c Outdated Show resolved Hide resolved

yanivagman modified the milestones: v0.9.0-rc1, v.0.10.0 Oct 26, 2022

josedonizetti self-requested a review November 9, 2022 14:41

rafaeldtinoco requested a review from yanivagman November 23, 2022 12:56

rafaeldtinoco mentioned this pull request Nov 24, 2022

types: add network protocol events types #2378

Merged

5 tasks

rafaeldtinoco added the PR/needs-review label Nov 25, 2022

yanivagman reviewed Nov 30, 2022

View reviewed changes

josedonizetti reviewed Nov 30, 2022

View reviewed changes

yanivagman approved these changes Dec 1, 2022

View reviewed changes

rafaeldtinoco added 12 commits December 1, 2022 16:35

types: upgrade types to network types

824ffe3

probes: add support for cgroup probes

8d7519e

Related: #2152

vmlinux.h: adjust needed types and missing definitions

514fe85

- preparation for the new networking code. Related: #2152

e2e-net-rules: helpers to signature to be reused

65ed7f3

pkg/cgroup: allow to get Cgroup by version

3036203

probes: use cgroup interface for cgroup probe attachment

8caa531

tests: fix e2e-net-test for exit cleanup

a3f0b9d

network: new network code fix for vanilla v4.19

6687a32

rafaeldtinoco merged commit b67fc02 into aquasecurity:main Dec 1, 2022

rafaeldtinoco deleted the new-network-code-with-tests branch December 1, 2022 18:47

[FEATURE] New network code with tests #2200

[FEATURE] New network code with tests #2200

Uh oh!

Conversation

Uh oh!

Initial Checklist

Description (git log)

Type of change

How Has This Been Tested?

Final Checklist:

Git Log Checklist:

Extra Info

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment