Introducing decoder pkg #1405

mtcherni95 · 2022-01-26T15:56:07Z

Hi, after running some benchmarks on Tracee @NDStrahilevitz and I saw that the processEvents logic was using a lot of computational time in binary.Read() function. Please see flame graph:

After exploring the internals of binary.Read() we saw it uses the reflect package which compromises program efficiency.
Moreover, from binary package description: "[...] This package favors simplicity over efficiency. Clients that require high-performance serialization, especially for large data structures should look at more advanced solutions [...]".

The intent of this PR is to maximize efficiency over simplicity.

A decoder package is introduced.

Similarly to binary, the decoder knows to read a sequence of bytes coming from the Tracee eBPF program to user-space and translate them to golang structs defined under the protocol package.

Here is a screenshot from a benchmark tests that compares the efficiency of the new decoder package vs binary.Read operations.

The decoder package translation scope is much smaller than the one of binary.Read, indeed binary.Read accept an interface while the decoder translate only to specific data structures. However decoder is much (much!) more efficient.

Solves: #1407

tracee-ebpf/tracee/argprinters.go

yanivagman

That's awesome @mtcherni95!
It seems like this will be a big improvement to performance

pkg/external/external.go

tracee-ebpf/tracee/internal/bufferdecoder/protocol.go

tracee-ebpf/tracee/process_tree.go

tracee-ebpf/tracee/write_capture.go

yanivagman · 2022-02-01T09:04:57Z

tracee-ebpf/tracee/internal/bufferdecoder/protocol_test.go

+func TestContextSize(t *testing.T) {
+	var v Context
+	assert.Equal(t, int(v.GetSizeBytes()), 104)
+}
+func TestChunkMetaSize(t *testing.T) {
+	var v ChunkMeta
+
+	assert.Equal(t, int(v.GetSizeBytes()), 45)
+}
+
+func TestVfsWriteMetaSize(t *testing.T) {
+	var v VfsWriteMeta
+	assert.Equal(t, int(v.GetSizeBytes()), 20)
+}
+
+func TestKernelModuleMetaSize(t *testing.T) {
+	var v KernelModuleMeta
+	assert.Equal(t, int(v.GetSizeBytes()), 24)
+}
+
+func TestMprotectWriteMetaSize(t *testing.T) {
+	var v MprotectWriteMeta
+	assert.Equal(t, int(v.GetSizeBytes()), 8)
+}


The number of bytes of each struct is hardcoded here, as well as in the GetSizeBytes() function, so do these tests really test something?

@yanivagman please see this #1405 (comment)
If agree, then in the tests I will use unsafe.Sizeof to get exact size of the struct and assert it's equal to the hardcoded value.

yanivagman · 2022-02-01T10:15:57Z

tracee-ebpf/tracee/internal/bufferdecoder/decoder.go

+	ctx.Retval = int64(binary.LittleEndian.Uint64(decoder.buffer[offset+88 : offset+96]))
+	ctx.StackID = binary.LittleEndian.Uint32(decoder.buffer[offset+96 : offset+100])
+	ctx.Argnum = uint8(binary.LittleEndian.Uint16(decoder.buffer[offset+100 : offset+102]))
+	decoder.cursor += ctx.GetSizeBytes()


Do we really need all of those GetSizeBytes() functions? To me it seems that they give us no real benefit as anyway the exact bytes offsets are hardcoded here. In addition, we will have to maintain those functions correct whenever we update one of the structs, increasing the chances of future programming mistakes.
If the problem it solves is padding (for example context is of size 104 while we read till byte 102 here), we can just add another field to the relevant structs of padding (for example, two more bytes for padding at the end of Context).

Yep, I agree with @yanivagman. I just left a comment about the hard coded values asking if we could get the arg type's sizes during initialization (using reflect type size or something like that) and have them constant... even if we need to adjusting padding on the affected structs...

@yanivagman , I'm fine with removing the GetSizeBytes() functions. I wouldn't add additional data to the structs just for expose the size of it. I would simply add it hard-coded.
@rafaeldtinoco I prefer not to use reflection as it negatively affects efficiency (binary.Read uses it, and that's the whole issue with it).

we could still have GetSizeBytes() which returns an hardcoded value, and in the tests (https://github.com/aquasecurity/tracee/pull/1405/files#diff-242d9e577032010728b6349d850cb4bcb8c517923dd77d50ad684732b43c23efR31) we use unsafe.Sizeof to calcualte the size struct (just need to take into account pad added by compiler, when the struct is created, to make it a multiple of 8 bytes). In this way: the test is meaningful as it will fail if someone changes the structs without updating the decoder, and the GetSizeBytes() has its purpose.

@rafaeldtinoco I prefer not to use reflection as it negatively affects efficiency (binary.Read uses it, and that's the whole issue with it).

Yep, it was just a way to say something else than hardcoded in here, possibly calculated as const in the startup. Does not necessarily need to be reflect.

=== RUN TestVfsWriteMetaSize protocol_test.go:31: Error Trace: protocol_test.go:31 Error: Not equal: expected: 28 actual : 20 Test: TestVfsWriteMetaSize --- FAIL: TestVfsWriteMetaSize (0.00s) === RUN TestKernelModuleMetaSize --- PASS: TestKernelModuleMetaSize (0.00s) === RUN TestMprotectWriteMetaSize --- PASS: TestMprotectWriteMetaSize (0.00s) FAIL FAIL github.com/aquasecurity/tracee/tracee-ebpf/tracee/internal/bufferdecoder 0.005s

After adding a variable to the arg type. I'm fine with that, since the PR checks will get the issue and allow the fix to happen (and I can understand why hard coding the size has better performance).

tracee-ebpf/tracee/internal/bufferdecoder/decoder.go

rafaeldtinoco

@mtcherni95 Overall I think this is a very nice change and I like the way it has been done with minimal changes to the parsing logic, etc. Left some comments on top of Yaniv's review, which I think address same concerns I had mostly (and some other nit fixes).

tracee-ebpf/tracee/internal/bufferdecoder/protocol.go

tracee-ebpf/tracee/internal/bufferdecoder/decoder.go

rafaeldtinoco · 2022-02-02T04:02:58Z

tracee-ebpf/tracee/internal/bufferdecoder/decoder.go

+	ctx.Retval = int64(binary.LittleEndian.Uint64(decoder.buffer[offset+88 : offset+96]))
+	ctx.StackID = binary.LittleEndian.Uint32(decoder.buffer[offset+96 : offset+100])
+	ctx.Argnum = uint8(binary.LittleEndian.Uint16(decoder.buffer[offset+100 : offset+102]))
+	decoder.cursor += ctx.GetSizeBytes()


Yep, I agree with @yanivagman. I just left a comment about the hard coded values asking if we could get the arg type's sizes during initialization (using reflect type size or something like that) and have them constant... even if we need to adjusting padding on the affected structs...

rafaeldtinoco · 2022-02-02T19:25:25Z

tracee-ebpf/tracee/internal/bufferdecoder/decoder_test.go

+	assert.Equal(t, nil, err)
+	// checking decoding succeeded correctly
+	assert.Equal(t, ctxExpected, ctxObtained)
+	// checking decoder cursor on buffer moved approiately


nit: spelling of approiately (multiple spelling issues because of copy/paste)

rafaeldtinoco · 2022-02-02T20:00:23Z

tracee-ebpf/tracee/internal/bufferdecoder/decoder.go

+	ctx.Retval = int64(binary.LittleEndian.Uint64(decoder.buffer[offset+88 : offset+96]))
+	ctx.StackID = binary.LittleEndian.Uint32(decoder.buffer[offset+96 : offset+100])
+	ctx.Argnum = uint8(binary.LittleEndian.Uint16(decoder.buffer[offset+100 : offset+102]))
+	decoder.cursor += ctx.GetSizeBytes()


=== RUN TestVfsWriteMetaSize protocol_test.go:31: Error Trace: protocol_test.go:31 Error: Not equal: expected: 28 actual : 20 Test: TestVfsWriteMetaSize --- FAIL: TestVfsWriteMetaSize (0.00s) === RUN TestKernelModuleMetaSize --- PASS: TestKernelModuleMetaSize (0.00s) === RUN TestMprotectWriteMetaSize --- PASS: TestMprotectWriteMetaSize (0.00s) FAIL FAIL github.com/aquasecurity/tracee/tracee-ebpf/tracee/internal/bufferdecoder 0.005s

After adding a variable to the arg type. I'm fine with that, since the PR checks will get the issue and allow the fix to happen (and I can understand why hard coding the size has better performance).

yanivagman

LGTM!

mtcherni95 requested a review from yanivagman January 26, 2022 15:59

mtcherni95 force-pushed the ebpf-decoder branch from e3f1515 to e39a68d Compare January 26, 2022 16:00

mtcherni95 requested a review from rafaeldtinoco January 26, 2022 16:05

mtcherni95 force-pushed the ebpf-decoder branch 3 times, most recently from 09504db to c502272 Compare January 27, 2022 11:05

yanivagman linked an issue Jan 27, 2022 that may be closed by this pull request

binary.Read efficiency overhead #1407

Closed

mtcherni95 force-pushed the ebpf-decoder branch from ff912c9 to 1375f79 Compare January 30, 2022 14:17

mtcherni95 added the area/performance label Jan 30, 2022

mtcherni95 added this to the v0.7.0 milestone Jan 30, 2022

mtcherni95 commented Jan 30, 2022

View reviewed changes

tracee-ebpf/tracee/argprinters.go Show resolved Hide resolved

yanivagman reviewed Feb 1, 2022

View reviewed changes

rafaeldtinoco reviewed Feb 2, 2022

View reviewed changes

mtcherni95 force-pushed the ebpf-decoder branch 2 times, most recently from fb08aaf to 0ea2b74 Compare February 2, 2022 15:30

mtcherni95 self-assigned this Feb 2, 2022

mtcherni95 force-pushed the ebpf-decoder branch 2 times, most recently from c7907bf to 22b50b3 Compare February 2, 2022 18:06

rafaeldtinoco reviewed Feb 2, 2022

View reviewed changes

mtcherni95 force-pushed the ebpf-decoder branch 4 times, most recently from 59c2d95 to d239924 Compare February 3, 2022 09:43

yanivagman approved these changes Feb 3, 2022

View reviewed changes

Introducing decoder pkg

ab5807e

mtcherni95 force-pushed the ebpf-decoder branch from d239924 to ab5807e Compare February 3, 2022 12:29

mtcherni95 merged commit ee5b176 into aquasecurity:main Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introducing decoder pkg #1405

Introducing decoder pkg #1405

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Introducing decoder pkg #1405

Introducing decoder pkg #1405

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!