-
Notifications
You must be signed in to change notification settings - Fork 449
[FEAT] log ebpf errors #2352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] log ebpf errors #2352
Conversation
b170072
to
998bb75
Compare
998bb75
to
baa6fe9
Compare
baa6fe9
to
49035d9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
The |
5b0cccf
to
39538a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
8f6bafb
to
129460c
Compare
I had to reduce |
One thing that we should be aware of when adding a new bpf error is the following: We should consider adding a mechanism that will turn off the submission and logging of such errors when they happen too frequently (e.g. more than 10 errors per second), and re enable it some time later |
Maybe we should have log levels set (possibly with separate perf buffers too?), and then we set a log level from userspace and in ebpf we have some method |
25afc9d
to
9b7c8d9
Compare
76db381
to
d917403
Compare
237b85d
to
7910130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Finally going through this now (and will merge it right after). |
if (likely(counter != NULL)) { | ||
ts_prev = counter->ts; // store previous ts | ||
|
||
counter->count += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're changing a pointee value on the map without an atomic operation. I believe this is racy among different CPUs (as the "errors_count" map isn't a per CPU map). So, you might need either to copy the counter to the stack, increment and update the key with the new stack values (the key update is atomic) OR use atomic[64][fetch]add() for the pointer (not sure if that works for a map pointer, need to check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe, due to the nature of this counter, there is no problem if we loose a few increments (due to 2 cpus copying the values to the stack at the same time and both updating the key values with their stack values). With that, it would be just a question of memcping the pointer to the stack, incrementing and updating the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not doing that might get us into a situation where count is in diff cachelines, and it gets badly corrupted, not that would impact too much, but, still...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Geyslan made a good point that part of the key to this map is the cpu id.
In that case, it is not possible that two different CPUs will update the same counter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is a good point (haven't payed attention to it). Nice work @geyslan !!
Introduce EBPF error control mechanism - submission via perf buffer. EBPF errors are aggregated before being submitted and are later processed and logged by the processBPFErrors() go routine. This starts the use of bpf error mechanism through some type checking like BPF_ERR_MAP_UPDATE_ELEM, BPF_ERR_GET_CURRENT_COMM and BPF_ERR_INIT_CONTEXT.
Due ChanErrors removal aquasecurity#2403, warn about it.
7910130
to
cfd8611
Compare
Description (git log)
commit cfd8611
commit ea4cac9
commit ff4c077
Fixes: #2174
Type of change
How Has This Been Tested?
Toggled a checking (falsed an error) in the ebpf side to be able to send the error and get it in user space.