Description
The Problem
We observed cases where events arrived from tracee-ebpf are not sorted by their timestamps - events that occurred before other events arrive after in the channel.
This is problematic, because when using stateful events, you can't trust the ordered receive and will need to sort the events yourself (which is not an easy task to do).
Suspected Reason
It is published by couple of sources that Perf Buffers can cause unsorted events arriving from the kernel (example).
This is caused because of the way that Perf Buffer works - each CPU has its own buffer, and all the buffers are handled using Epoll interface. Epoll interface helps handling multiple I/O sources, and it does it by reading events by Round Robin order.
This means that if the write speed is greater than the read speed and events start to get piled in the CPUs buffers, if one CPU write events faster than the other one the Nth event in its has occurred after the Nth of the other and only match the (N+e)th event.
As a result, when the events are read from their buffer by Round Robin, they will not be sorted by their timestamp.
Available Solutions
- Datadog has a sorting object that sort arriving events and send forward only events with timestamp previous to some delay. By that they can guarantee to some extent the order of the events. They sort events using heap-style sorting (they treat a slice as a heap).