Description
CPU utilization is an important metric in network benchmarks when we reach the line rate and cannot compare throughput, which is often the case with TCP stream tests. This has been the case when comparing the performance of pre- and post- Meltdown/Spectre fixes.
rushit
currently measures CPU usage with getrusage()
API, like iperf3
does it. We collect samples in two ways:
1. For the process - once at the start of the test run and once at the end of the test. These samples are printed in the output for the test run:
utime_start=0.019308
utime_end=0.568190
stime_start=0.000964
stime_end=7.634470
2. For each network thread, at regular intervals throughout the test run (interval set via -I
option). This gets reported together with the sample dump (-A
option):
$ cat samples.csv | awk -F, '{ print $10, $11 }' | column -t
utime stime
0.081938 0.697616
0.158586 1.397109
0.235467 2.103314
0.305231 2.856134
0.379570 3.569675
0.447578 4.324109
0.526333 5.069617
0.586349 5.832823
0.655824 6.564788
0.717896 7.287646
Other tools measure it differently. For instance, netperf
samples /proc/stat
, while fio
runs an idle thread and measures how much work it can get done. Cgroups also offer means of tracking used CPU time:
We should evaluate if the currently implemented CPU utilization accounting is useful for the user, and if needed change it or provide alternative methods.
Whichever method we chose, we should also inform the user if the measurement should not be relied on, like when power saving / CPU frequency scaling mechanisms are enabled.
Requested by @jbenc.