cpu plugin: add linux-specific "guest" states. #2439

xavierog · 2017-09-21T21:29:30Z

Hello,

The "cpu" plugin currently lacks two Linux-specific CPU states:

guest (since Linux 2.6.24) Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
guest_nice (since Linux 2.6.33) Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel).

(excerpts from procfs(5))

This patch brings support for guest and guest_nice CPU states into collectd. It lacks testing on non-Linux systems though. Feedback on whatever can be improved would be greatly appreciated.

rpv-tomsk · 2017-09-22T04:32:44Z

Hi!

As far I remember my investigations on this, guest and guest_nice time also accounted in user and nice values. In current state of cpu plugin, reported values in sum are nearby to 100%.
So, the user and nice should be reduced appropriately.

An example.

# uname -a
Linux xxx 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
# uptime
 04:16:18 up 1783 days, 21:08,  1 user,  load average: 0.30, 0.20, 0.14
# cat /proc/stat
cpu  23444831569 0 12909031514 89822723313 119722820 396457 1327218429 0 20301531579
cpu0 2994249468 0 1966516348 9971887995 53197509 396457 593614204 0 2310174506
cpu1 3671834014 0 1633443143 10562045363 11194735 0 128315878 0 3368986106
cpu2 3421247569 0 1575312753 10848305097 8955271 0 120350269 0 3126493824
cpu3 3038677912 0 1498977328 11314302612 6545621 0 112337605 0 2746774831
cpu4 3502103600 0 2156777287 10359483180 26244217 0 127265540 0 2949443788
cpu5 2588314656 0 1481738425 11796379451 5276000 0 97330483 0 2240763230
cpu6 2256636701 0 1359755391 12280178936 4545391 0 80678984 0 1915469013
cpu7 1971767647 0 1236510835 12690140674 3764072 0 67325463 0 1643426278

That machine is a server running several KVM VM, no other load inside the OS itself.

Let's look for a first cpu counters.

cpu0 
2994249468   user
0            nice
1966516348   system
9971887995   idle
53197509     iowait
396457       irq
593614204    softirq
0            steal
2310174506   guest

While guest not reported, first 8 values in sum will give the value, comparable with server uptime.

The sum of first 8 values: 155 798 619 81
Uptime in seconds*100: 154 137 600 00

An error between these values is about 1% for ~5 years of uptime.

So, that sum can be correctly used as 100% of CPU when percentage is calculated / on stacked charts.

This also confirms what guest value also accounted in user value.
That must be taken into account.

But if we start to reduce user and nice, that will be backward-incompatible change for people who do not ready for a such. May be that should be tunable?

rpv-tomsk · 2017-09-22T04:34:40Z

Mine 1.5 years-old version of this patch is below. That is not used by me, for unknown reasons.

Sorry, I can't remember all the details about.

--- a/src/cpu.c
+++ b/src/cpu.c
@@ -97,8 +97,10 @@
 #define COLLECTD_CPU_STATE_SOFTIRQ 6
 #define COLLECTD_CPU_STATE_STEAL 7
 #define COLLECTD_CPU_STATE_IDLE 8
-#define COLLECTD_CPU_STATE_ACTIVE 9 /* sum of (!idle) */
-#define COLLECTD_CPU_STATE_MAX 10 /* #states */
+#define COLLECTD_CPU_STATE_GUEST 9
+#define COLLECTD_CPU_STATE_GUEST_NICE 10
+#define COLLECTD_CPU_STATE_ACTIVE 11 /* sum of (!idle) */
+#define COLLECTD_CPU_STATE_MAX 12 /* #states */

 #if HAVE_STATGRAB_H
 # include <statgrab.h>
@@ -124,6 +126,8 @@ static const char *cpu_state_names[] = {
        "softirq",
        "steal",
        "idle",
+       "guest",
+       "guest_nice",
        "active"
 };

@@ -610,7 +614,7 @@ static int cpu_read (void)
        FILE *fh;
        char buf[1024];

-       char *fields[9];
+       char *fields[11];
        int numfields;

        if ((fh = fopen ("/proc/stat", "r")) == NULL)
@@ -628,26 +632,62 @@ static int cpu_read (void)
                if ((buf[3] < '0') || (buf[3] > '9'))
                        continue;

-               numfields = strsplit (buf, fields, 9);
+               numfields = strsplit (buf, fields, 11);
                if (numfields < 5)
                        continue;

                cpu = atoi (fields[0] + 3);

-               cpu_stage (cpu, COLLECTD_CPU_STATE_USER,   (derive_t) atoll(fields[1]), now);
-               cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,   (derive_t) atoll(fields[2]), now);
+               if (numfields < 10)
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER,   (derive_t) atoll(fields[1]), now);
+
+               if (numfields < 11)
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,   (derive_t) atoll(fields[2]), now);
+
                cpu_stage (cpu, COLLECTD_CPU_STATE_SYSTEM, (derive_t) atoll(fields[3]), now);
                cpu_stage (cpu, COLLECTD_CPU_STATE_IDLE,   (derive_t) atoll(fields[4]), now);

-               if (numfields >= 8)
-               {
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_WAIT,      (derive_t) atoll(fields[5]), now);
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_INTERRUPT, (derive_t) atoll(fields[6]), now);
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_SOFTIRQ,   (derive_t) atoll(fields[7]), now);
+               if (numfields < 8)
+                       continue;
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_WAIT,      (derive_t) atoll(fields[5]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_INTERRUPT, (derive_t) atoll(fields[6]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_SOFTIRQ,   (derive_t) atoll(fields[7]), now);
+
+               if (numfields < 9)
+                       continue;

-                       if (numfields >= 9)
-                               cpu_stage (cpu, COLLECTD_CPU_STATE_STEAL, (derive_t) atoll(fields[8]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_STEAL, (derive_t) atoll(fields[8]), now);
+
+               if (numfields < 10)
+                       continue;
+
+               /* the magic */
+               int status;
+               cpu_state_t *prev_user;
+               cpu_state_t *prev_guest;
+
+               status = cpu_states_alloc (cpu);
+               if (status != 0)
+                       continue;    //cpu_states_alloc() failed
+
+               prev_user  = get_cpu_state (cpu, COLLECTD_CPU_STATE_USER);
+               prev_guest = get_cpu_state (cpu, COLLECTD_CPU_STATE_GUEST);
+
+               if ( (atoll(fields[1]) - atoll(fields[9])) < (prev_user->conv.last_value.derive - prev_guest->conv.last_value.derive) ) {
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER, 0, now);
+               }
+               else {
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER,  (derive_t)(atoll(fields[1]) - atoll(fields[9])), now);
                }
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_GUEST, (derive_t) atoll(fields[9]), now);
+
+               if (numfields < 11)
+                       continue;
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,       (derive_t)(atoll(fields[2]) - atoll(fields[10])), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_GUEST_NICE, (derive_t) atoll(fields[10]), now);
        }
        fclose (fh);
 /* }}} #endif defined(KERNEL_LINUX) */

xavierog · 2017-09-23T17:23:10Z

As suggested, I have added options so this change does not feel like a regression to anyone:

guest and guest_nice will be reported only if reportGuestState (which defaults to false) is set to true
if so, subtractGuestState will by default subtract guest and guest_nice from user and nice. Those who are interested in guest and guest_nice but do no not intend to stack them on top of the other CPU metrics can set it to false.

rpv-tomsk · 2017-09-23T17:26:16Z

src/cpu.c

  int numfields;
+  long long user_value, nice_value, value;


Hi!

In current codestyle variable should be declared when it used at first time, if that is possible.

Thanks for your work on this!

Ok, adjusted :)

octo

Hi @xavierog, thank you very much for your patch!

In addition to the inline comments, if you make a case for keeping one or both config options, please also add them to src/collectd.conf.in.

Best regards,
—octo

octo · 2017-09-23T18:39:38Z

src/collectd.conf.pod

+When set to B<true>, reports the "guest" and "guest_nice" CPU states.
+Defaults to B<false>.
+
+=item B<SubtractGuestState> B<false>|B<true>


Having two options to control this behavior feels a bit like premature optimization. In other words, are users actually going to use those options? (This is an honest question, not a rhetorical one; I know nothing about the new guest counters ;)

I'd prefer to implement the behavior that ReportGuestState true; SubtractGuestState true has and implement config options only if (and when) users ask for it.

I have implemented these options to address the following concern:

But if we start to reduce user and nice, that will be backward-incompatible change for people who do not ready for a such. May be that should be tunable?

I believe having these options keeps you safe from any angry user, as they enable end users to revert to whatever old behaviour they were used to. So I would argue that, yes, indeed, some people will use them. Specifically, these counters are notoriously useless within VMs: people with 10 000+ VMs at $RandomCloudCompany may not want to store 20 000+ extra useless metrics. I may even use ReportGuestState myself, depending on the host I deploy collectd to :)

However, we surely can change the default values. The actual question is: what is the policy of the collectd project regarding new metrics within existing plugins?

Conservative? ReportGuestState false; SubtractGuestState true

Progressive? ReportGuestState true; SubtractGuestState true

Prone to delegate computations to query and graphing tools? ReportGuestState true; SubtractGuestState false

The choice is yours :)

Conservative: New versions need to be backwards compatible (with the exception of new major versions). Backwards compatibility is defined by the user observable behavior, i.e. the previous config file needs to still work and all metrics that were reported previously need to still be reported.

I'd consider this a bit of an edge case, sine the previous metrics are still present. But it sounds like having users make a conscious decision to enable these metrics is reasonable.

You've made a good case for ReportGuestState – who would use SubtractGuestState?

who would use SubtractGuestState?

End users who only want the raw values straight out from the kernel to graph them the way they want in one or several graphs. Typically, SubtractGuestState true is useful for people who want to stack all CPU metrics in a single graph and be done with it. But some prefer to mix stacked and non-stacked values on a single graph. Some prefer a separate graph to reflect the host vs guests CPU consumption.

I tend to focus on graph aficionados but alerting is to be considered too: some alerts and thresholds may assume that guest is included in user, others may assume that user reflects only the host-specific activity.

That being said, since you aim for a conservative policy, what will most likely happen is that end users will notice the lack of guest and guest_nice (like I did), have a look at the documentation, be glad to find they can enable them easily and hopefully learn at the same time that subtlety about them being part of user and nice (I didn't know).

SGTM, let's go with both options.

octo · 2017-09-23T18:41:44Z

src/cpu.c

Please close the >= 9 block here and out-dent the >= 10 and >= 11 cases. I.e.:

if (numfields >= 9) { /* … */ } if (numfields >= 10) { /* … */ } if (numfields >= 11) { /* … */ }

Should I get the (numfields >= 9) block out too? It is part of the (numfields >= 8) block, and that's why I chose to nest (numfields >= 10) and (numfields >= 11).

Yes, that'd be great :)

Done in 092b2d8. If needed, I'll rebase/squash everything after the discussions.

octo

LGTM; thanks @xavierog!

I'll wait for the CI system to catch up before merging.

xavierog · 2017-09-23T19:59:02Z

As requested, I have updated src/collectd.conf.in to reflect all options accepted by the cpu plugin.

xavierog · 2017-09-23T20:18:13Z

Thank you very much :)

octo · 2017-09-23T20:18:41Z

Thank you!

rpv-tomsk · 2017-09-24T05:07:36Z

src/cpu.c

+        /* Guest is included in User; optionally subtract Guest from User: */
+        if (subtract_guest) {
+          user_value -= value;
+          if (user_value < 0) user_value = 0;


Hi!

Do you think (or, possible, know) is that condition is possible at practice?

Kernel code:

root@hs01:~/srv01/linux-3.2.78# cat /root/srv01/linux-3.2.78/kernel/sched.c |grep -A 50 account_guest_time static void account_guest_time(struct task_struct *p, cputime_t cputime, cputime_t cputime_scaled) { cputime64_t tmp; struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; tmp = cputime_to_cputime64(cputime); /* Add guest time to process. */ p->utime = cputime_add(p->utime, cputime); p->utimescaled = cputime_add(p->utimescaled, cputime_scaled); account_group_user_time(p, cputime); p->gtime = cputime_add(p->gtime, cputime); /* Add guest time to cpustat. */ if (TASK_NICE(p) > 0) { cpustat->nice = cputime64_add(cpustat->nice, tmp); cpustat->guest_nice = cputime64_add(cpustat->guest_nice, tmp); } else { cpustat->user = cputime64_add(cpustat->user, tmp); cpustat->guest = cputime64_add(cpustat->guest, tmp); } }

So, that will be possible only on counter overflow, which should be practically impossible.
All is ok.

cpu plugin: add linux-specific "guest" states.

3dda0fa

xavierog added 2 commits September 23, 2017 19:06

cpu plugin: subtract guest states from user/nice.

f00e980

cpu plugin: update documentation.

a44d70c

rpv-tomsk reviewed Sep 23, 2017

View reviewed changes

cpu plugin: adjust coding style (variable declaration).

6b95249

octo reviewed Sep 23, 2017

View reviewed changes

cpu plugin: adjust numfields conditions in cpu_read().

092b2d8

octo approved these changes Sep 23, 2017

View reviewed changes

cpu plugin: add new options to the default confi 628C guration file.

ac7a73e

octo merged commit ac7a73e into collectd:master Sep 23, 2017

rpv-tomsk reviewed Sep 24, 2017

View reviewed changes

octo added this to the 5.8 milestone Oct 11, 2017

maryamtahhan mentioned this pull request Oct 12, 2017

Next collectd release #2420

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu plugin: add linux-specific "guest" states. #2439

cpu plugin: add linux-specific "guest" states. #2439

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpu plugin: add linux-specific "guest" states. #2439

cpu plugin: add linux-specific "guest" states. #2439

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!