8000 cpu plugin: add linux-specific "guest" states. by xavierog · Pull Request #2439 · collectd/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cpu plugin: add linux-specific "guest" states. #2439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 23, 2017
Merged

cpu plugin: add linux-specific "guest" states. #2439

merged 6 commits into from
Sep 23, 2017

Conversation

xavierog
Copy link
Contributor

Hello,

The "cpu" plugin currently lacks two Linux-specific CPU states:

  • guest (since Linux 2.6.24) Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
  • guest_nice (since Linux 2.6.33) Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel).

(excerpts from procfs(5))

This patch brings support for guest and guest_nice CPU states into collectd. It lacks testing on non-Linux systems though. Feedback on whatever can be improved would be greatly appreciated.

@rpv-tomsk
Copy link
Contributor
rpv-tomsk commented Sep 22, 2017

Hi!

As far I remember my investigations on this, guest and guest_nice time also accounted in user and nice values. In current state of cpu plugin, reported values in sum are nearby to 100%.
So, the user and nice should be reduced appropriately.

An example.

# uname -a
Linux xxx 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
# uptime
 04:16:18 up 1783 days, 21:08,  1 user,  load average: 0.30, 0.20, 0.14
# cat /proc/stat
cpu  23444831569 0 12909031514 89822723313 119722820 396457 1327218429 0 20301531579
cpu0 2994249468 0 1966516348 9971887995 53197509 396457 593614204 0 2310174506
cpu1 3671834014 0 1633443143 10562045363 11194735 0 128315878 0 3368986106
cpu2 3421247569 0 1575312753 10848305097 8955271 0 120350269 0 3126493824
cpu3 3038677912 0 1498977328 11314302612 6545621 0 112337605 0 2746774831
cpu4 3502103600 0 2156777287 10359483180 26244217 0 127265540 0 2949443788
cpu5 2588314656 0 1481738425 11796379451 5276000 0 97330483 0 2240763230
cpu6 2256636701 0 1359755391 12280178936 4545391 0 80678984 0 1915469013
cpu7 1971767647 0 1236510835 12690140674 3764072 0 67325463 0 1643426278

That machine is a server running several KVM VM, no other load inside the OS itself.

Let's look for a first cpu counters.

cpu0 
2994249468   user
0            nice
1966516348   system
9971887995   idle
53197509     iowait
396457       irq
593614204    softirq
0            steal
2310174506   guest

While guest not reported, first 8 values in sum will give the value, comparable with server uptime.

The sum of first 8 values: 155 798 619 81
Uptime in seconds*100: 154 137 600 00

An error between these values is about 1% for ~5 years of uptime.

So, that sum can be correctly used as 100% of CPU when percentage is calculated / on stacked charts.

This also confirms what guest value also accounted in user value.
That must be taken into account.

But if we start to reduce user and nice, that will be backward-incompatible change for people who do not ready for a such. May be that should be tunable?

@rpv-tomsk
Copy link
Contributor
rpv-tomsk commented Sep 22, 2017

Mine 1.5 years-old version of this patch is below. That is not used by me, for unknown reasons.

Sorry, I can't remember all the details about.

--- a/src/cpu.c
+++ b/src/cpu.c
@@ -97,8 +97,10 @@
 #define COLLECTD_CPU_STATE_SOFTIRQ 6
 #define COLLECTD_CPU_STATE_STEAL 7
 #define COLLECTD_CPU_STATE_IDLE 8
-#define COLLECTD_CPU_STATE_ACTIVE 9 /* sum of (!idle) */
-#define COLLECTD_CPU_STATE_MAX 10 /* #states */
+#define COLLECTD_CPU_STATE_GUEST 9
+#define COLLECTD_CPU_STATE_GUEST_NICE 10
+#define COLLECTD_CPU_STATE_ACTIVE 11 /* sum of (!idle) */
+#define COLLECTD_CPU_STATE_MAX 12 /* #states */

 #if HAVE_STATGRAB_H
 # include <statgrab.h>
@@ -124,6 +126,8 @@ static const char *cpu_state_names[] = {
        "softirq",
        "steal",
        "idle",
+       "guest",
+       "guest_nice",
        "active"
 };

@@ -610,7 +614,7 @@ static int cpu_read (void)
        FILE *fh;
        char buf[1024];

-       char *fields[9];
+       char *fields[11];
        int numfields;

        if ((fh = fopen ("/proc/stat", "r")) == NULL)
@@ -628,26 +632,62 @@ static int cpu_read (void)
                if ((buf[3] < '0') || (buf[3] > '9'))
                        continue;

-               numfields = strsplit (buf, fields, 9);
+               numfields = strsplit (buf, fields, 11);
                if (numfields < 5)
                        continue;

                cpu = atoi (fields[0] + 3);

-               cpu_stage (cpu, COLLECTD_CPU_STATE_USER,   (derive_t) atoll(fields[1]), now);
-               cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,   (derive_t) atoll(fields[2]), now);
+               if (numfields < 10)
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER,   (derive_t) atoll(fields[1]), now);
+
+               if (numfields < 11)
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,   (derive_t) atoll(fields[2]), now);
+
                cpu_stage (cpu, COLLECTD_CPU_STATE_SYSTEM, (derive_t) atoll(fields[3]), now);
                cpu_stage (cpu, COLLECTD_CPU_STATE_IDLE,   (derive_t) atoll(fields[4]), now);

-               if (numfields >= 8)
-               {
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_WAIT,      (derive_t) atoll(fields[5]), now);
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_INTERRUPT, (derive_t) atoll(fields[6]), now);
-                       cpu_stage (cpu, COLLECTD_CPU_STATE_SOFTIRQ,   (derive_t) atoll(fields[7]), now);
+               if (numfields < 8)
+                       continue;
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_WAIT,      (derive_t) atoll(fields[5]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_INTERRUPT, (derive_t) atoll(fields[6]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_SOFTIRQ,   (derive_t) atoll(fields[7]), now);
+
+               if (numfields < 9)
+                       continue;

-                       if (numfields >= 9)
-                               cpu_stage (cpu, COLLECTD_CPU_STATE_STEAL, (derive_t) atoll(fields[8]), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_STEAL, (derive_t) atoll(fields[8]), now);
+
+               if (numfields < 10)
+                       continue;
+
+               /* the magic */
+               int status;
+               cpu_state_t *prev_user;
+               cpu_state_t *prev_guest;
+
+               status = cpu_states_alloc (cpu);
+               if (status != 0)
+                       continue;    //cpu_states_alloc() failed
+
+               prev_user  = get_cpu_state (cpu, COLLECTD_CPU_STATE_USER);
+               prev_guest = get_cpu_state (cpu, COLLECTD_CPU_STATE_GUEST);
+
+               if ( (atoll(fields[1]) - atoll(fields[9])) < (prev_user->conv.last_value.derive - prev_guest->conv.last_value.derive) ) {
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER, 0, now);
+               }
+               else {
+                       cpu_stage (cpu, COLLECTD_CPU_STATE_USER,  (derive_t)(atoll(fields[1]) - atoll(fields[9])), now);
                }
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_GUEST, (derive_t) atoll(fields[9]), now);
+
+               if (numfields < 11)
+                       continue;
+
+               cpu_stage (cpu, COLLECTD_CPU_STATE_NICE,       (derive_t)(atoll(fields[2]) - atoll(fields[10])), now);
+               cpu_stage (cpu, COLLECTD_CPU_STATE_GUEST_NICE, (derive_t) atoll(fields[10]), now);
        }
        fclose (fh);
 /* }}} #endif defined(KERNEL_LINUX) */

@xavierog
Copy link
Contributor Author
xavierog commented Sep 23, 2017

As suggested, I have added options so this change does not feel like a regression to anyone:

  • guest and guest_nice will be reported only if reportGuestState (which defaults to false) is set to true
  • if so, subtractGuestState will by default subtract guest and guest_nice from user and nice. Those who are interested in guest and guest_nice but do no not intend to stack them on top of the other CPU metrics can set it to false.

src/cpu.c Outdated
int numfields;
long long user_value, nice_value, value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

In current codestyle variable should be declared when it used at first time, if that is possible.

Thanks for your work on this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adjusted :)

Copy link
Member
@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @xavierog, thank you very much for your patch!

In addition to the inline comments, if you make a case for keeping one or both config options, please also add them to src/collectd.conf.in.

Best regards,
—octo

When set to B<true>, reports the "guest" and "guest_nice" CPU states.
Defaults to B<false>.

=item B<SubtractGuestState> B<false>|B<true>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two options to control this behavior feels a bit like premature optimization. In other words, are users actually going to use those options? (This is an honest question, not a rhetorical one; I know nothing about the new guest counters ;)

I'd prefer to implement the behavior that ReportGuestState true; SubtractGuestState true has and implement config options only if (and when) users ask for it.

Copy link
Contributor Author
@xavierog xavierog Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented these options to address the following concern:

But if we start to reduce user and nice, that will be backward-incompatible change for people who do not ready for a such. May be that should be tunable?

I believe having these options keeps you safe from any angry user, as they enable end users to revert to whatever old behaviour they were used to. So I would argue that, yes, indeed, some people will use them. Specifically, these counters are notoriously useless within VMs: people with 10 000+ VMs at $RandomCloudCompany may not want to store 20 000+ extra useless metrics. I may even use ReportGuestState myself, depending on the host I deploy collectd to :)

However, we surely can change the default values. The actual question is: what is the policy of the collectd project regarding new metrics within existing plugins?

  • Conservative? ReportGuestState false; SubtractGuestState true
  • Progressive? ReportGuestState true; SubtractGuestState true
  • Prone to delegate computations to query and graphing tools? ReportGuestState true; SubtractGuestState false

The choice is yours :)

Copy link
Member
< 8000 div data-view-component="true" class="js-comment-hide-on-error flash flash-warn flash-full">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conservative: New versions need to be backwards compatible (with the exception of new major versions). Backwards compatibility is defined by the user observable behavior, i.e. the previous config file needs to still work and all metrics that were reported previously need to still be reported.

I'd consider this a bit of an edge case, sine the previous metrics are still present. But it sounds like having users make a conscious decision to enable these metrics is reasonable.

You've made a good case for ReportGuestState – who would use SubtractGuestState?

Copy link
Contributor Author
@xavierog xavierog Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who would use SubtractGuestState?

End users who only want the raw values straight out from the kernel to graph them the way they want in one or several graphs. Typically, SubtractGuestState true is useful for people who want to stack all CPU metrics in a single graph and be done with it. But some prefer to mix stacked and non-stacked values on a single graph. Some prefer a separate graph to reflect the host vs guests CPU consumption.

I tend to focus on graph aficionados but alerting is to be considered too: some alerts and thresholds may assume that guest is included in user, others may assume that user reflects only the host-specific activity.

That being said, since you aim for a conservative policy, what will most likely happen is that end users will notice the lack of guest and guest_nice (like I did), have a look at the documentation, be glad to find they can enable them easily and hopefully learn at the same time that subtlety about them being part of user and nice (I didn't know).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, let's go with both options.

src/cpu.c Outdated
cpu_stage(cpu, COLLECTD_CPU_STATE_STEAL, (derive_t)atoll(fields[8]), now);
if (numfields >= 10) { /* Guest (since Linux 2.6.24) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please close the >= 9 block here and out-dent the >= 10 and >= 11 cases. I.e.:

if (numfields >= 9) {
  /* … */
}
if (numfields >= 10) {
  /* … */
}
if (numfields >= 11) {
  /* … */
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I get the (numfields >= 9) block out too? It is part of the (numfields >= 8) block, and that's why I chose to nest (numfields >= 10) and (numfields >= 11).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that'd be great :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 092b2d8. If needed, I'll rebase/squash everything after the discussions.

Copy link
Member
@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks @xavierog!

I'll wait for the CI system to catch up before merging.

@xavierog
Copy link
Contributor Author

As requested, I have updated src/collectd.conf.in to reflect all options accepted by the cpu plugin.

@octo octo merged commit ac7a73e into collectd:master Sep 23, 2017
@xavierog
Copy link
Contributor Author

Thank you very much :)

@octo
Copy link
Member
octo commented Sep 23, 2017

Thank you!

/* Guest is included in User; optionally subtract Guest from User: */
if (subtract_guest) {
user_value -= value;
if (user_value < 0) user_value = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Do you think (or, possible, know) is that condition is possible at practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel code:

root@hs01:~/srv01/linux-3.2.78# cat /root/srv01/linux-3.2.78/kernel/sched.c |grep -A 50 account_guest_time
static void account_guest_time(struct task_struct *p, cputime_t cputime,
                               cputime_t cputime_scaled)
{
        cputime64_t tmp;
        struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;

        tmp = cputime_to_cputime64(cputime);

        /* Add guest time to process. */
        p->utime = cputime_add(p->utime, cputime);
        p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
        account_group_user_time(p, cputime);
        p->gtime = cputime_add(p->gtime, cputime);

        /* Add guest time to cpustat. */
        if (TASK_NICE(p) > 0) {
                cpustat->nice = cputime64_add(cpustat->nice, tmp);
                cpustat->guest_nice = cputime64_add(cpustat->guest_nice, tmp);
        } else {
                cpustat->user = cputime64_add(cpustat->user, tmp);
                cpustat->guest = cputime64_add(cpustat->guest, tmp);
        }
}

So, that will be possible only on counter overflow, which should be practically impossible.
All is ok.

@octo octo added this to the 5.8 milestone Oct 11, 2017
@maryamtahhan maryamtahhan mentioned this pull request Oct 12, 2017
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0