8000 cpu plugin: add linux-specific "guest" states. by xavierog · Pull Request #2439 · collectd/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

cpu plugin: add linux-specific "guest" states. #2439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 23, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/collectd.conf.in
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,9 @@
# ReportByCpu true
# ReportByState true
# ValuesPercentage false
# ReportNumCpu false
# ReportGuestState false
# SubtractGuestState true
#</Plugin>
#
#<Plugin csv>
Expand Down
13 changes: 13 additions & 0 deletions src/collectd.conf.pod
Original file line number Diff line number Diff line change
Expand Up @@ -1474,6 +1474,19 @@ in the un-aggregated (per-CPU, per-state) mode as well.
When set to B<true>, reports the number of available CPUs.
Defaults to B<false>.

=item B<ReportGuestState> B<false>|B<true>

When set to B<true>, reports the "guest" and "guest_nice" CPU states.
Defaults to B<false>.

=item B<SubtractGuestState> B<false>|B<true>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having two options to control this behavior feels a bit like premature optimization. In other words, are users actually going to use those options? (This is an honest question, not a rhetorical one; I know nothing about the new guest counters ;)

I'd prefer to implement the behavior that ReportGuestState true; SubtractGuestState true has and implement config options only if (and when) users ask for it.

Copy link
Contributor Author
@xavierog xavierog Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented these options to address the following concern:

But if we start to reduce user and nice, that will be backward-incompatible change for people who do not ready for a such. May be that should be tunable?

I believe having these options keeps you safe from any angry user, as they enable end users to revert to whatever old behaviour they were used to. So I would argue that, yes, indeed, some people will use them. Specifically, these counters are notoriously useless within VMs: people with 10 000+ VMs at $RandomCloudCompany may not want to store 20 000+ extra useless metrics. I may even use ReportGuestState myself, depending on the host I deploy collectd to :)

However, we surely can change the default values. The actual question is: what is the policy of the collectd project regarding new metrics within existing plugins?

  • Conservative? ReportGuestState false; SubtractGuestState true
  • Progressive? ReportGuestState true; SubtractGuestState true
  • Prone to delegate computations to query and graphing tools? ReportGuestState true; SubtractGuestState false

The choice is yours :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conservative: New versions need to be backwards compatible (with the exception of new major versions). Backwards compatibility is defined by the user observable behavior, i.e. the previous config file needs to still work and all metrics that were reported previously need to still be reported.

I'd consider this a bit of an edge case, sine the previous metrics are still present. But it sounds like having users make a conscious decision to enable these metrics is reasonable.

You've made a good case for ReportGuestState – who would use SubtractGuestState?

Copy link
Contributor Author
@xavierog xavierog Sep 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who would use SubtractGuestState?

End users who only want the raw values straight out from the kernel to graph them the way they want in one or several graphs. Typically, SubtractGuestState true is useful for people who want to stack all CPU metrics in a single graph and be done with it. But some prefer to mix stacked and non-stacked values on a single graph. Some prefer a separate graph to reflect the host vs guests CPU consumption.

I tend to focus on graph aficionados but alerting is to be considered too: some alerts and thresholds may assume that gues 10000 t is included in user, others may assume that user reflects only the host-specific activity.

That being said, since you aim for a conservative policy, what will most likely happen is that end users will notice the lack of guest and guest_nice (like I did), have a look at the documentation, be glad to find they can enable them easily and hopefully learn at the same time that subtlety about them being part of user and nice (I didn't know).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, let's go with both options.


This option is only considered when B<ReportGuestState> is set to B<true>.
"guest" and "guest_nice" are included in respectively "user" and "nice".
If set to B<true>, "guest" will be subtracted from "user" and "guest_nice"
will be subtracted from "nice".
Defaults to B<true>.

=back

=head2 Plugin C<cpufreq>
Expand Down
68 changes: 54 additions & 14 deletions src/cpu.c
8000
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,11 @@
#define COLLECTD_CPU_STATE_INTERRUPT 5
#define COLLECTD_CPU_STATE_SOFTIRQ 6
#define COLLECTD_CPU_STATE_STEAL 7
#define COLLECTD_CPU_STATE_IDLE 8
#define COLLECTD_CPU_STATE_ACTIVE 9 /* sum of (!idle) */
#define COLLECTD_CPU_STATE_MAX 10 /* #states */
#define COLLECTD_CPU_STATE_GUEST 8
#define COLLECTD_CPU_STATE_GUEST_NICE 9
#define COLLECTD_CPU_STATE_IDLE 10
#define COLLECTD_CPU_STATE_ACTIVE 11 /* sum of (!idle) */
#define COLLECTD_CPU_STATE_MAX 12 /* #states */

#if HAVE_STATGRAB_H
#include <statgrab.h>
Expand All @@ -119,7 +121,7 @@

static const char *cpu_state_names[] = {"user", "system", "wait", "nice",
"swap", "interrupt", "softirq", "steal",
"idle", "active"};
"guest", "guest_nice", "idle", "active"};

#ifdef PROCESSOR_CPU_LOAD_INFO
static mach_port_t port_host;
Expand Down Expand Up @@ -193,9 +195,12 @@ static _Bool report_by_cpu = 1;
static _Bool report_by_state = 1;
static _Bool report_percent = 0;
static _Bool report_num_cpu = 0;
static _Bool report_guest = 0;
static _Bool subtract_guest = 1;

static const char *config_keys[] = {"ReportByCpu", "ReportByState",
"ReportNumCpu", "ValuesPercentage"};
"ReportNumCpu", "ValuesPercentage",
"ReportGuestState", "SubtractGuestState"};
static int config_keys_num = STATIC_ARRAY_SIZE(config_keys);

static int cpu_config(char const *key, char const *value) /* {{{ */
Expand All @@ -208,6 +213,10 @@ static int cpu_config(char const *key, char const *value) /* {{{ */
report_by_state = IS_TRUE(value) ? 1 : 0;
else if (strcasecmp(key, "ReportNumCpu") == 0)
report_num_cpu = IS_TRUE(value) ? 1 : 0;
else if (strcasecmp(key, "ReportGuestState") == 0)
report_guest = IS_TRUE(value) ? 1 : 0;
else if (strcasecmp(key, "SubtractGuestState") == 0)
subtract_guest = IS_TRUE(value) ? 1 : 0;
else
return -1;

Expand Down Expand Up @@ -524,7 +533,7 @@ static void cpu_commit_without_aggregation(void) /* {{{ */
static void cpu_commit(void) /* {{{ */
{
gauge_t global_rates[COLLECTD_CPU_STATE_MAX] = {
NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN /* Batman! */
NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN, NAN /* Batman! */
};

if (report_num_cpu)
Expand All @@ -545,7 +554,8 @@ static void cpu_commit(void) /* {{{ */
for (size_t cpu_num = 0; cpu_num < global_cpu_num; cpu_num++) {
cpu_state_t *this_cpu_states = get_cpu_state(cpu_num, 0);
gauge_t local_rates[COLLECTD_CPU_STATE_MAX] = {NAN, NAN, NAN, NAN, NAN,
NAN, NAN, NAN, NAN, NAN};
NAN, NAN, NAN, NAN, NAN,
NAN, NAN };

for (size_t state = 0; state < COLLECTD_CPU_STATE_MAX; state++)
if (this_cpu_states[state].has_value)
Expand Down Expand Up @@ -632,7 +642,7 @@ static int cpu_read(void) {
FILE *fh;
char buf[1024];

char *fields[9];
char *fields[11];
int numfields;

if ((fh = fopen("/proc/stat", "r")) == NULL) {
Expand All @@ -648,14 +658,15 @@ static int cpu_read(void) {
if ((buf[3] < '0') || (buf[3] > '9'))
continue;

numfields = strsplit(buf, fields, 9);
numfields = strsplit(buf, fields, STATIC_ARRAY_SIZE(fields));
if (numfields < 5)
continue;

cpu = atoi(fields[0] + 3);

cpu_stage(cpu, COLLECTD_CPU_STATE_USER, (derive_t)atoll(fields[1]), now);
cpu_stage(cpu, COLLECTD_CPU_STATE_NICE, (derive_t)atoll(fields[2]), now);
/* Do not stage User and Nice immediately: we may need to alter them later: */
long long user_value = atoll(fields[1]);
long long nice_value = atoll(fields[2]);
cpu_stage(cpu, COLLECTD_CPU_STATE_SYSTEM, (derive_t)atoll(fields[3]), now);
cpu_stage(cpu, COLLECTD_CPU_STATE_IDLE, (derive_t)atoll(fields[4]), now);

Expand All @@ -665,11 +676,40 @@ static int cpu_read(void) {
now);
cpu_stage(cpu, COLLECTD_CPU_STATE_SOFTIRQ, (derive_t)atoll(fields[7]),
now);
}

if (numfields >= 9)
cpu_stage(cpu, COLLECTD_CPU_STATE_STEAL, (derive_t)atoll(fields[8]),
now);
if (numfields >= 9) { /* Steal (since Linux 2.6.11) */
cpu_stage(cpu, COLLECTD_CPU_STATE_STEAL, (derive_t)atoll(fields[8]), now);
}

if (numfields >= 10) { /* Guest (since Linux 2.6.24) */
if (report_guest) {
long long value = atoll(fields[9]);
cpu_stage(cpu, COLLECTD_CPU_STATE_GUEST, (derive_t)value, now);
/* Guest is included in User; optionally subtract Guest from User: */
if (subtract_guest) {
user_value -= value;
if (user_value < 0) user_value = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Do you think (or, possible, know) is that condition is possible at practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel code:

root@hs01:~/srv01/linux-3.2.78# cat /root/srv01/linux-3.2.78/kernel/sched.c |grep -A 50 account_guest_time
static void account_guest_time(struct task_struct *p, cputime_t cputime,
                               cputime_t cputime_scaled)
{
        cputime64_t tmp;
        struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat;

        tmp = cputime_to_cputime64(cputime);

        /* Add guest time to process. */
        p->utime = cputime_add(p->utime, cputime);
        p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
        account_group_user_time(p, cputime);
        p->gtime = cputime_add(p->gtime, cputime);

        /* Add guest time to cpustat. */
        if (TASK_NICE(p) > 0) {
                cpustat->nice = cputime64_add(cpustat->nice, tmp);
                cpustat->guest_nice = cputime64_add(cpustat->guest_nice, tmp);
        } else {
                cpustat->user = cputime64_add(cpustat->user, tmp);
                cpustat->guest = cputime64_add(cpustat->guest, tmp);
        }
}

So, that will be possible only on counter overflow, which should be practically impossible.
All is ok.

}
}
}

if (numfields >= 11) { /* Guest_nice (since Linux 2.6.33) */
if (report_guest) {
long long value = atoll(fields[10]);
cpu_stage(cpu, COLLECTD_CPU_STATE_GUEST_NICE, (derive_t)value, now);
/* Guest_nice is included in Nice; optionally subtract Guest_nice from
Nice: */
if (subtract_guest) {
nice_value -= value;
if (nice_value < 0) nice_value = 0;
}
}
}

/* Eventually stage User and Nice: */
cpu_stage(cpu, COLLECTD_CPU_STATE_USER, (derive_t)user_value, now);
cpu_stage(cpu, COLLECTD_CPU_STATE_NICE, (derive_t)nice_value, now);
}
fclose(fh);
/* }}} #endif defined(KERNEL_LINUX) */
Expand Down
0