8000 GaugeInc resets to NaN rather than 0 · Issue #4336 · collectd/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
GaugeInc resets to NaN rather than 0 #4336
Open
@slack2450

Description

@slack2450
  • Version of collectd: 5.12.0
  • Operating system / distribution: Amazon Linux 2023
  • Kernel version (if applicable): 6.1.119-129.201.amzn2023.x86_64

Expected behavior

I am trying to monitor /var/log/messages using the tail plugin for a few things including OOM kills.
Here is an abbreviated version of the config:

<Plugin tail>
  <File "/var/log/messages">
    Instance "dmesg_sensor"
    <Match>
      Regex "oom-kill:"
      DSType "GaugeInc"
      Type "gauge"
      Instance "oom_kill"
    </Match>
  </File>
</Plugin>

It's expected that between intervals that GaugeInc resets to 0, this was a previous issue in:
#2448

Actual behavior

When running the command collectdctl -s /var/run/collectd-socket getval hostname/tail-dmesg_sensor/gauge-oom_kill
I receive value=nan when the expected behaviour is value=0.000000e+00

I confirmed the rule is functioning by running echo > "oom-kill:" > /dev/kmsg. Which when running the command changes to value=1.000000e+00 as expected before returning to value=nan when the gauge is reset.

It's a simple fix and I think was overlooked in the previous fix. I have compiled the following git diff on top of collectd 5.12.0 and confirmed this does indeed resolve the issue.

diff --git a/src/utils_tail_match.c b/src/utils_tail_match.c
index 25714c16..597a1d46 100644
--- a/src/utils_tail_match.c
+++ b/src/utils_tail_match.c
@@ -76,7 +76,7 @@ static int simple_submit_match(cu_match_t *match, void *user_data) {

   if ((match_value->ds_type & UTILS_MATCH_DS_TYPE_GAUGE) &&
       (match_value->values_num == 0))
-    values[0].gauge = NAN;
+    values[0].gauge = (match_value->ds_type & UTILS_MATCH_CF_GAUGE_INC) ? 0 : NAN;
   else
     values[0] = match_value->value;

As a side question please could you advise how to work around this as we're using Cloudwatch which doesn't accept NaN and leaves us without metrics. Would it be possible to write a temporary plugin that converts NaN's to 0's for these specific metrics? I would appreciate any recommendations on working around this issue!

Ideally we wouldn't have to distribute our own version of collectd/the patched tail plugin whilst waiting for an official version, I was wondering what the timeline on a release might be? I'm hesitant as the last release was 4 years ago.

It doesn't look like we're the only ones having this issue:
awslabs/collectd-cloudwatch#78
https://sage.amazon.dev/posts/1675491

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0