gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

eero-t · 2022-01-11T13:08:25Z

No description provided.

byako

LGTM

uniemimu · 2022-01-14T13:23:24Z

src/gpu_sysman.c

+                       const char *type, const char *subdev, double value) {
+  metric_family_t fam = {
+      .type = METRIC_TYPE_COUNTER,
+      .name = (char *)name,


I hope these don't get written to, as in case if the incoming is truly const, writes result in undefined behavior

Compilers puts all literal strings to read-only section by default, so in actuality they are const strings. Assigning them to const char*, and propagating them e.g. to ras_submit() as such, is the correct thing to do, regardless of the enabled compiler warnings [1].

The annoying thing here is that metric_family_t.name in collectd v6 is non-const string. This is because collectd offers also functions for allocation and freeing metric families, and for this purpose the .name member cannot be const.

My code does not use / need the metric family alloc / free functions, only the metrics processing ones, so explicitly casting the literal name + help strings to non-const is safe. Besides, the same cast happens implicitly [1] in the many places where literal strings are assigned to metric_family_t.name directly.

[1] GCC does not do const checks for literal strings even with "-Wall -Wextra", one needs to explicitly use "-Write-strings" before GCC emits warnings about their assignments. "-Write-strings" GCC option would cause a large number of warnings about "metric_family_t.name" usage elsewhere than in this function, and in all the other collectd plugins though.

Allocating all the name & help strings just to be able to enable "-Wwrite-strings" warning is not worth the overhead though.

src/gpu_sysman.c

uniemimu

apart from the several division operations, and them const removals with casts, I think this looks fine. If you 8000 ensure that the divs can't be zero, and that them removed consts don't get written to, I'll say LGTM.

uniemimu · 2022-01-14T15:56:29Z

LGTM

eero-t · 2022-01-14T18:39:48Z

Rebased the test code debug output change commit to one adding it initially, rebased clang-format white-space changes to appropriate commits + added comment commit explaining the const casting.

eero-t · 2022-01-24T11:04:17Z

@uniemimu and @byako, could you review the 2 new commits, especially the last "dev_file" label one (intended to help with k8s custom rules)?

uniemimu

lgtm

byako

LGTM

eero-t · 2022-01-24T13:44:05Z

To document my chat with Alexey: I'm leaving handling of when ADD_DEV_FILE define is enabled until upstream comments on whether they: 1) want that Kubernetes Intel GPU plugin specific feature to be disabled in upstream, 2) a configure option for enabling it, or 3) configure checks for the required POSIX functions before "dev_file" label feature gets enabled.

eero-t · 2022-01-28T08:44:18Z

The reason for last two force pushes is that I'm using the reviewed branch for upstreaming (collectd#3968), and I rebased that to:

Move separate Prometheus PR (needed for testing the new gpu_sysman plugin in k8s cluster) out from a branch beneath it, to a new new dev branch above it
Apply contrib/format.sh changes that collectd CI told to run. I've been using local (LLVM 13) clang-format, but changes it does differ slightly from what remote clang-format (called by format.sh) does

Unfortunately contrib/format.sh changes did not satisfy whichever clang-format version / settings collectd CI itself runs => expect further pushes once upstream tells what actually should be changed in regards to code formatting.

…c(...) Except for gpu_subarray_alloc(), all allocs are done with calloc(). This way correctness of all of them is easy to check just by grepping for calloc (especially now that clang-format does not wrap those lines any more), and reviewing gpu_subarray_alloc().

Prometheus & OpenMetrics require metric names to be suffixed by the metric unit, and ratios (0-1) to be used instead of percentages (0-100).

…names There's now also support for multiple metrics per family although they are not used yet. "sstrncpy" is not needed any more.

Following labels are used: - sub_dev: subdevice ID (unsigned integer) - location: e.g. "gpu" / "memory" - type: e.g. "request" / "actual" - direction: "read" / "write" Additionally: * Two location label values were fixed * GPU engine indeces are now per engine type (instead of single index being used for all types) * All metric family and label names have been changed to use underscores instead of dashes to separate words, as required by Prometheus i.e. collectd does not need to convert them any more: https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

NOTE: providing NULL as label value to delete it is NOT supported. Test code will assert on labels with NULL values.

Also rename GPU struct "name" member to more explicit "pci_bdf". This allowed simplifying the code slightly. Sysman API supports nowadays also other devices than GPUs, so prefix is removed to to simplify code and to be more future-proof: https://spec.oneapi.io/level-zero/latest/core/api.html#_CPPv416ze_device_type_t (Plugin will still query only GPU devices from Sysman though.)

- do not add "pci_pdf" to metric name for matching - fix for adding metric labels to family copies of them

* Fix memory "type" label overwrite * Replace "free" memory metric with "memory_usage_ratio" one, and rename "memory_bytes" to "memory_used_bytes" metric * Split metric value aggregate function name to a separate "function" label * Have metric family declares always in same place in code * Avoid both setting metric labels, and reporting empty metrics, when higher internal sampling rate is used or there are L0 errors

* Add "memory_usage_ratio" checks * Update validation for metrics that can be sampled at higher rate i.e. have now the new aggregate function label * With empty metrics avoided, dispatch mock-up can assert on them * With extra L0 calls being skipped when not needed, number of calls can differ between query rounds: - refactor multi-sampling test to handle count changes - change error handing checks to be done in single-sampled mode * Debug output is needed to debug triggered multisample asserts, so do that when assert would have been triggered, then abort

And document why const-qual cast is safe, and why GCC does not warn about other assignments to .name & .help members.

More powerful GPUs can have a large number of engines of given type, but user may be interested only on the higher level engine groups utilization. "DisableEngineSingle" option allows skipping individual engine metrics.

This can be used to speciify whether output metrics values will be raw, derived or both. This commit add support just for the configuration option itself, adding / changing metrics to use it happens in next commit.

This adds new counter type metrics for: * memory bandwidth * frequency throttle time * engine execution time (activity) * energy usage Because collecd internally handles counters as integers, all units cannot be ones recommended by Prometheus, but microseconds and microjoules reported by Sysman.

Zero time intervals or max bandwidth would cause div-by-zero issues and (very rare) time wrap around would cause bogus metric value. Skip all of them.

Empty label equals to a missing one, and Prometheus queries can check for non-existence of a label, so let's just skip empty / unneeded ones. Main difference to earlier is that LevelZero error categories that provide non-zero values only for uncorrectable type (according to spec), are now without a type label. Correctable i.e. zero metrics for those categories were skipped already earlier.

And contrib/format.sh include re-order. "dev_file" support is behind a define (enabled by default) because it needs functions that are only part of POSIX, not C99. Intel Kubernetes GPU plugin uses primary GPU node device file names (card0, card1...) as its GPU identifiers. This new label helps in mapping Kubernetes custom metrics to them.

eero-t · 2022-06-01T08:24:29Z

This was rebased on latest upstream collectd-6.0 branch, so that "gpu_sysman-6.0" branch based on this branch, gets CI fixes from upstream => upstream's own bugs do not fail checks for upstream "gpu_sysman-6.0" branch pull request.

And document with what GCC warning options the code is tested / passes.

While for plugin that change does not really help (as target buffer is always larger than source), for test code it is useful. And it shuts up less capabable static checking tools than GCC. As test code cannot use existing collectd functionality for this (test code needs modified versions of some collectd functions, and all collectd code does not pass GCC warnings I use), sstrncpy() is copied to test code. For test code there's also a fix to size given for snprintf(), and removal of redundant string termination for modified plugin_log() copy (vsnprintf() already terminates string).

scalloc() wraps calloc() with exit on alloc failure, similarly to what smalloc() does for malloc().

If asserts were disabled, allocation failures would result in collectd memory errors => replace alloc+assert in the plugin with collectd smalloc/scalloc wrappers that exits after logging allocation error. Downsides are that this does not invoke debugger (which could be in a different control group with plenty of memory), nor tell where / what allocation failed, like enabled assert would, so test code variants of the wrappers still do asserts.

eero-t · 2022-06-08T10:23:38Z

Closing this "gpu_sysman-6.0" branch review PR, upstream finally reviewed & merged sysman plugin.

ChangeLog: collectd: Use one write thread per write plugin The previous write thread design used a single queue with a single read head from which one of the write threads would de-queue an element and would then sequentially call each registered write callback. This meant that all write plugins would have to cooperate in order to not drop values. If for example all write threads are stalled by the same write plugin's callback function not returning, the queue will start to fill up until elements start to be dropped, even though there are other plugins that could still make progress. In addition to that, all write callbacks have to be designed to be reentrant right now, which increases complexity. This new design uses a single linked-list write queue with one read head per output plugin. Each output plugin is serviced in a dedicated write thread. Elements are freed based on a reference count, which is shown in the ASCII-Art below: +- Thread #1 Head +- Thread #2 Head +- Tail v v v +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ | 0|->| 1|->| 1|->| 1|->| 1|->| 2|->| 2|->| 2|->| 2|->| 2|->X +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ +--+ ^ +- to be free()d The changes introduced by this commit have some side-effects: - The WriteThreads config option no longer exists, as a strict 1:1 ratio of write plugins and write threads is used. - The data flow has changed. The previous data flow was: (From one of the ReadThreads) plugin_dispatch_{values,multivalue}() plugin_dispatch_metric_family() enqueue_metric_family() write_queue_enqueue() -----{Queue}----+ | (In one of the WriteThreads threads) | plugin_write_thread() | ^- plugin_write_dequeue() <-+ plugin_dispatch_metric_internal() ^- fc_process_chain(pre_cache_chain) fc_process_chain(fc_process_chain) fc_bit_write_invoke() plugin_write(NULL) / plugin_write(plugin_name) plugin callback() The data flow now is: (From one of the ReadThreads) plugin_dispatch_{values,multivalue}() plugin_dispatch_metric_family() plugin_dispatch_metric_internal() ^- fc_process_chain(pre_cache_chain) fc_process_chain(post_cache_chain) fc_bit_write_invoke() plugin_write(NULL) / plugin_write(plugin_name) write_queue_enqueue() -----{Queue}----+ | (In one of the WriteThreads threads) | plugin_write_thread() <-+ plugin callback() One result of this change is, that the behaviour of plugin_write has changed from running the plugin callback immediately and in the same thread, to always enqueueing the value and de-queing in the dedicated thread. - The behaviour of the WriteQueueLimitHigh and WriteQueueLimitLow options has changed. The Queue will be be capped to a length of LimitHigh by dropping random queue elements between the queue end and LimitLow. Setting LimitLow to a reasonably large value ensures that fast write plugins do not loose values, even in the vicinity of a slow plugin. The diagram below shows the random element selected for removal (###) in Step 1 and the queue with the element removed in Step 2. Step 1: +- Thread #1 Head | +- Thread #2 Head +- Tail v | | v v +--+| +--+ #### +--+| +--+ +--+ +--+ +--+ +--+ | 1|->| 1|-># 1#->| 1|->| 2|->| 2|->| 2|->| 2|->| 2|->X +--+| +--+ #### +--+| +--+ +--+ +--+ +--+ +--+ | | | LimitHigh | LimitLow Step 2: | +- Thread #1 Head +- Thread #2 Head +- Tail | v | v v | +--+ +--+ +--+| +--+ +--+ +--+ +--+ +--+ | | 1|->| 1|->| 1|->| 2|->| 2|->| 2|->| 2|->| 2|->X | +--+ +--+ +--+| +--+ +--+ +--+ +--+ +--+ | | | LimitHigh | LimitLow Signed-off-by: Leonard Göhrs <l.goehrs@pengutronix.de>

byako approved these changes Jan 14, 2022

View reviewed changes

uniemimu reviewed Jan 14, 2022

View reviewed changes

src/gpu_sysman.c Show resolved Hide resolved

uniemimu reviewed Jan 14, 2022

View reviewed changes

src/gpu_sysman.c Outdated Show resolved Hide resolved

uniemimu reviewed Jan 14, 2022

View reviewed changes

src/gpu_sysman.c Show resolved Hide resolved

uniemimu reviewed Jan 14, 2022

View reviewed changes

uniemimu approved these changes Jan 14, 2022

View reviewed changes

eero-t force-pushed the gpu_sysman-6.0 branch from aff2f93 to c0e01a4 Compare January 14, 2022 18:37

eero-t force-pushed the gpu_sysman-6.0 branch from c0e01a4 to 9e92a15 Compare January 17, 2022 16:32

uniemimu reviewed Jan 24, 2022

View reviewed changes

byako reviewed Jan 24, 2022

View reviewed changes

eero-t force-pushed the gpu_sysman-6.0-review branch from a8fdb11 to 13c5598 Compare January 27, 2022 16:48

eero-t force-pushed the gpu_sysman-6.0 branch 2 times, most recently from eeb2ecb to 1a2ecb2 Compare January 27, 2022 17:40

eero-t mentioned this pull request Feb 3, 2022

8000 [collectd 6] Add 'gpu_sysman' plugin for (Intel) GPU metrics collectd/collectd#3968

Merged

eero-t force-pushed the gpu_sysman-6.0-review branch from 13c5598 to fcdfc1d Compare February 3, 2022 19:01

eero-t force-pushed the gpu_sysman-6.0 branch from 9e48877 to f049b82 Compare February 3, 2022 19:04

eero-t force-pushed the gpu_sysman-6.0-review branch from fcdfc1d to c51c8c8 Compare May 30, 2022 14:20

eero-t added 7 commits May 30, 2022 17:21

gpu_sysman: minimal v6 API support + add units to metric names

16038a9

Prometheus & OpenMetrics require metric names to be suffixed by the metric unit, and ratios (0-1) to be used instead of percentages (0-100).

gpu_sysman: update test code for minimal v6 API support + new metric …

77a132f

…names There's now also support for multiple metrics per family although they are not used yet. "sstrncpy" is not needed any more.

gpu_sysman: update test code to handle metrics split with labels

1d67d25

NOTE: providing NULL as label value to delete it is NOT supported. Test code will assert on labels with NULL values.

gpu_sysman: fix test code for "pci_bdf" added to metrics family

6883d36

- do not add "pci_pdf" to metric name for matching - fix for adding metric labels to family copies of them

eero-t added 10 commits May 30, 2022 17:21

gpu_sysman: add help information for all metric families

4f0e293

And document why const-qual cast is safe, and why GCC does not warn about other assignments to .name & .help members.

gpu_sysman: option for specifying metrics output type

8beb8e9

This can be used to speciify whether output metrics values will be raw, derived or both. This commit add support just for the configuration option itself, adding / changing metrics to use it happens in next commit.

gpu_sysman: skip metrics with div-by-zero or time wrap around issues

da2041b

Zero time intervals or max bandwidth would cause div-by-zero issues and (very rare) time wrap around would cause bogus metric value. Skip all of them.

gpu_sysman: fix test code -Wpedantic + -Wcast-qual warnings

fb56381

eero-t force-pushed the gpu_sysman-6.0 branch from f049b82 to 8f9843f Compare May 30, 2022 14:21

eero-t added 6 commits June 7, 2022 15:14

Move test defines from Sysman plugin to its test code

a79cb63

And document with what GCC warning options the code is tested / passes.

Pass clang-format check for gpu_sysman_test.c comments

7bb89a8

Add scalloc() wrapper similar to smalloc() to common utils

4ca4a40

scalloc() wraps calloc() with exit on alloc failure, similarly to what smalloc() does for malloc().

Pass clang-format check for gpu_sysman_test.c

910b8f2

eero-t closed this Jun 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!