8000 gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options by eero-t · Pull Request #1 · eero-t/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

gpu_sysman plugin update to v6.0 API + new labels + "raw" metrics + output options #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
60ecc88
gpu_sysman: use sizeof(*var) rather than sizeof(vartype) in var=callo…
eero-t Oct 21, 2021
16038a9
gpu_sysman: minimal v6 API support + add units to metric names
eero-t Jan 11, 2022
77a132f
gpu_sysman: update test code for minimal v6 API support + new metric …
eero-t Jan 11, 2022
8e83f42
gpu_sysman: split metric properties from their names to separate labels
eero-t Dec 21, 2021
1d67d25
gpu_sysman: update test code to handle metrics split with labels
eero-t Dec 22, 2021
772a657
gpu_sysman: remove "GPU-" prefix from name and add it "pci_pdf" label
eero-t Oct 7, 2021
6883d36
gpu_sysman: fix test code for "pci_bdf" added to metrics family
eero-t Dec 23, 2021
8b086a6
gpu_sysman: improvements to reported metrics
eero-t Dec 27, 2021
e572446
gpu_sysman: update tests for sysman plugin changes
eero-t Dec 27, 2021
4f0e293
gpu_sysman: add help information for all metric families
eero-t Jan 4, 2022
2284782
gpu_sysman: option to disable utilization metrics for single engines
eero-t Jan 4, 2022
8beb8e9
gpu_sysman: option for specifying metrics output type
eero-t Jan 4, 2022
fdea472
gpu_sysman: optional raw metrics output for already supported metrics
eero-t Jan 4, 2022
da2041b
gpu_sysman: skip metrics with div-by-zero or time wrap around issues
eero-t Jan 14, 2022
fb56381
gpu_sysman: fix test code -Wpedantic + -Wcast-qual warnings
eero-t Jan 14, 2022
7b8eb5d
gpu_sysman: add 'sub_dev' and 'type' labels only when needed
eero-t Jan 19, 2022
8f9843f
Add "dev_file" label support
eero-t Jan 21, 2022
a79cb63
Move test defines from Sysman plugin to its test code
eero-t Jun 7, 2022
91ad234
Change strcpy() in Sysman plugin to sstrncpy()
eero-t Jun 7, 2022
7bb89a8
Pass clang-format check for gpu_sysman_test.c comments
eero-t Jun 7, 2022
4ca4a40
Add scalloc() wrapper similar to smalloc() to common utils
eero-t Jun 7, 2022
12e1839
Replace Sysman plugin alloc+assert calls with smalloc/scalloc
eero-t Jun 7, 2022
910b8f2
Pass clang-format check for gpu_sysman_test.c
eero-t Jun 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/collectd.conf.in
Original file line number Diff line number Diff line change
Expand Up @@ -795,13 +795,16 @@

#<Plugin gpu_sysman>
# Samples 1
# LogGpuInfo false
# MetricsOutput both
# DisableMemory false
# DisableMemoryBandwidth false
# DisableFrequency false
# DisableThrottleTime false
# DisableTemperature false
# DisablePower false
# DisableEngine false
# DisableEngineSingle false
# DisableErrors false
# DisableSeparateErrors false
#</Plugin>
Expand Down
15 changes: 15 additions & 0 deletions src/collectd.conf.pod
Original file line number Diff line number Diff line change
Expand Up @@ -3726,6 +3726,16 @@ values is disabled, it is better to set Samples to 1 (default).
If enabled, plugin logs at start some information about all the GPUs
detected through Sysman API.

=item B<MetricsOutput>

Either "raw", "derived" or "both".

Specifies whether metrics should be reported as raw values provided
by Sysman (e.g. HW energy usage counter value in Joules) which is
preferred for use in Prometheus, as more human-readable and easier
to debug derived values (e.g. power usage gauge value in Watts), or
whether to increase number of produced metrics by reporting both.

=item B<DisableMemory>

Disable memory usage metrics collection.
Expand Down Expand Up @@ -3754,6 +3764,11 @@ Disable temperature metrics collection.

Disable engine utilization metrics collection.

=item B<DisableEngineSingle>

Disable utilization metrics collection for single engines i.e. provide
utilization information only for engine groups.

=item B<DisableErrors>

Disable RAS (Reliability, Availability, and Serviceability) error
Expand Down
Loading
0