Add slurm plugin #3037

pllopis · 2019-01-11T15:16:00Z

This PR adds a new plugin called slurm that collects node, job and internal health statistics for the SLURM workload manager (see https://slurm.schedmd.com/).

In addition to the plugin implementation in src/slurm.c, I added new corresponding types to types.db, the relevant bits for autoconf & automake, README docs, collectd.conf manpage bits, collectd.conf.in changes, and the corresponding AUTHORS file contributions update.
Please let me know if I missed anything.

Note that for the auto-generated collectd.conf, I believe it just adds the LoadPlugin slurm line, although slurm needs the Globals true option to work properly; not sure if it's possible to include the macro magic in the collectd.conf.in file to have that option on by default.

Also note that even though I updated src/collectd.conf.pod, make does not update src/collectd.conf.5 for me, not sure if these manpage source files are still used or not, but adding the docs to the .pod file seemed reasonable. I manually checked using pod2man that it parsed and rendered correctly.

ChangeLog: The new SLURM plugin gathers metrics from the SLURM workload manager.

rpv-tomsk · 2019-01-30T07:45:41Z

Hi!

Thanks for your work on Collectd!

... although slurm needs the Globals true option to work properly;

Can you please clarify, why Globals true is needed for your plugin?

pllopis · 2019-01-30T08:44:44Z

Hi, thanks!

Without Globals true, the plugin will complain about undefined symbols (in the slurm library code). It is a known issue that the slurm code requires RTLD_GLOBAL. To quote the developers:

Slurm is heavily dependent upon symbols loaded by dlopen() being globally available.

and

Making slurm without RTLD_GLOBAL would involve a very large effort.

Therefore, and after finding the option here, I thought that this would be acceptable. After setting Globals true no undefined symbol dlopen errors occur.

rpv-tomsk · 2019-01-31T10:10:26Z

Thank you for the detailed explanation!

rpv-tomsk · 2019-05-27T10:19:37Z

Hi,

you asked about 5.9 / merge. Let's discuss some implementation details in comments...

rpv-tomsk · 2019-05-27T10:27:36Z

src/slurm.c

+}
+
+static void slurm_submit_stats(stats_info_response_msg_t *stats_resp) {
+  slurm_submit("stats", "slurm_stats", "parts_packed",


The main point that catches me is slurm_stats type.

Collectd types has idea what type describes some area, but here we see all data sent in one slurm_stats type.
I think, that is wrong and needs more work (and more types).

Good example is jobs_* metric, which you put into type instance.
From my POV on Collectd types, that should be own type, possible slurm_jobs if no better type found.
And submitted/started/completed/canceled/failed should be type instances.
This type should be DERIVE (see cat types.db|grep _requests), I think these jobs_* grows continuosly and we mostly interested in change rate.

Install rrdtool plugin + setup collection3 and you will see what are benefits ;-)
With slurm_jobs implemented described way, you can get one chart which shows you all these metrics grouped, and, as I understand, their sum will give us 100% of jobs, so you can see distribution.

Another note - I think you should not set plugin instance value at all. Constant value in that field has no sence, I think. UPD: Ok, I found you send partition name in plugin instance... This makes this remark void, as there is no better solution now. (I really think about adding hierarhy field in addition to plugin and type).

Hi, thanks a lot for the feedback!

I think you are definitely correct about some of the metrics needing a different RRD type. These are all global types that are not bound to a specific partition, and as you pointed out some make sense to add as DERIVE, also some as COUNTER like *_sum metrics which are monotonically increasing.

However the granularity in which new types are added or metrics grouped together is not clear to me (see below). I intended to group metrics together in a way that makes sense, yet does not pollute types.db with lots of new types.

I am not using collection3, but currently the data get funneled through various dbs/frameworks and are displayed with Grafana at the end of the "pipeline", where they are already grouped by their metric name prefix (that is, by their type instance prefix).

While it would be certainly possible to create a type slurm_job as DERIVE, for the other metrics it is a bit less clear: Not all bf_* metrics and schedule_* metrics are of the same type, and therefore it would not be possible to create a slurm_schedule or slurm_bf type to group them together. Similarly, many other types would need to be created if the goal is to separate/plot them based on the type, in some cases like parts_packed, server_thread_count, agent_queue_size, these would each need a new type (one type per single metric). Unless, they are all grouped together somehow.

So I guess my question is: what granularity should we use to group them vs creating new types?

IMHO it would make sense to use slurm_stats as plugin instance to group all these global statistics together, and then use the generic types (counter for COUNTER type metrics, derive for DERIVE type metrics, gauge for GAUGE type metrics), and keep the metric names (type instance) as they are.
This would achieve: 1) each having their correct type 2) they are still grouped together as "slurm_stats" (based on the plugin instance), leaving finer granularity grouping up to the plotting software, and 3) not polluting types.db with lots of new types.

Does this make sense from a collectd/rrd point of view? Would you prefer an implementation of this change?

Thanks!

I think you should add new types when you think it is needed.
If in doubt, lets discuss.

counter for COUNTER type metrics, derive for DERIVE type metrics, gauge for GAUGE type metrics

I think that is not good solution, because one should do grouping, discussed above, anyway.

Not all bf_* metrics and schedule_* metrics are of the same type, and therefore it would not be possible to create a slurm_schedule or slurm_bf type to group them together.

That is not needed, that is OK to split them into subgroups.

->> https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatStruct
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStats
https://github.com/collectd/collectd/blob/master/src/virt.c#L1707

Method provides "memory stats". Sometime ago that data was sent as one type with type instances. But after careful review, part of data now sent as two other types, existing before and used by other plugins ("widely used types"):
swap_io type (in+ out) ; ps_pagefaults type (major + minor)
and 'last_update' value is not sent at all.

leaving finer granularity grouping up to the plotting software,

As you already use your plugin, you already do some grouping.
I also think your added grouping will not avoid possibility to do another grouping in plotting software. But presence of that grouping could be useful. So, provide it, please )

and 3) not polluting types.db with lots of new types.
I dont think that is real problem. My POV is to freely add new types if required.

… collectd.conf

Do not decouple node flags from node states as was the case up to now, as it will not be possible to tell which node flag corresponds to which node state. Instead, node state includes the node state flags (such as draining, non_responding, etc) in the name.

pllopis · 2019-05-29T10:07:01Z

I just updated the code with reworked types following our discussion.

There are now 6 types in total:

slurm_job_state (per-partition job states, same as before)
slurm_node_state (per-partition node states, same as before)
slurm_stats_load (slurm load that comprises several metrics: server and agent thread counts and queue sizes)
slurm_stats_backfill (backfill related metrics)
slurm_stats_cycles (scheduler cycles related metrics)
slurm_stats_jobs (global stats about job requests as derived metrics)

I also rebased against the latest commits in master. Please let me know what you think about these improvements.

rpv-tomsk · 2019-05-29T10:27:08Z

src/types.db

@@ -231,6 +231,12 @@ serial_octets           rx:DERIVE:0:U, tx:DERIVE:0:U
 signal_noise            value:GAUGE:U:0
 signal_power            value:GAUGE:U:0
 signal_quality          value:GAUGE:0:U
+slurm_job_state         value:GAUGE:0:U
+slurm_stats_load        server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U


Ouch, misunderstanding occured.

Types with multiple values are not preferred. all values should go to its own types with type_instance as before.

Most of types should have only one data_source named value.
I think, adding of new types with multiple data_sources will be very rare.

Is there is documentation about these statistics values?

Please understand, what threads_count and queue_size is different metrics (that is different kind of load), and they should go to different types, they should not be mixed together.

I will try to fill a table of proposed types.

rpv-tomsk · 2019-05-29T10:27:53Z

src/types.db

+slurm_node_state        value:GAUGE:0:U
+slurm_stats_backfill    backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U
+slurm_stats_cycles      max:COUNTER:0:U, last:GAUGE:0:U, sum:COUNTER:0:U, counter:COUNTER:0:U, depth:DERIVE:0:U, queue_len:GAUGE:0:U
+slurm_stats_jobs        submitted:DERIVE:0:U, started:DERIVE:0:U, completed:DERIVE:0:U, canceled:DERIVE:0:U, failed:DERIVE:0:U


I think, slurm_stats_jobs type + submitted/started/completed/canceled/failed _type_instance would be good.

rpv-tomsk · 2019-05-29T10:40:58Z

src/types.db

+slurm_stats_load        server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U
+slurm_node_state        value:GAUGE:0:U
+slurm_stats_backfill    backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U
+slurm_stats_cycles      max:COUNTER:0:U, last:GAUGE:0:U, sum:COUNTER:0:U, counter:COUNTER:0:U, depth:DERIVE:0:U, queue_len:GAUGE:0:U


depth and queue len is not directly related to cycles count, I think.
So, that also should not be mixed.

I think, you don't really need max, last, count metrics.
Did you use them?

If you look into https://github.com/SchedMD/slurm/blob/master/src/sdiag/sdiag.c:

if (buf->schedule_cycle_counter > 0) { printf("\tMean cycle: %u\n", buf->schedule_cycle_sum / buf->schedule_cycle_counter); printf("\tMean depth cycle: %u\n", buf->schedule_cycle_depth / buf->schedule_cycle_counter); }

So, you can use this piece of code as example, and submit metrics

average cycle duration (if I understand things correctly)

average cycle depth (I don't know what the depth is)

If you look to another example:

if ((buf->req_time - buf->req_time_start) > 60) { printf("\tCycles per minute: %u\n", (uint32_t) (buf->schedule_cycle_counter / ((buf->req_time - buf->req_time_start) / 60))); }

This shows us we need use DERIVE to submit buf->schedule_cycle_counter. Such metric will show us 'cycles per second' metric. Notice we don't need to use req_time and req_time start, Collectd will save and use time internally.

rpv-tomsk · 2019-05-29T10:41:14Z

src/types.db

+slurm_job_state         value:GAUGE:0:U
+slurm_stats_load        server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U
+slurm_node_state        value:GAUGE:0:U
+slurm_stats_backfill    backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U


This also mixes all-into-one, but in different form.
The goal was not in replacing type_instance into value index.

Do you really found all these metrics useful?
I found no documentation about these values, so there is no reasons to report all them.

From my POV we do engineering, so we should argue all things.
If there is no such ready documentation, please fill a table with all values you have from slurm_get_statistics() call,
add explanation about each that value and that will argue filling subsequent columns by type and type_instance values.

That is the thing should be done before writing the code.

Unfortunately the slurm documentation of the api is very lacking, but the best you can find is the sdiag manpage and sdiag.c source. This is the official slurm diagnosis tool which reports mostly the same metrics that I was aiming to include in this plugin.

I will add a reference to these in the collectd manpage and in the slurm.c module as a comment, as this would make the meaning of each metric easier to look up.

All these metrics are useful and needed, this is why the official slurm diagnosis tool reports all of those that I have included. (In my particular case some of these metrics may be constant and do not vary much or at all, but this is not a reason for excluding them, as they may be more useful for other slurm clusters.)

In theory metrics that end in _sum or _max could be derived from _last* kind of metrics, but in practice I found that these usually diverge, so keeping them seems sensible. Also, it may make it easier to have a global mean just like sdiag does by computing sum / count.

I will propose a new type listing and we can discuss. Would it be ok to re-use other types, e.g. threads for the server_thread_count, queue_length for the agent_queue_size (this would result in type: queue_length, type_instance: agent_queue_size), etc? Or is it preferred to duplicate the metric type with something like slurmctld_thread_count, slurm_agent_queue_size, etc?

Would it be ok to re-use other types, e.g. threads for the server_thread_count, queue_length for the agent_queue_size, etc?

That would be ok.

In theory metrics that end in _sum or _max could be derived from _last* kind of metrics, but in practice I found that these usually diverge, so keeping them seems sensible.

I think max has no sence, as it will never decrease, until restart.
So, it is better to look for specified interval of _last, if that is needed.

Also, I think possibly _sum should go to some DERIVE type, but this not a rule, but an opinion.

Thank you a lot for your fast and helpful responses.

Also, I think possibly _sum should go to some DERIVE type, but this not a rule, but an opinion.

Yes, I think this makes sense. Possibly COUNTER, although it may get reset (a condition different from overflow).

I have been experimenting with DERIVE for slurm_stats_jobs and I notice something unexpected: While gauge values will display the following values for one of the metrics:

1559136017.745,41.000000 1559136077.575,42.000000 1559136137.578,42.000000 1559136197.575,43.000000

changing the same metric to DERIVE or COUNTER will provide the following values:

1559136416.574,0.000000 1559136476.575,2345593565836.732910

Is this expected? (This is output from the CSV plugin, and I see the same values on my grafana dashboards) It seems like an increment of 1 for this metric is causing the derivative to jump from 0 to a very high value, while I was expecting a value of 1. I may be misunderstanding something here, but wanted to clarify before switching everything to DERIVE types.

It seems like an increment of 1 for this metric is causing the derivative to jump from 0 to a very high value, while I was expecting a value of 1.

If I understand correctly, when you possibly has to change GAUGE to DERIVE you had to restart all services, as there are internal states.

but wanted to clarify before switching everything to DERIVE types.

If value monotonously growing, that is candidate to DERIVE.

Possibly COUNTER, although it may get reset (a condition different from overflow).

I think, your plugin should not use COUNTER, that is rarely used and specific type.

Use GAUGE and DERIVE.

kwiatrox · 2019-10-18T14:27:21Z

src/slurm.c

+    slurm_free_job_info_msg(job_buffer_ptr);
+    slurm_free_node_info_msg(node_buffer_ptr);
+    slurm_free_partition_info_msg(part_buffer_ptr);
+    ERROR("alloc_partition_states");


Suggestion: add PLUGIN_NAME to all ERRORs like:
ERROR(PLUGIN_NAME ": alloc_partition_states error.");

pllopis · 2020-02-25T11:44:52Z

Hi,
I am adding three changes:

Already commited the suggested improvement to use PLUGIN_NAME in the ERRORs.
Already commited the updated fix to use the new common.h location
Regarding the slurm types, I propose the following changes summarized in the table below.
I have started from the types that I last commited (left column), and propose (middle column) to either remove the type, or to change it.
As discussed, there are no more use of multi-value types, or of COUNTER types.

Note that some types like slurm_cycle_last and slurm_cycle_duration may seem redundant since they're both measuring cycle duration. However, slurm_cycle_duration is a derived type using the difference since the last collectd reading, while slurm_cycle_last is just the last cycle. They can end up being different plots, since in slurm cycles do not happen necessarily at regular intervals; therefore if the last cycle was a spike for any reason, the derived value could end up being a "softened" version that loses the original signal. Therefore I'm specifying and submitting both in order to sample the "last cycle" as well. Same thing for slurm_last_cycle_depth and slurm_cycle_depth.

Type as of last review	New RRD Type	Comment
slurm_job_state	`value:GAUGE:0:U`	Use type_instance for pending/running/suspended/complete/..
server_thread_count	Remove.	Use `threads`, server_thread_count as type_instance
agent_thread_count	Remove.	Use `threads`, agent_thread_count as type_instance
agent_queue_size	Remove.	Use `queue_length`, agent_queue_size as type_instance
dbd_agent_queue_size	Remove.	Use `queue_length`, dbd_agent_queue_size as type_instance
slurm_node_state	`value:GAUGE:0:U`	Use type_instace for the actual node state
slurm_backfilled_jobs	`value:DERIVE:0:U`
backfilled_pack_jobs	Remove.	Use `slurm_backfilled_jobs` as type, backfilled_pack_jobs as type_instance
cycle_counter	`slurm_cycles value:DERIVE:0:U`	Derive number of cycles since last reading. Use min=0 for overflows.
cycle_last	`slurm_cycle_last value:GAUGE:0:U`	Duration of last cycle.
cycle_max	Remove.	Not used.
cycle_sum	`slurm_cycle_duration value:DERIVE:0:U`	Derive duration of cycles since last reading. Use min=0 for overflows.
cycle_last_depth	`slurm_last_cycle_depth value:GAUGE:0:U`
cycle_last_depth_try	Remove.	Use `slurm_last_cycle_depth_try` as type_instance
depth_sum	`slurm_cycle_depth value:DERIVE:0:U`
depth_try_sum	Remove.	Use `slurm_cycle_depth` with different type_instance like depth_try_sum.
bf_queue_len	Remove.	Use `queue_length`, change type_instance
bf_queue_len_sum	Remove.	Not used.
cycles_max	Remove.	Not used.
cycles_last	Remove.	Use `slurm_cycle_last` with diff. type_instance
cycles_sum	Remove.	Use `slurm_cycle_duration` with diff. type_instance
cycles_counter	Remove.	Use `slurm_cycles` with diff. type_instance
cycles_depth	Remove.	Use `slurm_cycle_depth` with diff. type_instance
cycles_queue_length	Remove.	Use `queue_length` with diff type_instance
slurm_stats_jobs	`value:GAUGE:0:U`	use type_instance for submitted/started/completed/canceled/failed

If you find these new types to be OK, I will submit a commit which implements the use of these new types very soon, (hopefully to make it before the code freeze).

dago · 2020-02-25T11:55:03Z

Looks good to me. @mrunge would you also mind having a look?

pllopis · 2020-02-26T10:46:05Z

Just to be clear:

slurm_job_state value:GAUGE:0:U
slurm_node_state value:GAUGE:0:U
slurm_backfilled_jobs value:DERIVE:0:U
slurm_cycles value:DERIVE:0:U
slurm_cycle_last value:GAUGE:0:U
slurm_cycle_duration value:DERIVE:0:U
slurm_last_cycle_depth value:GAUGE:0:U
slurm_cycle_depth value:DERIVE:0:U
slurm_job_stats value:DERIVE:0:U
slurm_queue_length value:DERIVE:0:U

Are these OK, should I try to go for more generic names, dropping the slurm_ as much as possible to make them reusable, or should I add new types prefixed by the plugin name (like, for instance, the mysql plugin is doing)? Is there an agreement of what the naming should be?

The reason I'm asking is because collectdctl listval shows very verbose/redundant naming:

(extract)

slurm02.cern.ch/slurm-slurm_backfill_stats/queue_length-backfill_last_queue_length
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_backfilled_jobs-backfilled_jobs
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_backfilled_jobs-backfilled_pack_jobs
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_depth-backfill_cycle_depth
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_depth-backfill_cycle_depth_try
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_duration-backfill_cycle_duration
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_last-last_backfill_cycle
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycles-backfill_cycles
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_last_cycle_depth-backfill_last_cycle_depth
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_last_cycle_depth-backfill_last_cycle_depth_try
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_queue_length-backfill_queue_length
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-canceled
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-completed
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-failed
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-started
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-submitted

I already have a code implementation of this, but since I remain a bit confused about how to name things and when/how to add new types, I wanted to make sure we get this right :)

pllopis · 2020-02-26T15:35:24Z

I just pushed the code implementation of what I described above. Let me know if you want to do any renaming (or have any other concerns).

dago · 2020-02-27T10:03:33Z

This is fine with me. @rpv-tomsk would you also mind having a look?

kwiatrox

LGTM

lukeyeager · 2020-07-23T16:51:26Z

src/slurm.c

+                      "backfilled_jobs", stats_resp->bf_backfilled_jobs);
+  slurm_submit_derive("slurm_backfill_stats", "slurm_backfilled_jobs",
+                      "backfilled_pack_jobs",
+                      stats_resp->bf_backfilled_pack_jobs);


FYI this is broken on Slurm 20.02. They renamed this field.

Indeed, I'll open a new PR to fix this.

octo added the New plugin label Feb 1, 2019

collectd-bot added this to the Features milestone Feb 1, 2019

pllopis force-pushed the slurm branch from 30372a5 to 904e50d Compare April 1, 2019 16:58

rpv-tomsk modified the milestones: Features, 5.9 May 20, 2019

pllopis mentioned this pull request May 27, 2019

Collectd 5.9 release preparation #3115

Closed

rpv-tomsk reviewed May 27, 2019

View reviewed changes

pllopis added 13 commits May 29, 2019 11:50

Add new slurm plugin

6139038

Add common node state flags

ae4c067

Add slurm to collectd.conf.in so it is included in the auto-generated…

a47df7e

… collectd.conf

Add slurm and libslurm to README

604aaf4

Add new contributions to AUTHORS

00d1ac2

clang-format slurm.c source

5f980a2

fix merge conflict in configure.ac

9647712

add slurm types

41c80b0

Add collectd.conf manpage slurm documentation

4cc53e5

Add slurm internal stats

8da4793

Minor doc additions

59b7dad

Rework slurm types

f68d939

pllopis force-pushed the slurm branch from 904e50d to f68d939 Compare May 29, 2019 10:02

rpv-tomsk reviewed May 29, 2019

View reviewed changes

mrunge modified the milestones: 5.9, 5.10 Jul 10, 2019

kwiatrox reviewed Oct 18, 2019

View reviewed changes

dago modified the milestones: 5.10.0, 5.11.0 Feb 17, 2020

Use new location for common.h

1047153

dago added the Pending contributor action label Feb 25, 2020

Add PLUGIN_NAME to all ERRORs

b946100

pllopis force-pushed the slurm branch from 5c4c0a3 to b946100 Compare February 25, 2020 09:42

dago removed the Pending contributor action label Feb 25, 2020

pllopis added 2 commits February 26, 2020 16:32

Rework slurm types

8b31311

contrib/format.sh src/slurm.c

7c3bc34

dago requested a review from sunkuranganath February 29, 2020 12:57

kwiatrox self-requested a review March 3, 2020 11:25

kwiatrox approved these changes Mar 3, 2020

View reviewed changes

dago merged commit 185c754 into collectd:master Mar 3, 2020

lukeyeager mentioned this pull request Jul 23, 2020

New slurm read plugins (do not necessarily merge) #1198

Closed

lukeyeager reviewed Jul 23, 2020

View reviewed changes

pllopis mentioned this pull request Aug 12, 2020

slurm: compatibility with Slurm >= 20.02 #3518

Merged

ssenator mentioned this pull request Sep 16, 2020

integrate collectd slurm plugin to generate data for #189 visualization (https://github.com/hpc/hpc-collab/issues/189) hpc/hpc-collab#197

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add slurm plugin #3037

Add slurm plugin #3037

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add slurm plugin #3037

Add slurm plugin #3037

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!