8000 Add slurm plugin by pllopis · Pull Request #3037 · collectd/collectd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add slurm plugin #3037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 3, 2020
Merged

Add slurm plugin #3037

merged 17 commits into from
Mar 3, 2020

Conversation

pllopis
Copy link
Contributor
@pllopis pllopis commented Jan 11, 2019

This PR adds a new plugin called slurm that collects node, job and internal health statistics for the SLURM workload manager (see https://slurm.schedmd.com/).

In addition to the plugin implementation in src/slurm.c, I added new corresponding types to types.db, the relevant bits for autoconf & automake, README docs, collectd.conf manpage bits, collectd.conf.in changes, and the corresponding AUTHORS file contributions update.
Please let me know if I missed anything.

Note that for the auto-generated collectd.conf, I believe it just adds the LoadPlugin slurm line, although slurm needs the Globals true option to work properly; not sure if it's possible to include the macro magic in the collectd.conf.in file to have that option on by default.

Also note that even though I updated src/collectd.conf.pod, make does not update src/collectd.conf.5 for me, not sure if these manpage source files are still used or not, but adding the docs to the .pod file seemed reasonable. I manually checked using pod2man that it parsed and rendered correctly.

ChangeLog: The new SLURM plugin gathers metrics from the SLURM workload manager.

@rpv-tomsk
Copy link
Contributor

Hi!

Thanks for your work on Collectd!

... although slurm needs the Globals true option to work properly;

Can you please clarify, why Globals true is needed for your plugin?

@pllopis
Copy link
Contributor Author
pllopis commented Jan 30, 2019

Hi, thanks!

Without Globals true, the plugin will complain about undefined symbols (in the slurm library code). It is a known issue that the slurm code requires RTLD_GLOBAL. To quote the developers:

Slurm is heavily dependent upon symbols loaded by dlopen() being globally available.

and

Making slurm without RTLD_GLOBAL would involve a very large effort.

Therefore, and after finding the option here, I thought that this would be acceptable. After setting Globals true no undefined symbol dlopen errors occur.

@rpv-tomsk
Copy link
Contributor

Thank you for the detailed explanation!

@octo octo added the New plugin label Feb 1, 2019
@collectd-bot collectd-bot added this to the Features milestone Feb 1, 2019
@rpv-tomsk rpv-tomsk modified the milestones: Features, 5.9 May 20, 2019
@rpv-tomsk
Copy link
Contributor

Hi,

you asked about 5.9 / merge. Let's discuss some implementation details in comments...

src/slurm.c Outdated
}

static void slurm_submit_stats(stats_info_response_msg_t *stats_resp) {
slurm_submit("stats", "slurm_stats", "parts_packed",
Copy link
Contributor
@rpv-tomsk rpv-tomsk May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point that catches me is slurm_stats type.

Collectd types has idea what type describes some area, but here we see all data sent in one slurm_stats type.
I think, that is wrong and needs more work (and more types).

Good example is jobs_* metric, which you put into type instance.
From my POV on Collectd types, that should be own type, possible slurm_jobs if no better type found.
And submitted/started/completed/canceled/failed should be type instances.
This type should be DERIVE (see cat types.db|grep _requests), I think these jobs_* grows continuosly and we mostly interested in change rate.

Install rrdtool plugin + setup collection3 and you will see what are benefits ;-)
With slurm_jobs implemented described way, you can get one chart which shows you all these metrics grouped, and, as I understand, their sum will give us 100% of jobs, so you can see distribution.

Another note - I think you should not set plugin instance value at all. Constant value in that field has no sence, I think. UPD: Ok, I found you send partition name in plugin instance... This makes this remark void, as there is no better solution now. (I really think about adding hierarhy field in addition to plugin and type).

Copy link
Contributor Author
@pllopis pllopis May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks a lot for the feedback!

I think you are definitely correct about some of the metrics needing a different RRD type. These are all global types that are not bound to a specific partition, and as you pointed out some make sense to add as DERIVE, also some as COUNTER like *_sum metrics which are monotonically increasing.

However the granularity in which new types are added or metrics grouped together is not clear to me (see below). I intended to group metrics together in a way that makes sense, yet does not pollute types.db with lots of new types.

I am not using collection3, but currently the data get funneled through various dbs/frameworks and are displayed with Grafana at the end of the "pipeline", where they are already grouped by their metric name prefix (that is, by their type instance prefix).

While it would be certainly possible to create a type slurm_job as DERIVE, for the other metrics it is a bit less clear: Not all bf_* metrics and schedule_* metrics are of the same type, and therefore it would not be possible to create a slurm_schedule or slurm_bf type to group them together. Similarly, many other types would need to be created if the goal is to separate/plot them based on the type, in some cases like parts_packed, server_thread_count, agent_queue_size, these would each need a new type (one type per single metric). Unless, they are all grouped together somehow.

So I guess my question is: what granularity should we use to group them vs creating new types?

IMHO it would make sense to use slurm_stats as plugin instance to group all these global statistics together, and then use the generic types (counter for COUNTER type metrics, derive for DERIVE type metrics, gauge for GAUGE type metrics), and keep the metric names (type instance) as they are.
This would achieve: 1) each having their correct type 2) they are still grouped together as "slurm_stats" (based on the plugin instance), leaving finer granularity grouping up to the plotting software, and 3) not polluting types.db with lots of new types.

Does this make sense from a collectd/rrd point of view? Would you prefer an implementation of this change?

Thanks!

Copy link
Contributor
@rpv-tomsk rpv-tomsk May 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should add new types when you think it is needed.
If in doubt, lets discuss.

counter for COUNTER type metrics, derive for DERIVE type metrics, gauge for GAUGE type metrics

I think that is not good solution, because one should do grouping, discussed above, anyway.

Not all bf_* metrics and schedule_* metrics are of the same type, and therefore it would not be possible to create a slurm_schedule or slurm_bf type to group them together.

That is not needed, that is OK to split them into subgroups.

->> https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatStruct
https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStats
https://github.com/collectd/collectd/blob/master/src/virt.c#L1707

Method provides "memory stats". Sometime ago that data was sent as one type with type instances. But after careful review, part of data now sent as two other types, existing before and used by other plugins ("widely used types"):
swap_io type (in+ out) ; ps_pagefaults type (major + minor)
and 'last_update' value is not sent at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leaving finer granularity grouping up to the plotting software,

As you already use your plugin, you already do some grouping.
I also think your added grouping will not avoid possibility to do another grouping in plotting software. But presence of that grouping could be useful. So, provide it, please )

and 3) not polluting types.db with lots of new types.
I dont think that is real problem. My POV is to freely add new types if required.

@pllopis
Copy link
Contributor Author
pllopis commented May 29, 2019

I just updated the code with reworked types following our discussion.

There are now 6 types in total:

  • slurm_job_state (per-partition job states, same as before)
  • slurm_node_state (per-partition node states, same as before)
  • slurm_stats_load (slurm load that comprises several metrics: server and agent thread counts and queue sizes)
  • slurm_stats_backfill (backfill related metrics)
  • slurm_stats_cycles (scheduler cycles related metrics)
  • slurm_stats_jobs (global stats about job requests as derived metrics)

I also rebased against the latest commits in master. Please let me know what you think about these improvements.

src/types.db Outdated
@@ -231,6 +231,12 @@ serial_octets rx:DERIVE:0:U, tx:DERIVE:0:U
signal_noise value:GAUGE:U:0
signal_power value:GAUGE:U:0
signal_quality value:GAUGE:0:U
slurm_job_state value:GAUGE:0:U
slurm_stats_load server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U
Copy link
Contributor
@rpv-tomsk rpv-tomsk May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, misunderstanding occured.

Types with multiple values are not preferred. all values should go to its own types with type_instance as before.

Most of types should have only one data_source named value.
I think, adding of new types with multiple data_sources will be very rare.


Is there is documentation about these statistics values?

Please understand, what threads_count and queue_size is different metrics (that is different kind of load), and they should go to different types, they should not be mixed together.

I will try to fill a table of proposed types.

src/types.db Outdated
slurm_node_state value:GAUGE:0:U
slurm_stats_backfill backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U
slurm_stats_cycles max:COUNTER:0:U, last:GAUGE:0:U, sum:COUNTER:0:U, counter:COUNTER:0:U, depth:DERIVE:0:U, queue_len:GAUGE:0:U
slurm_stats_jobs submitted:DERIVE:0:U, started:DERIVE:0:U, completed:DERIVE:0:U, canceled:DERIVE:0:U, failed:DERIVE:0:U
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, slurm_stats_jobs type + submitted/started/completed/canceled/failed _type_instance would be good.

src/types.db Outdated
slurm_stats_load server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U
slurm_node_state value:GAUGE:0:U
slurm_stats_backfill backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U
slurm_stats_cycles max:COUNTER:0:U, last:GAUGE:0:U, sum:COUNTER:0:U, counter:COUNTER:0:U, depth:DERIVE:0:U, queue_len:GAUGE:0:U
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depth and queue len is not directly related to cycles count, I think.
So, that also should not be mixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you don't really need max, last, count metrics.
Did you use them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look into https://github.com/SchedMD/slurm/blob/master/src/sdiag/sdiag.c:

	if (buf->schedule_cycle_counter > 0) {
		printf("\tMean cycle:   %u\n",
		       buf->schedule_cycle_sum / buf->schedule_cycle_counter);
		printf("\tMean depth cycle:  %u\n",
		       buf->schedule_cycle_depth / buf->schedule_cycle_counter);
	}

So, you can use this piece of code as example, and submit metrics

  1. average cycle duration (if I understand things correctly)
  2. average cycle depth (I don't know what the depth is)

If you look to another example:

	if ((buf->req_time - buf->req_time_start) > 60) {
		printf("\tCycles per minute: %u\n",
		       (uint32_t) (buf->schedule_cycle_counter /
		       ((buf->req_time - buf->req_time_start) / 60)));
	}

This shows us we need use DERIVE to submit buf->schedule_cycle_counter. Such metric will show us 'cycles per second' metric. Notice we don't need to use req_time and req_time start, Collectd will save and use time internally.

src/types.db Outdated
slurm_job_state value:GAUGE:0:U
slurm_stats_load server_thread_count:GAUGE:0:U, agent_thread_count:GAUGE:0:U, agent_queue_size:GAUGE:0:U, dbd_agent_queue_size:GAUGE:0:U
slurm_node_state value:GAUGE:0:U
slurm_stats_backfill backfilled_jobs:DERIVE:0:U, backfilled_pack_jobs:DERIVE:0:U, cycle_counter:COUNTER:0:U, cycle_last:GAUGE:0:U, cycle_max:COUNTER:0:U, cycle_sum:COUNTER:0:U, cycle_last_depth:GAUGE:0:U, cycle_last_depth_try:GAUGE:0:U, depth_sum:COUNTER:0:U, depth_try_sum:COUNTER:0:U, queue_len:GAUGE:0:U, queue_len_sum:COUNTER:0:U
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also mixes all-into-one, but in different form.
The goal was not in replacing type_instance into value index.

Do you really found all these metrics useful?
I found no documentation about these values, so there is no reasons to report all them.

From my POV we do engineering, so we should argue all things.
If there is no such ready documentation, please fill a table with all values you have from slurm_get_statistics() call,
add explanation about each that value and that will argue filling subsequent columns by type and type_instance values.

That is the thing should be done before writing the code.

Copy link
Contributor Author
@pllopis pllopis May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the slurm documentation of the api is very lacking, but the best you can find is the sdiag manpage and sdiag.c source. This is the official slurm diagnosis tool which reports mostly the same metrics that I was aiming to include in this plugin.

I will add a reference to these in the collectd manpage and in the slurm.c module as a comment, as this would make the meaning of each metric easier to look up.

All these metrics are useful and needed, this is why the official slurm diagnosis tool reports all of those that I have included. (In my particular case some of these metrics may be constant and do not vary much or at all, but this is not a reason for excluding them, as they may be more useful for other slurm clusters.)

In theory metrics that end in _sum or _max could be derived from _last* kind of metrics, but in practice I found that these usually diverge, so keeping them seems sensible. Also, it may make it easier to have a global mean just like sdiag does by computing sum / count.

I will propose a new type listing and we can discuss. Would it be ok to re-use other types, e.g. threads for the server_thread_count, queue_length for the agent_queue_size (this would result in type: queue_length, type_instance: agent_queue_size), etc? Or is it preferred to duplicate the metric type with something like slurmctld_thread_count, slurm_agent_queue_size, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok to re-use other types, e.g. threads for the server_thread_count, queue_length for the agent_queue_size, etc?

That would be ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory metrics that end in _sum or _max could be derived from _last* kind of metrics, but in practice I found that these usually diverge, so keeping them seems sensible.

I think max has no sence, as it will never decrease, until restart.
So, it is better to look for specified interval of _last, if that is needed.

Also, I think possibly _sum should go to some DERIVE type, but this not a rule, but an opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you a lot for your fast and helpful responses.

Also, I think possibly _sum should go to some DERIVE type, but this not a rule, but an opinion.

Yes, I think this makes sense. Possibly COUNTER, although it may get reset (a condition different from overflow).

I have been experimenting with DERIVE for slurm_stats_jobs and I notice something unexpected: While gauge values will display the following values for one of the metrics:

1559136017.745,41.000000
1559136077.575,42.000000
1559136137.578,42.000000
1559136197.575,43.000000

changing the same metric to DERIVE or COUNTER will provide the following values:

1559136416.574,0.000000
1559136476.575,2345593565836.732910

Is this expected? (This is output from the CSV plugin, and I see the same values on my grafana dashboards) It seems like an increment of 1 for this metric is causing the derivative to jump from 0 to a very high value, while I was expecting a value of 1. I may be misunderstanding something here, but wanted to clarify before switching everything to DERIVE types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an increment of 1 for this metric is causing the derivative to jump from 0 to a very high value, while I was expecting a value of 1.

If I understand correctly, when you possibly has to change GAUGE to DERIVE you had to restart all services, as there are internal states.

< 10000 span data-view-component="true"> Copy link
Contributor
@rpv-tomsk rpv-tomsk May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wanted to clarify before switching everything to DERIVE types.

If value monotonously growing, that is candidate to DERIVE.

Possibly COUNTER, although it may get reset (a condition different from overflow).

I think, your plugin should not use COUNTER, that is rarely used and specific type.

Use GAUGE and DERIVE.

@mrunge mrunge modified the milestones: 5.9, 5.10 Jul 10, 2019
src/slurm.c Outdated
slurm_free_job_info_msg(job_buffer_ptr);
slurm_free_node_info_msg(node_buffer_ptr);
slurm_free_partition_info_msg(part_buffer_ptr);
ERROR("alloc_partition_states");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add PLUGIN_NAME to all ERRORs like:
ERROR(PLUGIN_NAME ": alloc_partition_states error.");

@dago dago modified the milestones: 5.10.0, 5.11.0 Feb 17, 2020
@pllopis
Copy link
Contributor Author
pllopis commented Feb 25, 2020

Hi,
I am adding three changes:

  1. Already commited the suggested improvement to use PLUGIN_NAME in the ERRORs.

  2. Already commited the updated fix to use the new common.h location

  3. Regarding the slurm types, I propose the following changes summarized in the table below.
    I have started from the types that I last commited (left column), and propose (middle column) to either remove the type, or to change it.
    As discussed, there are no more use of multi-value types, or of COUNTER types.

Note that some types like slurm_cycle_last and slurm_cycle_duration may seem redundant since they're both measuring cycle duration. However, slurm_cycle_duration is a derived type using the difference since the last collectd reading, while slurm_cycle_last is just the last cycle. They can end up being different plots, since in slurm cycles do not happen necessarily at regular intervals; therefore if the last cycle was a spike for any reason, the derived value could end up being a "softened" version that loses the original signal. Therefore I'm specifying and submitting both in order to sample the "last cycle" as well. Same thing for slurm_last_cycle_depth and slurm_cycle_depth.

Type as of last review New RRD Type Comment
slurm_job_state value:GAUGE:0:U Use type_instance for pending/running/suspended/complete/..
server_thread_count Remove. Use threads, server_thread_count as type_instance
agent_thread_count Remove. Use threads, agent_thread_count as type_instance
agent_queue_size Remove. Use queue_length, agent_queue_size as type_instance
dbd_agent_queue_size Remove. Use queue_length, dbd_agent_queue_size as type_instance
slurm_node_state value:GAUGE:0:U Use type_instace for the actual node state
slurm_backfilled_jobs value:DERIVE:0:U
backfilled_pack_jobs Remove. Use slurm_backfilled_jobs as type, backfilled_pack_jobs as type_instance
cycle_counter slurm_cycles value:DERIVE:0:U Derive number of cycles since last reading. Use min=0 for overflows.
cycle_last slurm_cycle_last value:GAUGE:0:U Duration of last cycle.
cycle_max Remove. Not used.
cycle_sum slurm_cycle_duration value:DERIVE:0:U Derive duration of cycles since last reading. Use min=0 for overflows.
cycle_last_depth slurm_last_cycle_depth value:GAUGE:0:U
cycle_last_depth_try Remove. Use slurm_last_cycle_depth_try as type_instance
depth_sum slurm_cycle_depth value:DERIVE:0:U
depth_try_sum Remove. Use slurm_cycle_depth with different type_instance like depth_try_sum.
bf_queue_len Remove. Use queue_length, change type_instance
bf_queue_len_sum Remove. Not used.
cycles_max Remove. Not used.
cycles_last Remove. Use slurm_cycle_last with diff. type_instance
cycles_sum Remove. Use slurm_cycle_duration with diff. type_instance
cycles_counter Remove. Use slurm_cycles with diff. type_instance
cycles_depth Remove. Use slurm_cycle_depth with diff. type_instance
cycles_queue_length Remove. Use queue_length with diff type_instance
slurm_stats_jobs value:GAUGE:0:U use type_instance for submitted/started/completed/canceled/failed

If you find these new types to be OK, I will submit a commit which implements the use of these new types very soon, (hopefully to make it before the code freeze).

@dago
Copy link
Contributor
dago commented Feb 25, 2020

Looks good to me. @mrunge would you also mind having a look?

@pllopis
Copy link
Contributor Author
pllopis commented Feb 26, 2020

Just to be clear:

slurm_job_state value:GAUGE:0:U
slurm_node_state value:GAUGE:0:U
slurm_backfilled_jobs value:DERIVE:0:U
slurm_cycles value:DERIVE:0:U
slurm_cycle_last value:GAUGE:0:U
slurm_cycle_duration value:DERIVE:0:U
slurm_last_cycle_depth value:GAUGE:0:U
slurm_cycle_depth value:DERIVE:0:U
slurm_job_stats value:DERIVE:0:U
slurm_queue_length value:DERIVE:0:U

Are these OK, should I try to go for more generic names, dropping the slurm_ as much as possible to make them reusable, or should I add new types prefixed by the plugin name (like, for instance, the mysql plugin is doing)? Is there an agreement of what the naming should be?

The reason I'm asking is because collectdctl listval shows very verbose/redundant naming:

(extract)

slurm02.cern.ch/slurm-slurm_backfill_stats/queue_length-backfill_last_queue_length
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_backfilled_jobs-backfilled_jobs
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_backfilled_jobs-backfilled_pack_jobs
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_depth-backfill_cycle_depth
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_depth-backfill_cycle_depth_try
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_duration-backfill_cycle_duration
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycle_last-last_backfill_cycle
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_cycles-backfill_cycles
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_last_cycle_depth-backfill_last_cycle_depth
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_last_cycle_depth-backfill_last_cycle_depth_try
slurm02.cern.ch/slurm-slurm_backfill_stats/slurm_queue_length-backfill_queue_length
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-canceled
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-completed
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-failed
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-started
slurm02.cern.ch/slurm-slurm_jobs_stats/slurm_job_stats-submitted

I already have a code implementation of this, but since I remain a bit confused about how to name things and when/how to add new types, I wanted to make sure we get this right :)

@pllopis
Copy link
Contributor Author
pllopis commented Feb 26, 2020

I just pushed the code implementation of what I described above. Let me know if you want to do any renaming (or have any other concerns).

@dago
Copy link
Contributor
dago commented Feb 27, 2020

This is fine with me. @rpv-tomsk would you also mind having a look?

@dago dago requested a review from sunkuranganath February 29, 2020 12:57
@kwiatrox kwiatrox self-requested a review March 3, 2020 11:25
Copy link
Member
@kwiatrox kwiatrox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dago dago merged commit 185c754 into collectd:master Mar 3, 2020
"backfilled_jobs", stats_resp->bf_backfilled_jobs);
slurm_submit_derive("slurm_backfill_stats", "slurm_backfilled_jobs",
"backfilled_pack_jobs",
stats_resp->bf_backfilled_pack_jobs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is broken on Slurm 20.02. They renamed this field.

< BE96 clipboard-copy aria-label="Copy link" for="discussion_r469111572-permalink" role="menuitem" data-view-component="true" class="dropdown-item btn-link"> Copy link
Contributor Author
@pllopis pllopis Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll open a new PR to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0