Fix GPU options #1

nicolaschan · 2021-04-27T22:23:28Z

paciorek · 2021-04-27T22:29:43Z

Having users specify number of gpus and the rest be automated seems great -- much more user-friendly.

Are we ok with eliminating the possibility that they could request a more complicated gres specification in return for the simplicity/friendliness? My initial reaction is yes.

paciorek · 2021-04-27T23:07:52Z

@nicolaschan your revised help text for the gres field says a user can specify number and type of GPU. I don't think that this will work because you then do gres_value.to_i(), which would presumably fail if given something like k80:1. I might be missing something...

nicolaschan · 2021-04-27T23:30:33Z

Good catch, thanks Chris!

paciorek · 2021-04-28T00:25:08Z

I think we need equivalent changes for the MATLAB and RStudio apps.

kmuriki · 2021-05-04T16:46:20Z

@nicolaschan This is a very good improvisation. But few concerns. (1) We do not want to remove the CPU cores option alltogether because we need for the HTC partitions so you want to put the toggle_cpu_cores routine back in place and have logic in there to check if its a gpu partition then instead of displaying the cpu_cores field use the *2 multiplier. But then again (2) we are assuming users will need only gpu * 2, cpu cores. What is a user wants to use 2 gpu cores and 8 cpu cores ? I'm not sure how common the case is. May be we can just improve the help text to say you have to ask for *2 or above number of cpu cores and leave it at there instead of applying automatic multipliers on the backend ? Comments ? Thoughts ?

nicolaschan · 2021-05-06T17:08:44Z

If someone requests 1 GPU but all of the CPU cores, won't that stop anyone else from using the other available GPUs? If this is the case, then perhaps this should not be allowed (that is, you need to request exactly 2*GPUs).

kmuriki · 2021-05-06T17:21:15Z

Yeh If a user needs that weird combination of 1 GPU and all CPUs so be it. Why should we block it ? Users are charged appropriately. Ideally we should ask the number of GPUs question first in the form and based on what number they enter, if the number of cpus field is empty we should make a recommendation of 2*gpus in there, put a help note that it has to be 2 * gpus and still allow them to modify the number of cpus field. Does that make sense ?

nicolaschan · 2021-05-06T18:12:45Z

Ah, ok. savio3_2080ti has 32 CPU cores but only 8 GPUs. So if you want the whole node you'll need to request more CPUs than required. I've added the CPU selection option back to the form for GPU partitions.

paciorek · 2021-05-10T18:50:51Z

@nicolaschan I see that the OOD config is such that --ntasks-per-node is set. But in our standard job example for a non-OOD GPU job, we suggest setting --cpus-per-task with --ntasks=1.

Is there a reason to do it differently for OOD? I guess the effect is the same, so I suppose it doesn't really matter, but it could be that based on the non-OOD usage pattern, a user might expect SLURM_CPUS_PER_TASK to be set.

nicolaschan · 2021-05-11T06:37:54Z

You can't run multiple node jobs with --ntasks=1. See the warning below:

[nicolaschan@ln000 ~]$ srun -A ac_scsguest -p savio --ntasks=1 --nodes=2 -t 0:10:00 --pty bash -i
srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1
srun: job 8704371 queued and waiting for resources

paciorek · 2021-05-11T14:35:25Z

Good point. I don't know how often users would have multi-node GPU jobs via OOD, but I guess it is something we want to accommodate. Our GPU scheduler example is a one-node example.

tin6150 · 2021-09-24T23:20:58Z

I guess at this point we are not going to have time in the near future to debug this multi-gpu task. maybe keep ood simple for now as we don't have the bandwidth for this? converting this to draft request for future consideration.

paciorek · 2021-09-28T20:16:38Z

Agreed - it's not clear what we want to do in terms of any automation of setting cpus relative to gpus.

Fix GPU options

eaf3c0e

Remove type of GPUs in help text

bd6a13c

nicolaschan added 2 commits May 6, 2021 10:32

Reinstate ntasks-per-node for htc/knl partitions

b0848ba

Add CPU option to GPU partitions

8cf58e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GPU options #1

Fix GPU options #1

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix GPU options #1

Are you sure you want to change the base?

Fix GPU options #1

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!