-
Notifications
You must be signed in to change notification settings - Fork 1
Fix GPU options #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Having users specify number of gpus and the rest be automated seems great -- much more user-friendly. Are we ok with eliminating the possibility that they could request a more complicated gres specification in return for the simplicity/friendliness? My initial reaction is yes. |
@nicolaschan your revised help text for the gres field says a user can specify number and type of GPU. I don't think that this will work because you then do |
Good catch, thanks Chris! |
I think we need equivalent changes for the MATLAB and RStudio apps. |
@nicolaschan This is a very good improvisation. But few concerns. (1) We do not want to remove the CPU cores option alltogether because we need for the HTC partitions so you want to put the toggle_cpu_cores routine back in place and have logic in there to check if its a gpu partition then instead of displaying the cpu_cores field use the *2 multiplier. But then again (2) we are assuming users will need only gpu * 2, cpu cores. What is a user wants to use 2 gpu cores and 8 cpu cores ? I'm not sure how common the case is. May be we can just improve the help text to say you have to ask for *2 or above number of cpu cores and leave it at there instead of applying automatic multipliers on the backend ? Comments ? Thoughts ? |
If someone requests 1 GPU but all of the CPU cores, won't that stop anyone else from using the other available GPUs? If this is the case, then perhaps this should not be allowed (that is, you need to request exactly 2*GPUs). |
Yeh If a user needs that weird combination of 1 GPU and all CPUs so be it. Why should we block it ? Users are charged appropriately. Ideally we should ask the number of GPUs question first in the form and based on what number they enter, if the number of cpus field is empty we should make a recommendation of 2*gpus in there, put a help note that it has to be 2 * gpus and still allow them to modify the number of cpus field. Does that make sense ? |
Ah, ok. savio3_2080ti has 32 CPU cores but only 8 GPUs. So if you want the whole node you'll need to request more CPUs than required. I've added the CPU selection option back to the form for GPU partitions. |
@nicolaschan I see that the OOD config is such that --ntasks-per-node is set. But in our standard job example for a non-OOD GPU job, we suggest setting --cpus-per-task with --ntasks=1. Is there a reason to do it differently for OOD? I guess the effect is the same, so I suppose it doesn't really matter, but it could be that based on the non-OOD usage pattern, a user might expect SLURM_CPUS_PER_TASK to be set. |
You can't run multiple node jobs with [nicolaschan@ln000 ~]$ srun -A ac_scsguest -p savio --ntasks=1 --nodes=2 -t 0:10:00 --pty bash -i
srun: Warning: can't run 1 processes on 2 nodes, setting nnodes to 1
srun: job 8704371 queued and waiting for resources |
Good point. I don't know how often users would have multi-node GPU jobs via OOD, but I guess it is something we want to accommodate. Our GPU scheduler example is a one-node example. |
I guess at this point we are not going to have time in the near future to debug this multi-gpu task. maybe keep ood simple for now as we don't have the bandwidth for this? converting this to draft request for future consideration. |
Agreed - it's not clear what we want to do in terms of any automation of setting cpus relative to gpus. |
--ntasks-per-core
)--cpus-per-task
to 2 times the number of GPUs2
instead ofgpu:2