8000 PPI · gpuocean/gpuocean Wiki · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
FlorianBeiser edited this page Sep 15, 2023 · 2 revisions

Running on GPU-nodes in new PPI

Recipe

  1. Connect to MET using VPN

  2. Before you can reach the actual compute/GPU-nodes, you need to login to one of the login nodes:

$ ssh -AX <MET username>@ppi-r8login-a1.int.met.no

or

$ ssh -AX <MET username>@ppi-r8login-b1.int.met.no

  1. After that I do (here for GPU node 1 in room A with 10G mem allocated):

$ qlogin -q gpu-r8.q -l h=gpu-01.ppi.met.no,h_rss=10G,mem_free=10G
$ source /modules/rhel8/conda/install/etc/profile.d/conda.sh
$ conda activate gpuocean
$ module use /modules/MET/rhel8/user-modules
$ module load cuda
$ cd <root dir for notebooks or scripts>

$ jupyter notebook --no-browser --ip=$(hostname -f)

(Then connect to the URL given by jupyter. If you are in room A you need to add ".int.met.no" after the hostname.)

or for batch/script execution

$ python some_script.py

Note that some of the commands above can be collected in .bashrc

  1. Remember to exit the notebook server and logout of the compute/GPU-node (to kill the interactive job and free the resources) when finished.

References

Information about "new" PPI (running RHEL 8) and the queue system (Sun Grid Engine) can be found in this section: https://dokit.met.no/brukerdok/post-processing_infrastructure?s[]=ppi#new_ppi_implementation_rhel8

See hostnames and commands for interactive logins at https://dokit.met.no/brukerdok/post-processing_infrastructure?s[]=ppi#gpu_nodes_in_rhel8

For batch jobs, use the qsub command. Example job script can be found at https://dokit.met.no/brukerdok/post-processing_infrastructure?s[]=ppi#basic_example_submission_script (try with just the absolutely necessary queue/job variables first, like memory and queue)

Clone this wiki locally
0