Submitting Jobs on Caltech HPC

# Submitting Jobs on Caltech HPC ###### tags: `Caltech HPC` `computing` The Caltech HPC compute clusters uses the [SLURM](https://en.wikipedia.org/wiki/Slurm_Workload_Manager) queue scheduler to manage jobs. ## `Caltech HPC` Configuration Full information can be found [here](https://www.hpc.caltech.edu/documentation). `Caltech HPC` consists of multiple login node s(where you start when you `ssh` to `Caltech HPC`) and a large number of compute nodes. You should avoid running substantial calculations (i.e. anything that uses more than ~1 minute of CPU time, or uses multiple cores) on the login node. The compute nodes are intended for serious computation. Compute nodes are heterogeneous - they have different numbers of cores and memory as described [here](https://www.hpc.caltech.edu/resources). ### Storage Your home directory, `/home/$USER`, has a quota of 50GB. As such, it's recommended that you use your home space for source files and other valuable data. Additionally, there is group space available at `/central/groups/carnegie_poc/${USER}`. There is also scratch space at `/central/scratch`. It is recommended that you create a directory named with your user name on the scratch disk: ``` mkdir -p /central/scratch/$USER ``` and store temporary data there. Note that files untouched for 14 days are automatically removed from `/central/scratch` - more information can be found [here](https://www.hpc.caltech.edu/documentation/storage). ## Submitting a Job To submit a job to `Caltech HPC` you should create a "submit script", which is simply a `bash` script with some header information to specify what resources you require. An example is as follows: ``` #!/bin/bash #SBATCH --time=1:00:00 # walltime #SBATCH --ntasks=1 # number of tasks (i.e. number of Galacticus.exe that will run) #SBATCH --cpus-per-task=16 # number of CPUs to assign to each task #SBATCH --nodes=1 # number of nodes #SBATCH --mem-per-cpu=2G # memory per CPU core #SBATCH -J "myJobName" # job name #SBATCH --mail-user=abenson@carnegiescience.edu # email address #SBATCH --error=myLogFile.log # Send output to a log file #SBATCH --output=myLogFile.log # Notify at the beginning, end of job and on failure. #SBATCH --mail-type=BEGIN #SBATCH --mail-type=END #SBATCH --mail-type=FAIL # Change directory to the location from which this job was submitted cd $SLURM_SUBMIT_DIR # Disable core-dumps (not useful unless you know what you're doing with them) ulimit -c 0 export GFORTRAN_ERROR_DUMPCORE=NO # Ensure there are no CPU time limits imposed. ulimit -t unlimited # Tell OpenMP to use all available CPUs on this node. export OMP_NUM_THREADS=16 # Run Galacticus. ./Galacticus.exe myJobParameters.xml ``` The most important header lines are: ``` #SBATCH --ntasks=16 # number of processor cores (i.e. tasks) #SBATCH --nodes=1 # number of nodes ``` which specifies what resources we want for this job. In this case we request 1 compute node, and 16 tasks (cores) on it. If you have `Galacticus` compiled for MPI parallelism you can run it across multiple nodes. An example, using 4 nodes, would look like this: ``` #!/bin/bash #SBATCH --time=1:00:00 # walltime #SBATCH --ntasks=64 # number of tasks (i.e. number of Galacticus.exe that will run) #SBATCH --cpus-per-task=1 # number of CPUs to assign to each task #SBATCH --nodes=4 # number of nodes #SBATCH --mem-per-cpu=2G # memory per CPU core #SBATCH -J "myJobName" # job name #SBATCH --mail-user=abenson@carnegiescience.edu # email address # Notify at the beginning, end of job and on failure. #SBATCH --mail-type=BEGIN #SBATCH --mail-type=END #SBATCH --mail-type=FAIL # Change directory to the location from which this job was submitted cd $SLURM_SUBMIT_DIR # Disable core-dumps (not useful unless you know what you're doing with them) ulimit -c 0 export GFORTRAN_ERROR_DUMPCORE=NO # Ensure there are no CPU time limits imposed. ulimit -t unlimited # Tell OpenMP to use all available CPUs on this node. export OMP_NUM_THREADS=1 # Run Galacticus. mpirun --n 64 --bind-to none --map-by node --mca pml ob1 --mca btl ^openib ./Galacticus.exe myJobParameters.xml ``` where we switch off OpenMP parallelism by setting `OMP_NUM_THREADS=1` and launch 64 MPI processes. To submit your job to `Caltech HPC` use: ``` $ sbatch mySubmitScript.sh ``` This will place the job into the queue, and it will automatically start running as soon as resources are available. You can monitor the status of jobs using: ``` JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 19782601 any myJob abenson R 2-00:00:11 1 hpc-80-33 ``` This shows your job number and name, the time it has been running for, and its state (the `ST` column). States are: * `COMPLETED` - `CD`: The job has completed successfully. * `COMPLETING` - `CG`: The job is finishing but some processes are still active. * `FAILED` - `F`: The job terminated with a non-zero exit code and failed to execute. * `PENDING` - `PD`: The job is waiting for resource allocation. It will eventually run. * `PREEMPTED` - `PR`: The job was terminated because of preemption by another job. * `RUNNING` - `R`: The job currently is allocated to a node and is running. * `SUSPENDED` - `S`: A running job has been stopped with its cores released to other jobs. * `STOPPED` - `ST`: A running job has been stopped with its cores retained. ### Interactive Sessions You can request an interactive session on a compute node (i.e. pull up a command line interface on a compute node so that you can work on there directly) using: ``` srun --pty -n 1 --wait=0 --time=1:00:00 /bin/bash ``` This requests a single task (core), will log you in to a node and move to the same directory as you were in on the login node. When you're finished, just `exit` and you'll be back on the login node (and your interactive session job will terminate). ### Tasks, Nodes, CPUs In the above, we have the following `SBATCH` commands which control how resources are allocated to your job: * `--nodes` * `--ntasks` * `--cpus-per-task` How you use these will depend on whether you're running Galacticus using OpenMP parallelism (the default), MPI parallelism (which you activate by compiling with the `GALACTICUS_BUILD_OPTION=MPI` option), or a hybrid of both. ### OpenMP parallelism OpenMP parallelism doesn't allow you run run over multiple nodes, so we will always set `--nodes=1` in this case. Furthermore, OpenMP paralleism only ever runs a single copy of `Galacticus.exe`, so we always set `--ntasks=1`. OpenMP parallelism _does_ allow that single `Galacticus.exe` to use multiple CPUs. So, set `--cpus-per-task=N` where `N` is whatever number of CPUs you want Galacticus to use - and include a corresponding: ``` export OMP_NUM_THREADS=N ``` in your submit script so that Galacticus knows how many CPUs it has available to it. ### MPI parallelism MPI parallelism allows Galacticus to run across multiple nodes. There will be multiple `Galacticus.exe` processes running in this case. Suppose we want to run Galacticus using 4 nodes, and to make use of 16 CPUs on each node (for a total of 64 CPUs). We would set the options: ``` --nodes=4 --ntasks=64 --cpus-per-task=1 ``` where we've selected 4 nodes, 64 tasks (i.e. 64 copies of `Galacticus.exe` running in total - these will be distributed over the 4 nodes), and assigned a single CPU to each `Galacticus.exe`. Then also include: ``` export OMP_NUM_THREADS=1 ``` in your submit script (this limits OpenMP parallelism to a single thread - i.e. no parallelism), and launch Galacticus using: ``` mpirun --n 64 --bind-to none --map-by node ./Galacticus.exe myJobParameters.xml ``` The `--map-by node` ensures that the 64 `Galacticus.exe` processes get distributed across our 4 nodes. ### Hybrid OpenMP/MPI parallelism You can use MPI and OpenMP parallelism simultaneously. To do this, first decide how many nodes you want to use, call this `Nnode`. Then decide how many CPUs you want to use on each node, call this `Ncpu`. Next decide how many MPI processes you want to run _on each node_ - this must be an integer factor of `Ncpu` - call this `Nmpi`. Then, to use all available CPUs we need each `Galacticus.exe` to use `Nopenmp=Ncpu/Nmpi` CPUs. Having determined all of these, use the `SBATCH` options: ``` --nodes=Nnode --ntasks=Nnode*Nmpi --cpus-per-task=Nopenmp ``` and launch Galacticus using: ``` export OMP_NUM_THREADS=Nopenmp mpirun --n Nnode*Nmpi --bind-to none --map-by node ./Galacticus.exe myJobParameters.xml ``` This is where the `--bind-to none` is important. Without it, MPI restricts all OpenMP parallel threads to run on the same CPU - which defeats the purpose of using OpenMP. With this option, OpenMP threads have access to all avalable CPUs.