# Lecture 1.1
This document: https://hackmd.io/@coderefinery/TTT4HPC-Lecture-1-1
Schedule:
Intro (5 min)
Slurm basics (10 min)
Number of cores (15 min)
Memory (15 min)
Buffer/Q&A (5 min)
After lunch: exercises
# Lecture 1.1
**Learning outcomes**
- Can check what resources their HPC job uses and request appropriate resources
- Understands the meaning of different resource types
- Can anticipate load on the file system and store files in an appropriate format
## Slurm basics
**What is Slurm?**
- provides a framework for starting, executing, and monitoring jobs on the compute nodes
- schedules the jobs on the clusters
- allocates the required resources (compute cores or nodes, memory)
- free, open source, lightweight -> popular
- available at most supercomputing centra
- https://slurm.schedmd.com
**How to submit a job to the Slurm manager?**
`sbatch <job script>`
Example: `sbatch job_script.sh`. The job script contains the details of the simulation to be performed and the resources to be allocated for it.
### SBATCH flags
-A <project>
-p <partition>
`-t <max walltime>`
Maximum wall time for the job. The general format is `-t days-hours:minutes:seconds`.
- while testing new software or inputs, use a short time limit, 10 min - 1h.
- if you have an idea of how long a program would take to run, overbook by 25-50%
- if you have no idea how long a program will take to run, you may book long time, e.g. 2-00:00:00
-n <number of tasks>
Number of tasks, typically the number of cores.
-N <number of nodes>
-J <job name>
--output=slurm-%j.out
--error=slurm-%j.err
-—mail-type=BEGIN,END,FAIL
Notifies the user by email when a certain event occurs: the job starts, ends, or fails.
**Sample MPI job script**
```bash
#!/bin/bash
#SBATCH -J dhcpNd
#SBATCH -A naiss2024-22-49
#SBATCH -t 00-07:00:00
#SBATCH -p node
#SBATCH -N 4
module load RSPt/2023-10-04
export RSPT_SCRATCH=$SNIC_TMP
srun -n 80 rspt
```
The example above is for an RSPt simulation running on 80 cores spread over 4 nodes for a maximum of 7h.
**Sample OpenMP or job script**
```bash
#!/bin/bash
#SBATCH -A naiss2024-22-49
#SBATCH --exclusive
#SBATCH -t 01:00:00
#SBATCH -p node
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=20
module load uppasd
export OMP_NUM_THREADS=20
sd > out.log
```
**OpenMP or MPI code?**
- read the software documentation
- search the source code for **OMP** and/or **MPI**
**Slurm commands**
`sbatch`
`squeue`
`squeue --me` lists the running and pending of the user
`squeue -u <username>`
`squeue -A <project>`
`squeue -u <username> --state=running`
`squeue -u <username> --state=pending`
`scancel`
- `scancel <jobid>` cancels the job with <jobid>
- `scancel -u username` cancels all the jobs of user <username> (your jobs)
- `scancel --state --PENDING --user <username>` cancel pending jobs
- `scancel --state --RUNNING --user <username>` cancel running jobs
- `scancel --name <jobname>` cancel jobs with a given <jobname>
- `-i` ask for confirmation
`sinfo`
Example: `sinfo` or `sinfo -p <partition>`
`scontrol`
Example: `scontrol show job <jobid>` lists all the Slurm parameters of a job: number of cores and nodes, partition, submit directory, ...
`scontrol` can be used to modify the job details after the job has been submitted, though not all SBATCH parameters may be modified by regular users.
Example: `scontrol update JobID=jobid TimeLimit=0-01:00:00` decreases the wall time to 1h.
`salloc`
- allocate resources for an interactive session
- handy for debugging a code or a script or for using programs with a graphical user interface
useful together with the --begin=<time> flag
Example: `salloc -A naiss2024-22-49 -n 20 -t 03:00:00 --begin=2024-04-17T09:00:00` asks for 20 cores for 3h in a interactve session, which is to start earliest on April 17th at 9 o'clock.
**Parameters in the job script or the command line?**
`sbatch -p devel -t 00:15:00 jobscript.sh`