SLURM at HPCFS

Partition `rome`

20 compute nodes with Rocky Linux 8.5 have the folowing characteristics:

2 x AMD EPYC 7402 24-Core Processor in multithreaded configuration totaling 96 compute cores per node. Note that for MPI jobs --ntasks-per-core=1 should be used.
max --mem=125G RAM can be used per node
max --time=72:0:0 limit can be used per job. Longer jobs can use --signal=USR1 or similar, to start graceful shutdown and restart.

Partition `haswell`

20 compute nodes with Rocky Linux 8.5

2 x Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz in multithreaded configuration totaling 24x2 compute cores per node.
max --mem=60G can be used per node

Interactive X11 jobs

Software rendering can be used on compute nodes.

MATLAB example

Limit the job to 4 hours and alocate

[leon@viz ~]$ salloc --nodes=1 --ntasks=96 \
--partition=rome --mem=50G --time=4:00
salloc: Granted job allocation 57759
salloc: Waiting for resource configuration
salloc: Nodes cn41 are ready for job
[leon@viz ~]$ ssh -X cn41
Warning: Permanently added 'cn41,10.0.2.141' (ECDSA) to the list of known hosts.
Last login: Wed Oct 20 11:21:27 2021 from 10.0.2.99
[leon@cn41 ~]$ module load MATLAB
[leon@cn41 ~]$ matlab
MATLAB is selecting SOFTWARE OPENGL rendering.
[leon@cn41 ~]$ exit
logout
Connection to cn41 closed.
[leon@viz ~]$ exit
salloc: Relinquishing job allocation 57759
[leon@viz ~]$

MATLAB is usually run single threaded and not parallel and therefore the following is recommended way to start MATLAB for max 4 hours

[leon@viz ~]$ ml MATLAB    
[leon@viz ~]$ srun --nodes=1 --ntasks=2 \
--ntasks-per-core=2 --mem=8G --partition=rome --time=4:0:0 --x11 --pty matlab

Jupyter notebook example

ml Python
python3 -m venv jupyter
jupyter/bin/pip install jupyter
env --unset LD_PRELOAD --unset SESSION_MANAGER srun --partition=haswell --mem=0 --time=2:00:00 --pty jupyiter/bin/jupyter-notebook --no-browser --ip=0.0.0.0

Under NoMachine desktop use browser and open a link reported by the jupyiter server at the compute node. Such as:

Jupyter Server 2.15.0 is running at:
    http://cn72.hpc:8888/tree?token=1fbc9e418c3c40341c9e48f41c77631ace9c91e89cd08e58

:zap: Note that env --unset=LD_PRELOAD reduces meaningless warnings in the logfiles when submitting from display nodes. It prevents forwarding VirtualGL imposter libraries used for virtual hardware graphics rendering, so that you do not see

ERROR: ld.so: object '/usr/NX/scripts/vgl/librrfaker.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

messages appearing when redirected to compute node. The VGL libraries are installed on compute nodes to prevent showing the warnings in case whennLD_PRELOADnot being unset at submission.

Interactive shell

In the following example we additionally set bash shell timeout to 600 sectonds (10 minutes) that will auto logout from the compute node if no command is being typed for that time. Max time for interactive job is set to 4 hours.

[leon@viz ~]$ env --unset=LD_PRELOAD TMOUT=600 \
srun --mem=100G --time=4:0:0 -p rome --x11 --pty bash -i
[leon@cn41 ~]$ timed out waiting for input: auto-logout

Useful SLURM job information commands

List detailed information for a job (useful for troubleshooting):

scontrol show jobid -dd <jobid>

List status info for a currently running job:

sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps

Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:

sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed

Runing R script with Rmpi under SLURM

Rmpi is demonstrated with the following R example

library(Rmpi)

size <- Rmpi::mpi.comm.size(0)
rank <- Rmpi::mpi.comm.rank(0)
host <- Rmpi::mpi.get.processor.name()

if (rank == 0){
        print('I am the master')
} else {
        print(paste("I am", rank, "of", size, "running on", host))
}

and sbatch script

#!/bin/bash
#SBATCH --export=ALL,LD_PRELOAD=
#SBATCH --job-name MyR
#SBATCH --partition=haswell --mem=24GB --time=02:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
module load R
srun  Rscript rmpi-test.R

SLURM at HPCFS

Partition rome

Partition haswell

Interactive X11 jobs

MATLAB example

Jupyter notebook example

Interactive shell

Useful SLURM job information commands

Runing R script with Rmpi under SLURM

tags: HPCFS SLURM

Read more

Welcome to HPCFS

Mathematica

News

Access to HPCFS

Partition `rome`

Partition `haswell`

tags: `HPCFS` `SLURM`