rome
20 compute nodes with Rocky Linux 8.5 have the folowing characteristics:
--ntasks-per-core=1
should be used.--mem=125G
RAM can be used per node--time=72:0:0
limit can be used per job. Longer jobs can use --signal=USR1
or similar, to start graceful shutdown and restart.haswell
20 compute nodes with Rocky Linux 8.5
--mem=60G
can be used per nodeSoftware rendering can be used on compute nodes.
Limit the job to 4 hours and alocate
[leon@viz ~]$ salloc --nodes=1 --ntasks=96 \
--partition=rome --mem=50G --time=4:00
salloc: Granted job allocation 57759
salloc: Waiting for resource configuration
salloc: Nodes cn41 are ready for job
[leon@viz ~]$ ssh -X cn41
Warning: Permanently added 'cn41,10.0.2.141' (ECDSA) to the list of known hosts.
Last login: Wed Oct 20 11:21:27 2021 from 10.0.2.99
[leon@cn41 ~]$ module load MATLAB
[leon@cn41 ~]$ matlab
MATLAB is selecting SOFTWARE OPENGL rendering.
[leon@cn41 ~]$ exit
logout
Connection to cn41 closed.
[leon@viz ~]$ exit
salloc: Relinquishing job allocation 57759
[leon@viz ~]$
MATLAB is usually run single threaded and not parallel and therefore the following is recommended way to start MATLAB for max 4 hours
[leon@viz ~]$ ml MATLAB
[leon@viz ~]$ srun --nodes=1 --ntasks=2 \
--ntasks-per-core=2 --mem=8G --partition=rome --time=4:0:0 --x11 --pty matlab
env --unset=LD_PRELOAD
reduces meaningless warnings in the logfiles when submitting from display nodes. It prevents forwarding VirtualGL imposter libraries used for virtual hardware graphics rendering, so that you do not see
ERROR: ld.so: object '/usr/NX/scripts/vgl/librrfaker.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
messages appearing when redirected to compute node. The VGL libraries are installed on compute nodes to prevent showing the warnings in case whennLD_PRELOAD
not being unset at submission.
In the following example we additionally set bash shell timeout to 600 sectonds (10 minutes) that will auto logout from the compute node if no command is being typed for that time. Max time for interactive job is set to 4 hours.
[leon@viz ~]$ env --unset=LD_PRELOAD TMOUT=600 \
srun --mem=100G --time=4:0:0 -p rome --x11 --pty bash -i
[leon@cn41 ~]$ timed out waiting for input: auto-logout
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
Rmpi is demonstrated with the following R example
library(Rmpi)
size <- Rmpi::mpi.comm.size(0)
rank <- Rmpi::mpi.comm.rank(0)
host <- Rmpi::mpi.get.processor.name()
if (rank == 0){
print('I am the master')
} else {
print(paste("I am", rank, "of", size, "running on", host))
}
and sbatch script
#!/bin/bash
#SBATCH --export=ALL,LD_PRELOAD=
#SBATCH --job-name MyR
#SBATCH --partition=haswell --mem=24GB --time=02:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=24
module load R
srun Rscript rmpi-test.R
HPCFS
SLURM