rome
20 compute nodes with Rocky Linux 8.5 have the folowing characteristics:
--ntasks-per-core=1
should be used.--mem=125G
RAM can be used per node--time=72:0:0
limit can be used per job. Longer jobs can use --signal=USR1
or similar, to start graceful shutdown and restart.haswell
20 compute nodes with Rocky Linux 8.5
--mem=60G
can be used per nodeSoftware rendering can be used on compute nodes.
Limit the job to 4 hours and alocate
MATLAB is usually run single threaded and not parallel and therefore the following is recommended way to start MATLAB for max 4 hours
Under NoMachine desktop use browser and open a link reported by the jupyiter server at the compute node. Such as:
:zap: Note that env --unset=LD_PRELOAD
reduces meaningless warnings in the logfiles when submitting from display nodes. It prevents forwarding VirtualGL imposter libraries used for virtual hardware graphics rendering, so that you do not see
ERROR: ld.so: object '/usr/NX/scripts/vgl/librrfaker.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
messages appearing when redirected to compute node. The VGL libraries are installed on compute nodes to prevent showing the warnings in case whennLD_PRELOAD
not being unset at submission.
In the following example we additionally set bash shell timeout to 600 sectonds (10 minutes) that will auto logout from the compute node if no command is being typed for that time. Max time for interactive job is set to 4 hours.
List detailed information for a job (useful for troubleshooting):
scontrol show jobid -dd <jobid>
List status info for a currently running job:
sstat --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j <jobid> --allsteps
Once your job has completed, you can get additional information that was not available during the run. This includes run time, memory used, etc.
To get statistics on completed jobs by jobID:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed
Rmpi is demonstrated with the following R example
and sbatch script
HPCFS
SLURM