HPC examples

Some examples for running on HPCFS.

Using the gpu02 login node

Set the host in your NoMachine connection to:

​​​​gpu02.hpc.fs.uni-lj.si

and connect.

Cloning the hpc-examples repository

Open a Konsole shell, create a working directory work-dir:

​​​​mkdir work-dir
​​​​cd work-dir

and clone the repository:

​​​​git clone https://bitbucket.org/lecad-peg/hpc-examples.git

​​​​cd hpc-examples

Loading program modules

You can search for available modules with, e.g., for Python:

​​​​module avail python

and then load the desired version, e.g.:

​​​​module load Python/3.8.6-GCCcore-10.2.0

Purging all loaded modules can be done with:

​​​​module purge

Running examples

Running many sequential jobs in parallel using job arrays (example from HPC-UiT Services User Documentation)

Go to the directory with the example:

​​​​cd examples/many_similar_sequential
​​​​dir

In it, there are two files: a python script test.py and a shell script run.sh. The python script:

#!/usr/bin/env python

import time

print('start at ' + time.strftime('%H:%M:%S'))

print('sleep for 10 seconds ...')
time.sleep(10)

print('stop at ' + time.strftime('%H:%M:%S'))

prints time stamps at an interval of 10 seconds. Executing it will produce, e.g.:

[bogdanl@gpu02 many_similar_sequential]$ python test.py
start at 21:19:47
sleep for 10 seconds ...
stop at 21:19:57

To run this Python script 16 times at the same time on a compute node of the cluster, the following shell script can be used:

#!/bin/bash -l

#####################
# job-array example #
#####################

#SBATCH --job-name=example

# 16 jobs will run in this array at the same time
#SBATCH --array=1-16

# run for five minutes
#              d-hh:mm:ss
#SBATCH --time=0-00:05:00

# 500MB memory per core
# this is a hard limit
#SBATCH --mem-per-cpu=500MB

# you may not place bash commands before the last SBATCH directive

# define and create a unique scratch directory
SCRATCH_DIRECTORY=/home/${USER}/job-array-example/${SLURM_JOBID}
mkdir -p ${SCRATCH_DIRECTORY}
cd ${SCRATCH_DIRECTORY}

cp ${SLURM_SUBMIT_DIR}/test.py ${SCRATCH_DIRECTORY}

# each job will see a different ${SLURM_ARRAY_TASK_ID}
echo "now processing task id:: " ${SLURM_ARRAY_TASK_ID}
python test.py > output_${SLURM_ARRAY_TASK_ID}.txt

# after the job is done we copy our output back to $SLURM_SUBMIT_DIR
cp output_${SLURM_ARRAY_TASK_ID}.txt ${SLURM_SUBMIT_DIR}

# we step out of the scratch directory and remove it
cd ${SLURM_SUBMIT_DIR}
rm -rf ${SCRATCH_DIRECTORY}

# happy end
exit 0 

Submit the script with:

[bogdanl@gpu02 many_similar_sequential]$ sbatch run.sh
Submitted batch job 60048

After a short while you should see 16 output files in your submit directory with:

[bogdanl@gpu02 many_similar_sequential]$ ls -l output*.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_10.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_11.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_12.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_13.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_14.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_15.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_16.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_1.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_2.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_3.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_4.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_5.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_6.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_7.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_8.txt
-rw-r--r-- 1 bogdanl lecad 60 Jan 24 21:24 output_9.txt

Notice, that all the 16 executions of the script produced the output files at the same time. You can check the timestamps of execution in some output files to confirm that all the executions started and stopped at the same time:

[bogdanl@gpu02 many_similar_sequential]$ cat output_9.txt
start at 21:24:21
sleep for 10 seconds ...
stop at 21:24:31
[bogdanl@gpu02 many_similar_sequential]$ cat output_1.txt
start at 21:24:21
sleep for 10 seconds ...
stop at 21:24:31
[bogdanl@gpu02 many_similar_sequential]$ cat output_10.txt
start at 21:24:21
sleep for 10 seconds ...
stop at 21:24:31
[bogdanl@gpu02 many_similar_sequential]$ cat output_4.txt
start at 21:24:21
sleep for 10 seconds ...
stop at 21:24:31
tags: HPCFS