changed 3 years ago
Published Linked with GitHub

User defined signal handler on Rackham (trap)

Here is an example on how to define and use user defined signal handler on Rackham

--signal=[[R][B]:]<sig_num>[@<sig_time>]
When a job is within sig_time seconds of its end time, send it the signal signum. Due to the resolution of event handling by Slurm, the signal may be sent up to 60 seconds earlier than specified. signum may either be a signal number or name (e.g. "10" or "USR1"). sigtime must have an integer value between 0 and 65535. By default, no signal is sent before the job's end time. If a signum is specified without any sigtime, the default time will be 60 seconds. Use the "B:" option to signal only the batch shell, none of the other processes will be signaled. By default all job steps will be signaled, but not the batch shell itself. Use the "R:" option to allow this job to overlap with a reservation with MaxStartDelay set. To have the signal sent at preemption time see the preempt_send_user_signal SlurmctldParameter. source

#SBATCH --signal=B:USR1@120
#SBATCH -J test
#SBATCH -A PROJECT
#SBATCH -t 00-00:04:00
#SBATCH -p core
#SBATCH -n 1

echo "01.Initializing stage..."

# define function to be called when signal is trapped
function clean_up()
{
   date | tee -a $SNIC_TMP/log.txt # leave some trace with time stamp
   cp  $SNIC_TMP/log.txt . # copy files from the loacl disk before the job is killed
}

# Define user defined signal handler for USR1
trap 'clean_up; echo "USR1 was trapped" ' USR1


echo "02.Ready to run..."
env > log.txt  # some debugging to see what is the environment on the compute node
date  | tee -a  log.txt # leave some trace with time stamp
rsync -ah log.txt $SNIC_TMP/  # bring some files to the local disk

# Sleep for 10m, so we can trigger the trap for testing purposes (we requested 4m job).
sleep 10m &
wait

# Release the user defined signal handler for USR1
trap - USR1

More examples:

Contacts:


tags: UPPMAX, SNIC
Select a repo