Sea ice deformation from ships radar

Ticket

Data:

  • ship radar measurements
  • recorded in 1 min time intervals
  • Dec 19 - May 20 -> 6 months -> ~180 days -> ~4320 measurements
  • stored in 2 week periods -> ~12 datasets
  • in total ~1.2 Tb (-> ~ 7Gb per day?)

Info:

  • each day can be processed independent of other days, one day includes 1440 measurements
  • within one day processes depend on previous processes within same day
  • one tcsh script runs multiple binaries for one measurement
  • tested: one hour runtime processes 2.5 hours worth of data, ie 210 measurements -> ~10 hours processing time for 1 day of data
  • process always takes two timepoints/images/measurements and compares them -> results stored in 1 temporary text file per timepoint

Questions:

  • I/O load: how often is data read and written during process of one measurement/ one days worth of measurements? -> read in once, one temporary output file per measurement, 2 final output files per day worth of measurements (one with trajectories, one with timepoints)
  • Memory requirements: How much memory is needed for processing one measurement/one days worth of measurements? ->?

Suggestions:

  • store compressed data in projects scratch directory; faster to transfer to local scratch (nvme) for processing
  • data storage in scratch or in object storage Allas from where data can be copied directly to local scratch for processing (faster I/O)

"Thinking out loud":

#!/bin/bash -l
#SBATCH --job-name=array_job
#SBATCH --output=array_job_out_%A_%a.txt
#SBATCH --error=array_job_err_%A_%a.txt
#SBATCH --account=project_xxx
#SBATCH --partition=small #should suffice, max runtime per job: 3 days 	max number of tasks: 40 , has IO nodes, max memory: 382 GiB , max local storage: 3600 GiB
#SBATCH --time=10:00:00 # time it takes for one job to finnish
#SBATCH --ntasks=1 # use of one CPU per job
#SBATCH --mem-per-cpu=xx # memory needed per job
#SBATCH --array=0-179
#SBATCH --gres=nvme:xx # in Gb, total size of all input/output files (?)

# load needed software module
module load xx

# copy data for the job from scratch to local scratch
cp xx yy

# run the analysis command with input and ouput filename (if possible); SLURM_ARRAY_TASK_ID goes from 0 to 179
my_prog data_${SLURM_ARRAY_TASK_ID}.inp data_${SLURM_ARRAY_TASK_ID}.out

# copy results from local scratch to projects scratch
cp yy xx