Sea ice deformation from ships radar

Ticket

Data:

ship radar measurements
recorded in 1 min time intervals
Dec 19 - May 20 -> 6 months -> ~180 days -> ~4320 measurements
stored in 2 week periods -> ~12 datasets
in total ~1.2 Tb (-> ~ 7Gb per day?)

Info:

each day can be processed independent of other days, one day includes 1440 measurements
within one day processes depend on previous processes within same day
one tcsh script runs multiple binaries for one measurement
tested: one hour runtime processes 2.5 hours worth of data, ie 210 measurements -> ~10 hours processing time for 1 day of data
process always takes two timepoints/images/measurements and compares them -> results stored in 1 temporary text file per timepoint

Questions:

I/O load: how often is data read and written during process of one measurement/ one days worth of measurements? -> read in once, one temporary output file per measurement, 2 final output files per day worth of measurements (one with trajectories, one with timepoints)
Memory requirements: How much memory is needed for processing one measurement/one days worth of measurements? ->?

Suggestions:

store compressed data in projects scratch directory; faster to transfer to local scratch (nvme) for processing
data storage in scratch or in object storage Allas from where data can be copied directly to local scratch for processing (faster I/O)

"Thinking out loud":

Options: GNU-parallel or HyperQueue or array job.
Array job seems suitable:
- 180 independent jobs; sbatch script:

#!/bin/bash -l
#SBATCH --job-name=array_job
#SBATCH --output=array_job_out_%A_%a.txt
#SBATCH --error=array_job_err_%A_%a.txt
#SBATCH --account=project_xxx
#SBATCH --partition=small #should suffice, max runtime per job: 3 days 	max number of tasks: 40 , has IO nodes, max memory: 382 GiB , max local storage: 3600 GiB
#SBATCH --time=10:00:00 # time it takes for one job to finnish
#SBATCH --ntasks=1 # use of one CPU per job
#SBATCH --mem-per-cpu=xx # memory needed per job
#SBATCH --array=0-179
#SBATCH --gres=nvme:xx # in Gb, total size of all input/output files (?)

# load needed software module
module load xx

# copy data for the job from scratch to local scratch
cp xx yy

# run the analysis command with input and ouput filename (if possible); SLURM_ARRAY_TASK_ID goes from 0 to 179
my_prog data_${SLURM_ARRAY_TASK_ID}.inp data_${SLURM_ARRAY_TASK_ID}.out

# copy results from local scratch to projects scratch
cp yy xx

Local storage

Sea ice deformation from ships radar

Data:

Info:

Questions:

Suggestions:

"Thinking out loud":

Read more

Spatial data analysis with R

EO tutorial comments

EO tutorial draft

GeoML