--- tags: user-meetings title: sea-ice deformation --- # Sea ice deformation from ships radar [Ticket](https://rt.csc.fi/rt/Ticket/Display.html?id=574009) ## Data: * ship radar measurements * recorded in 1 min time intervals * Dec 19 - May 20 -> 6 months -> ~180 days -> ~4320 measurements * stored in 2 week periods -> ~12 datasets * in total ~1.2 Tb (-> ~ 7Gb per day?) ## Info: * each day can be processed independent of other days, one day includes 1440 measurements * within one day processes depend on previous processes within same day * one tcsh script runs multiple binaries for one measurement * tested: one hour runtime processes 2.5 hours worth of data, ie 210 measurements -> ~10 hours processing time for 1 day of data * process always takes two timepoints/images/measurements and compares them -> results stored in 1 temporary text file per timepoint ## Questions: * I/O load: how often is data read and written during process of one measurement/ one days worth of measurements? -> read in once, one temporary output file per measurement, 2 final output files per day worth of measurements (one with trajectories, one with timepoints) * Memory requirements: How much memory is needed for processing one measurement/one days worth of measurements? ->? ## Suggestions: * store compressed data in projects scratch directory; faster to transfer to local scratch (nvme) for processing * data storage in scratch or in object storage [Allas](https://docs.csc.fi/data/Allas/accessing_allas/) from where data can be copied directly to local scratch for processing (faster I/O) * ## "Thinking out loud": * Options: [GNU-parallel](https://docs.csc.fi/support/tutorials/many/#an-example-case-80000-independent-runs) or [HyperQueue]( docs.csc.fi/apps/hyperqueue/ ) or array job. * [Array job](https://docs.csc.fi/computing/running/array-jobs/ ) seems suitable: * 180 independent jobs; sbatch script: ``` #!/bin/bash -l #SBATCH --job-name=array_job #SBATCH --output=array_job_out_%A_%a.txt #SBATCH --error=array_job_err_%A_%a.txt #SBATCH --account=project_xxx #SBATCH --partition=small #should suffice, max runtime per job: 3 days max number of tasks: 40 , has IO nodes, max memory: 382 GiB , max local storage: 3600 GiB #SBATCH --time=10:00:00 # time it takes for one job to finnish #SBATCH --ntasks=1 # use of one CPU per job #SBATCH --mem-per-cpu=xx # memory needed per job #SBATCH --array=0-179 #SBATCH --gres=nvme:xx # in Gb, total size of all input/output files (?) # load needed software module module load xx # copy data for the job from scratch to local scratch cp xx yy # run the analysis command with input and ouput filename (if possible); SLURM_ARRAY_TASK_ID goes from 0 to 179 my_prog data_${SLURM_ARRAY_TASK_ID}.inp data_${SLURM_ARRAY_TASK_ID}.out # copy results from local scratch to projects scratch cp yy xx ``` * [Local storage](https://docs.csc.fi/computing/running/creating-job-scripts-puhti/#local-storage)