---
tags: user-meetings
title: sea-ice deformation
---
# Sea ice deformation from ships radar
[Ticket](https://rt.csc.fi/rt/Ticket/Display.html?id=574009)
## Data:
* ship radar measurements
* recorded in 1 min time intervals
* Dec 19 - May 20 -> 6 months -> ~180 days -> ~4320 measurements
* stored in 2 week periods -> ~12 datasets
* in total ~1.2 Tb (-> ~ 7Gb per day?)
## Info:
* each day can be processed independent of other days, one day includes 1440 measurements
* within one day processes depend on previous processes within same day
* one tcsh script runs multiple binaries for one measurement
* tested: one hour runtime processes 2.5 hours worth of data, ie 210 measurements -> ~10 hours processing time for 1 day of data
* process always takes two timepoints/images/measurements and compares them -> results stored in 1 temporary text file per timepoint
## Questions:
* I/O load: how often is data read and written during process of one measurement/ one days worth of measurements? -> read in once, one temporary output file per measurement, 2 final output files per day worth of measurements (one with trajectories, one with timepoints)
* Memory requirements: How much memory is needed for processing one measurement/one days worth of measurements? ->?
## Suggestions:
* store compressed data in projects scratch directory; faster to transfer to local scratch (nvme) for processing
* data storage in scratch or in object storage [Allas](https://docs.csc.fi/data/Allas/accessing_allas/) from where data can be copied directly to local scratch for processing (faster I/O)
*
## "Thinking out loud":
* Options: [GNU-parallel](https://docs.csc.fi/support/tutorials/many/#an-example-case-80000-independent-runs) or [HyperQueue](
docs.csc.fi/apps/hyperqueue/ ) or array job.
* [Array job](https://docs.csc.fi/computing/running/array-jobs/ ) seems suitable:
* 180 independent jobs; sbatch script:
```
#!/bin/bash -l
#SBATCH --job-name=array_job
#SBATCH --output=array_job_out_%A_%a.txt
#SBATCH --error=array_job_err_%A_%a.txt
#SBATCH --account=project_xxx
#SBATCH --partition=small #should suffice, max runtime per job: 3 days max number of tasks: 40 , has IO nodes, max memory: 382 GiB , max local storage: 3600 GiB
#SBATCH --time=10:00:00 # time it takes for one job to finnish
#SBATCH --ntasks=1 # use of one CPU per job
#SBATCH --mem-per-cpu=xx # memory needed per job
#SBATCH --array=0-179
#SBATCH --gres=nvme:xx # in Gb, total size of all input/output files (?)
# load needed software module
module load xx
# copy data for the job from scratch to local scratch
cp xx yy
# run the analysis command with input and ouput filename (if possible); SLURM_ARRAY_TASK_ID goes from 0 to 179
my_prog data_${SLURM_ARRAY_TASK_ID}.inp data_${SLURM_ARRAY_TASK_ID}.out
# copy results from local scratch to projects scratch
cp yy xx
```
* [Local storage](https://docs.csc.fi/computing/running/creating-job-scripts-puhti/#local-storage)