2022-02-01 === ###### tags: `meeting` `storage` `biohub` `data` :::info - **Date:** 2022-02-01 - **Agenda** - data workflow, requirements - **Attendees** - Nathan Clack - Cameron Foltz - Ivan Ivanov - **Follow up** - [ ] keep in touch! - [ ] Nathan should visit ::: # Data requirements with Ivan and Cameron ## Bandwidth and latency - 2x10Gbps from cameras - to PC 8xraid nvme, acquisition pc - to on prem (40 Gbps) - conversion to chunked (zarr) - during setup of microscope - recon every shot - single image plane (4ch) - full volume: 30 to 60s. - 350 images, 5 Mpx at 16 bit, 4-5 polarizations - during production - recon every 10 minutes - 24 hr development every 10 minute snapshots - 16 positions w fluorscence ## Data lifecycle - micromanager pycromanager - determines Tiff based workflow - convert to zarr, w compression ideally - delete original tiffs - multiple rounds of analysis - artifacts live as zarr float32 - metadata - important to have parameters from original acquisition - umanager metadata is in tiff - wavelengths etc is in a separate json - computer's not in the loop for the filter placement - ends up being input as a parameter into the reconstruction snapshot tool...so that goes into the json. - try to use OME-TIFF where possible - other techniques have different metadata standards - there's a bigger **data hub project** that's forming at the biohub around unifying across domains and sharing. - reconstruction parameters - custom yaml - plugin related - zarr attributes ## Choices around formats Why zarr - python - chunked - OME standard - lazy loading/distributed compute support Chunking - (sharding here, not even concerned with partitions) - every image, or 8 z slices - determined by unit of compute - balance between file size/read speed - large files easy to transfer - but hard to view an individual thing Compression - reduces storage cost, esp important - ratios: 1.7X - numcodex, zstd - lossy no good Standards - accessibility - open source - standards - standard dimension order (tzcyx) - compatible tooling - nice to have a variety, reduce dependencies, better community