2022-02-01
===
###### tags: `meeting` `storage` `biohub` `data`
:::info
- **Date:** 2022-02-01
- **Agenda**
- data workflow, requirements
- **Attendees**
- Nathan Clack
- Cameron Foltz
- Ivan Ivanov
- **Follow up**
- [ ] keep in touch!
- [ ] Nathan should visit
:::
# Data requirements with Ivan and Cameron
## Bandwidth and latency
- 2x10Gbps from cameras
- to PC 8xraid nvme, acquisition pc
- to on prem (40 Gbps)
- conversion to chunked (zarr)
- during setup of microscope
- recon every shot
- single image plane (4ch)
- full volume: 30 to 60s.
- 350 images, 5 Mpx at 16 bit, 4-5 polarizations
- during production
- recon every 10 minutes
- 24 hr development every 10 minute snapshots
- 16 positions w fluorscence
## Data lifecycle
- micromanager pycromanager
- determines Tiff based workflow
- convert to zarr, w compression ideally
- delete original tiffs
- multiple rounds of analysis
- artifacts live as zarr float32
- metadata
- important to have parameters from original acquisition
- umanager metadata is in tiff
- wavelengths etc is in a separate json
- computer's not in the loop for the filter placement
- ends up being input as a parameter into the
reconstruction snapshot tool...so that goes into the
json.
- try to use OME-TIFF where possible
- other techniques have different metadata standards
- there's a bigger **data hub project** that's forming
at the biohub around unifying across domains and
sharing.
- reconstruction parameters
- custom yaml
- plugin related
- zarr attributes
## Choices around formats
Why zarr
- python
- chunked
- OME standard
- lazy loading/distributed compute support
Chunking
- (sharding here, not even concerned with partitions)
- every image, or 8 z slices
- determined by unit of compute
- balance between file size/read speed
- large files easy to transfer
- but hard to view an individual thing
Compression
- reduces storage cost, esp important
- ratios: 1.7X
- numcodex, zstd
- lossy no good
Standards
- accessibility
- open source
- standards
- standard dimension order (tzcyx)
- compatible tooling
- nice to have a variety, reduce dependencies, better community