owned this note
owned this note
Published
Linked with GitHub
---
tags: NGFF, community-call
---
Please paste this into the Zoom chat as new people join:
:::warning
Welcome to the community call. Please be aware that this session may be recorded. Live notes for the session are available in https://hackmd.io/GZ1euZUSRZeqPTJj9WJEtg Where possible, help to structure the notes for later publication rather than commenting in Zoom's chat. Thanks!
:::
# NGFF Community Call 2021-09-02
**See:**
[Previous meeting notes](https://hackmd.io/Ndb5IHRmQn2PCCNBLkG-fQ),
[Connection information](https://forum.image.sc/t/connection-information-for-next-gen-call-on-feb-23rd/48907), and
[Recordings](https://downloads.openmicroscopy.org/presentations/2021/community-call-2021-09-02/).
## Using this document
This document is a place where you can help drive what needs discussing.
Add your thoughts, needs, etc. or even new sections if need be.
If there's an idea already in place that you like, give it a :thumbsup:
If you are unclear about this document, **just add a question here** and someone will tidy it up or get in touch:
* no problems yet? Excellent! :surfer:
## Brief agenda
### Introductions `20m`
- `min(60s, 1200s/attendees)` per person
### Zarr and OME-Zarr status (Josh) `<5m`
- [NGFF v0.3] with axes (more later)
- [ngff preprint] back out (HDF fun)
- [zarr_impl] is moving along :+1:
- Zarr EOSS funding & community manager (plus contractors)
### Any outstanding community items `~30m`
- Statuses from other efforts
- Questions about statuses, goals, etc.
- _etc. **It's good to hear from you!**_
### Spec development: v0.4 and beyond (Constantin) `20m`
- add support for transforms, see also [Trafo discussion]
- [Trafo proposal] by John and Stephan
- how do we specify `axes` involved in trafo?
- one transform per scale dataset
- work on`axes` specification, see also [Axes discussion]
- add `units` as list
- or alternative proposal for units in [Trafo proposal], 3 lists might be equivalent (not as elegant, but compatible with 0.3)
- Discussions
- do we allow more than 5d?
- arbitraty `axes` names? (related to prev. point)
- if possible would rather not tackle these points in the very next version
### Sharding (Norman): [Sharding Slides] `20m`
### Tools (Constantin): `20m`
- Constantin: **others are welcome** to present, talk about more tools or demo if the time allows
- Tools: FIJI (currently via BDV / [MoBIE]), [napari plugin] (via [ome-zarr-py]), [vizarr], [bioformats2raw]
- Add list of tools supporting ome zarr to ngff spec?!
- Josh: already in [ome-zarr-impl], could use version info, etc.
### Next & future steps `~5min`
- Meetings on https://j.mp/imagesc-island
<hr/>
## "User registration" Session 1
| Name | Institute | Twitter Handle | GitHub Handle |
|------------ |---------------------- |---------------- |--------------- |
| Copy | and | paste | me |
| Josh Moore | University of Dundee | notjustmoore | joshmoore |
| Norman Rzepka | scalable minds | normanrz | normanrz |
| Constantin Pape | EMBL Heidelberg | @cppape | constantinpape |
| Juan Nunez-Iglesias | Monash University | @jnuneziglesias | jni |
| Matthew Hartley | EMBL-EBI | | mrmh2 |
| Volker Hilsenstein | EMBL Heidelberg | | VolkerH |
| Kimberly Meechan | EMBL Heidelberg | @Sci_Wanderlust | K-Meech |
| Sébastien Besson | University of Dundee | | sbesson |
| Guillaume Gay | Aix Marseille Université | morpholg | glyg |
| Gonzalo Merino | PIC, Barcelona | @pic_es | |
| Rohola Hosseini | Leiden University | | |
| Jean-Marie Burel | University of Dundee | | jburel |
| Mark Kittisopikul | Janelia Research Campus/ HHMI | @markkitti | @mkitti |
| Jean-Karim Heriche | EMBL | | jkh1 |
| Ken Ho | Francis Crick Institute | DrKenHo | DrKenHo-crick |
| Koji Kyoda | RIKEN Center for Biosystems Dynamics Research | kkyoda | kkyoda |
| Christian Tischer | EMBL Heidelberg | tischitischer | tischi |
| Aastha Mathur | Euro-BioImaging | | |
| Tobias Pietzsch | | pietzscht | tpietzsch |
## Session 1 Live Notes
### Introductions
- Some **key words**: spatial transcriptomics / metabolomics; public services; sharding at petabte scale; standardized formats (:tada:); list of needs is too long; light sheet; cloud; ending the initial "what format?" conversations; saving direct from microscope; reading vs writing performance; common & scalable is critical; conserving & enriching datasets; putting an end to the nonsense (proliferation of file formats); big data; pyramids & collections.
- AE: example of ome-ngff for spatial metabolomics; some extension for transformation spec; would be good to check if this is compatible with [Trafo proposal]
- KK/KH: BD5(HDF5) hdf5 + xml data format, now developing BDZ(arr) linking(?) with ome-ngff
- note: BD stands for Biological Dynamics - i.e. ROIs, polygons.
### Zarr & OME-Zarr quick status report
- Comments:
- JK: netcdf-c in R would be an option
- KH: interested in MATLAB (best way is to start an issue in [zarr-python] to start discussions on this) https://github.com/zarr-developers/community/issues/16
- there is a netcdf-matlab that is worth looking
- CP: xtensor-zarr has tools for wrapping (have code for R & Julia)
Q?: where are all the C, C++ implementations? links somewhere?
https://github.com/zarr-developers/zarr_implementations (C: netcdf, C++: xtensor-zarr, z5)
- GG:
- ROIs/Meshes: moving back from a funded topic to a community one.
- KH: Q?: Shouldn't ROIs and meshes be separate. I think they are quite different, no?
- GG: Yes, here is the discussion about meshes [meshes issue](https://github.com/ome/ngff/issues/33)
- SMLM: GG, IG,
### Transformations
- JMB: difference between unit name or unit symbol (see issues in OME-XML)
- CP: they've diverged from v0.3. First step is how to bring them together.
- JNI: SS made the point that units is of the target space (as axes)
- v0.3 is agnostic (identity). Will need to be clear in the document. (Axes labels and units are properties of the target space)
- JK: is time space regularly defined or arbitrary? Currently not arbitrary. Add coordinates?
- CP: Could be done via transformation on the time dimension
- Josh: xarray style coordinate arrays could also be used, idea would be that xarray can read the metadata and then one can e.g. query time point @ 4mins. We would need to add specific metadata.
- Tischi: maybe xarray could be one of the transformation types? (JM: interesting)
- JM: who would like to be alpha/beta tester?
- NR: need an implementation in Webknossos first and then can look at transforms
- CP: will talk about tools also a bit later (unifying BDV & MoBIE, etc.)
- AE: ready to test any upcoming spec (currently using most minimal Github proposal for affine matrix in private name space)
- SB: at multiscale zgroup or single resolution zarray level? zarray (datasets)
- SB: is correlative in scope? (aligning two images)
- CP: transforms are the way to do it, but not clear on how to specify the set of images (solve that orthogonally; single image first)
- MK: include shuffling, etc. i.e. compression? No. That's underlying Zarr.
- CP: this is about transforming the coordinate space
- AE: An image may have transformations relative to different coordinate systems. E.g. before registrations are finished, a single global coordinate system is not yet known. If not in scope / too complex for spec, we can keep track of temporary transformations externally.
### Sharding
* 32x32x32 chunks (of size 32x32x32) in one file per channel.
* stored sequentially in z(morton)-order, compressed individually.
- easier loading of local collections
* uncompressed writes to chunks; compressed to shards.
* Discussion
- CT: tried in object storage? Don't have in WK but in neuroglancer (from google/GCS)
- KH: to read individual chunks you need to decompress whole shard? No header has an index.
- SB: optimizing shard size? (recreating HDF?) BDV shards on XYZ-image (unit of work). 1GB benchmarked?
- NR: dependent on the application. works well for WK for writing files on given cluster
- Bigger chunks would be ok as well, depending. Can be tuned. Power-of-2 restriction.
- VH: Zarr issue? JM: Yes, but good to have some/all of you involved in Zarr as well.
- MK: seems useful. Trying to write a position on Windows and having problems with lots of files.
- Looks familiar to the blosc2 proposal. (chunks & blocks)
- 32kb per chunk was chosen because of L1-cache side? No internet speeds at the time. (in Nature Methods, 2017). Might switch to 64^3 now.
- CP: 64 is good for raw data; for compressible data, 96 or 128 (last year)
- MK: for uint16? Never did that. Always stayed with 32 for higher-bit data.
- MH: all data is chunked/shared the same? Basically always the same.
- for a few applications (parallel write) then needed smaller shards.
- TP: good idea, very useful. Main issue is writing. Good thing about N5/Zarr is very flexible. Missing blocks, write in any order. Consider a workflow from uncompressed to compressed shards?
- NR: that's the typical workflow, yes. (Scheduling management)
- TP: nice to have it transparently.
- TP: cool if sharding was completely independent of the dataset, re-sharding. (without re-compression??)
- NR: z-ordering helps with that. It's just concatenating + index rewriting.
- MH: somewhere between critical and absolutely essential (for EBI S3), even if not immediate.
* Additionally
- MK: Windows performance, chunking and many files? Are we benchmarking this?
- Josh: no, not yet benchmarking on Windows. Help there welcome.
- Comparison of shards/chunks to [Blosc chunks/blocks](https://www.blosc.org/posts/caterva-slicing-perf/)?
- Relevant zarr issues: https://github.com/zarr-developers/zarr-specs/issues/59#issuecomment-882790211, https://github.com/zarr-developers/zarr-python/issues/713
### Tools
* CP: good to have a list. Keep it updated. Strive to have good support:
* Fiji, Napari, Web, ...
* May need some work to consolidate it.
* e.g. multiple efforts in Fiji: MoBIE, Saalfeld lab tools, ...
* plus get it into the normal Fiji distribution
* Also good to advertise the napari plugin (napari-ome-zarr)
* Perhaps also regularly show tools at these calls
* JNI: writing from python with dask arrays? Should work.
* Tischi: How far are we from Fiji: File > Save As > OME.ZARR and File > Open > OME.ZARR ?
* I think Kimberly did something for saving...?!
* [instructions](https://omero-guides.readthedocs.io/en/latest/fiji/docs/view_mobie_zarr.html)
* That's only reading and only into BDV, right?
* Have writer (need to update imagesc thread). On time of Saalfeld's writer.
* VH: AE implemented, with dask-downsampling. Sharing on github soon. Pyramids, numpy & dask.
* MK: incompatibilities, https://github.com/zarr-developers/numcodecs/issues/175
* JM: worth capturing in zarr_implementations. Also point specification issues e.g. in the Zarr spec
* JM: progression of involvement -- issue, PR, fix, spec update (All useful!)
* MK: multiple tools is a problem. multiple paths for single language (Java -> C) needs testing.
* JMB: great to have lots of readers & writers (especially on the list)
* we're testing trying to test centrally when there are upstream PRs.
* GG: notion of validating
* CT: something core OME? SB: OMEZarrReader, but a few specs behind. Need an update site.
* CP: good to not replicate in the BDV space? Next action? image.sc thread?
* TP: good idea. missing in fiji is the internal representation for multiscales & collections of images.
* CT: you get the thumbnails, "pick one". Ok for now.
* MK: nothing yet on the Julia side (ome-zarr-jl). CP: don't know anyone
* MK: new Julia Microscopy group: https://github.com/JuliaMicroscopy
### Misc
* imagesc-island
- No objections to using it.
- Probably before the end of the year.
- Gather Town client.
- Perhaps focus on collections.
* Date consideration for early December:
- Dec 1 through Dec 10 ASCB/EMBO CellBio Meeting https://www.ascb.org/cellbio2021/deadlines/
<hr/>
## "User registration" Session 2
| Name | Institute | Twitter Handle | GitHub Handle |
|------------ |---------------------- |---------------- |--------------- |
| Copy | and | paste | me |
| Josh Moore | University of Dundee | notjustmoore | joshmoore |
| John Bogovic | HHMI Janelia | BogovicJohn | bogovicj |
| Jordao Bragantini | CZ Biohub | jobragantini | jookuma |
| Davis Bennett | HHMI Janelia | davisvbennett | d-v-b |
| Melissa Linkert | Glencoe Software | | melissalinkert |
| Constantin Pape | EMBL Heidelberg | @cppape | constantinpape |
| Niko Ehrenfeuchter | Biozentrum, Uni Basel | | ehrenfeu |
| Jackson Maxfield Brown | Allen Institute for Cell Science | @jmaxfieldbrown | JacksonMaxfield |
| David Gault | OME | | dgaulgaulgaulgaul |
| Trevor Manz | Harvard Medical School | @trevmanz | manzt |
| Mark Kittisopikul | HHMI Janelia | @markkitti | mkitti |
| Kevin Kozlowski | Glencoe Software | | kkoz |
| Eric Perlman | | perlman | perlman |
| Andras Lasso | PerkLab, Queen's University| lassoan | lassoan |
| Nick Schaub | NCATS/NIH | | nicholas-schaub |
| Dave Mellert | The Jackson Laboratory | DaveMellert | mellertd |
| Matthew McCormick | Kitware| thewtex | thewtex |
| Lee Kamentsky | MIT | |
## Session 2 Live Notes
### Introduction
* **Various keywords**: funded positions, writing from microscope, 3D medical imaging (Nifti), publication on the web, fast web preview, The One Format Dream, big datasets on the cloud (without copy & paste), cross compatibility between different language ecosystems, registration and stitching - coordinate systems (opening just one section), BIDS microscopy, access control/permissions, xarray-compatibility, IPFS compatibility, multi-language, "necessary evil", "making our lives easier", [bfio](https://pypi.org/project/bfio/), RDM
* JB: on-the-fly converter for different meta-data flavors
### Status / Community
* JBogo - on different implementations, the test suite that Josh mentioned is key. along with examples that Will has posted. Some kind of language-agnostic tests would be cool in order to know how trustworthy / what features a particular implementation supports
* [the examples](https://github.com/ome/ngff/issues/51)
* NH: putting our weight behind an implementation? Probably more of a litmus test (TCK)
* JM: standard **NetCDF blurb**
* DVB: bioimaging starting to face geoscience issues, we should definitely make use of that.
- NH: isn't that where zarr came from (basically)
* NH: move in EM space to unify metadata model?
- cf. OME/BINA
- DVB: would love to stop firing from the hip and work from a metadata
- Josh: vEM call roughly every 2 months (hackathon in December)
- metadata / formats hasn't (as far as i know) been covered much in the meetings I (bogovic) have been to, should try to get it on the agenda
### Transforms
* CP: v0.3 review, axes metadata, etc.
* [Issue discussion transforms](https://github.com/ome/ngff/issues/28)
* [Bogovic and Saalfeld's transform spec proposal](https://github.com/saalfeldlab/n5-ij/wiki/Transformation-spec-proposal-for-NGFF)
* NH: LSM treating compression differently along T.
* DVB: want to have different behavior for spatial dimensions as well
- want to keep semantics out of storage spec
- perhaps a community convention
- storage spec should have no idea
* CP: want to increase usability
* DVB: problem of the viewers
* AL: there are many viewers, important to agree
* DL: With a completely general/arbitrary axis description, we go back toward plain zarr and readers won’t know what to do
* DVB: powerful if viewers treat the data as tensors
* AL: difference between axes labels & types
* JM & DVB: ok to have convention, but viewers shouldn't break if the conventions aren't present
* DVB: is type needed if unit is possible?
* JB: wavelength might clash for a channel dimension
* DVB: channels don't have an ordering.
* NS/NH: in mass spec it has meaning
* (chat) JB: Andras, does 3Dslicer treat channels differently from space dimensions? it must, right?
- Yes, 3D slicer handles different dimensions very differently.
- exactly, so this standard was missing that information, it would be harmful right
* NS: allowing communities (tomography) to build up axes types would be powerful (**extensibility**)
* NH: consider adding an angular axes? (EM, medical)
- CP: Would be a valuable proposal.
* AL: units? CP: part of this discussion
* DVB: **proposal** that a validator should not need to check two fields against each other.
- JM: a limit may be when it comes to performance
* AL: nerrd, nifti have axis definitions
* NH: types like linearSpatialType, temporal, spectral, ...
* ...and a really cool browser
* DVB: in xyztc, C is very special. unit not really nanometer (not a regular grid)
* neuroglancer has "categorical" dimensions. Don't belong to a space.
* JB: good for that in general (specifies domain), but wouldn't try to work that into the next version.
- Could have spherical, toroidal domains
* JM: perhaps not scalable if *every* tool that uses a spec needs to be updated / have timely feedback come back
* NM: likes the idea of concrete types with labels (x,y,z) but flexible enough for different / more general kinds of types, with the responsibility of the renderer to deal with them correctly
* Summary (Making it happen)
* CP: tackle in two parts
- axes type & unit
- then transformations
* CT: clarified some of the semantics (napari, imaris, 3d Slicer, etc. should open without asking the user too much)
* MM: types are also good for the tranformations
### Tools
* CP: BDV/Fiji/Bio-Formats, Napari, Vizarr
- Discussing coordinating on the Fiji front to reduce redundant developments
* JM: See (and add to!) https://ngff.openmicroscopy.org/latest/#implementations
* NS: rudimentary implementation in https://github.com/PolusAI/bfio
- funding for C++ with fastloader from NIST
- fast loading of OME-Zarr
* AL: for 3dslicer are interested, but use ITK or something else?
- MM: working on support in ITK + some python packages or C++ route
* DT: working on https://github.com/AllenInstitute/volume-viewer/blob/feature/load-zarr/src/VolumeLoader.ts#L131
* NS: interested in using the **OME-XML metadata** (cf. bioformats2raw)
- JM: it's coming. Anyone interested should feel free to get involved.
- NS: agree that JSON would be nice, but would be great to have stability
* JB - question for AL/MC : when ITK can open ome-zarr, will downstream tools (besides 3d slicer) be able to use them as well?
* MK: Julia implementations
- https://discourse.julialang.org/t/a-julia-compatible-alternative-to-zarr/11842/19
- https://github.com/meggart/ZarrNative.jl
- https://github.com/meggart/Zarr.jl
- See also https://github.com/seung-lab/BigArrays.jl/issues/52
- Josh: these include OME-Zarr?!
- MK: Just Zarr afaik
- comment on performance from JB (chat): "in terms of benchmarking load times, I gave a talk specifically on this topic on how AICSImageIO achieves fast-er times for "non-Zarr" file formats. C++ readers will always be faster but there are things you can do before you "rewrite in C++ https://youtu.be/LNa_gGpSnvc that said, C++ / Rust impls are always faster and I highly encourage them :)"
- MK: I would also be intereted in benchmarking writing times on different filesystems / operating systems (Windows in particular)
### Sharding
* EP: like this approach. Problems moving datasets away. (8 files for many many terabytes)
* Lots of files for acquisition; but then can optimize.
* Like it being more of a Zarr issue.
* DVB: :+1: would love to have it but use it judicially, towards the ends of a dataset (ready for consumption when you are ok to take on the complexity)
* CP: Matthew from EBI was also quite in favor of saving inodes
* EP: also solves the missing file problem.
* MK: how do shards map to files?
* Josh: fsspec/grib (time permitting)
- OME-TIFF as zarr: https://observablehq.com/@manzt/ome-tiff-as-filesystemreference
- https://github.com/intake/fsspec-reference-maker
- MK: API is the important thing (sharded, unsharded, etc.) should do the access for me.
* MK: **compression schemes**
- "you MUST implement these compressors"
- NH: subset based on performance (speed or compression). could specify minima.
- DVB: think that datatypes and compression shouldn't
- JM: would put that into validation
- CP: good to abstract into zarr
- NH: without the spec, offsetting to implementors
- JM: but to **Zarr* implementors (let's re-use)
- TM: would be nice to have a "can i use codec/feature matrix?" for a glance
- DB: compression can get exotic quickly
- TM: all imagecodecs are exported to numcodecs (have a unique key)
### Misc
* JB: readers for ITK. How long until elastix, etc. can use them? (Painful to convert to Nifti and then convert back)
- MM: depends on the avenue. JB: can use simple-elastix
- MM: very soon or currently in Python. In Java, relatively soon. Have some funding. So, next few months (not the next ITK release. Separator repository package developed independently) Q4 or Q1 2022
- ITK formats in Python conversion: https://github.com/spatial-image/spatial-image-ngff
- ITK formats in C++ conversion: https://github.com/InsightSoftwareConsortium/ITKIOZ5 (this repository is getting a reawakening, maybe renamed if we use a different C++ library for the Zarr support)
- JB: updates?
- AL: if new IO is in ITK, then get those in slicer roughly after a month or so (metadata may be difficult; requires extra integration work)
* Next time
- Group generally less keen on gather town
- JM: perhaps we test it beforehand? NH: Yes! :imp:
* MK: Difficulty on Windows
* Josh: talking to one Vendor. (**CSHARP**)
* Do we need to start having "data generator" calls?
* Benchmarking!!
* Windows with local SSDs (NTFS) perhaps with RAID
* Also Enterprise filesystems and then NFS ...
* JM: where do we run this? GitHub Actions? (gigabyte scale)
## Links
Feel free to add links here at the bottom of the document to make referencing things above cleaner. Alphabetical by alias will make it easier to detect conflicts.
[Axes discussion]: https://github.com/ome/ngff/issues/35
[bioformats2raw]: https://github.com/glencoesoftware/bioformats2raw
[MoBIE]: https://github.com/mobie/mobie-viewer-fiji
[napari plugin]: https://github.com/ome/napari-ome-zarr
[ngff preprint]: https://forum.image.sc/t/ome-ngff-biorxiv-preprint/51486
[NGFF v0.3]: https://github.com/orgs/ome/projects/17?add_cards_query=is%3Aopen
[ome-zarr-impl]: https://ngff.openmicroscopy.org/latest/#implementations
[ome-zarr-py]: https://github.com/ome/ome-zarr-py
[Sharding slides]: https://docs.google.com/presentation/d/1sPfhYRBZGLA6RI8dAjwg8Iuaolz0Xs3O404z5l1-Rx8/edit
[Trafo discussion]: https://github.com/ome/ngff/issues/28
[Trafo proposal]: https://github.com/saalfeldlab/n5-ij/wiki/Transformation-spec-proposal-for-NGFF
[vizarr]: https://github.com/hms-dbmi/vizarr
[zarr_impl]: https://github.com/zarr-developers/zarr_implementations
[zarr-python]: https://github.com/zarr-developers/zarr-python