Please paste this into the Zoom chat as new people join:

Welcome to the community call. Please be aware that this session may be recorded. Live notes for the session are available in https://hackmd.io/GZ1euZUSRZeqPTJj9WJEtg Where possible, help to structure the notes for later publication rather than commenting in Zoom's chat. Thanks!

NGFF Community Call 2021-09-02

See: Previous meeting notes, Connection information, and Recordings.

Using this document

This document is a place where you can help drive what needs discussing. Add your thoughts, needs, etc. or even new sections if need be. If there's an idea already in place that you like, give it a

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

If you are unclear about this document, just add a question here and someone will tidy it up or get in touch:

no problems yet? Excellent!
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Brief agenda

Introductions `20m`

min(60s, 1200s/attendees) per person

Zarr and OME-Zarr status (Josh) `<5m`

NGFF v0.3 with axes (more later)
ngff preprint back out (HDF fun)
zarr_impl is moving along
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Zarr EOSS funding & community manager (plus contractors)

Any outstanding community items `~30m`

Statuses from other efforts
Questions about statuses, goals, etc.
etc. It's good to hear from you!

Spec development: v0.4 and beyond (Constantin) `20m`

add support for transforms, see also Trafo discussion
- Trafo proposal by John and Stephan
- how do we specify axes involved in trafo?
- one transform per scale dataset
work onaxes specification, see also Axes discussion
- add units as list
- or alternative proposal for units in Trafo proposal, 3 lists might be equivalent (not as elegant, but compatible with 0.3)
Discussions
- do we allow more than 5d?
- arbitraty axes names? (related to prev. point)
- if possible would rather not tackle these points in the very next version

Sharding (Norman): Sharding Slides `20m`

Tools (Constantin): `20m`

Constantin: others are welcome to present, talk about more tools or demo if the time allows
Tools: FIJI (currently via BDV / MoBIE), napari plugin (via ome-zarr-py), vizarr, bioformats2raw
Add list of tools supporting ome zarr to ngff spec?!
- Josh: already in ome-zarr-impl, could use version info, etc.

Next & future steps `~5min`

Meetings on https://j.mp/imagesc-island

"User registration" Session 1

Name	Institute	Twitter Handle	GitHub Handle
Copy	and	paste	me
Josh Moore	University of Dundee	notjustmoore	joshmoore
Norman Rzepka	scalable minds	normanrz	normanrz
Constantin Pape	EMBL Heidelberg	@cppape	constantinpape
Juan Nunez-Iglesias	Monash University	@jnuneziglesias	jni
Matthew Hartley	EMBL-EBI		mrmh2
Volker Hilsenstein	EMBL Heidelberg		VolkerH
Kimberly Meechan	EMBL Heidelberg	@Sci_Wanderlust	K-Meech
Sébastien Besson	University of Dundee		sbesson
Guillaume Gay	Aix Marseille Université	morpholg	glyg
Gonzalo Merino	PIC, Barcelona	@pic_es
Rohola Hosseini	Leiden University
Jean-Marie Burel	University of Dundee		jburel
Mark Kittisopikul	Janelia Research Campus/ HHMI	@markkitti	@mkitti
Jean-Karim Heriche	EMBL		jkh1
Ken Ho	Francis Crick Institute	DrKenHo	DrKenHo-crick
Koji Kyoda	RIKEN Center for Biosystems Dynamics Research	kkyoda	kkyoda
Christian Tischer	EMBL Heidelberg	tischitischer	tischi
Aastha Mathur	Euro-BioImaging
Tobias Pietzsch		pietzscht	tpietzsch

Session 1 Live Notes

Introductions

Some key words: spatial transcriptomics / metabolomics; public services; sharding at petabte scale; standardized formats (
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
); list of needs is too long; light sheet; cloud; ending the initial "what format?" conversations; saving direct from microscope; reading vs writing performance; common & scalable is critical; conserving & enriching datasets; putting an end to the nonsense (proliferation of file formats); big data; pyramids & collections.
AE: example of ome-ngff for spatial metabolomics; some extension for transformation spec; would be good to check if this is compatible with Trafo proposal
KK/KH: BD5(HDF5) hdf5 + xml data format, now developing BDZ(arr) linking(?) with ome-ngff
- note: BD stands for Biological Dynamics - i.e. ROIs, polygons.

Zarr & OME-Zarr quick status report

Comments:
- JK: netcdf-c in R would be an option
- KH: interested in MATLAB (best way is to start an issue in zarr-python to start discussions on this) https://github.com/zarr-developers/community/issues/16
- there is a netcdf-matlab that is worth looking
- CP: xtensor-zarr has tools for wrapping (have code for R & Julia) Q?: where are all the C, C++ implementations? links somewhere? https://github.com/zarr-developers/zarr_implementations (C: netcdf, C++: xtensor-zarr, z5)
GG:
- ROIs/Meshes: moving back from a funded topic to a community one.
  - KH: Q?: Shouldn't ROIs and meshes be separate. I think they are quite different, no?
  - GG: Yes, here is the discussion about meshes meshes issue
- SMLM: GG, IG,

Transformations

JMB: difference between unit name or unit symbol (see issues in OME-XML)
CP: they've diverged from v0.3. First step is how to bring them together.
JNI: SS made the point that units is of the target space (as axes)
- v0.3 is agnostic (identity). Will need to be clear in the document. (Axes labels and units are properties of the target space)
JK: is time space regularly defined or arbitrary? Currently not arbitrary. Add coordinates?
- CP: Could be done via transformation on the time dimension
- Josh: xarray style coordinate arrays could also be used, idea would be that xarray can read the metadata and then one can e.g. query time point @ 4mins. We would need to add specific metadata.
- Tischi: maybe xarray could be one of the transformation types? (JM: interesting)
JM: who would like to be alpha/beta tester?
- NR: need an implementation in Webknossos first and then can look at transforms
- CP: will talk about tools also a bit later (unifying BDV & MoBIE, etc.)
- AE: ready to test any upcoming spec (currently using most minimal Github proposal for affine matrix in private name space)
SB: at multiscale zgroup or single resolution zarray level? zarray (datasets)
SB: is correlative in scope? (aligning two images)
- CP: transforms are the way to do it, but not clear on how to specify the set of images (solve that orthogonally; single image first)
MK: include shuffling, etc. i.e. compression? No. That's underlying Zarr.
- CP: this is about transforming the coordinate space
AE: An image may have transformations relative to different coordinate systems. E.g. before registrations are finished, a single global coordinate system is not yet known. If not in scope / too complex for spec, we can keep track of temporary transformations externally.

Sharding

32x32x32 chunks (of size 32x32x32) in one file per channel.
stored sequentially in z(morton)-order, compressed individually.
- easier loading of local collections
uncompressed writes to chunks; compressed to shards.
Discussion
- CT: tried in object storage? Don't have in WK but in neuroglancer (from google/GCS)
- KH: to read individual chunks you need to decompress whole shard? No header has an index.
- SB: optimizing shard size? (recreating HDF?) BDV shards on XYZ-image (unit of work). 1GB benchmarked?
  - NR: dependent on the application. works well for WK for writing files on given cluster
  - Bigger chunks would be ok as well, depending. Can be tuned. Power-of-2 restriction.
- VH: Zarr issue? JM: Yes, but good to have some/all of you involved in Zarr as well.
- MK: seems useful. Trying to write a position on Windows and having problems with lots of files.
  - Looks familiar to the blosc2 proposal. (chunks & blocks)
  - 32kb per chunk was chosen because of L1-cache side? No internet speeds at the time. (in Nature Methods, 2017). Might switch to 64^3 now.
  - CP: 64 is good for raw data; for compressible data, 96 or 128 (last year)
- MK: for uint16? Never did that. Always stayed with 32 for higher-bit data.
- MH: all data is chunked/shared the same? Basically always the same.
  - for a few applications (parallel write) then needed smaller shards.
- TP: good idea, very useful. Main issue is writing. Good thing about N5/Zarr is very flexible. Missing blocks, write in any order. Consider a workflow from uncompressed to compressed shards?
  - NR: that's the typical workflow, yes. (Scheduling management)
  - TP: nice to have it transparently.
  - TP: cool if sharding was completely independent of the dataset, re-sharding. (without re-compression??)
    - NR: z-ordering helps with that. It's just concatenating + index rewriting.
- MH: somewhere between critical and absolutely essential (for EBI S3), even if not immediate.
Additionally
- MK: Windows performance, chunking and many files? Are we benchmarking this?
  - Josh: no, not yet benchmarking on Windows. Help there welcome.
- Comparison of shards/chunks to Blosc chunks/blocks?
- Relevant zarr issues: https://github.com/zarr-developers/zarr-specs/issues/59#issuecomment-882790211, https://github.com/zarr-developers/zarr-python/issues/713

Tools

CP: good to have a list. Keep it updated. Strive to have good support:
- Fiji, Napari, Web, …
- May need some work to consolidate it.
- e.g. multiple efforts in Fiji: MoBIE, Saalfeld lab tools, …
- plus get it into the normal Fiji distribution
- Also good to advertise the napari plugin (napari-ome-zarr)
- Perhaps also regularly show tools at these calls
JNI: writing from python with dask arrays? Should work.
Tischi: How far are we from Fiji: File > Save As > OME.ZARR and File > Open > OME.ZARR ?
- I think Kimberly did something for saving…?!
- instructions
  - That's only reading and only into BDV, right?
- Have writer (need to update imagesc thread). On time of Saalfeld's writer.
VH: AE implemented, with dask-downsampling. Sharing on github soon. Pyramids, numpy & dask.
MK: incompatibilities, https://github.com/zarr-developers/numcodecs/issues/175
- JM: worth capturing in zarr_implementations. Also point specification issues e.g. in the Zarr spec
- JM: progression of involvement – issue, PR, fix, spec update (All useful!)
- MK: multiple tools is a problem. multiple paths for single language (Java -> C) needs testing.
JMB: great to have lots of readers & writers (especially on the list)
- we're testing trying to test centrally when there are upstream PRs.
GG: notion of validating
CT: something core OME? SB: OMEZarrReader, but a few specs behind. Need an update site.
- CP: good to not replicate in the BDV space? Next action? image.sc thread?
- TP: good idea. missing in fiji is the internal representation for multiscales & collections of images.
- CT: you get the thumbnails, "pick one". Ok for now.
MK: nothing yet on the Julia side (ome-zarr-jl). CP: don't know anyone
- MK: new Julia Microscopy group: https://github.com/JuliaMicroscopy

Misc

imagesc-island
- No objections to using it.
- Probably before the end of the year.
- Gather Town client.
- Perhaps focus on collections.
Date consideration for early December:
- Dec 1 through Dec 10 ASCB/EMBO CellBio Meeting https://www.ascb.org/cellbio2021/deadlines/

"User registration" Session 2

Name	Institute	Twitter Handle	GitHub Handle
Copy	and	paste	me
Josh Moore	University of Dundee	notjustmoore	joshmoore
John Bogovic	HHMI Janelia	BogovicJohn	bogovicj
Jordao Bragantini	CZ Biohub	jobragantini	jookuma
Davis Bennett	HHMI Janelia	davisvbennett	d-v-b
Melissa Linkert	Glencoe Software		melissalinkert
Constantin Pape	EMBL Heidelberg	@cppape	constantinpape
Niko Ehrenfeuchter	Biozentrum, Uni Basel		ehrenfeu
Jackson Maxfield Brown	Allen Institute for Cell Science	@jmaxfieldbrown	JacksonMaxfield
David Gault	OME		dgaulgaulgaulgaul
Trevor Manz	Harvard Medical School	@trevmanz	manzt
Mark Kittisopikul	HHMI Janelia	@markkitti	mkitti
Kevin Kozlowski	Glencoe Software		kkoz
Eric Perlman		perlman	perlman
Andras Lasso	PerkLab, Queen's University	lassoan	lassoan
Nick Schaub	NCATS/NIH		nicholas-schaub
Dave Mellert	The Jackson Laboratory	DaveMellert	mellertd
Matthew McCormick	Kitware	thewtex	thewtex
Lee Kamentsky	MIT

Session 2 Live Notes

Introduction

Various keywords: funded positions, writing from microscope, 3D medical imaging (Nifti), publication on the web, fast web preview, The One Format Dream, big datasets on the cloud (without copy & paste), cross compatibility between different language ecosystems, registration and stitching - coordinate systems (opening just one section), BIDS microscopy, access control/permissions, xarray-compatibility, IPFS compatibility, multi-language, "necessary evil", "making our lives easier", bfio, RDM
JB: on-the-fly converter for different meta-data flavors

Status / Community

JBogo - on different implementations, the test suite that Josh mentioned is key. along with examples that Will has posted. Some kind of language-agnostic tests would be cool in order to know how trustworthy / what features a particular implementation supports
- the examples
NH: putting our weight behind an implementation? Probably more of a litmus test (TCK)
JM: standard NetCDF blurb
DVB: bioimaging starting to face geoscience issues, we should definitely make use of that.
- NH: isn't that where zarr came from (basically)
NH: move in EM space to unify metadata model?
- cf. OME/BINA
- DVB: would love to stop firing from the hip and work from a metadata
- Josh: vEM call roughly every 2 months (hackathon in December)
  - metadata / formats hasn't (as far as i know) been covered much in the meetings I (bogovic) have been to, should try to get it on the agenda

Transforms

CP: v0.3 review, axes metadata, etc.
Issue discussion transforms
Bogovic and Saalfeld's transform spec proposal
NH: LSM treating compression differently along T.
DVB: want to have different behavior for spatial dimensions as well
- want to keep semantics out of storage spec
- perhaps a community convention
- storage spec should have no idea
CP: want to increase usability
DVB: problem of the viewers
AL: there are many viewers, important to agree
DL: With a completely general/arbitrary axis description, we go back toward plain zarr and readers won’t know what to do
DVB: powerful if viewers treat the data as tensors
AL: difference between axes labels & types
JM & DVB: ok to have convention, but viewers shouldn't break if the conventions aren't present
DVB: is type needed if unit is possible?
JB: wavelength might clash for a channel dimension
DVB: channels don't have an ordering.
NS/NH: in mass spec it has meaning
(chat) JB: Andras, does 3Dslicer treat channels differently from space dimensions? it must, right?
- Yes, 3D slicer handles different dimensions very differently.
- exactly, so this standard was missing that information, it would be harmful right
NS: allowing communities (tomography) to build up axes types would be powerful (extensibility)
NH: consider adding an angular axes? (EM, medical)
- CP: Would be a valuable proposal.
AL: units? CP: part of this discussion
DVB: proposal that a validator should not need to check two fields against each other.
- JM: a limit may be when it comes to performance
AL: nerrd, nifti have axis definitions
NH: types like linearSpatialType, temporal, spectral, …
- …and a really cool browser
DVB: in xyztc, C is very special. unit not really nanometer (not a regular grid)
- neuroglancer has "categorical" dimensions. Don't belong to a space.
- JB: good for that in general (specifies domain), but wouldn't try to work that into the next version.
  - Could have spherical, toroidal domains
JM: perhaps not scalable if every tool that uses a spec needs to be updated / have timely feedback come back
NM: likes the idea of concrete types with labels (x,y,z) but flexible enough for different / more general kinds of types, with the responsibility of the renderer to deal with them correctly
Summary (Making it happen)
- CP: tackle in two parts
  - axes type & unit
  - then transformations
- CT: clarified some of the semantics (napari, imaris, 3d Slicer, etc. should open without asking the user too much)
- MM: types are also good for the tranformations

Tools

CP: BDV/Fiji/Bio-Formats, Napari, Vizarr
- Discussing coordinating on the Fiji front to reduce redundant developments
JM: See (and add to!) https://ngff.openmicroscopy.org/latest/#implementations
NS: rudimentary implementation in https://github.com/PolusAI/bfio
- funding for C++ with fastloader from NIST
- fast loading of OME-Zarr
AL: for 3dslicer are interested, but use ITK or something else?
- MM: working on support in ITK + some python packages or C++ route
DT: working on https://github.com/AllenInstitute/volume-viewer/blob/feature/load-zarr/src/VolumeLoader.ts#L131
NS: interested in using the OME-XML metadata (cf. bioformats2raw)
- JM: it's coming. Anyone interested should feel free to get involved.
- NS: agree that JSON would be nice, but would be great to have stability
JB - question for AL/MC : when ITK can open ome-zarr, will downstream tools (besides 3d slicer) be able to use them as well?
MK: Julia implementations
- https://discourse.julialang.org/t/a-julia-compatible-alternative-to-zarr/11842/19
- https://github.com/meggart/ZarrNative.jl
- https://github.com/meggart/Zarr.jl
- See also https://github.com/seung-lab/BigArrays.jl/issues/52
- Josh: these include OME-Zarr?!
  - MK: Just Zarr afaik

comment on performance from JB (chat): "in terms of benchmarking load times, I gave a talk specifically on this topic on how AICSImageIO achieves fast-er times for "non-Zarr" file formats. C++ readers will always be faster but there are things you can do before you "rewrite in C++ https://youtu.be/LNa_gGpSnvc that said, C++ / Rust impls are always faster and I highly encourage them :)"
- MK: I would also be intereted in benchmarking writing times on different filesystems / operating systems (Windows in particular)

Sharding

EP: like this approach. Problems moving datasets away. (8 files for many many terabytes)
- Lots of files for acquisition; but then can optimize.
- Like it being more of a Zarr issue.
- DVB:
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
  would love to have it but use it judicially, towards the ends of a dataset (ready for consumption when you are ok to take on the complexity)
CP: Matthew from EBI was also quite in favor of saving inodes
EP: also solves the missing file problem.
MK: how do shards map to files?
Josh: fsspec/grib (time permitting)
- OME-TIFF as zarr: https://observablehq.com/@manzt/ome-tiff-as-filesystemreference
- https://github.com/intake/fsspec-reference-maker
- MK: API is the important thing (sharded, unsharded, etc.) should do the access for me.
MK: compression schemes
- "you MUST implement these compressors"
- NH: subset based on performance (speed or compression). could specify minima.
- DVB: think that datatypes and compression shouldn't
- JM: would put that into validation
- CP: good to abstract into zarr
- NH: without the spec, offsetting to implementors
- JM: but to *Zarr implementors (let's re-use)
- TM: would be nice to have a "can i use codec/feature matrix?" for a glance
- DB: compression can get exotic quickly
- TM: all imagecodecs are exported to numcodecs (have a unique key)

Misc

JB: readers for ITK. How long until elastix, etc. can use them? (Painful to convert to Nifti and then convert back)
- MM: depends on the avenue. JB: can use simple-elastix
- MM: very soon or currently in Python. In Java, relatively soon. Have some funding. So, next few months (not the next ITK release. Separator repository package developed independently) Q4 or Q1 2022
  - ITK formats in Python conversion: https://github.com/spatial-image/spatial-image-ngff
  - ITK formats in C++ conversion: https://github.com/InsightSoftwareConsortium/ITKIOZ5 (this repository is getting a reawakening, maybe renamed if we use a different C++ library for the Zarr support)
- JB: updates?
- AL: if new IO is in ITK, then get those in slicer roughly after a month or so (metadata may be difficult; requires extra integration work)
Next time
- Group generally less keen on gather town
- JM: perhaps we test it beforehand? NH: Yes!
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
MK: Difficulty on Windows
- Josh: talking to one Vendor. (CSHARP)
- Do we need to start having "data generator" calls?
- Benchmarking!!
- Windows with local SSDs (NTFS) perhaps with RAID
- Also Enterprise filesystems and then NFS …
- JM: where do we run this? GitHub Actions? (gigabyte scale)

Links

Feel free to add links here at the bottom of the document to make referencing things above cleaner. Alphabetical by alias will make it easier to detect conflicts.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

NGFF Community Call 2021-09-02

Using this document

Brief agenda

Introductions 20m

Zarr and OME-Zarr status (Josh) <5m

Any outstanding community items ~30m

Spec development: v0.4 and beyond (Constantin) 20m

Sharding (Norman): Sharding Slides 20m

Tools (Constantin): 20m

Next & future steps ~5min

"User registration" Session 1

Session 1 Live Notes

Introductions

Zarr & OME-Zarr quick status report

Transformations

Sharding

Tools

Misc

"User registration" Session 2

Session 2 Live Notes

Introduction

Status / Community

Transforms

Tools

Sharding

Misc

Links

Introductions `20m`

Zarr and OME-Zarr status (Josh) `<5m`

Any outstanding community items `~30m`

Spec development: v0.4 and beyond (Constantin) `20m`

Sharding (Norman): Sharding Slides `20m`

Tools (Constantin): `20m`

Next & future steps `~5min`