owned this note changed 3 years ago
Published Linked with GitHub

Please paste this into the Zoom chat as new people join:

Welcome to the community call. Please be aware that this session may be recorded. Live notes for the session are available in https://hackmd.io/GZ1euZUSRZeqPTJj9WJEtg Where possible, help to structure the notes for later publication rather than commenting in Zoom's chat. Thanks!

NGFF Community Call 2021-09-02

See: Previous meeting notes, Connection information, and Recordings.

Using this document

This document is a place where you can help drive what needs discussing. Add your thoughts, needs, etc. or even new sections if need be. If there's an idea already in place that you like, give it a

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

If you are unclear about this document, just add a question here and someone will tidy it up or get in touch:

  • no problems yet? Excellent!
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Brief agenda

Introductions 20m

  • min(60s, 1200s/attendees) per person

Zarr and OME-Zarr status (Josh) <5m

  • NGFF v0.3 with axes (more later)
  • ngff preprint back out (HDF fun)
  • zarr_impl is moving along
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Zarr EOSS funding & community manager (plus contractors)

Any outstanding community items ~30m

  • Statuses from other efforts
  • Questions about statuses, goals, etc.
  • etc. It's good to hear from you!

Spec development: v0.4 and beyond (Constantin) 20m

  • add support for transforms, see also Trafo discussion
    • Trafo proposal by John and Stephan
    • how do we specify axes involved in trafo?
    • one transform per scale dataset
  • work onaxes specification, see also Axes discussion
    • add units as list
    • or alternative proposal for units in Trafo proposal, 3 lists might be equivalent (not as elegant, but compatible with 0.3)
  • Discussions
    • do we allow more than 5d?
    • arbitraty axes names? (related to prev. point)
    • if possible would rather not tackle these points in the very next version

Sharding (Norman): Sharding Slides 20m

Tools (Constantin): 20m

Next & future steps ~5min


"User registration" Session 1

Name Institute Twitter Handle GitHub Handle
Copy and paste me
Josh Moore University of Dundee notjustmoore joshmoore
Norman Rzepka scalable minds normanrz normanrz
Constantin Pape EMBL Heidelberg @cppape constantinpape
Juan Nunez-Iglesias Monash University @jnuneziglesias jni
Matthew Hartley EMBL-EBI mrmh2
Volker Hilsenstein EMBL Heidelberg VolkerH
Kimberly Meechan EMBL Heidelberg @Sci_Wanderlust K-Meech
Sébastien Besson University of Dundee sbesson
Guillaume Gay Aix Marseille Université morpholg glyg
Gonzalo Merino PIC, Barcelona @pic_es
Rohola Hosseini Leiden University
Jean-Marie Burel University of Dundee jburel
Mark Kittisopikul Janelia Research Campus/ HHMI @markkitti @mkitti
Jean-Karim Heriche EMBL jkh1
Ken Ho Francis Crick Institute DrKenHo DrKenHo-crick
Koji Kyoda RIKEN Center for Biosystems Dynamics Research kkyoda kkyoda
Christian Tischer EMBL Heidelberg tischitischer tischi
Aastha Mathur Euro-BioImaging
Tobias Pietzsch pietzscht tpietzsch

Session 1 Live Notes

Introductions

  • Some key words: spatial transcriptomics / metabolomics; public services; sharding at petabte scale; standardized formats (
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    ); list of needs is too long; light sheet; cloud; ending the initial "what format?" conversations; saving direct from microscope; reading vs writing performance; common & scalable is critical; conserving & enriching datasets; putting an end to the nonsense (proliferation of file formats); big data; pyramids & collections.
  • AE: example of ome-ngff for spatial metabolomics; some extension for transformation spec; would be good to check if this is compatible with Trafo proposal
  • KK/KH: BD5(HDF5) hdf5 + xml data format, now developing BDZ(arr) linking(?) with ome-ngff
    • note: BD stands for Biological Dynamics - i.e. ROIs, polygons.

Zarr & OME-Zarr quick status report

  • Comments:

  • GG:

    • ROIs/Meshes: moving back from a funded topic to a community one.
      • KH: Q?: Shouldn't ROIs and meshes be separate. I think they are quite different, no?
      • GG: Yes, here is the discussion about meshes meshes issue
    • SMLM: GG, IG,

Transformations

  • JMB: difference between unit name or unit symbol (see issues in OME-XML)
  • CP: they've diverged from v0.3. First step is how to bring them together.
  • JNI: SS made the point that units is of the target space (as axes)
    • v0.3 is agnostic (identity). Will need to be clear in the document. (Axes labels and units are properties of the target space)
  • JK: is time space regularly defined or arbitrary? Currently not arbitrary. Add coordinates?
    • CP: Could be done via transformation on the time dimension
    • Josh: xarray style coordinate arrays could also be used, idea would be that xarray can read the metadata and then one can e.g. query time point @ 4mins. We would need to add specific metadata.
    • Tischi: maybe xarray could be one of the transformation types? (JM: interesting)
  • JM: who would like to be alpha/beta tester?
    • NR: need an implementation in Webknossos first and then can look at transforms
    • CP: will talk about tools also a bit later (unifying BDV & MoBIE, etc.)
    • AE: ready to test any upcoming spec (currently using most minimal Github proposal for affine matrix in private name space)
  • SB: at multiscale zgroup or single resolution zarray level? zarray (datasets)
  • SB: is correlative in scope? (aligning two images)
    • CP: transforms are the way to do it, but not clear on how to specify the set of images (solve that orthogonally; single image first)
  • MK: include shuffling, etc. i.e. compression? No. That's underlying Zarr.
    • CP: this is about transforming the coordinate space
  • AE: An image may have transformations relative to different coordinate systems. E.g. before registrations are finished, a single global coordinate system is not yet known. If not in scope / too complex for spec, we can keep track of temporary transformations externally.

Sharding

  • 32x32x32 chunks (of size 32x32x32) in one file per channel.
  • stored sequentially in z(morton)-order, compressed individually.
    • easier loading of local collections
  • uncompressed writes to chunks; compressed to shards.
  • Discussion
    • CT: tried in object storage? Don't have in WK but in neuroglancer (from google/GCS)
    • KH: to read individual chunks you need to decompress whole shard? No header has an index.
    • SB: optimizing shard size? (recreating HDF?) BDV shards on XYZ-image (unit of work). 1GB benchmarked?
      • NR: dependent on the application. works well for WK for writing files on given cluster
      • Bigger chunks would be ok as well, depending. Can be tuned. Power-of-2 restriction.
    • VH: Zarr issue? JM: Yes, but good to have some/all of you involved in Zarr as well.
    • MK: seems useful. Trying to write a position on Windows and having problems with lots of files.
      • Looks familiar to the blosc2 proposal. (chunks & blocks)
      • 32kb per chunk was chosen because of L1-cache side? No internet speeds at the time. (in Nature Methods, 2017). Might switch to 64^3 now.
      • CP: 64 is good for raw data; for compressible data, 96 or 128 (last year)
    • MK: for uint16? Never did that. Always stayed with 32 for higher-bit data.
    • MH: all data is chunked/shared the same? Basically always the same.
      • for a few applications (parallel write) then needed smaller shards.
    • TP: good idea, very useful. Main issue is writing. Good thing about N5/Zarr is very flexible. Missing blocks, write in any order. Consider a workflow from uncompressed to compressed shards?
      • NR: that's the typical workflow, yes. (Scheduling management)
      • TP: nice to have it transparently.
      • TP: cool if sharding was completely independent of the dataset, re-sharding. (without re-compression??)
        • NR: z-ordering helps with that. It's just concatenating + index rewriting.
    • MH: somewhere between critical and absolutely essential (for EBI S3), even if not immediate.
  • Additionally

Tools

  • CP: good to have a list. Keep it updated. Strive to have good support:
    • Fiji, Napari, Web,
    • May need some work to consolidate it.
    • e.g. multiple efforts in Fiji: MoBIE, Saalfeld lab tools,
    • plus get it into the normal Fiji distribution
    • Also good to advertise the napari plugin (napari-ome-zarr)
    • Perhaps also regularly show tools at these calls
  • JNI: writing from python with dask arrays? Should work.
  • Tischi: How far are we from Fiji: File > Save As > OME.ZARR and File > Open > OME.ZARR ?
    • I think Kimberly did something for saving?!
    • instructions
      • That's only reading and only into BDV, right?
    • Have writer (need to update imagesc thread). On time of Saalfeld's writer.
  • VH: AE implemented, with dask-downsampling. Sharing on github soon. Pyramids, numpy & dask.
  • MK: incompatibilities, https://github.com/zarr-developers/numcodecs/issues/175
    • JM: worth capturing in zarr_implementations. Also point specification issues e.g. in the Zarr spec
    • JM: progression of involvement issue, PR, fix, spec update (All useful!)
    • MK: multiple tools is a problem. multiple paths for single language (Java -> C) needs testing.
  • JMB: great to have lots of readers & writers (especially on the list)
    • we're testing trying to test centrally when there are upstream PRs.
  • GG: notion of validating
  • CT: something core OME? SB: OMEZarrReader, but a few specs behind. Need an update site.
    • CP: good to not replicate in the BDV space? Next action? image.sc thread?
    • TP: good idea. missing in fiji is the internal representation for multiscales & collections of images.
    • CT: you get the thumbnails, "pick one". Ok for now.
  • MK: nothing yet on the Julia side (ome-zarr-jl). CP: don't know anyone

Misc

  • imagesc-island
    • No objections to using it.
    • Probably before the end of the year.
    • Gather Town client.
    • Perhaps focus on collections.
  • Date consideration for early December:

"User registration" Session 2

Name Institute Twitter Handle GitHub Handle
Copy and paste me
Josh Moore University of Dundee notjustmoore joshmoore
John Bogovic HHMI Janelia BogovicJohn bogovicj
Jordao Bragantini CZ Biohub jobragantini jookuma
Davis Bennett HHMI Janelia davisvbennett d-v-b
Melissa Linkert Glencoe Software melissalinkert
Constantin Pape EMBL Heidelberg @cppape constantinpape
Niko Ehrenfeuchter Biozentrum, Uni Basel ehrenfeu
Jackson Maxfield Brown Allen Institute for Cell Science @jmaxfieldbrown JacksonMaxfield
David Gault OME dgaulgaulgaulgaul
Trevor Manz Harvard Medical School @trevmanz manzt
Mark Kittisopikul HHMI Janelia @markkitti mkitti
Kevin Kozlowski Glencoe Software kkoz
Eric Perlman perlman perlman
Andras Lasso PerkLab, Queen's University lassoan lassoan
Nick Schaub NCATS/NIH nicholas-schaub
Dave Mellert The Jackson Laboratory DaveMellert mellertd
Matthew McCormick Kitware thewtex thewtex
Lee Kamentsky MIT

Session 2 Live Notes

Introduction

  • Various keywords: funded positions, writing from microscope, 3D medical imaging (Nifti), publication on the web, fast web preview, The One Format Dream, big datasets on the cloud (without copy & paste), cross compatibility between different language ecosystems, registration and stitching - coordinate systems (opening just one section), BIDS microscopy, access control/permissions, xarray-compatibility, IPFS compatibility, multi-language, "necessary evil", "making our lives easier", bfio, RDM
  • JB: on-the-fly converter for different meta-data flavors

Status / Community

  • JBogo - on different implementations, the test suite that Josh mentioned is key. along with examples that Will has posted. Some kind of language-agnostic tests would be cool in order to know how trustworthy / what features a particular implementation supports
  • NH: putting our weight behind an implementation? Probably more of a litmus test (TCK)
  • JM: standard NetCDF blurb
  • DVB: bioimaging starting to face geoscience issues, we should definitely make use of that.
    • NH: isn't that where zarr came from (basically)
  • NH: move in EM space to unify metadata model?
    • cf. OME/BINA
    • DVB: would love to stop firing from the hip and work from a metadata
    • Josh: vEM call roughly every 2 months (hackathon in December)
      • metadata / formats hasn't (as far as i know) been covered much in the meetings I (bogovic) have been to, should try to get it on the agenda

Transforms

  • CP: v0.3 review, axes metadata, etc.
  • Issue discussion transforms
  • Bogovic and Saalfeld's transform spec proposal
  • NH: LSM treating compression differently along T.
  • DVB: want to have different behavior for spatial dimensions as well
    • want to keep semantics out of storage spec
    • perhaps a community convention
    • storage spec should have no idea
  • CP: want to increase usability
  • DVB: problem of the viewers
  • AL: there are many viewers, important to agree
  • DL: With a completely general/arbitrary axis description, we go back toward plain zarr and readers won’t know what to do
  • DVB: powerful if viewers treat the data as tensors
  • AL: difference between axes labels & types
  • JM & DVB: ok to have convention, but viewers shouldn't break if the conventions aren't present
  • DVB: is type needed if unit is possible?
  • JB: wavelength might clash for a channel dimension
  • DVB: channels don't have an ordering.
  • NS/NH: in mass spec it has meaning
  • (chat) JB: Andras, does 3Dslicer treat channels differently from space dimensions? it must, right?
    • Yes, 3D slicer handles different dimensions very differently.
    • exactly, so this standard was missing that information, it would be harmful right
  • NS: allowing communities (tomography) to build up axes types would be powerful (extensibility)
  • NH: consider adding an angular axes? (EM, medical)
    • CP: Would be a valuable proposal.
  • AL: units? CP: part of this discussion
  • DVB: proposal that a validator should not need to check two fields against each other.
    • JM: a limit may be when it comes to performance
  • AL: nerrd, nifti have axis definitions
  • NH: types like linearSpatialType, temporal, spectral,
    • and a really cool browser
  • DVB: in xyztc, C is very special. unit not really nanometer (not a regular grid)
    • neuroglancer has "categorical" dimensions. Don't belong to a space.
    • JB: good for that in general (specifies domain), but wouldn't try to work that into the next version.
      • Could have spherical, toroidal domains
  • JM: perhaps not scalable if every tool that uses a spec needs to be updated / have timely feedback come back
  • NM: likes the idea of concrete types with labels (x,y,z) but flexible enough for different / more general kinds of types, with the responsibility of the renderer to deal with them correctly
  • Summary (Making it happen)
    • CP: tackle in two parts
      • axes type & unit
      • then transformations
    • CT: clarified some of the semantics (napari, imaris, 3d Slicer, etc. should open without asking the user too much)
    • MM: types are also good for the tranformations

Tools

  • comment on performance from JB (chat): "in terms of benchmarking load times, I gave a talk specifically on this topic on how AICSImageIO achieves fast-er times for "non-Zarr" file formats. C++ readers will always be faster but there are things you can do before you "rewrite in C++ https://youtu.be/LNa_gGpSnvc that said, C++ / Rust impls are always faster and I highly encourage them :)"
    • MK: I would also be intereted in benchmarking writing times on different filesystems / operating systems (Windows in particular)

Sharding

  • EP: like this approach. Problems moving datasets away. (8 files for many many terabytes)
    • Lots of files for acquisition; but then can optimize.
    • Like it being more of a Zarr issue.
    • DVB:
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      would love to have it but use it judicially, towards the ends of a dataset (ready for consumption when you are ok to take on the complexity)
  • CP: Matthew from EBI was also quite in favor of saving inodes
  • EP: also solves the missing file problem.
  • MK: how do shards map to files?
  • Josh: fsspec/grib (time permitting)
  • MK: compression schemes
    • "you MUST implement these compressors"
    • NH: subset based on performance (speed or compression). could specify minima.
    • DVB: think that datatypes and compression shouldn't
    • JM: would put that into validation
    • CP: good to abstract into zarr
    • NH: without the spec, offsetting to implementors
    • JM: but to *Zarr implementors (let's re-use)
    • TM: would be nice to have a "can i use codec/feature matrix?" for a glance
    • DB: compression can get exotic quickly
    • TM: all imagecodecs are exported to numcodecs (have a unique key)

Misc

  • JB: readers for ITK. How long until elastix, etc. can use them? (Painful to convert to Nifti and then convert back)
    • MM: depends on the avenue. JB: can use simple-elastix
    • MM: very soon or currently in Python. In Java, relatively soon. Have some funding. So, next few months (not the next ITK release. Separator repository package developed independently) Q4 or Q1 2022
    • JB: updates?
    • AL: if new IO is in ITK, then get those in slicer roughly after a month or so (metadata may be difficult; requires extra integration work)
  • Next time
    • Group generally less keen on gather town
    • JM: perhaps we test it beforehand? NH: Yes!
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
  • MK: Difficulty on Windows
    • Josh: talking to one Vendor. (CSHARP)
    • Do we need to start having "data generator" calls?
    • Benchmarking!!
    • Windows with local SSDs (NTFS) perhaps with RAID
    • Also Enterprise filesystems and then NFS
    • JM: where do we run this? GitHub Actions? (gigabyte scale)

Feel free to add links here at the bottom of the document to make referencing things above cleaner. Alphabetical by alias will make it easier to detect conflicts.

Select a repo