owned this note changed 3 years ago
Published Linked with GitHub

"Rich OME-Zarr" (ROZ) Brainstorming

Note: "ROZ" isn't intended as a product name just as a short-cut for use throughout the conversation.

Driver

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
IDR will run out of NFS storage within the next several months. We need to accept OME-Zarr submissions which are at least as complete as OME-TIFF ones so that we can make use of S3 storage.

The current state of support is one can take a subdirectory from the bioformat2raw output, rename it to include .zarr and import it into (pre-release) OMEROs using a --depth argument, but the OME-XML metadata is not included. In the case of non-HCS filesets, the relationship between the images is also not included. (Relatedly, mixed HCS / non-HCS datasets are not supported in bioformats2raw.)

The primary goals of the call are:

  • make sure everyone is clear on the options in PR 104: OME Metadata Support
    (Please read beforehand)
  • choose an initial set of options to start with
  • decide which features are required and in what order they can/must be developed
  • outline testing & deployment scenarios (when are we using what by when)
  • work towards a maintainable roadmap for full ROZ support

Open Questions & Discussion

  • Anything unclear? (Feel free to list questions here before the meeting)
    • Will:
      • I assume that OME-NGFF conversion needs to happen before import to OMERO (we're not considering OMERO 4 but with NGFF instead of /Pixels)? Josh: Yes, conversion external to OMERO. Seb: or potentially after too. Important thing is that the conversion is not expcted managed by the import process (like the legacy OMERO pyramid generation).
      • If we're talking about a "Transition" format, do we know what we're transitioning to? If we do, does that influence the next steps, or is the priority to solve IDR driver ASAP? Josh: primary transition would be XML > JSON, but yes, the balance is creating baggage we need to support against getting data in ASAP.
      • How much metadata support do we need for IDR? E.g. if we don't support MicrobeamManipulation, does that matter (if there are no IDR images that use it?). I estimate that about half of the studies in IDR have no acquisition metadata, and that support of Objective, Detector, exposure times and excitation/emission wavelengths would be enough to cover > 95% of studies. Josh: for OME-XML that's probably not an issue since all of that infrastructure exists. If we want to discuss non-XML solutions, then that is perhaps something to consider.
      • It seems we have started defining a new format (OME-NGFF) but are we also creating a new data model, or are we trying to stick with the existing OME model? Do we want to support both in OME-NGFF? Josh: I have an idea on making them perhaps orthogonal, e.g. do we try to keep NGFF at the TIFF level and add metadata "insidhe", but needs discussion
      • "Zarr output of bioformats2raw [should be] readable by all readers". What does that mean? e.g. Vizarr should be able to read METADATA.ome.xml and display of MicrobeamManipulation etc? Josh: read OME-XMl, yes, since it would be part of the standard. What they do with the details is less an issue.
      • If OMERO can read METADATA.ome.xml (for IDR), does that mean it is a standard (and everything else has to read it too?). Seems that's asking a lot of the OME-NGFF community in order to support IDR's needs? Josh: yes, the goal is to have a metadata specification for OME-NGFF but it's not just for IDR.
  • NGFF104: OME-XML or JSON? Propose: bioformts2raw.layout: Josh: primarily
  • NGFF104: File or metadata (cf. xarray)? Propose: file
  • NGFF104: Timeline? Propose: before the summer
  • How would others introduce their metadata? (4DN, DICOM, )
    • Propose: Separate file for now.
    • Extensible mechanism is prototyped in ome-zarr-metadata
  • Would we release this as OME-Zarr v1.0? OME-Zarr "transitional"? etc.
    • Do we put other spec changes on hold? (cf. Bogovic)
  • Release mechanism? Propose frequent OMERO.server releases
    • One per milestone?

Roadmap proposal 1

See related drawing

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

M1: Rich metadata

The first set of tasks focus more on simply making OME-Zarr as rich as OME-TIFF (ROZ & ROT). This mostly entails releases of:

  • OMERO.server
  • OMERO.insight
  • Bio-Formats
  • ome-zarr-py
  • ngff
  • and to a lesser degree, bioformats2raw

M1.1: add bioformats2raw.layout spec

  • [???] ngff PR which specifies the current semantics

M1.2: add metadata spec

// Potential JSON
{
    "@type": "some-container",
    "metadataSources": [
        {
            "@type": "ome-xml",
            "file": {
                "path": "/OME/METADATA.ome.xml"
            },
            // Not required in some cases (like HCS)
            "seriesMapping": [
                "s0", "s1", "s2"
            ]
        }
    ]
}
  • [???] write spec to deprecate bioformats2raw.layout
  • [Melissa] update bioformats2raw
  • [Will] update Python implementations, etc.

M1.3: enable usage in OMERO

  • [David] update ZarrReader to support bioformats2raw.layout and the new spec
  • [Seb] enable replacing filesets in place (like pyramids)
  • [???] build ZarrReader-enabled OMEROs
  • [???] deploy in IDR

M1.4: QA

  • [Petr] define comprehensive tests in spreadsheet!
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • [Petr] choose input OME-TIFFs & PFFs
  • [Petr] test & benchmark

M2: Remote Zarrs & Beyond

The second milestone is more concerned with dealing with the storage issues of IDR (which are beyond our control). This includes enabling remote & more performant access, i.e. things that are not strictly necessary to import an OME-Zarr into OMERO. It, however, is likely necessary to complete (parts of) M2 in order to avoid downtime of IDR.

M2.1: Archive PFFs to slow S3

  • [???] generate Zarrs for PFFs
  • [???] offload PFFs onto S3
  • [???] enable s3fs access (??)

M2.2: Split low- and hi-res Zarrs

  • [???] add implementation (and possibly spec) for remote scales

M2.3: chunk API

  • [???] enable remote access to chunk API
  • [???] make use of new chunk API

M3: ZarrIDR

Various tasks that might be moved to earlier if deemed necessary.

  • [???] Replace .screens with OME-Zarrs
  • [???] Fix --depth requirement

Notes

Attending: Josh, Seb, Will, Petr, David, Melissa, Erin, Khaled, Jason, Mina, Chris

  • Josh: issue is that current conversion loses some metadata.
    • Some changes required to the format to make it usable
    • Benefits of using XML: libraries exist, schemas exist.
    • Allows people to start working on the drawing immediately.
    • Bigger question of whether we are deprecating it. Maybe not too onerous since we need to support OME-TIFF
  • Defining new format and new model?
    • Thinking of NGFF equivalent to TIFF
    • M1.2 is defining the equivalent to the OME-TIFF rule defining where the metadata lives. Might not be as limited as the single TIFF entrypoint e.g. allows to deal with multiple metadata entrypoint (similar to Micro-Manager TIFF vs OME-TIFF)
  • Will: place for new rendering settings?
    • Josh: don't have it in OME-XML. Similar to collections
    • Will: also go beyond 5D?
    • Josh: need to define intermediate milestones
    • Chris: most questions fall into the Transitional phase. Make current structure work as well as OME-TIFF. Definitely want to add rendering settings, collections. But huge gap with dealing with metadat to allow applications to function correctly.
    • Bridging the gap. Use existing infrastructure as possible will require coding from scratch
  • Will: teach OMERO to consume bioformats2raw. Just for IDR?
    • Seb: Also any OMERO
    • Josh: also Vitessce waiting for our specification
    • Chris: if we don't do it now, others will implement it for us
    • Seb: IDR is a driver that helps us to release but the use cases are wider
  • Josh: roadmap drawing
    • M1
      • first task is to implement the legacy spec, then work on new metadata block
      • ZarrReader reading bf2raw metadata. Using raw2ometiff implementation as starting
      • M1.3: equivalent of OME-TIFF import into OMERO
      • metadata completeness i.e. compare OME-TIFF/OME-NGFF. Not focus on import performance
      • M1.deployment. Assuming iteration on OMEZarrReader, need OMERO.server release. Monthly patch releases
      • Seb: question of reimport as opposed to import (i.e conversion of PFF into OME-NGFF). Josh: should be testable without NGFF i.e. using data converted to OME-TIFF
    • M2
      • Chunk API. Could be exposed via blitz API to the clients. Could be entire separate epic
      • Extract largest resolution and move into separate storage. Might be a Zarr concern i.e. potentially outside the scope of current work.
      • Chris: top-level metadata becoming URIs rather than paths? Josh: requires server changes
  • Jason: timelines. Is M1 now vs M2 important but separable?
    • Close to getting Zarrs into OMERO
    • Hard bits: specifying it, proving ourselves it is correct
    • Feels like baggage but people assuming OME will solve this problem for us
    • Josh: assuming other people (4DN, DICOM) can plug in their own metadata
  • Timelines
    • Release OMERO.server asap
    • Then move towards a demo of importing a ROZ into OMERO
    • Chris: some GS use cases of people interesting into importing
    • Discussions daily? Jason: some provision to report on Tuesday
  • Actions
    • Chris: can do bioformats2raw and apply whatever specification is decided
    • ZarrReader PR to detect the layout
    • Metadata discussions
Select a repo