owned this note
owned this note
Published
Linked with GitHub
---
tags: NGFF, Agenda, Brainstorming
---
# "Rich OME-Zarr" (ROZ) Brainstorming
<small><i>
*Note: "ROZ" isn't intended as a product name just as a short-cut for use throughout the conversation.*
</i></small>
> **Driver** :point_right: IDR will run out of NFS storage within the next several months. We need to accept OME-Zarr submissions which are at least as complete as OME-TIFF ones so that we can make use of S3 storage.
The current state of support is one can take a subdirectory from the bioformat2raw output, rename it to include `.zarr` and import it into (pre-release) OMEROs using a `--depth` argument, but the OME-XML metadata is not included. In the case of non-HCS filesets, the relationship between the images is also not included. (Relatedly, mixed HCS / non-HCS datasets are not supported in bioformats2raw.)
The primary goals of the call are:
- make sure everyone is clear on the options in [PR 104: OME Metadata Support](https://github.com/ome/ngff/issues/104)<br/>(**Please read beforehand**)
- choose an initial set of options to start with
- decide which features are required and in what order they can/must be developed
- outline testing & deployment scenarios (when are we using what by when)
- work towards a maintainable roadmap for full ROZ support
# Open Questions & Discussion
- Anything unclear? (Feel free to list questions here before the meeting)
- ...
- Will:
- I assume that OME-NGFF conversion needs to happen *before* import to OMERO (we're not considering OMERO 4 but with NGFF instead of /Pixels)? Josh: Yes, conversion external to OMERO. Seb: or potentially *after* too. Important thing is that the conversion is not expcted managed by the import process (like the legacy OMERO pyramid generation).
- If we're talking about a "Transition" format, do we know what we're transitioning to? If we do, does that influence the next steps, or is the priority to solve IDR driver ASAP? Josh: primary transition would be XML --> JSON, but yes, the balance is creating baggage we need to support against getting data in ASAP.
- How much metadata support do we need for IDR? E.g. if we don't support MicrobeamManipulation, does that matter (if there are no IDR images that use it?). I estimate that about half of the studies in IDR have no acquisition metadata, and that support of Objective, Detector, exposure times and excitation/emission wavelengths would be enough to cover > 95% of studies. Josh: for OME-XML that's probably not an issue since all of that infrastructure exists. If we want to discuss non-XML solutions, then that is perhaps something to consider.
- It seems we have started defining a new format (OME-NGFF) but are we also creating a new data model, or are we trying to stick with the existing OME model? Do we want to support both in OME-NGFF? Josh: I have an idea on making them perhaps orthogonal, e.g. do we try to keep NGFF at the TIFF level and add metadata "insidhe", but needs discussion
- "Zarr output of bioformats2raw [should be] readable by all readers". What does that mean? e.g. Vizarr should be able to read METADATA.ome.xml and display of MicrobeamManipulation etc? Josh: read OME-XMl, yes, since it would be part of the standard. What they _do_ with the details is less an issue.
- If OMERO can read METADATA.ome.xml (for IDR), does that mean it is a standard (and everything else has to read it too?). Seems that's asking a lot of the OME-NGFF community in order to support IDR's needs? Josh: yes, the goal is to have a metadata specification for OME-NGFF but it's not just for IDR.
- [NGFF104](https://github.com/ome/ngff/issues/104): OME-XML or JSON? Propose: `bioformts2raw.layout`: Josh: primarily
- [NGFF104](https://github.com/ome/ngff/issues/104): File or metadata (cf. xarray)? Propose: file
- [NGFF104](https://github.com/ome/ngff/issues/104): Timeline? Propose: before the summer
- How would others introduce their metadata? (4DN, DICOM, ...)
- Propose: Separate file for now.
- Extensible mechanism is prototyped in [ome-zarr-metadata](https://github.com/ome/ome-zarr-metadata)
- Would we release this as OME-Zarr v1.0? OME-Zarr "transitional"? etc.
- Do we put other spec changes on hold? (cf. Bogovic)
- Release mechanism? Propose frequent OMERO.server releases
- One per milestone?
----
# Roadmap proposal 1
See [related drawing](/j3PiOVT7Qwag2i9Zd30asQ?view) :point_left:
## M1: Rich metadata
The first set of tasks focus more on simply making OME-Zarr as rich as OME-TIFF (ROZ & ROT). This mostly entails releases of:
- OMERO.server
- OMERO.insight
- Bio-Formats
- ome-zarr-py
- ngff
- and to a lesser degree, bioformats2raw
### M1.1: add `bioformats2raw.layout` spec
- [???] ngff PR which specifies the current semantics
### M1.2: add metadata spec
```json
// Potential JSON
{
"@type": "some-container",
"metadataSources": [
{
"@type": "ome-xml",
"file": {
"path": "/OME/METADATA.ome.xml"
},
// Not required in some cases (like HCS)
"seriesMapping": [
"s0", "s1", "s2"
]
}
]
}
```
- [???] write spec to deprecate `bioformats2raw.layout`
- [Melissa] update bioformats2raw
- [Will] update Python implementations, etc.
### M1.3: enable usage in OMERO
- [David] update ZarrReader to support `bioformats2raw.layout` and the new spec
- [Seb] enable replacing filesets in place (like pyramids)
- [???] build ZarrReader-enabled OMEROs
- [???] deploy in IDR
### M1.4: QA
- [Petr] define comprehensive tests in spreadsheet! :heart:
- [Petr] choose input OME-TIFFs & PFFs
- [Petr] test & benchmark
----
## M2: Remote Zarrs & Beyond
The second milestone is more concerned with dealing with the storage issues of IDR (which are beyond our control). This includes enabling remote & more performant access, i.e. things that are not strictly necessary to import an OME-Zarr into OMERO. It, however, is likely necessary to complete (parts of) M2 in order to avoid downtime of IDR.
### M2.1: Archive PFFs to slow S3
- [???] generate Zarrs for PFFs
- [???] offload PFFs onto S3
- [???] enable s3fs access (??)
### M2.2: Split low- and hi-res Zarrs
- [???] add implementation (and possibly spec) for remote scales
### M2.3: chunk API
- [???] enable remote access to chunk API
- [???] make use of new chunk API
----
## M3: ZarrIDR
Various tasks that might be moved to earlier if deemed necessary.
- [???] Replace .screens with OME-Zarrs
* [???] Fix `--depth` requirement
# Notes
Attending: Josh, Seb, Will, Petr, David, Melissa, Erin, Khaled, Jason, Mina, Chris
* Josh: issue is that current conversion loses some metadata.
* Some changes required to the format to make it usable
* Benefits of using XML: libraries exist, schemas exist.
* Allows people to start working on the drawing immediately.
* Bigger question of whether we are deprecating it. Maybe not too onerous since we need to support OME-TIFF
* Defining new format _and_ new model?
* Thinking of NGFF equivalent to TIFF
* M1.2 is defining the equivalent to the OME-TIFF rule defining where the metadata lives. Might not be as limited as the single TIFF entrypoint e.g. allows to deal with multiple metadata entrypoint (similar to Micro-Manager TIFF vs OME-TIFF)
* Will: place for new rendering settings?
* Josh: don't have it in OME-XML. Similar to collections
* Will: also go beyond 5D?
* Josh: need to define intermediate milestones
* Chris: most questions fall into the Transitional phase. Make current structure work as well as OME-TIFF. Definitely want to add rendering settings, collections. But huge gap with dealing with metadat to allow applications to function correctly.
* Bridging the gap. Use existing infrastructure as possible will require coding from scratch
* Will: teach OMERO to consume bioformats2raw. Just for IDR?
* Seb: Also any OMERO
* Josh: also Vitessce waiting for our specification
* Chris: if we don't do it now, others will implement it for us
* Seb: IDR is a driver that helps us to release but the use cases are wider
* Josh: roadmap drawing
* M1
* first task is to implement the legacy spec, then work on new metadata block
* ZarrReader reading bf2raw metadata. Using raw2ometiff implementation as starting
* M1.3: equivalent of OME-TIFF import into OMERO
* metadata completeness i.e. compare OME-TIFF/OME-NGFF. Not focus on import performance
* M1.deployment. Assuming iteration on OMEZarrReader, need OMERO.server release. Monthly patch releases
* Seb: question of reimport as opposed to import (i.e conversion of PFF into OME-NGFF). Josh: should be testable without NGFF i.e. using data converted to OME-TIFF
* M2
* Chunk API. Could be exposed via blitz API to the clients. Could be entire separate epic
* Extract largest resolution and move into separate storage. Might be a Zarr concern i.e. potentially outside the scope of current work.
* Chris: top-level metadata becoming URIs rather than paths? Josh: requires server changes
* Jason: timelines. Is M1 now vs M2 important but separable?
* Close to getting Zarrs into OMERO
* Hard bits: specifying it, proving ourselves it is correct
* Feels like baggage _but_ people assuming OME will solve this problem for us
* Josh: assuming other people (4DN, DICOM) can plug in their own metadata
* Timelines
* Release OMERO.server asap
* Then move towards a demo of importing a ROZ into OMERO
* Chris: some GS use cases of people interesting into importing
* Discussions daily? Jason: some provision to report on Tuesday
* Actions
* Chris: can do bioformats2raw and apply whatever specification is decided
* ZarrReader PR to detect the layout
* Metadata discussions