"Rich OME-Zarr" (ROZ) Brainstorming

--- tags: NGFF, Agenda, Brainstorming --- # "Rich OME-Zarr" (ROZ) Brainstorming *Note: "ROZ" isn't intended as a product name just as a short-cut for use throughout the conversation.* > **Driver** :point_right: IDR will run out of NFS storage within the next several months. We need to accept OME-Zarr submissions which are at least as complete as OME-TIFF ones so that we can make use of S3 storage. The current state of support is one can take a subdirectory from the bioformat2raw output, rename it to include `.zarr` and import it into (pre-release) OMEROs using a `--depth` argument, but the OME-XML metadata is not included. In the case of non-HCS filesets, the relationship between the images is also not included. (Relatedly, mixed HCS / non-HCS datasets are not supported in bioformats2raw.) The primary goals of the call are: - make sure everyone is clear on the options in [PR 104: OME Metadata Support](https://github.com/ome/ngff/issues/104) (**Please read beforehand**) - choose an initial set of options to start with - decide which features are required and in what order they can/must be developed - outline testing & deployment scenarios (when are we using what by when) - work towards a maintainable roadmap for full ROZ support # Open Questions & Discussion - Anything unclear? (Feel free to list questions here before the meeting) - ... - Will: - I assume that OME-NGFF conversion needs to happen *before* import to OMERO (we're not considering OMERO 4 but with NGFF instead of /Pixels)? Josh: Yes, conversion external to OMERO. Seb: or potentially *after* too. Important thing is that the conversion is not expcted managed by the import process (like the legacy OMERO pyramid generation). - If we're talking about a "Transition" format, do we know what we're transitioning to? If we do, does that influence the next steps, or is the priority to solve IDR driver ASAP? Josh: primary transition would be XML --> JSON, but yes, the balance is creating baggage we need to support against getting data in ASAP. - How much metadata support do we need for IDR? E.g. if we don't support MicrobeamManipulation, does that matter (if there are no IDR images that use it?). I estimate that about half of the studies in IDR have no acquisition metadata, and that support of Objective, Detector, exposure times and excitation/emission wavelengths would be enough to cover > 95% of studies. Josh: for OME-XML that's probably not an issue since all of that infrastructure exists. If we want to discuss non-XML solutions, then that is perhaps something to consider. - It seems we have started defining a new format (OME-NGFF) but are we also creating a new data model, or are we trying to stick with the existing OME model? Do we want to support both in OME-NGFF? Josh: I have an idea on making them perhaps orthogonal, e.g. do we try to keep NGFF at the TIFF level and add metadata "insidhe", but needs discussion - "Zarr output of bioformats2raw [should be] readable by all readers". What does that mean? e.g. Vizarr should be able to read METADATA.ome.xml and display of MicrobeamManipulation etc? Josh: read OME-XMl, yes, since it would be part of the standard. What they _do_ with the details is less an issue. - If OMERO can read METADATA.ome.xml (for IDR), does that mean it is a standard (and everything else has to read it too?). Seems that's asking a lot of the OME-NGFF community in order to support IDR's needs? Josh: yes, the goal is to have a metadata specification for OME-NGFF but it's not just for IDR. - [NGFF104](https://github.com/ome/ngff/issues/104): OME-XML or JSON? Propose: `bioformts2raw.layout`: Josh: primarily - [NGFF104](https://github.com/ome/ngff/issues/104): File or metadata (cf. xarray)? Propose: file - [NGFF104](https://github.com/ome/ngff/issues/104): Timeline? Propose: before the summer - How would others introduce their metadata? (4DN, DICOM, ...) - Propose: Separate file for now. - Extensible mechanism is prototyped in [ome-zarr-metadata](https://github.com/ome/ome-zarr-metadata) - Would we release this as OME-Zarr v1.0? OME-Zarr "transitional"? etc. - Do we put other spec changes on hold? (cf. Bogovic) - Release mechanism? Propose frequent OMERO.server releases - One per milestone? ---- # Roadmap proposal 1 See [related drawing](/j3PiOVT7Qwag2i9Zd30asQ?view) :point_left: ## M1: Rich metadata The first set of tasks focus more on simply making OME-Zarr as rich as OME-TIFF (ROZ & ROT). This mostly entails releases of: - OMERO.server - OMERO.insight - Bio-Formats - ome-zarr-py - ngff - and to a lesser degree, bioformats2raw ### M1.1: add `bioformats2raw.layout` spec - [???] ngff PR which specifies the current semantics ### M1.2: add metadata spec ```json // Potential JSON { "@type": "some-container", "metadataSources": [ { "@type": "ome-xml", "file": { "path": "/OME/METADATA.ome.xml" }, // Not required in some cases (like HCS) "seriesMapping": [ "s0", "s1", "s2" ] } ] } ``` - [???] write spec to deprecate `bioformats2raw.layout` - [Melissa] update bioformats2raw - [Will] update Python implementations, etc. ### M1.3: enable usage in OMERO - [David] update ZarrReader to support `bioformats2raw.layout` and the new spec - [Seb] enable replacing filesets in place (like pyramids) - [???] build ZarrReader-enabled OMEROs - [???] deploy in IDR ### M1.4: QA - [Petr] define comprehensive tests in spreadsheet! :heart: - [Petr] choose input OME-TIFFs & PFFs - [Petr] test & benchmark ---- ## M2: Remote Zarrs & Beyond The second milestone is more concerned with dealing with the storage issues of IDR (which are beyond our control). This includes enabling remote & more performant access, i.e. things that are not strictly necessary to import an OME-Zarr into OMERO. It, however, is likely necessary to complete (parts of) M2 in order to avoid downtime of IDR. ### M2.1: Archive PFFs to slow S3 - [???] generate Zarrs for PFFs - [???] offload PFFs onto S3 - [???] enable s3fs access (??) ### M2.2: Split low- and hi-res Zarrs - [???] add implementation (and possibly spec) for remote scales ### M2.3: chunk API - [???] enable remote access to chunk API - [???] make use of new chunk API ---- ## M3: ZarrIDR Various tasks that might be moved to earlier if deemed necessary. - [???] Replace .screens with OME-Zarrs * [???] Fix `--depth` requirement # Notes Attending: Josh, Seb, Will, Petr, David, Melissa, Erin, Khaled, Jason, Mina, Chris * Josh: issue is that current conversion loses some metadata. * Some changes required to the format to make it usable * Benefits of using XML: libraries exist, schemas exist. * Allows people to start working on the drawing immediately. * Bigger question of whether we are deprecating it. Maybe not too onerous since we need to support OME-TIFF * Defining new format _and_ new model? * Thinking of NGFF equivalent to TIFF * M1.2 is defining the equivalent to the OME-TIFF rule defining where the metadata lives. Might not be as limited as the single TIFF entrypoint e.g. allows to deal with multiple metadata entrypoint (similar to Micro-Manager TIFF vs OME-TIFF) * Will: place for new rendering settings? * Josh: don't have it in OME-XML. Similar to collections * Will: also go beyond 5D? * Josh: need to define intermediate milestones * Chris: most questions fall into the Transitional phase. Make current structure work as well as OME-TIFF. Definitely want to add rendering settings, collections. But huge gap with dealing with metadat to allow applications to function correctly. * Bridging the gap. Use existing infrastructure as possible will require coding from scratch * Will: teach OMERO to consume bioformats2raw. Just for IDR? * Seb: Also any OMERO * Josh: also Vitessce waiting for our specification * Chris: if we don't do it now, others will implement it for us * Seb: IDR is a driver that helps us to release but the use cases are wider * Josh: roadmap drawing * M1 * first task is to implement the legacy spec, then work on new metadata block * ZarrReader reading bf2raw metadata. Using raw2ometiff implementation as starting * M1.3: equivalent of OME-TIFF import into OMERO * metadata completeness i.e. compare OME-TIFF/OME-NGFF. Not focus on import performance * M1.deployment. Assuming iteration on OMEZarrReader, need OMERO.server release. Monthly patch releases * Seb: question of reimport as opposed to import (i.e conversion of PFF into OME-NGFF). Josh: should be testable without NGFF i.e. using data converted to OME-TIFF * M2 * Chunk API. Could be exposed via blitz API to the clients. Could be entire separate epic * Extract largest resolution and move into separate storage. Might be a Zarr concern i.e. potentially outside the scope of current work. * Chris: top-level metadata becoming URIs rather than paths? Josh: requires server changes * Jason: timelines. Is M1 now vs M2 important but separable? * Close to getting Zarrs into OMERO * Hard bits: specifying it, proving ourselves it is correct * Feels like baggage _but_ people assuming OME will solve this problem for us * Josh: assuming other people (4DN, DICOM) can plug in their own metadata * Timelines * Release OMERO.server asap * Then move towards a demo of importing a ROZ into OMERO * Chris: some GS use cases of people interesting into importing * Discussions daily? Jason: some provision to report on Tuesday * Actions * Chris: can do bioformats2raw and apply whatever specification is decided * ZarrReader PR to detect the layout * Metadata discussions