--- tags: NGFF, community-call --- # OME-NGFF community call: 2023-03-15 **See:** [Previous call](https://forum.image.sc/t/ome-ngff-community-call-transforms-and-tables/71792), [image.sc thread](https://forum.image.sc/t/community-call-metadata-in-ome-ngff/77570/10), **Recordings (TBD)** **Code of conduct** The OME community is open to everybody and built upon mutual respect. Please take the time to review the [code of conduct]( https://github.com/ome/.github/blob/master/CODE_OF_CONDUCT.md). Please paste this into the Zoom chat as new people join: :::warning Welcome to the community call. Live notes for the session are available in https://hackmd.io/BqnK9Wm4QpGYAhYOoaFBQQ Where possible, help to structure the notes for later publication rather than commenting in Zoom's chat. Thanks! ::: ## Re-usable Agenda (not for notes) 1. Recap - Welcome & general NGFF/Zarr status (Josh 10m) - Update on tables and transforms specs (v0.5) (Kevin/John 10m) 2. Brief introduction to metadata (Wouter-Michiel ~ 5m) - Phases of metadata generation - Any additional phases of metadata generation? 3. Metadata in ome-ngff (Wouter-Michiel ~ 15m) - Examples of metadata standards - Choices regarding extent of standardization - Possible design choices - Tooling 4. Pitch metadata needs (~ 20m) - Tooling 5. Open discussion (~ 50m) - Opinions on standardization - Opinions on design choices - Implementation in OME-NGFF 6. Summary and providing outlook (~ 10m) - Organizing offline work - Next meeting ## OME-NGFF community call (18:00 CET; 15 Mar 2023) ### "User registration" Session 2 | Name | Institute | Twitter Handle | GitHub Handle | Mastodon Handle | |------------ |---------------------- |---------------- |--------------- | --------------- | | Copy | and | paste | me | | Josh Moore | German BioImaging | notjustmoore | joshmoore | @joshmoore@fediscience.org | | Wouter-Michiel Vierdag | EMBL | WVierdag | melonora | | | Damir Sudar | Quantitative Imaging Systems | | dsudar | @damirs@fosstodon.org | | Koji Kyoda | RIKEN BDR | | | | | Aybuke Kupcu Yoldas | EMBL-EBI | | AybukeKY | | | John Bogovic | HHMI Janelia | BogovicJohn | bogovicj | | | Norman Rzepka | scalable minds | normanrz | normanrz | @normanrz@mastodon.social | | Ken Ho | Crick | DrKenHo | DrKenHo-crick | @kenho@fediscience.org | | Dave Mellert | The Jackson Laboratory | | mellertd | | Thomas Pengo | U of Minnesota | stk_tp | tp81 | | Kiya Govek | The Jackson Laboratory | | govekk | | Andy Sweet | CZI | | andy-sweet| | Will Moore | OME | | will-moore| | | Davis Bennett | HHMI / Janelia | | d-v-b | ### Notes - Round of introductions - Josh M: link to info on zarr v3 - and implementations. This community will be responsible for java impl - JMS and NR have more information - WMK: presentation of metadata - CSDC on linked data: - QUAREP on board (not there now, but moving toward) - QUAREP is about "image acquisition", REMBI describes that and other "topics" - prefers one standard to different standards "picking and choosing" - manufacturers should contribute - DB on different standards: - that several exist suggests there is disagreement - doesn't see coming up with A standard as "our" (ngff's) job - what "we" can do is describe how to put (whatever metadata) in an OME-NGFF - should not be our job to make new metadata - Z: puts metadata from an ome-tiff into a container (aggre w/ davis) - AKY: given, this is my metadata schema and how to validate it - MM: proposes intermediate between standardizing and completely open - find things that are common and define those things - and ability to extend metadata as needed - WMV: meta-metadata - if you provide "non-standard" metadata, what format should it have - enables people to store data before it is officially added - VS: above detailed discussion feels remote from direct users - would like more meetings rather than 1-2 per year - JM: please continue those questions - this call: are there blockers / worries about how to do the storage? - first get on the same page, then we - JM: summary of consensus - what is minimal? who defines minimal? - ngff should enable users to express what they want - can currently attach ome-xml to an ngff dataset - proposal 1: do more of that. attach whatever meta-spec you want (keeps things simpler in ngff) - then the necessary job is to describe how to do that - consensus for link-ML in the morning session - there remains space for describing how to write down the "graph of metadata models" - a practical step - generate a collection of consensus models that we recommend (as MM's suggestion) - CSDC: doesn't think ngff metadata is the place where we create "the standard" - rather have a map of different topics, communities of interest can build consensus when possible (ngff doesn't do that work directly) - agrees that it isn't ngff's job to create a standard - creating a minimal set is not trivial and requires consensus - most users don't care about schemas and would like guidance re: what to record about their exp - NH: would guess that people in communities could come to consensus about the overlapping parts of ontologies - ngff could choose among "some" models - ngff could attempt to encourage consensus building - schema validation tools provided by communities - readers and writers know what standard(s) they work with and - DB: agrees with NH - goes further - onus is on the metadata standard writers - all we need to say in ngff is "dont use reserved keywords" - up to communities to "go crazy" - NH: push back a little on "go nuts" - DB: "go nuts" means "feel unconstrained" - :D JB - KH: manufacturers already have a - doesn't want ngff to be lowest common denominator - so there is value in deciding what the minimum is - AKY: we get all kinds of stuff in the bio image archive - without some useful metadata, images are practically useless - binding tools for the community are needed - general schemas are great for CS people but the above constraints are important for data users and consumers - NR: the archives are in a good position to decide a min metadata standard - ngff is not the place where we mandate a metadata standard - but mandating standards about tech specifications (e.g. linkML, schemas, whatever) - NH: need tools (UIs) to make schemas / specs accessibles for users - CSDC: format should agree on how we store metadata, not what metadata we store - useful to have some recommended communities / standards ("if you use a microscope, look at X,Y") - JM: three groups - (1) just want this to be solved (no tech. opinions) - (2) people who want to exp with link-ml proposal - (3) not interested in (2), then what is the list of technical platforms you want to use for validation - NH: is this meeting about ngff declaring metadata, or using other meta fmts - JM: would like to go beyond xml - desire to move toward linkML - this meeting as "last chance" for folks to suggest alternatives - and search for those interested in experimenting w/ linkML - but will it have a logo?? - MM: for folks building tools to interpret metadata - what is equivalent semantic meaning between standards - for tools that need to analyze quantitatively with those - AS: question about transitional metadata - specifically omero metadata - there is useful information there now - what is the plan for it? - would link-ml be helpful during the transition? - JM: [issue 78](https://github.com/ome/ngff/issues/78) is closest someone's come - more hand and eyes would be helpful - not sure if link-ml would make it better - for things more than "this is an image" does it belong next to the image, or linked to elsehwere and fetched / loaded by the client - e.g. rendering settings - DB: visualization data should be stored somewhere, but not near next to the data - isn't nice when e.g. showing multiple volumes with conflicting rendering settings - WMV: generally agrees, but depends on the context - DB: store it, but store it somewhere else - another example - initial position of the viewer - NR: view settings - reference to images / volumes with view + render settings - there has been a collection proposal. that could be a place to store rendering setting - TP: have a 1 -> many relationships of image to viewer - WMB: agree, for large images need regions of interest - 1 -> many relationship fits well with that - WMB: try to prototype with link-ml - storage in custom files - NR: metadata at different levels - some need attaching to an image, some to group of images, ... inheritance. - JM: synchronization and writing of attributes - optimize metadata for some purpose archiving / readability - DB: consolidated metadata analogous to sharding of chunks - NR: zarr side doesn't care about semantics of metadata - ngff's job is to explain how group level metadata relates to its sub-groups / arrays. (Perhaps on a per-standard basis ⚠️) - details tricky, but defining inheritance could be useful generally - WMB: how to get vendors on board? - DB: has this historically worked? - CSDC: quarep has been a successful effort - taken hardware specs (revision of ome model with additions) - talk to camera manufacturers, objective manufacturers, etc - it has worked, the challenge is sustainability (readers/writers/tools) - NR: one image can have metadata fields linked to different schema - important for composition - vendor X schema for scope, ontology Y schema to describe sample ... - VS: thinking this meeting would be about the spec - specific issues and pr's on github - would be interested in that mtg - JM: options - (1) can plan a next community meeting - (2) on image sc or github can do a spontaneous meeting - (3) probably need all of the above - (4) need to get to the point where there are regular meetings - VS: should have meetings about extra "hot" issues and prs - could help organize - WMB: +1 to the above - NR: would like an overview of the initiatives - what exists and what state is it in: - metadata - tables - transformations - JM: https://github.com/ome/ngff/pull/177 ### zoom chat notes - DT: We want some info right there locally in the zarr directory and for other data it is totally fine for it to be a URL to something potentially remote. There is the archival aspect and the “I just need to open the image and make sense of it”. We also have computed images where there is no microscope but some processing provenance and maybe parent images etc. - DM: one other think to think about, at JAX we think a lot about Systems of Record and who should own the metadata about a thing. e.g., I wouldn't ever pack sample metadata into an image file if the sample is captured in another database - NH: I would also advocate for moving away from xml to a more modern web format, i.e. json/yaml ---- <details><summary><H1>MORNING SESSION</H1></summary> ## OME-NGFF community call (11:00 CET; 15 Mar 2023) ### "User registration" Session 1 | Name | Institute | Twitter Handle | GitHub Handle | Mastodon Handle | |------------ |---------------------- |---------------- |--------------- | --------------- | | Copy | and | paste | me | ... | | Adam Taylor | [Sage Bionetworks](https://sagebionetworks.org/)| [adamjtaylor](https://twitter.com/adamjtaylor) | [adamjtaylor](https://github.com/adamjtaylor) | | | Alex Henderson | University of Manchester | [AlexHenderson00](https://twitter.com/AlexHenderson00) | [AlexHenderson](https://github.com/AlexHenderson) | [@alexhenderson@fosstodon.org](https://fosstodon.org/@alexhenderson) | | Andreas Eisenbarth | EMBL Heidelberg | ... | aeisenbarth | ... | | Aybuke Kupcu Yoldas | EMBL-EBI | | AybukeKY | | | Benjamin Rombaut | VIB/UGent | @berombau | berombau | ... | | Bishoy Wadie | EMBL Heidelberg | ... | Bisho2122 | ... | | David Gault | OME | ... | dgault | ... | | Eric Perlman | | | perlman | @perlman@urbanists.social | | Guillaume Maucort | Bordeaux Imaging Center-FBI | FluoGui | | | Jean-Karim Heriche | EMBL | | jkh1 | | | Jean-Marie Burel | OME (Dundee) | ... | jburel | ... | | Joel Lüthi | Friedrich Miescher Institute | joel_luethi | jluethi | @joel_luethi@mstdn.social | | Josh Moore | German BioImaging | notjustmoore | joshmoore | @joshmoore@fediscience.org | | Ken Ho | Crick | DrKenHo | DrKenHo-crick | @kenho@fediscience.org | | Kevin Yamauchi | ETH Zurich | ky396 | kevinyamauchi | ... | | Martin Schorb |EMBL Heidelberg | ... | martinschorb | ... | | Matthew Hartley| EMBL-EBI | BioImageA | mrmh2 | | | Norman Rzepka | scalable minds | normanrz | normanrz | @normanrz@mastodon.social | | Susanne Kunis | University Osnabrueck, CellNanOs, NFDI4BIOIMAGE | | sukunis | | | Christian Schmidt | DKFZ Heidelberg, NFDI4BIOIMAGE | SchmChristian | SchmChris | | | Tatiana Woller | VIB/KU Leuven |... |... | ...| | Wouter-Michiel Vierdag | EMBL | WVierdag | melonora | | ---- ### Notes - Round of introductions - Some keywords: parameters, re-use, history of processing, FAIR, QC, couple of "OMERO"s, data management systems, BIA, _...feel free to add more..._ - upcoming zarr meetings: https://zarr.dev/zeps/meetings/ - v0.5 for NGFF (tables and transforms specs) - specs are nearing final call for comments. we should merge soon so that implementations can start and we can test/revise. - there are some outstanding comments on the PRs which indicate we will likely have to revisit the spec after we have tested "in the wild". in general, the NGFF specs are still at a stage where revision after the initial merge is made will be necessary. - specs are already being used by several communities - we will have to see how this goes and potentially revise/iterate - question (Sebastien Besson): when will NGFF transition to zarr v3? - Josh M.: timeline unclear - however, can be accelerated if community members are can to do the implementation work. it seems unlikely that v3 will make big changes to NGFF metadata - Norman R.: agree timeline unclear, but zarr v3 unlikely to have huge implications on NGFF metadata. can help with transitioning NGFF. - Metadata presentation by Wouter-Michiel V. - goal for today: get feedback from community about metadata requirements - categories of metadata: - pre-acquisition - acquisition - analysis and viewing - additional categories - MH: meta-metadata. Things added post-publication - TW (chat): image analysis, https://arxiv.org/ftp/arxiv/papers/2302/2302.07005.pdf - linkml models - https://www.ghga.de/resources/metadata-model - https://github.com/ncihtan/data-models/ - comments: - JNI: lots of required fields in specs mentioned. I think we should aim to have the smallest subset of required fields as possible to reduce burden. how are you thinking about decidiing on required metadata - W-MV: perhaps we have a parent minimum spec in NGFF and then other specs build on this - AT: +1 for finding the minimum subset - AT: also think about the interoperability with metadata living elsewhere - JNI summarising his own point: I think OME-NGFF should say: "**if** metadata field X is present, it should look like so." It should not mandate metadata beyond (as Alex H put it) information about how to interpret the binary payload as an array of numbers. - AT: using LinkML internally for a number of our data models at Sage (but not HTNA which is currently is in JSON-LD) and expect to exapnd our use of it. (https://github.com/ncihtan/data-models/) - Josh: see also https://www.ghga.de/resources/metadata-model - RH: Do we want to keep linked metadata with the file or somewhere else? - RH: ontology is more linkable - J-MB: copying sample prep protocol into image metadata would be a large burden on the data generator, so it would be nice if protocols could be linked to other resources (e.g., protocols.io) - J-KH: need to move to more constrained metadata - AH: RDFShapes? Rather than having "one standard" keep it re-usable / flexible / comparable - AH: RDF Data Shapes (SHACL/ShEx) allow for validation against anything you want, for example standard XYZ, or a given usecase - AH: Someone mentioned RO-Crate [https://www.researchobject.org/ro-crate/](https://www.researchobject.org/ro-crate/), the only currently available implementation of a FAIR Digital Object (FDO), according to the FDOForum. - AKY: we have REMBI to guide minimum metadata. we must have at least have enough information to know what the image is - RH: sample > channel metadata - WMV: metadata loaded on the first GET of an image (latency, performance) - KH: worried about the practicality of separating things - AE: fine-grained storage can lead to data duplication, whereas when joining datasets with top-level metadata, you have to assign it to the respective data items (duplication) - WMV: general topic of modularization (see Zoom chat) - Alex: vote for all in one place (top-level) with unique identifiers. and points to a channel slice. - Question of where your entry point is. Are you accessing a channel? Or are you accessing the file to tell you about the channels? - JKH: don't want to have to go through all the subdivisions to find something. index where I can search for relevant metadata. e.g. which channel contains GFP - JKH: conversation keeps going between what metadata and how to represent it. Split those conversations. Perhaps each includes a definition of what they follow. - JL: +1 for - AH: if you have all metadata colocated in the file, then you can extract/harvest it easily for use in data catalogues, without having to track through the files looking at each channel - JK: "chunked metadata" ? - JM: need to be chooseable, computable, etc. - JL: storing _some_ representation all at the top. - AT: DICOM profiles (see Zoom chat) - AH: time frame issue. In imaging mass spec, infrared, Raman etc., no community standard is available. Those communities often use MATLAB, so right now **HDF5** is only option. Would be good to have the same basic metadata, independent of the binary payload format. See Annotated Data as an example [https://anndata.readthedocs.io/en/latest/fileformat-prose.html](https://anndata.readthedocs.io/en/latest/fileformat-prose.html) - what's the version 1? (and the vision) - WMV: outlook - linkml seem to agree - MH: REMBI - what is REMBI? - set of guidelines - specific (pretty minimal) set of REMBI-compatible models (MITI is likely one) - from an archive: need to be able to take data and check that it matches the model (independent of _how_ it's stored) - agreement on shared ontologies (across models) would also really help - JKH: next steps? - use OME metadata model in some form (e.g., linked data) - WMV: e.g. create common model that's common across different existing models - "universal" maintaining of the models together - JKH: good to look at what everyone is doing - Getting started - Who: WMV/Sourab (Miti), MH/AKY (REMBI), AT (HTAN/MITI - Meshing with minimal & generalist standards), JM (OME/IDR), ??? (SSBD), RH (NL), JM/SK (DE) - AT: voices who are missing from the room at the moment - Vendors: JM ELMI meeting will have a community room (AT: Rarecyte & Nanostring as well) - Repositories (e.g., NCI DC nodes): upcoming supplement </details>