NGFF Community Call 2020-10-29

--- tags: NGFF, community-call --- See: [Connection information](https://forum.image.sc/t/connection-information-for-next-gen-call-on-oct-29th/44210) and [recording](https://downloads.openmicroscopy.org/presentations/2020/community-call-2020-10-29/]) Please paste this into the Zoom chat as new people join: :::warning Welcome to the community call. Please be aware that this session may be recorded. Live notes for the session are available in https://hackmd.io/_sftykiGR9mSyUan3l1WmA?edit Where possible, help to structure the notes for later publication rather than commenting in Zoom's chat. Thanks! ::: # NGFF Community Call 2020-10-29 This document is a place where you can help drive what needs discussing on the 29th. Add your thoughts, needs, etc. or even new sections if need be. If there's an idea already in place that you like, give it a :thumbsup: If you are unclear about this document, **just add a question here** and someone will tidy it up or get in touch: * no problems yet? Excellent! :surfer: ## "User registration" | Name | Institute | Twitter Handle | GitHub Handle | |------------ |---------------------- |---------------- |--------------- | | Copy | and | paste | me | | Josh Moore | University of Dundee | notjustmoore | joshmoore | | Juan Nunez-Iglesias | Monash University | jnuneziglesias | jni | | Jean-Karim Heriche | EMBL | | jkh1 | | Eric A Perlman | self (working with JAX) | perlman | perlman | | Guillaume Gay | Aix-Marseille University | morpholg | glyg | | Anatole Chessel | Ecole polytechnique | AnatoleChessel | ac744 | | Will Moore | University of Dundee | will_j_moore | will-moore | | Mark Kittisopikul | Janelia / HHMI | markkitti | mkitti | | Jean-Marie Burel| University of Dundee | | jburel | | Simon Li | University of Dundee | | manics | | Sebastien Besson | University of Dundee | | sbesson | | Dominik Lindner | University of Dundee | | dominikl| | Petr Walczysko | University of Dundee | | pwalczysko | | Bill Katz | Janelia / HHMI | wtkatz | DocSavage | | Caterina Strambio De Castillia | UMass Medical School | StrambioLab | strambc | | Susanne Kunis | University of Osnabrueck | | sukunis | | Robert Haase | MPI CBG | haesleinhuepf | haesleinhuepf | | David Miguel Susano Pinto | University of Oxford | CarandraugNet | carandraug | | Nicolas Chiaruttini | EPFL | nKiaru | NicoKiaru | | Christian Tischer | EMBL | tischitischer | tischi | | Raphael Maree | University of Liege | @cytomine_uliege | cytomine_uliege | | Laurent Guerard | University of Basel | lguerard42 | lguerard | | Jason Swedlow | University of Dundee | @jrswedlow | jrswedlow | | Stephan Wagner-Conrad | Carl Zeiss Microscopy GmbH | @StephanWagnerC1 | swg08 | | Dave Mellert | The Jackson Laboratory | @DaveMellert | mellertd | | Erin Diel | Glencoe Software | @dielwithit | erindiel | | Egor Panfilov | University of Oulu | soupaulte | soupault | | Melissa Linkert | Glencoe Software | | melissalinkert | | Davis Bennett | HHMI/Janelia | @d-v-b | d-v-b | | Nicholas Sofroniew | Chan Zuckerberg Initiative | @sofroniewn | @sofroniewn | | John Bogovic | Janelia | @BogovicJohn | @bogovicj | | Ulrike Boehm | Janelia Research Campus | @ulike_boehm | UlrikeBoehm | | Damir Sudar | Quantitative Imaging Systems & OHSU | NA | @dsudar | | Blair Rossetti | Janelia Research Campus | | @brossetti | | Eric Wait | Janelia Research Campus | | @ericwait | | Ola Tarkowska | Sanger Institute | @olatarkowska | @olatarkowska | | Trevor Manz | Harvard Medical School | @trevmanz | @manzt | ## Lightning talks | Name | Video | Length | Size | Materials | |---|---|---|---|---|---| | Josh Moore | [NGFF Timeline Recap][1] | 6:56 | 12M | [hackmd](https://hackmd.io/4hKVSVVAQW-I9O3qLyh6lw?view) | Will Moore | [OME-Zarr Plate Clients][2] | 5:43 | 11M | | Chris Allan | [OMERO Plus, PathViewer and NGFF][3] | 4:16 | 30M | | Trevor Manz | [NGFF & Viv][4] ([youtube][5]) | 5:20 | 43M | ### Repository Listing List any repositories or other resources that you think someone on the call might be interested in finding out more about. Please include a one-line summary. - https://github.com/ome/ome-zarr-py - https://github.com/ome/omero-cli-zarr : OMERO command-line plugin to export images from OMERO as zarr files - https://github.com/ome/omero-ms-zarr : An OMERO.server microservice that serves OME.zarr images and metadata. Also contains the OME.zarr specification - https://github.com/hms-dbmi/vizarr - https://github.com/xtensor-stack/xtensor-zarr - N5 - https://github.com/saalfeldlab/n5 - https://github.com/saalfeldlab/n5-zarr - https://github.com/saalfeldlab/n5-aws-s3 - https://github.com/saalfeldlab/n5-google-cloud - https://github.com/saalfeldlab/n5-viewer - https://github.com/saalfeldlab/n5-ij - Test data from Will's video: [idr0002], [idr0002_no_T], [idr0033] (see links at bottom) - Test data from Trevor's video: [idr0062] (Additionally [idr0053] at 922k x 381k) - S3 endpoint (not for viewing): https://s3.embassy.ebi.ac.uk/idr/share/community-call-2020-10-29 - Examples of analysis notebooks using idr data/zarr/dask - https://github.com/ome/omero-guide-cellprofiler - https://github.com/ome/omero-guide-python - https://github.com/ome/omero-guide-fiji (imagej-python) - https://github.com/ome/omero-guide-ilastik - http://dvid.io : Distributed Versioned Image-oriented Dataservice - https://github.com/mobie/mobie : MoBIE is a framework for sharing and exploring large multi-modal image datasets - https://downloads.openmicroscopy.org/presentations/2020/Dundee/Workshops/NGFF/ : OME2020 community meeting workshop on NGFF. Zarr example: https://github.com/ome/omero-guide-python/blob/master/notebooks/zarr-public-s3-multiscale.ipynb [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ome/omero-guide-python/master?filepath=notebooks%2Fzarr-public-s3-multiscale.ipynb) - https://github.com/google/tensorstore/ : C++ library with python wrappers for reading & writing "neuroglancer percomputed" format, N5 and Zarr. - https://github.com/google/neuroglancer/ : Web-based viewer for n-Dimensional data, including support for n5 & Zarr. Demo links in the README. - Zarr implementations - https://github.com/xtensor-stack/xtensor-zarr - **Add yours here** ## Preliminary questions ### What are you working on? - JKH: Cloud deployment of image processing/analysis workflows - EAP: Cloud/object store workflows for image registration & visualization - BK: Image-oriented data engine / service that supports branched versioning, currently tailored for Connectomics - DMSP: microscope image acquisition software (Python's [microscope](https://python-microscope.org/) and [cockpit](https://github.com/MicronOxford/cockpit/)) - DJM: Hybrid HPC/Cloud image processing and analysis. - JAB: n5, metadata, converters - **Add something here** ### What are you needing help with? - **@jni**: we'd love a way to indicate scale and possibly position in ome-zarr — would have immediate use for this. - **@eap**: strategies for accessing image data associated with polygon ROIs at multiple scales - **@glyg**: Interface with neuro data (NWB), strategy for STORM data - **@ac**: Storage for 3d ROIs + associated data - **@tischi**: Read and write ome.zarr from Java - Work to add OME.zarr to BioFormats is on-going https://github.com/ome/bioformats/pull/3554 - **@tischi**: Add tabular data into ome.zarr - DJM: One format to rule them all - we want to follow community standards for how we store image data in the cloud - **Add something here** ### What are you looking to help with? - Supporting all language implementations - Python: - Java: Mark Kittisopikul - Javascript: - Julia: Mark Kittisopikul - C++: Damir Sudar (need it), Blair Rossetti, Mark Kittisopikul - ... - 100% object store workflows (_including_ format conversion) - @eap - **Add something here** ---- ## Session 2 See also the notes from https://hackmd.io/_sftykiGR9mSyUan3l1WmA?view#Session-1 (Please don't edit or delete them.) ### Introductions [<45m] - Josh: general hello (recording? hackmd) - Everyone: introduce yourself in 1 minute - General questions about the meeting ### Topics [60m] *Please add your name under all topics you are interested in, and add any specific issues you'd like to discuss:* - Questions about the videos (add your questions) - Interest: - Copying data (slow to copy 1 million small files) - Also: APIs versus fileformats - Interest: Mark Kittisopikul, John Bogovic, @d-v-b, @ericwait - Mark: - d-v-b: overhead on each file. nested format. without atomic writes, can have larger files (neuroglancer). easier to copy. need to learn new tricks. - Eric: Trevor has done this with a number of formats. - Bill: chunk-per-file (sharded vs. unshared) plus mutable versus immutable impact in terms of sharability. leads to API versus file formats. strictly file format limits options. - Blair: keeping users in mind. not just a coding solution. - Jamie: (mass spec background) formats designed on dumping to disk ASAP. valid role for YAFF. zarr is good for sharing. - John: using education to avoid copying. and there's a regime where this (chunked) data shouldn't be used. - Ilan: for HuBMAP can't just "download" zarr in the browser. - Josh : if the specs are clear, we can know whether we have 'retrieved' all the data from a proprietery file format (nothing gets lost in translation ?) -> "isomorphism" - Caterina: how do we get there? (With vendors keep wanting to create their file formats) - Ulrike: vendors should be invited (incl zarr & n5) - d-v-b: do the vendors want to do it - Josh ... - Bill: seems to be mostly about dense data. like how tileDB differentiates between the format and the sparse data. Difficult to standardize on a file format since it doesn't support all the use cases. What's the minimal use cases that we need to support and what are the candidate formats? Currently adding branched versioning to Nico. - Ola: use cases are critical. optimization needs to happen at the process level. different solutions. Bill: neuroglancer has a compression for segmentation. Denormalization of the data for rapid access. cf. GPU & direct storage - Nico: on political/vendor side, imaging facility need to have common workflow for all the vendors. In code for tender you can specify Bio-Formats must be supported. - JRS: interest from vendors. Zeiss, Nikon, Leica, Olympus. "This will be a long process". There won't be just one format. Funders are asking how a new file format will solve interoperability ?! Need to work on how we're presenting this to institutions, funders, etc. Too much to support "random formats under an API" - Bill: Initial OpenML blog post about ideal format for ML datasets. Response by founder of tiledb as ideal format for ML. Focus on data engines (library) Josh: would :heart: to get into a deeper discussion, but I fear that there will *always* need to be a thing that decides between multiple stores/engines/formats/... - Damir: common thing in the metadata layer then we get to a common API/format. (And there shouldn't be multiple versions of those) - Multiple APIs - One for dense images (multiscale, multidimensional) - Log API would be different. - I'd call this "metadata", or maybe out-of-scope (logging of what?) John B - What's the minimal set of APIs that we need for science. - Ola: data lifecycle? - Bill: 3D annotations, points are sparse with properties (labels) scale of 100M. Different data type for dense. - Davis: don't think that industry hasn't need to deal with sharing & investigating large volumes - Mark: many of the APIs are bound to specific programming languages, need a service model (like OMERO). In talking to manufacturers, there's a willingness to support users if they can be provided an interface. Modular ecosystem. - Trevor: NB- back to zarr as an API with a small microservice translating chunks to a URL. There are possibilities. i.e. mirroring a native format. - Pyramids / multiscale - Interest: - John Bogovic - @d-v-b - Damir Sudar - Nicolas Chiaruttini - Davis: excited about the spec. Thinking about getting Jeremy to support Saalfeld's. - Eric: definitely aware. Requires that the channel dimensions to be in the same chunk. Bad for RGB. - Only Jeremy can fix neuroglancer to support - John: also excited. Hope we agree. - Josh: - https://forum.image.sc/t/multiscale-arrays-v0-1/37930 - https://github.com/zarr-developers/zarr-specs/issues/50 - Can someone kick off the conversation again? Get it done in 2020? - Nico: reduction with averaging, (Eric: max and majority) **metadata** - Useful for 3D representation - Jamie: general practice to mirror existing pyramids? Is that recorded? - Documenting downsampling ... upsampling with with ML - Davis: worry about arbitrary functions that can be used. multiscale should be able to stand on their own. - Metadata - Interest: - Nicolas Chiaruttini ( Image transformation - affine and spline / warped ) - Don't want to resave on transforms (BigWarp). incl time & space - Standard spline representation? - Davis: space of possible image transformation, what fraction will support them? Medieval map with oceans of dragons (for anything beyond linear) 99% case? - John Bogovic - No one should be responsible, but the format should be flexible enough that anyone can include their own metadata. Someone will leave to write **YAFF** - Ulrike: need proper documentation and a way to find your way around. - Damir Sudar - pre-defined managed metadata fields that are well-defined, but the possibility of extension. "affine-transform-store" or some weirdo spline representation. - Jamie: binary flag that says "this is raw data" and everything else is interpolated. (**metadata**). What is what's been done and what is a lab notebook to capture everything that has been done? - Josh: short **JSON-LD** explanation, need to discuss it more. - Caterina: 4D Nucleome/BINA/Quarep-LiMi / Microscope hard-ware, image acquisition settings and Quality Control metadata -->how to repesent it? - https://forum.image.sc/t/metadata-for-ome-next-generation-file-format-ngff/44373 - Bill: URL as a service... - Compression - Interest: Mark Kittisopikul - Bill: if there are things that can be customized - "Cloud" - Interest: - John Bogovic - @d-v-b - Topic: - @d-v-b: Chunk fusion for reducing file counts - ROI representations - Interest: ### Next steps (for those who are interested) [15m] - Josh: - Ola: Lots of knowledge shared. Willing to contribute to specification? MUST, SHOULD, COULD, WOULD - I like this because it gives an indication what a *tool* / library MUST have - important when someone inevitably writes a say, a julia ngff library, though this gets complicated - John B - Josh: a starting point will be the github repository (or similar) will - Ola: discussing prioritization of what is in the API - Publishing notes, recordings, etc.? - Time - Topic - Videos - Communication (image.sc group?) ### Post-discussion - Ulrike: great to have the reposiory paper. Thanks Jason. Janelia image database is still unsure how much data they want to put into the repository. Only papers? Everything? People that produce the data should be part of these discussions. JRS: early days... (**end recording**) - Eric: nice to see ... have hope ... that we can start to re-use more leading to the format being useful. - Bill: good to get the context. Used to dealing with curated data in a database way. Have been thinking about all the pieces of data (e.g. neuroglancer pre-computed). Jeremy is trying to add transactions to tensorstore. - Josh: eventual consistency *a la* iRODS. - Bill: as a community what is the best base layer? Across microscopy (immutable with tight storage requirements) to curated/transaction-oriented work. - Josh: base layer of sparse? - Bill: would argue dense+sparse. Covers enough of the science. - Maybe logs. Josh: would also add the search index. - (They're making money from the authorization) - Josh: happy to delegate that from OMERO - Mark: price point isn't there for storage yet, but people are talking about it - Bill: like the embedded data engine (like tiledb or other key-value stores). Different characteristics from the cloud. Good to have a buffer between reading the data. Trevor: agreed, have been trying to make things look like a zarr. But have transform layers since all clients can decompress everything. So yes, the buffer. - Bill: also separate mutable from immutable ## Session 1 ### Introductions [<45m] - Josh: general hello (recording? hackmd) - no objection to recording raised by anyone - Eric: potentially include 5 min of off-recording - Using hackmd as the place to add topics, questions - Everyone: introduce yourself in 1 minute - Christian: what is "NGFF"? - General questions about the meeting ### Topics [60m] *Please add your name under all topics you are interested in, and add any specific issues you'd like to discuss:* - Questions about the videos (add your questions) - ROI representations - Interest: @ac, @bk, djm, @eap, jkh1, @lguerard, @tischi - Josh: status of label masked - Emil: working on implementations and conversions from WKT/WKB - Juan: metadata? In scope. - Eric: polygons. Two worlds: export or OMERO. - Eric: separating labels from pixel data? Josh: remote links are coming (relative!) - Anatole: supporting other formats between WKT/WKB? supporting data that lives on the mesh. Emil: only vertices, thinking about how to link those formats with the features. Josh/Seb: examples? candidates? Neuronal traces use SWC - https://github.com/napari/napari/issues/693 - Christian: does metadata incl. name, features, classes, values? - @tischi: maybe require/recommend that "the table", in addition, to the label_index also has anchor_point coordinates of the image segment (this allows to quickly locate the region in the image). - Simpler primitives? circles, rectangles,... J-k: anchor points. - David: people don't need circles. Mostly comes from a UI drawing expectation - Anatole: polygons don't include continuous objects e.g. splines - Summary - no objection on proposed label spec - need to start thinking about other types and link to metadata - "Cloud" - Also: "Reading image data: file vs object store, granularity" - @bk: (actually file formats vs embedded data engine/API) - Interest: @bk @haesleinhuepf djm jkh1 @eap @tischi @carandraug (and writing) - J-K: from EOSC life perspective, data would be publicly accessible with access to compute centers. Moving away from local workstations model - Simon: key requirements from the data format perspective? - Issue of tooling - Josh: several terabytes of data converted to the format (links below) - J-M: example https://github.com/ome/omero-guide-python/blob/master/notebooks/zarr-public-s3-segmentation-parallel.ipynb [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ome/omero-guide-python/master?filepath=notebooks%2Fzarr-public-s3-segmentation-parallel.ipynb) - What exists today should meet most of the requirements and can be tested. Main issue is that this is not the final format yet - Zarr V3 draft spec: https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html - Guillaume: workflow for data publication. Moving data from institutional server to federated server - Other data types - Interest: @bk @glyg @carandraug - @glyg How are non-image data (eg storm (x, y, z, I)) stored? - JK: sufficient to store a table of data - Seb: re-use vertices? - x y z frame + features + precision location - Will: migrating from 5D to nd, sooner rather than later - JK: microscopy/physics vs. sample metadata - HCS example (walking through folder) - Juan: trying to make metadata as optional as possible. - Will: e.g. needing to list S3 - Seb: subset of "generic layout" - Compression - swg08 (lossless) - have been looking at advanced algorithms lz4 and zstd to understand good rates (ratio between compression ratio and compression/decompression speed) - https://zarr.readthedocs.io/en/v2.5.0/tutorial.html?highlight=lz4#compressors - https://www.biorxiv.org/content/10.1101/164624v1 - Juan: scale/origin - **Items to be opened as issues on an ome-zarr repository** - Copying data - @tischi It is slow to copy 1 million small files, is there anything we can do about this? - Simon: This depends on chunk-size. Smaller chunks are better for interactive visualisation or analysis where you're interested in a subset of an image since it reduces the amount of data you have to fetch, but it's also valid to have e.g. have 1 chunk per image - Yes, I know, but one may want to have it all (which may not be possible): smooth interactive visualisation, efficient computing and efficient copying - OK! We've previously discussed uploading multiple chunk sizes for the same image. - interesting idea, more data though, of course. - Yes, we may need it anyway to optimise for visualisation, e.g. separate chunkings for x-y, y-z, x-z. - Using HTTP range requests could be another option- use large Zarr chunks but client requests part of the chunk - @glyg (covered in the cloud section) - Metadata - @carandraug (where does it go on zarr and libraries to create them) - @joshmoore: do you mean e.g. ome-zarr-py? - @carandraug: documentation suggests that is only to read zarr files - not really support, but PR to extend ome-zarr-py instead of writing your own code, and ome-zarr-py has some code to handle metadata ### Next steps (for those who are interested) [15m] - Publishing notes, recordings, etc.? - No objection raised during the meeting - Time - monthly meetings? - J-K: depends on involvment. For people generally interested, might be too frequent. Useful for people getting hands dirty - Juan: break at 1 hour - Topic - Josh: from OME side, expecting HCS spec + some Java work by end of this month in preparation of I2K - Videos - Jason: clickable links along with the videos - Communication - Review of the invitation process (survivor bias?) - Groups/lists necessary? Slack? Etc. - J-K: the me too's are a bit annoying - JRS: EuBI/National bioimaging people but different crew - Discourse group: Yes - JRS: plea for videos to broadcast (even if just 1-2 slides) - Juan: reminders at: a month ago, a week ago, and a day ago, and today, and nothing else ---- [idr0002]: https://mystifying-lalande-e12142.netlify.app/?source=https://s3.embassy.ebi.ac.uk/idr/share/community-call-2020-10-29/idr0002-heriche-condensation/plate1_1_013/422.zarr [idr0002_no_T]: https://mystifying-lalande-e12142.netlify.app/?source=https://s3.embassy.ebi.ac.uk/idr/share/community-call-2020-10-29/idr0002-heriche-condensation/plate1_1_013/422_no_T.zarr [idr0033]: https://mystifying-lalande-e12142.netlify.app/?source=https://s3.embassy.ebi.ac.uk/idr/share/community-call-2020-10-29/idr0033-rohban-pathways/41744_illum_corrected/5966.zarr/ [idr0053]: https://hms-dbmi.github.io/vizarr?source=https%3A%2F%2Fs3.embassy.ebi.ac.uk%2Fidr%2Fzarr%2Fv0.1%2F4495402.zarr [idr0062]: https://hms-dbmi.github.io/vizarr?source=https%3A%2F%2Fs3.embassy.ebi.ac.uk%2Fidr%2Fzarr%2Fv0.1%2F6001240.zarr [1]: https://downloads.openmicroscopy.org/presentations/2020/community-call-2020-10-29/ngff-timeline-recap.mp4 [2]: https://downloads.openmicroscopy.org/presentations/2020/community-call-2020-10-29/ome-zarr-plate-clients.mp4 [3]: https://downloads.openmicroscopy.org/presentations/2020/community-call-2020-10-29/OMERO%20Plus,%20PathViewer%20and%20NGFF%202020-10-27.mp4 [4]: https://downloads.openmicroscopy.org/presentations/2020/community-call-2020-10-29/ome-talk-ngff-viv.mp4 [5]: https://www.youtube.com/watch?v=OeRyMVtzSag

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.