IDR-NGFF 2020-W44

--- tags: NGFF --- # IDR-NGFF 2020-W44 ## 2020-10-30 ### Feedback from the calls? - JMB: people want to move forward. Need to reach out to Java community. - Seb: C/C++ also missing (following Java) - JMB: showing performance differences there will be difficult. (No bindings) Metrics in Java, Python is easier. - JRS: WIP but lot of constituencies. Support & attention. Need to make clear our long-term intentions. - Seb: Roadmap incl. the most active players - JRS: writ large, how will I use it, how do I encorporate this into my institution? e.g. Nico EPFL -- how do *others* write grants based on this? - Petr: where is OMERO in this picture? Josh: moving to OMERO 6 (support reading files in the cloud) - Does someone must use OMERO? - Josh: Not in the case of the prototype i.e. one can create, upload visualize a dataset - Probably comes into play when having many datasets. Will need some database for querying! - JRS: numerically OME = "OME-TIFF". (Mostly deal with that through Bio-Formats) But our focus is largely on OMERO ... since the other is "done". But cf. the programming languages. - J-m: marketing slide? About what happens to your original data. - Seb: build on DMP that everyone is writing (i.e. text to replace Bio-Formats and IDR as the solution) ### Clients - J-m: could imagine figure of the IDR data in S3 (Seb: OME.figure. Josh: IDR.figure!) - Seb: Spec decision re: new changes like plate name. Need a process. Open until the end of the month or separate milestones? - Decision: keep open (at the level of omero-ms-zarr/spec) + no bump of IDR layout ### Datasets - ### Formats - ### Infrastructure - ## 2020-10-29 ### Clients - J-M: - CP notebook using new plate layout ready (idr0002). Data retrieve from s3.embassy - metadata read from IDR - TODO: Tag repo when PR is merged - next: working on idr0033 - Will - handling empty wells - definitive list of datasets? - Josh: see in https://hackmd.io/_sftykiGR9mSyUan3l1WmA ### Community call * Will: videos * Will: where are the pages? - original page: https://forum.image.sc/t/upcoming-calls-on-next-gen-bioimaging-data-tools-starting-oct-29/43489 - private post: https://forum.image.sc/t/connection-information-for-next-gen-call-on-oct-29th/44210/20 - agenda/notes: https://hackmd.io/_sftykiGR9mSyUan3l1WmA * Process * if someone asks, add them to the private post which links to the Markdown etc * Mention videos are available to watch * Notifications post-morten - JRS: don't think it works because it's too sensitive to personal settings * Josh: breakouts? - social breakout only - 3-5 people - needs leader/topic/etc. to get something done. ### Datasets & Formats - Seb: omero-cli-zarr releases up to 0.0.5 (Spec more or less timestamped) - Available datasets - idr0033 - idr0002 (whole timeseries + time 0 only) - idr0004 - Seb: https://github.com/ome/omero-ms-zarr/pull/75 - captures current implementation of the hierarchy + metadata - next priorities (November?) - multiple acquisitions (incl. sparseness) - well vs column vs row spec - metadata distribution/redundancy - spatial context (multiple fields of view) - more suggestions from community e.g. label names - Simon: encourage people to open PRs against the spec ### AOB - Simon: Nada on infrastructure - Will testing napari 0.4.0 - Dom/David: good ## 2020-10-28 ### Clients - Josh: See demo conversation - Will: sent video - Will: looking at omero-cli-zarr download - Simon: using download code vs S3 client? - Josh: if something is not listable, downloader code will be useful - Simon: what if half of your chunks are empty. Getting lots of 404/403 - Driven by Australian use case --> (use awscli) - https://github.com/ome/omero-ms-zarr/issues/74#issuecomment-717212598 ### Datasets - Ordered datasets - idr0033 (complete) - idr0002 (complete) - Seb: bisected issues associated with the idr0033 conversion. - ScreenReader file leak: will affect any IDR studies using .screen files. OOM due to the rendering metadata addition: will affect the conversion of any large (>100) number of images - Will opened PR closing resources. Review looks good and looking for a tag of omero-cli-zarr. - J-m: it needs to be clear by people using (Python) code. Have something in Java code. - Seb: throttle number of servants/. Simon: set low limit on merge-ci? - J-m: also need bold warning, "YOU MUST CLOSE THIS" - All the above :+1: - Targets - Today: production datasets export for Thursday? - Next: ScreenReader (to not bring IDR down) ### Formats - Seb: working on omero-ms-zarr specification PR for tomorrow - J-M: plates formats on minio-dev. - adjusting cellprofiler notebook. need plate id. - Josh: currently paths are opaque and paths can't be discovered - Will: have multiple versions of each. Simon: even multiple S3 servers - Seb: will need a registry - Josh: Propose s3.embassy.ebi.ac.uk/idr/v0.1/idr/share/20201029 - J-M: adjusting https://github.com/ome/omero-guide-cellprofiler/blob/master/notebooks/idr0002_zarr.ipynb - Simon: Trying to update conda-bioformats2raw - see issues. it's messy. - https://github.com/ome/conda-bioformats2raw/issues/3 - https://github.com/glencoesoftware/bioformats2raw/issues/62#issuecomment-717808003 - David: making progress, don't need anything. Hopefully a first draft this week. ### Infrastructure - Simon: Fighting with molecule/travis on the ome-zarr-dev1 PRs ### Misc - J-m: focusing on getting notebook working. Video? (Simon: link to notebook is more powerful) - Josh: https://github.com/orgs/ome/projects/13 - Petr: training/NGFF -- horrific misunderstanding? - Josh: same structure as I2K. Tischi involved in NGFF workflow, proposed to redo that for GBI ## 2020-10-27 ### Infrastructure - Simon: vm (ome-zarr-dev1) is present with docker. Mounted files as v4 (hopefully that's ok) - ome-zarr-dev1.openmicroscopy.org - single place to do all conversions (rather than needing the devspaces) - Seb: separate docker partion (since they can grow quickly)? No. Dev only. ### Datasets - Seb: omero-cli-zarr bringing down IDR. idr0033 is leaking. Prioritizing that. - Josh: move to idr-testing? (Seb: readwrite would also work) - Simon: try not to paste idr-next/idr-testing into public comments - Will: not currently trying to convert at the moment. - Seb: to retest workflow against idr-testing (Will: see "NGFF Workflow") trying new VM ### Formats - Josh: one possible use/context for ZarrReader is I2K at the end of November ### Clients - Will: comment from napari guys re: performance issues possibly from dask.concatenate - dask.map_blocks is supposedly better. Trying to use that. Not grok'd yet. - Then working on video. - Latest omero-cli-zarr spec isn't documented - Seb: can open a PR against omero-ms-zarr. Slightly split world since we have HCS and non-HCS data. Can prioritize that tomorrow. - Current image spec: https://github.com/ome/omero-ms-zarr/blob/master/spec.md - Seb: release omero-cli-zarr? - Will: have one PR open to write metadata first, but otherwise would be good to have it released for Thursday. (Seb: also a versioning PR from Simon. Release as is, and then list PRs tomorrow. ## 2020-10-26 ### Paper - Jason: thinking about a 300-600 word letter, "Commentary" in Nature Methods on the NGFF work starting with the conversions. - Less political and just addressing the technical reasons for why we're doing it. Enabling object stores. Data that's not otherwise manageable. Etc. - Often useful to show that something is a quantifiable improvement (faster, cheaper, etc.) - Accessing KLB/lightsheet off of EBI S3 - Worth a discussion about what the comparison would be. - Text to start appearing. - Simon: links to youtube videos in commentary? JRS: in principle yes, but academic elitism says things should be DOI'd. - Josh: doing well to defend against possible arguments. - JMB: depends how far we want to go. Showing all possibilities and what it enables requires several metrics. Plates in napari, segmentation on KLB in ..., etc. Too much for a commentary? JRS: perhaps less is more, but can include in supplementary. ### Clients - Will: Demo on 29th (3 days) - what are we going to show? - what needs doing? - little way off yet. what do we want to show? - currently we have one idr0002 plate up (and truncated version) and idr0033 needs to be regenerated. - Josh: could see doing a 3 spec review. Will to do a viewer video? (If can skip the initial loading time) - J-m: also points to the practical aspects of bare minimum to load. - Will: do we need to make do with the spec or update the metadata? - Josh: I assume that we need more metadata - Simon: need to figure out where the slow down is coming from - Will: don't know what's moving down the wire - napari: - See https://github.com/ome/ome-zarr-py - `napari https://minio-dev.openmicroscopy.org/idr/idr0002-heriche-condensation/plate1_1_013/422.zarr` - loads slowly - nearly 2 mins from hitting Enter till plate loads - Same for `napari https://minio-dev.openmicroscopy.org/idr/idr0002-heriche-condensation/plate1_1_013/422_no_T/422.zarr` (no Time-lapse) - Need a different strategy? - What's the file chunk-size? - For 'No_T' set: (1, 2, 1, 64, 84) - With T (329, 2, 1, 64, 84) ``` $ du -hsc 422_no_T/422.zarr/0/A/* 3.9M 422_no_T/422.zarr/0/A/1 3.9M 422_no_T/422.zarr/0/A/10 3.7M 422_no_T/422.zarr/0/A/11 3.8M 422_no_T/422.zarr/0/A/12 3.9M 422_no_T/422.zarr/0/A/2 3.8M 422_no_T/422.zarr/0/A/3 3.8M 422_no_T/422.zarr/0/A/4 3.8M 422_no_T/422.zarr/0/A/5 4.0M 422_no_T/422.zarr/0/A/6 4.2M 422_no_T/422.zarr/0/A/7 3.9M 422_no_T/422.zarr/0/A/8 4.2M 422_no_T/422.zarr/0/A/9 47M total ``` - https://forum.image.sc/t/connection-information-for-next-gen-call-on-oct-29th/44210/20 - Basically, how to set up a break out that you would potentially want to have. ### Datasets - idr0033 needs pyramids. (Dom) - Will: rsync'ing onto idr0-slot3 took many hours - Simon: long-term need to think about the number of files ### Formats - Everyone else is good (time for IDR) ### Infrastructure - Tabled - mounting minio's objectstore on idr1-slot2 - Setting up `ome-zarr-dev1.openmicroscopy.org`, problem with Docker installation at the moment - Short hostname? ---- ## Template ### Clients - ### Datasets - ### Formats - ### Infrastructure -