--- tags: zarr, b-open, meeting, notes --- # Zarr/B-Open Bi-weeklies Zoom link: https://www.google.com/url?q=https://openmicroscopy-org.zoom.us/j/85829696160?pwd%3DVXZuVzZjclJxUDRyQTB4bXFVaWdpdz09%26from%3Daddon ## 2022-08-03 Attending: JM, MA, AB - any projects? - MA: maybe sparse data. None now but eventually. - MA: https://github.com/pydata/sparse/issues/222 - https://github.com/zarr-developers/zarr-python/issues/424 - why not? - consensus & maintenance burden - https://github.com/pydata/xarray/issues/3213 - AB: open some issues to summarize what's been done - define needs - review in September and then move to something concrete - JM: maybe close the old issues - netcdf? - jbms user was writing up proposal - getting that - NB: https://github.com/zarr-developers/zarr-specs/pull/149 - lot of focus on ZEP00001 right now - then we can start looking at future ZEPs: - netcdf - sparse ## 2022-07-04 Attending: JM, MA, AB - JM: opencollective ok? MA: Think so. AB: Don't know. - AA (slack): first payment is fine - MA: - one PR in xarray done: https://github.com/pydata/xarray/pull/6636 - Tom working on few PRs in datatree. Quiet again. Gave him feedback. - JM: Ryan Abernathey said it would be merged into mainline - MA: surprised. wouldn't have thought before Christmas. - MA: would mean a few things need cleaning up - get rid of a few copy-n-pastes - MA: PR open in datatree. (contextmanagers) - https://github.com/xarray-contrib/datatree/pull/114 - MA: opened PR in Zarr - https://github.com/zarr-developers/zarr-python/pull/1066 - JA: xarray - github.com/jbms may open a PR to support array dimensions in Zarr - MA: https://github.com/Unidata/netcdf-c/releases/tag/v4.9.0 supports various nczarr things - looking for bigger drivers. any ongoing activities strictly with Zarr? Not at the moment. - possibilities / high-level discussion needed with AA - anything on the OGC front? - kerchunk? Sure, but what? Really in non-Python languages? (People busy) - sparse arrays? That could be interesting. - https://ipfs.io/ ? distributed filesystems - helper library for multi-scale images? e.g. geozarr & ome-zarr - publicity for datatree? get people start using it. ## 2022-05-19 Attending: MA, JM, AB, TN - JM: got in touch with Stephan & Ryan - meeting moving forward. JM to send around time suggestions - block on B-Open or optional? - MA: will ask AA - late is often better (1800 etc. is fine) - AB: looked at notebook - package does something similar to the prototype (slightly more complicated) - looking at datatree. anything to work on? - TN: 2 refactors remaining - storing tuples to storing dictionaries (done) - storing entire datasets to just individual variables (so node acts like dataset) - vaguely started. have local changes that can be pushed, but doesn't work yet. - doesn't change the API too much. - touches on a lot of other issues - could tag people for review, etc. - MA: not using it on a daily basis (i.e. please ping) - plans for using? AB: likely xarray-sentinel - see: https://github.com/xarray-contrib/datatree/issues/80 (API) - see: https://github.com/xarray-contrib/datatree/issues/77 (BUG) - TN: lot of it boils down to "do you want to operate on the node or tree" - MA: quite on nczarr front (on v3 as well -- not before the summer) - TN: helpful to look at backends with trees (do files get closed) - Joe thinks he's found a bug in encoding, all the way up to xarray - zipfile? TN: yeah, but likely solved. - see: https://github.com/xarray-contrib/datatree/issues/89 - see: https://github.com/xarray-contrib/datatree/pull/95 (still failing) - issue with the NC4 interface - JM: datatree adoption? - some, growing - needs docs - AB: now or later? - some now, but need to wait on some things - there are some plans/ideas for the docs in place - one getting started page and function docs are basically it - MA: reports should be ready. 150 hrs. Actions: - B-Open should have a look at and provide a feedback: - https://github.com/xarray-contrib/datatree/issues/80 - B-Open can start working on: - https://github.com/xarray-contrib/datatree/issues/89 - https://github.com/xarray-contrib/datatree/pull/95 - https://github.com/xarray-contrib/datatree/issues/61 ## 2022-05-05 Attending: AA, MA, JM, TN - for AB / from JM: - https://github.com/spatial-image/spatial-image-multiscale/blob/0d1458813f10663cc9f0366b132b9c8677ca992b/examples/ConvertTiffFile.ipynb - xarray: no response - MA - putting together report for payment - nczarr stuff is in place - zarr v3 in xarray has gone quiet - potentially will join the meetings - TN - already updated the tree implementation (before AB) - spent a _lot_ of time working DT over the past fewer weeks (fits & starts) - issued a release with the changes. internals refactored. no more anytree. - may break code that depends on it. changes the node access (more like unix paths) - see WHATSNEW - 1 of 2 big refactors (other is to stop storing as Dataset objects) - MA to tell AB to try it out - TN: pin CI - AA: or the breakage is good. - TN: `.ds` returns an actual dataset. but could return a frozen view - if you have a node, it's ambiguous if you want an op on just the data or the tree - .ds API helps to separate that. - ...josh spaced out a bit... - TN: another group always wants to refactor out their class for DT - AA: had sentinel product in netcdf/zarr backend and can save one after the other - datasets as different groups - navigable dictionary - JM: is the plan to eventually return DT from open_dataset? - TN: less breaking to have open_datatree - AA: multiscale - https://github.com/ome/ngff/pull/114 - TN: dataclasses is one suggestion for us to investigate ## 2022-04-20 Attending: Josh Moore, Alessandro Amici, Thomas Nicholas, Mattia Almansi - nczarr - MA: PR opened on xarray for nczarr. Overestimated. Not creating extra files. in .zattrs. - AA: they want to do something different (global variables) - https://github.com/pydata/xarray/issues/6374 - https://github.com/pydata/xarray/pull/6420 - only difference - xarray only supports reading of nczarr - nczarr also writes xarray attributes - AA: not yet at one standard - TN: Ryan, Stephan, Joe as candidates - AA: GDAL implementation is different yet again. - https://gdal.org/drivers/raster/zarr.html#srs-encoding - https://github.com/pydata/xarray/issues/6448 - Related to GeoZarr / GDAL - DataTree - TN: nothing new - Someone from Arviz wanted to replace an internal class with datatree - Already like a one-level datatree - https://github.com/arviz-devs/arviz/issues/2015 - JM: https://github.com/spatial-image/spatial-image/pull/8 - AA: any thing need help on datatree? - TN: internally it still relies on a library for a tree structure (AnyTree) - need to change the internal structures - change from structure of node stores tuple of children - to something that is more dictionary like - unnamed node stores unnamed nodes under keys - AA: using fact that node knows its own name - variables have & know a name (can be out of sync) - TN: subtle variables have optional names, must have a name if in a Dataset - https://github.com/xarray-contrib/datatree/issues/3 --> Aureliana - Misc - V3: https://github.com/pydata/xarray/pull/6475 - Shouldn't have to worry about formats or about datatree - Looking at ome-datatree, Aureliana's open_variable could live in datatree - pseudo-public - https://github.com/aurghs/ome-datatree/ - can then build a tree from various variables - @@Tom to open an issue - Sending report for end of April. ## 2022-03-23 Attending: Mattia, Josh, Alessandro, Aureliana - FYI: 80 hours to date. (10% of total) - Tom getting invites? Josh to reach out. - nczarr/zarr (matt) - supports both conventions (in master) - see: https://github.com/Unidata/netcdf-c/issues/2252 - MA: xarray only works with netcdf4 encoding - opened issue. no one from xarray has responded yet - https://github.com/pydata/xarray/issues/6374 - Ryan: "difficult thing to do" - no good doc options for a python user - current backends: zarr & netcdf4 - potential solutions: - document netcdf4 - add nczarr support to zarr - add a nczarr backend - additional issues - define precendence. what happens if both exist. - MA: in nczarr you can likely specify it (`#mode=nczarr,xarray`) - can you write both from xarray? - Mattia: most users use `open_zarr` so they aren't aware of backends. - tldr - work on PR on standard zarr backend. - budgeting about a week of work. - goal is to have something explicit (precendence, etc.) - hopefully it will encourage someone on the xarray side to engage with us - datatree (auerliana) - Discussion with Matt McCormick: multiscale registration but some issues with dask (using `map_overlap`). Also some interest in `ARRAY_DIMENSIONS`. - GeoZarr relationship? No one known. - https://github.com/christophenoel/geozarr-spec/blob/main/geozarr-spec.md - AA: close to our remote-sensing soul - waiting on python code from Matt McCormick ## 2022-03-09 Attending: Josh, Mattia Almansi (@malmans2), Aureliana, Alessandro - Josh: haven't managed to get in touch with Matt about the use case - hackathon early April would be a good time to try out xarray (on brain data) - Aureliana: it would be good to define tasks - last time discussed small prototype for reading ome-zarr in datatree - https://github.com/aurghs/ome-datatree - needed to disable a netcdf check in xarray - AA: decision on a few things like where the metadata lives - plays a role in what a "netcdf-compliant" zarr will look like - which features are useful? etc. - JM: could start a spec issue showing the prototype - AA: other deliverables - - https://twitter.com/rabernat/status/1380205129509523457 - action Josh: - update https://github.com/zarr-developers/zarr-specs/issues/125 et al. - action B-Open: - Evaluate with the xarray community to implement in xarray NETCDF convetions for dimensions. - ask Tom what can we do for the first release - report of effort spent and a plan ## 2022-02-23 - Tom's issue on DataTree (AA) - what happens on propagation of operations? - priority of support with netcdf for ngff - datagroup or... - specific backend? - or (Hoyer) open by variable - file format & tree stay the same but API allows creating. - open_dataset --> open_variable - cf. https://github.com/spatial-image/spatial-image-multiscale/issues/8 - seealso: https://github.com/spatial-image/spatial-image-ngff/blob/main/spatial_image_ngff.py - seealso: https://github.com/spatial-image/spatial-image/blob/main/spatial_image.py - AA: NGFF spec becomes more like a convention on top of NetCDF - AA: difference to NC - no specific name for the scale - to match xarray model, within a group, arrays should have names which identified what's inside (i.e. the variable name) --> `$scale/image` - see https://github.com/alexamici/xarray-datagroup `{x2,x4}/data_{1,2}` - TN: https://twitter.com/carbonplanorg/status/1442559369539817478 - Next steps - with agreement on spec, we can move forward with implementation - could work on backend... JM:worried about focusing - AA: something to actually try them now. - what does it take to get datatree done? - discussion is crystallizing - repo is in a pretty good place - no documentation, started recently - internal things need fixing - abstract tree dependency - does what it says, but not ready to be relied on - maybe api changes - data: - https://github.com/spatial-image/spatial-image-ngff/blob/main/test_spatial_image_ngff.py - https://www.openmicroscopy.org/2021/12/16/ome-ngff.html - https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/idr0079A/9836998.zarr (not too big) ## 2022-02-09 Agenda: - Josh: FYI/ https://github.com/zarr-developers/zarr-specs/issues/73 - Ryan gets the point: https://github.com/ome/ngff/issues/48#issuecomment-1031482303 - where to add issue to track activities (if not on a zarr repo we can create an ad-hoc repo under bopen) - ome-ngff and cfconventions - possible specific backend to read a ome-zarr file in a xarray tree (or group) - data-groups and data-tree Notes: - Assumptions in [backend](https://github.com/pydata/xarray/blob/main/xarray/backends/zarr.py) - `_ARRAY_DIMENSIONS` - naming of coordinates associated with dimension - with a new backend you can open them now - Option for the ome spec: - using different name for coords (aa: allow not to open with a dataset, but it is less clean) - put every array in a separate group (more complex and currently the hierachical data structure is not ready, so it can be used immediately) - https://github.com/thewtex as a potential user / https://github.com/thewtex/spatial-image-multiscale (cf. [dataclasses](https://github.com/astropenguin/xarray-dataclasses)) - [DataTree proposal](https://docs.google.com/document/d/19jVW5lL2jwhS0dgj9XqPBrcvIa13cpWrnDsnVLqZkfc/edit) - [alexamici DataGroup API experiment (xarray hierarchical data structure)](https://github.com/alexamici/xarray-datagroup) - Misc - TN: discussions with Stephan Hoyer - JM: desire to find the _right_ way of solving these issues - AA: interesting use of groups in cfconventions - Deliverable 1 reminders - Contribute to the hierarchical datasets implementation (“datatree”) - Investigate the possibility of sharing metadata between scales - Demonstrate usage in downstream libraries (e.g. napari, aicsimageio - see set_level) - Submit any conventions that are developed to the central registry - Additionally - Entrypoint with a new backend xtensor-zarr-legacy - open_datatree (loops and calls open) - Joe is the only user - perhaps play there, with a new backend. - demo code (blog post) - use cases! (very important) - xarray _as the_ image. - relative paths... - isomorphism operations (num. of groups) - [map_over_subtree (even as decorator)](https://github.com/TomNicholas/datatree/blob/d9c85b3470bd9698ba74fcdb606216de5949368d/datatree/mapping.py#L106) - doesn't work for diagonal relationships, only horizontal ## 2022-01-26 Agenda: - Where to take notes, hackmd.io? - Sanket Verma - https://github.com/msankeys963 (New community manager) - https://github.com/zarr-developers/governance/issues/14 - Use cases for multiscale other than visualisation - Any suggestions on applications to look at? - We know Geospatial overviews = mostly visualisation - Joshua: Deep learning - Alessandro: It's more and query API rather than opreations - https://github.com/napari/napari - e.g. https://github.com/ome/napari-ome-zarr/ - sample data: https://www.openmicroscopy.org/2021/12/16/ome-ngff.html - spec: https://ngff.openmicroscopy.org/latest/ - https://www.napari-hub.org/?search=zarr&sort=relevance&page=1 - Multiscale extension: proposal for a more generic approach of relation between datasets - CF Conventions define the relations between coordinates in section 8.3 [“Lossy Compression by Coordinate Subsampling” of CF-1.9](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling) ``` a.ome.zarr |_ an_image (group) -- "_ARRAY_DIMENSIONS" | .zattrs |_ {"multiscales": ["datasets": [{"path": "name"}]]} |_ 0 (hi-res) |_ 1 (next-res) |_ a_label (group) |_ 0 (hi-res) |_ 1 (next-res) xarray -> cfconventions -> netcdf -> HDF5 and Zarr ``` - Josh: - chatting with aicsimageio (multi-named coordinates). - chatting Tom. - pointing OME-NGFF community at cfconventions. - B-Open (@g5Y1e6NLQ96ULB-CM9ybUw): - try out napari & aicsimageio - try looking inside aicsimageio to the data structure they use to keep track of the multiscale relation - Talk on aicsimageio: https://www.youtube.com/watch?v=LNa_gGpSnvc - look into Tom's datatree: https://github.com/TomNicholas/datatree