---
tags: zarr, b-open, meeting, notes
---
# Zarr/B-Open Bi-weeklies
Zoom link: https://www.google.com/url?q=https://openmicroscopy-org.zoom.us/j/85829696160?pwd%3DVXZuVzZjclJxUDRyQTB4bXFVaWdpdz09%26from%3Daddon
## 2022-08-03
Attending: JM, MA, AB
- any projects?
- MA: maybe sparse data. None now but eventually.
- MA: https://github.com/pydata/sparse/issues/222
- https://github.com/zarr-developers/zarr-python/issues/424
- why not?
- consensus & maintenance burden
- https://github.com/pydata/xarray/issues/3213
- AB: open some issues to summarize what's been done
- define needs
- review in September and then move to something concrete
- JM: maybe close the old issues
- netcdf?
- jbms user was writing up proposal
- getting that
- NB: https://github.com/zarr-developers/zarr-specs/pull/149
- lot of focus on ZEP00001 right now
- then we can start looking at future ZEPs:
- netcdf
- sparse
## 2022-07-04
Attending: JM, MA, AB
- JM: opencollective ok? MA: Think so. AB: Don't know.
- AA (slack): first payment is fine
- MA:
- one PR in xarray done: https://github.com/pydata/xarray/pull/6636
- Tom working on few PRs in datatree. Quiet again. Gave him feedback.
- JM: Ryan Abernathey said it would be merged into mainline
- MA: surprised. wouldn't have thought before Christmas.
- MA: would mean a few things need cleaning up
- get rid of a few copy-n-pastes
- MA: PR open in datatree. (contextmanagers)
- https://github.com/xarray-contrib/datatree/pull/114
- MA: opened PR in Zarr
- https://github.com/zarr-developers/zarr-python/pull/1066
- JA: xarray
- github.com/jbms may open a PR to support array dimensions in Zarr
- MA: https://github.com/Unidata/netcdf-c/releases/tag/v4.9.0 supports various nczarr things
- looking for bigger drivers. any ongoing activities strictly with Zarr? Not at the moment.
- possibilities / high-level discussion needed with AA
- anything on the OGC front?
- kerchunk? Sure, but what? Really in non-Python languages? (People busy)
- sparse arrays? That could be interesting.
- https://ipfs.io/ ? distributed filesystems
- helper library for multi-scale images? e.g. geozarr & ome-zarr
- publicity for datatree? get people start using it.
## 2022-05-19
Attending: MA, JM, AB, TN
- JM: got in touch with Stephan & Ryan
- meeting moving forward. JM to send around time suggestions
- block on B-Open or optional?
- MA: will ask AA
- late is often better (1800 etc. is fine)
- AB: looked at notebook
- package does something similar to the prototype (slightly more complicated)
- looking at datatree. anything to work on?
- TN: 2 refactors remaining
- storing tuples to storing dictionaries (done)
- storing entire datasets to just individual variables (so node acts like dataset)
- vaguely started. have local changes that can be pushed, but doesn't work yet.
- doesn't change the API too much.
- touches on a lot of other issues
- could tag people for review, etc.
- MA: not using it on a daily basis (i.e. please ping)
- plans for using? AB: likely xarray-sentinel
- see: https://github.com/xarray-contrib/datatree/issues/80 (API)
- see: https://github.com/xarray-contrib/datatree/issues/77 (BUG)
- TN: lot of it boils down to "do you want to operate on the node or tree"
- MA: quite on nczarr front (on v3 as well -- not before the summer)
- TN: helpful to look at backends with trees (do files get closed)
- Joe thinks he's found a bug in encoding, all the way up to xarray
- zipfile? TN: yeah, but likely solved.
- see: https://github.com/xarray-contrib/datatree/issues/89
- see: https://github.com/xarray-contrib/datatree/pull/95 (still failing)
- issue with the NC4 interface
- JM: datatree adoption?
- some, growing
- needs docs
- AB: now or later?
- some now, but need to wait on some things
- there are some plans/ideas for the docs in place
- one getting started page and function docs are basically it
- MA: reports should be ready. 150 hrs.
Actions:
- B-Open should have a look at and provide a feedback:
- https://github.com/xarray-contrib/datatree/issues/80
- B-Open can start working on:
- https://github.com/xarray-contrib/datatree/issues/89
- https://github.com/xarray-contrib/datatree/pull/95
- https://github.com/xarray-contrib/datatree/issues/61
## 2022-05-05
Attending: AA, MA, JM, TN
- for AB / from JM:
- https://github.com/spatial-image/spatial-image-multiscale/blob/0d1458813f10663cc9f0366b132b9c8677ca992b/examples/ConvertTiffFile.ipynb
- xarray: no response
- MA
- putting together report for payment
- nczarr stuff is in place
- zarr v3 in xarray has gone quiet
- potentially will join the meetings
- TN
- already updated the tree implementation (before AB)
- spent a _lot_ of time working DT over the past fewer weeks (fits & starts)
- issued a release with the changes. internals refactored. no more anytree.
- may break code that depends on it. changes the node access (more like unix paths)
- see WHATSNEW
- 1 of 2 big refactors (other is to stop storing as Dataset objects)
- MA to tell AB to try it out
- TN: pin CI
- AA: or the breakage is good.
- TN: `.ds` returns an actual dataset. but could return a frozen view
- if you have a node, it's ambiguous if you want an op on just the data or the tree
- .ds API helps to separate that.
- ...josh spaced out a bit...
- TN: another group always wants to refactor out their class for DT
- AA: had sentinel product in netcdf/zarr backend and can save one after the other
- datasets as different groups
- navigable dictionary
- JM: is the plan to eventually return DT from open_dataset?
- TN: less breaking to have open_datatree
- AA: multiscale
- https://github.com/ome/ngff/pull/114
- TN: dataclasses is one suggestion for us to investigate
## 2022-04-20
Attending: Josh Moore, Alessandro Amici, Thomas Nicholas, Mattia Almansi
- nczarr
- MA: PR opened on xarray for nczarr. Overestimated. Not creating extra files. in .zattrs.
- AA: they want to do something different (global variables)
- https://github.com/pydata/xarray/issues/6374
- https://github.com/pydata/xarray/pull/6420
- only difference
- xarray only supports reading of nczarr
- nczarr also writes xarray attributes
- AA: not yet at one standard
- TN: Ryan, Stephan, Joe as candidates
- AA: GDAL implementation is different yet again.
- https://gdal.org/drivers/raster/zarr.html#srs-encoding
- https://github.com/pydata/xarray/issues/6448
- Related to GeoZarr / GDAL
- DataTree
- TN: nothing new
- Someone from Arviz wanted to replace an internal class with datatree
- Already like a one-level datatree
- https://github.com/arviz-devs/arviz/issues/2015
- JM: https://github.com/spatial-image/spatial-image/pull/8
- AA: any thing need help on datatree?
- TN: internally it still relies on a library for a tree structure (AnyTree)
- need to change the internal structures
- change from structure of node stores tuple of children
- to something that is more dictionary like
- unnamed node stores unnamed nodes under keys
- AA: using fact that node knows its own name
- variables have & know a name (can be out of sync)
- TN: subtle variables have optional names, must have a name if in a Dataset
- https://github.com/xarray-contrib/datatree/issues/3 --> Aureliana
- Misc
- V3: https://github.com/pydata/xarray/pull/6475
- Shouldn't have to worry about formats or about datatree
- Looking at ome-datatree, Aureliana's open_variable could live in datatree
- pseudo-public
- https://github.com/aurghs/ome-datatree/
- can then build a tree from various variables
- @@Tom to open an issue
- Sending report for end of April.
## 2022-03-23
Attending: Mattia, Josh, Alessandro, Aureliana
- FYI: 80 hours to date. (10% of total)
- Tom getting invites? Josh to reach out.
- nczarr/zarr (matt)
- supports both conventions (in master)
- see: https://github.com/Unidata/netcdf-c/issues/2252
- MA: xarray only works with netcdf4 encoding
- opened issue. no one from xarray has responded yet
- https://github.com/pydata/xarray/issues/6374
- Ryan: "difficult thing to do"
- no good doc options for a python user
- current backends: zarr & netcdf4
- potential solutions:
- document netcdf4
- add nczarr support to zarr
- add a nczarr backend
- additional issues
- define precendence. what happens if both exist.
- MA: in nczarr you can likely specify it (`#mode=nczarr,xarray`)
- can you write both from xarray?
- Mattia: most users use `open_zarr` so they aren't aware of backends.
- tldr
- work on PR on standard zarr backend.
- budgeting about a week of work.
- goal is to have something explicit (precendence, etc.)
- hopefully it will encourage someone on the xarray side to engage with us
- datatree (auerliana)
- Discussion with Matt McCormick: multiscale registration but some issues with dask (using `map_overlap`). Also some interest in `ARRAY_DIMENSIONS`.
- GeoZarr relationship? No one known.
- https://github.com/christophenoel/geozarr-spec/blob/main/geozarr-spec.md
- AA: close to our remote-sensing soul
- waiting on python code from Matt McCormick
## 2022-03-09
Attending: Josh, Mattia Almansi (@malmans2), Aureliana, Alessandro
- Josh: haven't managed to get in touch with Matt about the use case
- hackathon early April would be a good time to try out xarray (on brain data)
- Aureliana: it would be good to define tasks
- last time discussed small prototype for reading ome-zarr in datatree
- https://github.com/aurghs/ome-datatree
- needed to disable a netcdf check in xarray
- AA: decision on a few things like where the metadata lives
- plays a role in what a "netcdf-compliant" zarr will look like
- which features are useful? etc.
- JM: could start a spec issue showing the prototype
- AA: other deliverables -
- https://twitter.com/rabernat/status/1380205129509523457
- action Josh:
- update https://github.com/zarr-developers/zarr-specs/issues/125 et al.
- action B-Open:
- Evaluate with the xarray community to implement in xarray NETCDF convetions for dimensions.
- ask Tom what can we do for the first release
- report of effort spent and a plan
## 2022-02-23
- Tom's issue on DataTree (AA)
- what happens on propagation of operations?
- priority of support with netcdf for ngff
- datagroup or...
- specific backend?
- or (Hoyer) open by variable
- file format & tree stay the same but API allows creating.
- open_dataset --> open_variable
- cf. https://github.com/spatial-image/spatial-image-multiscale/issues/8
- seealso: https://github.com/spatial-image/spatial-image-ngff/blob/main/spatial_image_ngff.py
- seealso: https://github.com/spatial-image/spatial-image/blob/main/spatial_image.py
- AA: NGFF spec becomes more like a convention on top of NetCDF
- AA: difference to NC - no specific name for the scale
- to match xarray model, within a group, arrays should have names which identified what's inside (i.e. the variable name) --> `$scale/image`
- see https://github.com/alexamici/xarray-datagroup `{x2,x4}/data_{1,2}`
- TN: https://twitter.com/carbonplanorg/status/1442559369539817478
- Next steps
- with agreement on spec, we can move forward with implementation
- could work on backend... JM:worried about focusing
- AA: something to actually try them now.
- what does it take to get datatree done?
- discussion is crystallizing
- repo is in a pretty good place
- no documentation, started recently
- internal things need fixing
- abstract tree dependency
- does what it says, but not ready to be relied on
- maybe api changes
- data:
- https://github.com/spatial-image/spatial-image-ngff/blob/main/test_spatial_image_ngff.py
- https://www.openmicroscopy.org/2021/12/16/ome-ngff.html
- https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.3/idr0079A/9836998.zarr (not too big)
## 2022-02-09
Agenda:
- Josh: FYI/ https://github.com/zarr-developers/zarr-specs/issues/73
- Ryan gets the point: https://github.com/ome/ngff/issues/48#issuecomment-1031482303
- where to add issue to track activities (if not on a zarr repo we can create an ad-hoc repo under bopen)
- ome-ngff and cfconventions
- possible specific backend to read a ome-zarr file in a xarray tree (or group)
- data-groups and data-tree
Notes:
- Assumptions in [backend](https://github.com/pydata/xarray/blob/main/xarray/backends/zarr.py)
- `_ARRAY_DIMENSIONS`
- naming of coordinates associated with dimension
- with a new backend you can open them now
- Option for the ome spec:
- using different name for coords (aa: allow not to open with a dataset, but it is less clean)
- put every array in a separate group (more complex and currently the hierachical data structure is not ready, so it can be used immediately)
- https://github.com/thewtex as a potential user / https://github.com/thewtex/spatial-image-multiscale (cf. [dataclasses](https://github.com/astropenguin/xarray-dataclasses))
- [DataTree proposal](https://docs.google.com/document/d/19jVW5lL2jwhS0dgj9XqPBrcvIa13cpWrnDsnVLqZkfc/edit)
- [alexamici DataGroup API experiment (xarray hierarchical data structure)](https://github.com/alexamici/xarray-datagroup)
- Misc
- TN: discussions with Stephan Hoyer
- JM: desire to find the _right_ way of solving these issues
- AA: interesting use of groups in cfconventions
- Deliverable 1 reminders
- Contribute to the hierarchical datasets implementation (“datatree”)
- Investigate the possibility of sharing metadata between scales
- Demonstrate usage in downstream libraries (e.g. napari, aicsimageio - see set_level)
- Submit any conventions that are developed to the central registry
- Additionally
- Entrypoint with a new backend xtensor-zarr-legacy
- open_datatree (loops and calls open) - Joe is the only user
- perhaps play there, with a new backend.
- demo code (blog post)
- use cases! (very important)
- xarray _as the_ image.
- relative paths...
- isomorphism operations (num. of groups)
- [map_over_subtree (even as decorator)](https://github.com/TomNicholas/datatree/blob/d9c85b3470bd9698ba74fcdb606216de5949368d/datatree/mapping.py#L106)
- doesn't work for diagonal relationships, only horizontal
## 2022-01-26
Agenda:
- Where to take notes, hackmd.io?
- Sanket Verma
- https://github.com/msankeys963 (New community manager)
- https://github.com/zarr-developers/governance/issues/14
- Use cases for multiscale other than visualisation
- Any suggestions on applications to look at?
- We know Geospatial overviews = mostly visualisation
- Joshua: Deep learning
- Alessandro: It's more and query API rather than opreations
- https://github.com/napari/napari
- e.g. https://github.com/ome/napari-ome-zarr/
- sample data: https://www.openmicroscopy.org/2021/12/16/ome-ngff.html
- spec: https://ngff.openmicroscopy.org/latest/
- https://www.napari-hub.org/?search=zarr&sort=relevance&page=1
- Multiscale extension: proposal for a more generic approach of relation between datasets
- CF Conventions define the relations between coordinates in section 8.3 [“Lossy Compression by Coordinate Subsampling” of CF-1.9](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling)
```
a.ome.zarr
|_ an_image (group) -- "_ARRAY_DIMENSIONS"
| .zattrs
|_ {"multiscales": ["datasets": [{"path": "name"}]]}
|_ 0 (hi-res)
|_ 1 (next-res)
|_ a_label (group)
|_ 0 (hi-res)
|_ 1 (next-res)
xarray
-> cfconventions
-> netcdf
-> HDF5 and Zarr
```
- Josh:
- chatting with aicsimageio (multi-named coordinates).
- chatting Tom.
- pointing OME-NGFF community at cfconventions.
- B-Open (@g5Y1e6NLQ96ULB-CM9ybUw):
- try out napari & aicsimageio
- try looking inside aicsimageio to the data structure they use to keep track of the multiscale relation
- Talk on aicsimageio: https://www.youtube.com/watch?v=LNa_gGpSnvc
- look into Tom's datatree: https://github.com/TomNicholas/datatree