Zarr Bi-weekly Community Calls

Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/

Joining instructions: https://zoom.us/j/300670033 (password: 558943)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1

2025-06-25

Attending: Ward Fisher (WF), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)

Meeting Minutes:

DB: https://github.com/zarr-developers/zarr-python/pull/2874 — got merged!
- Next steps:
  - Fix the codecs situation — unify them
  - Stop depending on numcodecs
  - Variable size chunks
- Other things:
  - Indexing issues
  - NVIDIA folks are interested in how GPU can be added into Zarr-Python codebase
- SV: New data type addition is also included in DB's PR
- DB: Also been working on the new data types stuff
WF: Unidata is back in operation. We are also collaborating with DKRZ (German Climate Computing Center) on some proposed ncZarr work they would like to see (consolidated metadata, amongst other things).
- There have been questions from DKRZ devs re: functionality in various Zarr Python packages that deviate from the specification (v2, specifically).
DB: Having code samples in the Zarr-Python repository, using PEP723
DB: Found a funny bug in Zarr-Python 2

2025-06-11

Attending: Josh Moore (JM), Gábor Kovács (GK)

GK: any obvious issues to get involved on.
- JM: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/zarr.2Eload.20deletes.20data/with/522616083
- Other suggestions welcome.
People still interested in the meeting? New time? Monthly?

2025-05-28

Attending: Josh Moore (JM)

No show.

2025-04-16

Attending: Josh Moore (JMo), Eric Perlman (EP), Justus Magin (JMa), Gábor Kovács (GK)

TL;DR:

Updates:

Meeting Minutes:

JMa: "signed URLs"
- EP: looking at them for raw data
- e.g. HTTP parameters NOT the AWS thing
- JMo: bug? JMa: Zarr is oversimplifying the use of paths
- Planetary computer pulls an access token, appends to any URL (Zarr or Geo-TIFF or …)
- EP: would try hacking it in in a little fsspec wrapper
- JMa: use-case – trying to access something S3-like that needs parameters
- JMo: does obstore "do the right thing"?
- JMa: Kyle has something that will work for planetary computer, but that's just one endpoint
- EP: with shards can almost use proper signed URLs
  - JMo: will likely require Virtualizarr
  - also: https://earthmover.io/blog/announcing-flux
JMa: sparse arrays
- looked at binsparse
  - which decomposes into one dimensional arrays
  - another level of nesting
- JMo: good format but need library support like .chunks. need to be aware of the metadata.
- JMa: encoding the sparseness per chunk?

2025-04-02

Attending: Davis Bennett (DB), Sanket Verma (SV), Eric Perlman (EP), Jeremy Maitin-Shepard (JMS), Michael Sumner (MS)

TL;DR:

Updates:

Version policy update: https://github.com/zarr-developers/zarr-python/pull/2910
- Blog post PR: https://github.com/zarr-developers/blog/pull/67
Obstore-base store implementation PR has landed: https://github.com/zarr-developers/zarr-python/pull/1661
Zarr-Python V2 Support release 2.18.5 took place last week! Thanks, David!

Meeting Minutes:

DB: Working to add support for V3 and Tensorstore in Pydantic Zarr
- Also to add group support in Pydantic for Tensorstore
- Appreciate the results by reading and writing in Tensorstore, i.e. returns an object
DB: elaborates on the version policy change
- DB: Effver—mostly a function of efforts put in by the users
- JMS:
DB: https://github.com/zarr-developers/numcodecs/issues/686—formalise old and new styles of JSON serialisation
- DB: Numcodecs doesn't interoperate well with Zarr-Python, also there's code in Cython which only handful of folks can maintain
MS: There have been great developments in the Zarr ecosystem but things have been moving so fast that I worry it will start proliferate. It's difficult to keep track of all the things
- GDAL: https://lists.osgeo.org/pipermail/gdal-dev/2025-April/060414.html
- List of EOPF product samples publicly available from the EOPF s3 public bucket: https://eopf-public.s3.sbg.perf.cloud.ovh.net/product.html
- https://cpm.pages.eopf.copernicus.eu/eopf-cpm/main/PSFD/4-storage-formats.html
EP: Most of the Jackson Lab data is in V3 sharded effort
- EP helped in conversion and Eric Ratamero lead the effort

2025-03-19

Attending: Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

AR had GSoC questions
https://github.com/zarr-developers/zarr-python/pull/2910 — atleast social media announcement would be good, blog post ++ (more the merrier)
DB: Dtype addition to Zarr-Python (https://github.com/zarr-developers/zarr-python/pull/2874)
- JMS: Would the data type be mapped to 1-1?
- DB: Currently support NumPy and CuPy datatypes
- DB: More tests needed to handle endianess, need to change the API too
JMS: Norman registered new codecs—how are they gonna work in Zarr-Python 3?
- DB: We haven't made a final decision on that

2025-02-19

Attending: Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Zarr participating in GSoC this year, ideas list: https://github.com/numfocus/gsoc/blob/master/2025/ideas-list.md
ZEP9 Draft published — https://zarr.dev/zeps/draft/ZEP0009.html.
- Follow up PRs in zarr-specs:
  - https://github.com/zarr-developers/zarr-specs/pull/330
  - https://github.com/zarr-developers/zarr-specs/pull/331
- Setting up https://zarr.dev/extensions
EP: Jackson lab will be converting the datasets to sharded Zarr V3 arrays
SV: https://2025.pycon.de/talks/ABWHSD/ — speaking at PyCon DE 2025!

Open agenda (add here 👇🏻):

MS using Pizzarr for their work
- GDAL and Pizzarr work for virtual references as well
- Has datasets in HDF5 and NetCDF - both has their pecularities
JM: gives an overview of ZEP9
EP: Jackson lab data conversion
- JM: Any benefits in performance?
- EP:
- EP: Sticking with OMERO 5D arrays
- MS: Public link for the data
- EP: https://images.jax.org/webclient/userdata/?experimenter=-1 (can be added https://zarr.dev/datasets)
- JM: could also add https://ome.github.io/ome2024-ngff-challenge/
- JM: Zarrs (Rust implementation) would be useful
- EP: Solely using OMERO but could pass the URL to Neuroglancer to view it
EP: Cloudflare is potentially working with OS projects and giving them resonable tier prices
EP: Raw 10TB nbytes - how do you convert it to sharded V3 array?
- DB: Using np.mempap might be useful
- JM: Can also use Kerchunk
- JM: Can use Tensorstore to convert the data as well
- DB: Could potentially use Zarr-Python - but is slower 10x slower
- JM: https://github.com/LDeakin/zarr_benchmarks
- JM: Satra ran into memory issues with Tensorstore: https://github.com/ome/ome2024-ngff-challenge/issues/83
- JM: Best way to shard large arrays - a good GSoC project
JM: https://github.com/asdf-format/asdf-standard
- https://github.com/asdf-format/asdf-zarr
EP: Zarr being used in bio space - Folks at the Allen are looking to submit a proposal at SciPy 2025
- JM: Francesc Alted did a nice presentation at SciPy 2023: https://youtu.be/0GX5nDqUUZE?si=WvE6asx5zjtrBcHI

2025-02-05

Notes TBA

2025-01-22

Attending: Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB)

TL;DR:

Updates:

Zarr-Python 3 released on January 9th, 2025!
- Blog post: https://zarr.dev/blog/zarr-python-3-release/

Open agenda (add here 👇🏻):

N5
- JM: There'll be new release to add support for N5 in Zarr-Python 3
- EP: They can leverage sharding and other useful features
Zarr-Python 3
- DB: Gave a presentation on Zarr-Python 3 at Allen Institute
- DB: Realised some issues in Zarr-Python 2 when listing groups
- JMS: Because ZP 2 listing processes were taking place parallely
- DB: In Zarr V2 spec there's nothing says that groups and arrays are different
- JMS: Looked at the spec as well as the implementation when working with my implementation
JMS: Added support for ZEP8 in Tensorstore
- PR: https://github.com/google/neuroglancer/pull/696
- JM: Too many files, JMS: Includes linting changes and test files
- JMS: The path resolution for Zarr V3 in Neuroglancer: https://host/path/to/n5/group/|n5:path/to/array/ would look for array and not go up in the group directory
- JM: The searching is mostly top down in OME land but we should work towards more completeness
- JMS: Also, planning to add Icechunk support to Neuroglancer
Discussion on URL for Neuroglancer
- Deciding the right characters to use
- Tricky to decide the right URL

2025-01-09

Attending: Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard

TL;DR:

Updates:

Happy New Year!
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Zarr-Python v3.0.0-rc.1 and rc.2 out now! Full release coming up tomorrow!
Zarr has a wikipedia page now! — https://en.wikipedia.org/wiki/Zarr_(data_format)
A group of Zarr-Python devs are at AMS next week including Joe and Ryan
CFP for SciPy 2025 are open: https://www.scipy2025.scipy.org/

Open agenda (add here 👇🏻):

EP: the month wait was good to get other projects like napari up-to-speed
DB: reached out to people using n5 in python. They weren't pinning to zarr-python<3. Sent an email. No response. EP anyone? No. Using Zarr.
JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it.
- TN: Kerchunk doesn't support zarr-python 3.x (API usage)
- without kerchunk: fits & netcdf won't work.
- lose access to anything in the future (in-progress HDF4)
- JH: requires rethinking of MultiZarrToZarr logic
- TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects)
- JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec
- JM: I'll get the ZSC to respond on the longing issues
- DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it)
- DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3
- JMS: Planning to add Icechunk support to Tensorstore

2024-12-11

Attending: Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev
Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman!
A group of Zarr-Python devs are at AGU this week including Joe and Ryan
Zarr-Python V3 release before holidays!
- DB has bunch of PRs coming in soon!
- Planning to expose to sharding in a user friendly way
WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata

Open agenda (add here 👇🏻):

Intros w/ favourite places for holidays
- Sanket — into Himalayas
- Ward - near Colorado
- Davis — Italy
- Eric - coming to India soon!
- Gábor — Canada for Skiing
- Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data
DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape?
- JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult
- JMS: Could also be based on user preference
- DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it?
- JMS: The non-regular/partial chunks would not compose across shards
- DB: I see! The proposal is off the table then!

2024-11-27

Attending: Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

Zarr-Python V3 release in first week of December
New numcodecs release (includes fixes and improvements)
- Check here: https://github.com/zarr-developers/numcodecs/releases/tag/v0.14.1
Zarrs-Python:
- Check: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Announcing.20zarrs-python!

Open agenda (add here 👇🏻):

DB: OME-NGFF hackathon update:
- They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it!
- https://github.com/BioImageTools/ome-zarr-models-py
- EP: John Bogoviç made good progress on Zarr Java (in the NGFF land)
  - FIJI being able to open Zarr V3
DB: zarr.open() and zarr.create() are confusing - instead we could have zarr.create_array() or zarr.create_group() to make things clear
- DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine
- JM: Would be cool if zarr.open() could figure out if it's an array or a group
Zarrs-Python
- JM: The nomenclature could've been better! A bit confusing.
- DB: Major scope is to improve the IO
JM: https://bsky.app/profile/zarr.dev
- Follow us!
https://github.com/zarr-developers/zarr-specs/pull/311
- JMS: The only part which requires spec is the URL
- JM: I care about the internal directory structure
- DB: The folks who I spoke at the OME hackathon they want the drag and drop feature
- JM: In ZP land

2024-11-13

Attending: Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

meetings (Josh)
- conversation
  - zarr-python: going strong (weekly)
  - ZEP: (not great for Dennis)
    - one off as necessary
  - community: combine into ZEP.
    - or vice versa
  - office hours: likely to end
  - Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
    run a doodle
  - People
    - Dennis: not before 10am MST
    - Jeremy: ZEP meetings are less critical at the moment
- Decisions/TODOs
  - Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones)
  - Josh: remove ZEP calendar entry
  - Josh: update on Zulip (anywhere else?)
Davis: was "zarr.json" a mistake?
- Josh: good question. benefits were:
  - only one GET (or HEAD) rather than needing a frequent 404
  - non-hidden files with proper file-ending
- Davis: true. just have a pattern now where want to iterate and bottom out on arrays
- Jeremy: often need to load the json anyway
  - or storage transformers that are needed
- Dennis: preference is to have the directories marked (e.g., in the name)
  - price is mostly paid with large numbers of groups/arrays
  - cf. consolidated metadata – locating all objects before reading them
  - maybe be able to recognize that by name
  - Davis: give it its own document?
  - Josh: like .zmetadata; downside is it requires an extra GET but that's ok.
  - Davis: using _nc for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues)
  - Josh: could additionally gzip the extra file. benchmarking?
  - Dennis: how big can get the total metadata? (in characters)
  - Davis: maybe tabular data.
  - Dennis: in NetCDF, lot of use of groups (1000s) as namespaces.
Davis: Store API
- getters take memory type (GPU, CPU, …)
- Josh: good to track (or disallow?) copies
- Jeremy: most Stores are CPU, so actively copying for GPU.
- Davis: Separate stores as in v2 (regular and chunk)
- Davis: Store is simple key/value. Agnostic to Zarr formats.
- Is the Store API overloaded?
Davis: On extra files, an extension where sqlite for every group and array. Good for tabular.
- Jeremy: sqlite doesn't work for cloud storage.
- What stops people from doing it today?
  - Prototype
  - Is this icechunk? That's more a Store API.
  - Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead.
  - Josh: duckdb?
  - Davis: see BigStitcher's use of arrays.
  - Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing.
  - Davis: GeoParquet has a spatial index
  - https://github.com/opengeospatial/geoparquet/pull/191
Theodoros: interested in adopting Zarr
- Problem is that we're dealing with really sparse datasets (mass spec imaging).
- Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign.
- Efficient encoding of a single sample and "plugged into" zarr.
- TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels)
- JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours.
  - other step would be full spare support. ton of people have asked for this. but has to be woven throughout.
- Davis: tell us what doesn't work for you. "we want to use Zarr but …"
- Josh: https://github.com/GraphBLAS/binsparse-specification
- Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array"

2024-10-30

Attending: Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests
JMS: Back from the parental leave — the baby is doing great!
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Been working on bugs for tensorstore
SV: GeoZarr spec meetings have been updated on the community calendar

Open agenda (add here 👇🏻):

Frequency of the meetings
- DB: No strong feelings
- JMS: Less activity, so make sense
- GB: Fine by me
- DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful
- SV: Will open a wide discussion for the community to get everyone thoughts
DB: Zarr V2 arrays using sharded arrays?
- JMS: Not simple enough to do that because of overlapping arrays
- DB: Zarr V2 codecs can utilise sharding codec
- JMS: The JSON metadata is differ for sharding
- JMS: Who's the user base?
- DB: Someone who's using Zarr V2 and want to use sharding
- DB: People might be scared of switching to a new format
DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference
JMS: The idea of community and core codecs is not super impressive!
- DB: Would be good to avoid namespacing issues
- JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core?
- DB: Good question, need to come up with a process for this
- JMS: Adding a vendor name could work — value in having a vendor name
Discussions on upcoming possible extensions

2024-10-16

Attending: Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA)

TL;DR:

Updates:

The default branch has been changed back to main to prepare for V3 main release - https://github.com/zarr-developers/zarr-python/pull/2335
Numcodecs 0.13.1 release soon - https://github.com/zarr-developers/numcodecs/pull/592
VirtualiZarr has a dedicated ZulipChat channel now - https://ossci.zulipchat.com/#narrow/stream/461625-VirtualiZarr
- Check VirtualiZarr repo: https://github.com/zarr-developers/VirtualiZarr
New OS project release by Earthmover - https://earthmover.io/blog/icechunk
- Transactional storage engine for ND array data on cloud object storage
Zarr-Python V3 updates
Any other updates?

Open agenda (add here 👇🏻):

Intro w/ favourite food
- Sanket - Dumplings
- Joe - Burrito
- Eric - Donuts
- Davis - Ethopian, Mexican and Indian dishes
- Michael - RSE at Australian Antartcic Division - Burgers
- Ilana - Works at Earthmover
- Daniel - NCAR
JH: starts screen sharing
- JH: Presents on Earthmover, Arraylake, Icechunk…
- JH: presentation ends — time for questions
- DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask?
  - JH: Fetching is done by a different library - we're handling the concurrency better on the IO side
  - DB: What lessons could be take from this plot that can be applied to Zarr-Python?
  - JH: Python binding to rust crate needs to be looked at
  - JH: Doing decompression and IO in a relieved fashion
- SV: Does Icechunk works with Zarr V2?
  - JH: Only with V3 for some parts - but we can change that
- EP: Able to leverage Zarr sharding in some way for Icechunk would be great
  - JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec
  - JH: Implies sharding in a different manner
- DB: How coupled are you with the current Zarr V3 API?
  - JH: Highly coupled
  - JH: LDeakin has started filling issues
  - JH: Can envision a high-level and a low-level store - that's what we build in the rust store
  - JH: We should ask store to do more, but we should be specific about it
- MS: Really interested in Rust implementation - Does Rust part take over the encodings?
  - JH: No. We haven't implemented all of ZP yet
  - JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around
  - SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you
- Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!
DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays!
- JH: Try to get sharded V2 data to work, and let us know!

2024-10-02

Attending: Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684
We had a documentation sprint for Zarr-Python V3
- The doc sprint officially ended on 10/1 evening. The participants have sent PRs to document the zarr.array and zarr.storage modules. Here are the open PRs:
- https://github.com/zarr-developers/zarr-python/pull/2276
- https://github.com/zarr-developers/zarr-python/pull/2279
- https://github.com/zarr-developers/zarr-python/pull/2281
Zarr-Python V3 team good progress — alpha release every week — V3 main release soon!
- Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass
OME Challenge
- EP: Was able to convert a big JAX datasets into V3
- JM: Ran into issues and was able to convert them into Zarr-Python V3 issues
- More discussion down below
Any other updates?

Open agenda (add here 👇🏻):

OME Challenge
- EP: JAX ran into issues for remote access and it's good to point them out and later rectify that
- EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets
- EP: Checking if an object exists before a write could cost us $2k!
- EP: JZarr is currently being written
- DH: Any decisions about deleting objects?
- EP: When you check for existing objects, you have the ability to rewrite them
- DH: That means you can delete it!
- DH: NetCDF implements a recursive delete operation
- EP:
- DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them
- DH: Having consolidated metadata help in rewriting operations
- DB: Defining schema and knowing the entire hierarchy has been helpful
- DH: We have this in NetCDF
- DB: https://github.com/janelia-cellmap/pydantic-zarr/
Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/
- JM: Will need to read this
- DB: Seems good enough and harmless
Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/
- JM: Sharing password has been a challenge
DB: https://github.com/zarr-developers/zarr-specs/pull/312
- JM: Need to merge on https://github.com/zarr-developers/governance/pull/44
- JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes
- DH: Diversity defined on the structure of the internal architecture and not programming language implementation

2024-09-18

Attending: Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct.
- Identify missing docs and start creating issues
- Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc
- Async working via Zoom meetings
MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation
Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27
Updates from Zarr-Python V3 effort / OME challenge
- DB: Getting through issues from V2 and V3 compatibility
  - Tom and Deepak taking care of Dask issues
  - Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history
  - Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined
  - Main release by the end of year

Open agenda (add here 👇🏻):

DB: https://github.com/zarr-developers/zarr-python/issues/2170
- The way of defining sharding codec is not intuitive and can be improved
- https://github.com/zarr-developers/zarr-python/pull/2169 - proposed solution
  - DB: Will update this PR and make it ready for review
DB: All stores should have cache: https://github.com/zarr-developers/zarr-python/issues/1500
- EP: Some stores like S3 would benefit from this
- EP: Compression and decompression on cache is expensive
- DB: We can default it to 0 turn it on accordingly
- DB: FSSpec have a default cache enabled - we can look into it
EP: Will try to join the Zarr-Python core devs meetings on Friday
- SV: Early morning for west coast
- EP: Can make it!
- SV: Early morning stuff: presented on Zarr V3 at EuroBioc: https://eurobioc2024.bioconductor.org/abstracts/paper-bioc4/

2024-09-04

Attending: Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP)

TL;DR:

Updates:

Meeting Minutes:

Consolidated v2
- DH: annex for v2, "officially"/loose recognized (would be a great favor)
- JM: and if we put it in v3 to say, "this is the former version"?
- DH: add a forward pointer
- EP: how many edge cases?
Deprecation (DH)
- JM: No plan to deprecate v2 format (format vs. library)
- DH: presumably people will use the new library, that will be the "test" of the consolidated metadata.
Bugs between implementations (JM)
- DH: list of those bugs? JM: no, bug good idea.
- DH: available data?
- JM: yes! see https://github.com/ome/ome2024-ngff-challenge?tab=readme-ov-file#challenge-overview
- EP: billions of objects isn't fun.
Consolidated v3
- JM: pushed recently at zarr-python meeting for a spec (and with more design)
- DH: as soon as it's in the format, then it's not just caching
  - metadata caching prevents multiple reads
- DH: caching -> "big set of objects, keeping subset in memory"
  - JM: can be re-created? "index"?
  - DH: regardless, have to specify construction any block of JSON
    - could say a subtree looks like some other pre-defined block
- JM: parameterized MetadataLoader (or "MetadataDriver")
  - DH: that's what I was going to implement anyway
  - DH: like StorageDrivers (not caching) – "VirtualAccess"
  - DH: but same wave length
  - JM: would like to offload some JSON (speed vs size)
  - DH: should that API do more than read/write the JSON?
    - should it interpret it?
    - "give me key X out of this dictionary"
    - JM: like mongodb or jq queries
    - DH: walk binary without needing to convert down to JSON
  - EP: this gets back to N5 as an API rather than a format
    - logical versus storage
  - DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP)
    - essential to have some virtual object/class
    - hammer applied to everything ("common API")
EP: https://github.com/zarr-developers/zarr_implementations
- why didn't that find the codec issues?
- JM: no v3!
- EP: hackathon?
- EP: need mozilla support for HTML things
- DH: agreed. hugely important
- JM: As a github action?

2024-08-21

Attending: Eric Perlman (EP) and Sanket Verma (SV)

TL;DR:

Updates:

Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093
Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence
Welcome David Stansby as core dev of Zarr-Python! 🎉
- https://github.com/zarr-developers/zarr-python/pull/2071
Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see here

Open agenda (add here 👇🏻):

EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore
- Repo: https://github.com/ome/ome2024-ngff-challenge
- EP: Jackson lab will be utilising the docker for converting V2 to V3 data created by EP
SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places!

2024-08-07

Attending: Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN)

TL;DR:

Updates:

Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy
Zarr-Python updates
- DB: There have been some movements in ZP
  - Discussion around the new API: https://github.com/zarr-developers/zarr-python/discussions/2052
  - Chunks, shards, and other terminology - need to decide what to use
  - Getting more active core-devs for ZP will help in having lively discussion
TN: Applying for money to work on VirtualiZarr / Zarr upstream
- TN: Development Seed is applying for the NASA grant - Julia Signell would work on it
- DB: Non-zero origin for Zarr arrays would help
- Related issue: https://github.com/zarr-developers/zarr-specs/issues/122

Open agenda (add here 👇🏻):

EP: Discussion about cycling, library and picking up books from library and reading them in the park!
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
EP/DB: Tangent on write directly to sharding from microscopes…
TN: Various ways of storing the large metadata for a huge Zarr array
- TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed
- TN: Seems important because it has come up every time there's a discussion about scalability
- DB: Store the aggregrated information in the header of the chunk
- SV: How doe BSON scale as compared to JSON?
- TN: We would still need to have a pointer to the BSON in JSON
- DB: How do we introduce it to the Zarr V3 Spec?
- TN: Maybe a convention
- TN: Zarr is close to be a superformat!
- DB: We could also increment the spec to a major version to include the change
TN: Discussions on if its possible for Zarr to be a superformat!
- TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that
DB: A fundamental Zarr array could be a set of small Zarr arrays
- TN: VirtualiZarr basically does that
- TN: starts screen sharing
- DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner
- TN: Hopefully I'd be able to work on this after VirtualiZarr
Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages

2024-07-24

Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN)

TL;DR:

Updates:

SciPy 2024 was great! 🎉
DB: Zarr-Python updates
- Sharding codec is pickleable
- Decision need to made about array API
- How sharding codec should look like to the user?
  - DB: Easy to find if your array is sharded
  - JM: Partial reading this in Zarr V2
    - TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr
    - DB: All the arrays should have sharding configuration
    - JM: Working with Tensorstore, the order of codecs didn't matter –> read_chunks / write_chunks
      - DB: some weirdness when it comes to different backends when uncompressed
New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan!
- New codec added - Pcodec
- JM: Conda is unhappy

Open agenda (add here 👇🏻):

Intros
- SV: Yosemite National Park
- JM: National Seashore in Florida - Gulf of Mexico
- FC: Jackson Lab working in ML - Saccida National Park
- EP: Zayn National Park
- WF: Yellowstone National Park
- DB: Yellowstone National Park
TN: Want to open issues on bunch of ideas
- 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this
  - Can use Zarr to open NetCDF directly
- 1. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays
  - Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081
- DB: Could be handled and should be a priority now
- TN:
- JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct!
- JM: Talked to 2 Napari maintainers - on a problem of chunking
- TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that
- JM: Finding more people with similar interests would help us provide more engineering power
- DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase
- TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices
- DB: Slicing and concatenation are duals - if you have both its complete
- DB:
- JM: Query optimisation can be tweaked as we move forward
- TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader
- JM: What does user do with the plan? Do they do something with it?
- TN: Array API folks has deliberately made arrays lazy
GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041
- GitHub and Cirun sounds good and easy to setup
- Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF
- NF has money reserved for projects in the infrastructure committee for similar costs
- JM: Good to have it!
- SV: Need to get it sooner that later
Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F
- JM: My poster was cited multiple times in the last few weeks
- JM: JOSS is a potential venue - IETF is more work
- TN: Submitting to a computing journal - W3C, IEEE, etc.
- TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148
- JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html
TABLED
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages

2024-07-10

Attending: Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC)

Updates:

SciPy!
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Josh: testing zarr v3
- issue for each problem? Davis: sure
- Davis: to be fixed:
  - no validation of fill value
  - multiple bugs with sharding: 1d
- Josh: missing "attributes"
- Josh: but neuroglancer working?
  - Davis: not for all static file servers. need PR.
  - Davis: various forks. Josh: plugins? Davis: tough
  - or: neuroglancer as a component that can be embedded
  - Janelia NG is a React component.
  - "Visualization is tough."
Motion for food
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Seconded.

2024-06-26

Attending: Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)

TL;DR:

Updates:

Zarr-Python 3.0.0a0 out
- https://pypi.org/project/zarr/3.0.0a0/
Good momentum and lots of things happening with ZP-V3 - aiming for mid July release
SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466
- R users at bio-conductor looking to develop bindings for ZP-V3
New blog post: https://zarr.dev/blog/nasa-power-and-zarr/
ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819
https://dynamical.org/ - making weather data easy and accessbile to work with
- Check: https://dynamical.org/about/
- Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV

Open agenda (add here 👇🏻):

BP: Will be talking about how Zarr is utilised at NASA!
- starts screen sharing and presenting
- BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - not representing all the data centres
- BP: Lot of people are coming into Zarr from the SMD (Science mission directorates)
- BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management
- BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use
- BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats
- BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB!
- BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse!
- BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range
- BP: Over at NASA, we do many Zarr stuff…
  - Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS)
  - Giovanni in cloud - duplicates Zarr (variable based)
    - Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data
  - ZEP0005
  - Brianna is leading the GeoZarr work
  - VEDA - no. of things Zarr/STAC related going on in VEDA
- TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?)
  - BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes!
  - TN: From the slides - Xarray can take care of some of the stuff that Giovanni does
- TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake
  - BP: LakeFS is OS ArrayLake - no vendor lock-in
- SV: What does Giovanni actually do when you say, ‘it grooms metadata’?
  - BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services
  - BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata
  - TN: Can we do the flipping with Zarr/VirtualiZarr?
  - DB: If you flip at the store level - you'd need to find out the how deep you'd need to go
  - BP: Will try to make time standard across the datasets
  - BP: https://github.com/briannapagan/quirky-data-checker
- BP: from the Zoom chat
  - Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices
  - Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True
  - Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data
  - ZEP 0005: Zarr accumulation extension for optimizing data analysis
  - Looking into a GIS service for zarr stores
  - POWER https://power.larc.nasa.gov/data-access-viewer/
  - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
  - https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7
EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful
- DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document
- DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature
- EP: The bio and geo world have parallel tracks and working in silos
- EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363
  - DB: The challenge doesn't seems interesting to me! - convering JSONs documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem
- EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.