Zarr-Python Developer Meeting Notes

formerly Zarr-Python Refactor Meeting Notes

April 4, 2025

Davis Bennett / @d-v-b
Joe Hamman / @jhamman
Ian Hunt-Isaak / @ianhi
Sanket Verma / @sanketverma1704
Tom Augspurger /
Ryan Abernathey /
Josh Moore / @joshmoore
Tom Nicholas / @TomNicholas

Agenda

release?
(davis) learnings from tensorstore
(davis) A rough idea for a zarr-format-aware store API

Minutes

(ryan a.) update on the state of the spec w.r.t. extension names
Datetypes extension names
- datetime64
- timedelta
- string
The dtype plan:
- open issue in extensions repo for each new datatype
- get feedback on names / configuration
- split davis' pr into pieces
  - registry framework
  - new dtypes

March 21, 2025

Davis Bennett / @d-v-b
Joe Hamman / @jhamman
Ian Hunt-Isaak / @ianhi
Sanket Verma / @sanketverma1704
Kyle Barron /

Minutes

Core topic for the day: https://github.com/zarr-developers/zarr-python/pull/2874

Davis discovered a weird NumPy dtype in Windows
- Relevant NumPy issue: https://github.com/numpy/numpy/issues/9464

>>> from ml_dtypes import bfloat16
>>> import numpy as np
>>> np.zeros(4, dtype=bfloat16)
array([0, 0, 0, 0], dtype=bfloat16)

see https://github.com/zarr-developers/zarr-python/pull/2874#issuecomment-2701802998

March 7, 2025

Davis Bennett / @d-v-b
Josh Moore / @joshmoore
Joe Hamman / @jhamman
Tom Nicholas / @TomNicholas
Deepak Cherian / @dcherian
Tom Augspurger
Ian Hunt-Isaak / @ianhi

Minutes

Davis' Dtypes (https://github.com/zarr-developers/zarr-python/pull/2874)
- target: fixed length unicode strings
- also covering numpy dtypes
- and ml-dtypes
- Dtype wrapper class -> holds on to data needed to generate a dtype
- Planned reviewers:
  - Tom N
  - Nick (eye on custom dtypes)
  - Josh (eye on the spec)
Versioning policy (https://zarr.readthedocs.io/en/latest/developers/contributing.html#compatibility-and-versioning-policies)
- issue: https://github.com/zarr-developers/zarr-python/issues/2889
- started: https://github.com/zarr-developers/zarr-python/pull/2819
SciPy abstracts that went in
- Akshay - GPUs + Zarr
- Tom N. - Virtualizarr
- Joe - Icechunk
- Xarray Tutorial
- Ian - Xarray in Biology

February 28, 2025

Davis Bennett / @d-v-b
Josh Moore / @joshmoore
Joe Hamman / @jhamman
Ian Hunt-Isaak / @ianhi

Minutes

Josh and Davis on extension dtype naming
Davis is working on extension dtypes in zarr-python
- need to add support for parametric dtypes and extension dtypes
Akshay and co have been hacking on zarr/gpus this week

February 21, 2025

Davis Bennett / @d-v-b
Josh Moore / @joshmoore
Sanket Verma / @sanketverma1704
Ian Hunt-Isaak / @ianhi

Minutes

Yank the recent release: https://github.com/zarr-developers/zarr-python/issues/2852 due to a bug
https://github.com/zarr-developers/zarr-python/pull/2665 - would like to merge soon

February 14, 2025

Deepak Cherian / @dcherian
Josh Moore / @joshmoore
Norman Rzepka / @normanrz
Davis Bennett / @d-v-b
Ian Hunt-Isaak / @ianhi

Agenda

release?

February 7, 2025

Deepak Cherian / @dcherian
Josh Moore / @joshmoore
Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Max Jones / @maxrjones
Sanket Verma / @sanketverma1704

Agenda

codec/numcodecs issue: https://github.com/zarr-developers/zarr-python/issues/2800
Need reviews on:
- [reviewer=norman?] boundary chunk problem -> https://github.com/zarr-developers/zarr-python/pull/2784
- [reviewer=joe] batch creation PR -> https://github.com/zarr-developers/zarr-python/pull/2665
- [reviewer?] create_array explicit groups -> https://github.com/zarr-developers/zarr-python/pull/2795
- [reviewer=davis?] empty chunk contents -> https://github.com/zarr-developers/zarr-python/pull/2755
- [reviewer=david] scalar return type -> https://github.com/zarr-developers/zarr-python/pull/2718
- [reviewer=joe] zipstore pickling -> https://github.com/zarr-developers/zarr-python/pull/2807
- [reviewer=joe] obstore store -> https://github.com/zarr-developers/zarr-python/pull/1661
store hypothesis tests (Max ? for Deepak)
next steps for ObjectStore (https://github.com/zarr-developers/zarr-python/pull/1661)
- Docs: write a section here. Note that this store is experimental.
- Test coverage:

January 31, 2025

Deepak Cherian / @dcherian
Norman Rzepka / @normanrz
Josh Moore / @joshmoore
Akshay Subramaniam / @akshaysubr
Joe Hamman / @jhamman

Agenda

conda-forge?
Are V2 tests missing?
Feedback on https://github.com/zarr-developers/zarr-python/pull/2780
- Move function definitions from teh api modules into core, e.g. api.sync.create_array -> core.array.sync
- Pull out into its own PR
Strict parsing of metadata
Feedback on https://github.com/zarr-developers/zarr-python/pull/2751
- Explicit CPU buffers for metadata
- Basic GPU docs

January 24, 2025

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Josh Moore / @joshmoore
Sanket Verma / @sanketverma1704
Nick Byrne / @nenb
Norman Rzepka / @normanrz
Ryan Abernathey / @rabernat
Max Jones / @maxrjones
Yurii Zubov
Akshay Subramaniam / @akshaysubr

Agenda

dtypes
- Nick talked through some slides - https://github.com/nenb/zarr-dtype-presentation/tree/main
- NR:
  - Array-to-bytes codec
  - Endianess
  - Order is now a runtime config
  - zarr.core is private api, would need zarr.dtypes module or something
extensions
store API / tests
- Max looking for guidance on a handful of questions - https://docs.google.com/presentation/d/17eXBCI3WoI3pELVt_uyaWxDF2G2ovuiXoHB5wPd68z4/edit?usp=sharing

January 10, 2025

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz
Josh Moore
Sanket Verma / @MSanKeys963

Agenda

Object data type - https://github.com/zarr-developers/zarr-python/issues/2617
Next steps after 3.0
- variable chunking?
- deprecating more api?
- numcodecs thing that could use some thought / design work?
  - semi-circular dependency
    - what do we do with the v2 codecs
    - what do we do with the v3 things
  - future directions

January 3, 2025

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz
Deepak Cherian
Sanket Verma / @MSanKeys963

Agenda

3.0 release schedule update (Joe)
a. 3.0.0-rc.1 went out yesterday
b. we will publish and socialize the migration guide today
c. we will make the full 3.0.0 release on Thursday Jan 9 at 10a ET
-> during this time, we will focus on documenation and bug fixes (no major feture additions)
release announcement
a. Joe has written a blog post. The full zarr-dev team has comment access here: https://www.notion.so/earthmover/Zarr-Python-3-Release-Blogpost-14b492ee309f80d28af3ebfdeedf96f7
b. sanket will prepare a social media thread
shape of array after the addition of filters/compressors to top-level api
- davis is opening an issue on this
Norman will write a docs section on sharding

`create_array` API design notes

We are struggling to find a user-facing API for creating new arrays.
We have decided to create a new top-level API function (create_array)
to handle this but questions remain about how to provide a simple / intuitive
API that covers both v2 and v3 arrays, and sharded/non-sharded arrays in one API.
This short design note lays out the goals and options we are considering.

goals

provide a single function to create v2 and v3 arrays
make it easy to create sharded arrays
provide a way to configure codecs (ala compressors and filters from v2)

non goals

extending sharding to v2 array
?

current proposal





















async def create_array(
    store: str | StoreLike,
    *,
    name: str | None = None,
    shape: ShapeLike,
    dtype: npt.DTypeLike,
    chunk_shape: ChunkCoords | Literal["auto"] = "auto",
    shard_shape: ChunkCoords | None = None,
    filters: FiltersParam = "auto",
    compression: CompressionParam = "auto",
    fill_value: Any | None = 0,
    order: MemoryOrder | None = "C",
    zarr_format: ZarrFormat | None = 3,
    attributes: dict[str, JSON] | None = None,
    chunk_key_encoding: ChunkKeyEncoding | ChunkKeyEncodingParams | None = None,
    dimension_names: Iterable[str] | None = None,
    storage_options: dict[str, Any] | None = None,
    overwrite: bool = False,
    config: ArrayConfig | ArrayConfigParams | None = None,
    data: npt.ArrayLike | None = None,
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:

This function signature includes parameters that fall into the follwing categories:

Store parameters

store
storage_options

Runtime parameters

order
overwrite
config
data

V3-only parameters

dimension_names
shard_shape
chunk_key_encoding

Generic parameters

name
shape
dtype
chunk_shape
filters
compression
fill_value
attributes

**Note 1: the focus of this document is on the parameters that control how the core array metadata is configured.
**- shape

dtype
chunk_shape
shard_shape
compression
filters

Note 2: it may be worth grouping the parameters in create_array using a
similar framework to above. This will help users navigate this fairly large
parameter space.

Usage examples

minimal example w/o sharding:
this creates an array using default / inferred parameters for zarr_format, chunk_shape, etc., etc.
```
create_array(store=store, shape=(1000, 1000), dtype='f8')
```

create sharded array
_this creates a sharded array where chunks are compressed with Zstd

create_array(
    store=store,
    shape=(1000, 1000),
    shard_shape=(100, 100),
    chunk_shape=(10, 10),
    compressors=[ZstdCodec(level=3)]
    dtype='f8',
)

questions

what is the value/justification for providing arguments for filters and compressors instead of a single codecs parameter? Will we enforce that all filters are array->array codecs and all compressors are bytes->bytes?

@d-v-b –> this seems like the question we need to answer first. How will we des

(d-v-b): filters and compressors map on to the two types of variadic codecs allowed in the v3 codecs attribute. This makes those parameters simple to parse – filters must resolve to tuple[ArrayArrayCodec, ...], and compressors must resolve to tuple[BytesBytesCodec, ...]. I think we could have just 1 codecs parameter, but it would need to take a form that allowed separably specifying the ArrayArray and BytesBytes codecs. Something like this:
```
class CodecParams(TypedDict):
    filters: NotRequired[tuple[ArrayArrayCodec, ...]]
    compressors: NotRequired[tuple[BytesBytesCodec, ...]]
    array_serializer: NotRequired[BytesBytesCodec]
```
any missing keys would resolve to the defaults set in the config.

But if codecs was tuple[Codec, ...] then users would be confused, and parsing it would be a headache.

(NR): I like filters and compressors because imo they better convey what the codecs are used for instead of "array->array" or "bytes->bytes" codecs. We should enforce that only the right type is used for both kwargs.
what is correct type for the filters / compressors argument? Options include:

a. list of strings, e.g. ['gzip']
b. list of dicts, e.g. [{"name": 'gzip', "configuration": {"level": 4}]
c. list of objects, e.g. [GZipCodec(level=4)]

(b) and (c) seem like a reasonable choice.

(d-v-b) IMO the only option here is something that unambiguously represents a codec instance, which rules out a. If we can make constructing the dict representation of the codecs ergonomic (i.e., autocomplete), then I think b is a pretty nice option, because users don't need to import a bunch of classes to use the create function. but we should also accept the complete codec class instances as well, so c.

(NR): I like c best, but also fine with b. Agree that a is too ambiguous. I also cleaned that up for the default codecs https://github.com/zarr-developers/zarr-python/commit/5cb6dd8f62ad6ed5391a08535dc05ef9ac50bbad
How do we want to parametrize the partitioning of the array? Right now the PR in question takes two parameters, chunk_shape: tuple[int, ...] | Literal["auto"] and shard_shape: tuple[int, ...] | Literal["auto"] | None. In the interest of backwards compatibility and brevity I would support the names chunks and shards. An alternative API would be to have a single parameter, e.g., chunking, that takes:
- tuple[int, ...], (no sharding, regular chunking),
- A dict like {"chunks": tuple[int, ...] | Literal["auto"], "shards": Tuple[int, ...] | Literal["auto"]}
- and maybe more complicated types? This basically pushes complexity into a single parameter, but it's convenient given that chunk shape and shard shape have to be defined together.
(NR): I'd prefer chunks and shards. chunking: tuple[int, ...] | {"chunks": tuple[int, ...] | Literal["auto"], "shards": tuple[int, ...] | None | Literal["auto"]} would also be fine. Not really a fan of auto , though.

(JH): would it help reduce scope to remove auto chunking / chunk/shared alignment from this first version?

(DVB): I don't think the auto chunking / sharding adds a lot of complexity here, and I think it's a big win for usability to have some defaults that "just work" (whether the defaults in my pr actually "just work" is another question). As for auto, we need some way of expressing "pick chunks / shards automatically". Often we use None to mean "default", but if we are using shards=None to denote "no sharding", None can't mean "default" anymore, and we need to pick another value. I think auto is short and literate but I'd be up for alternatives.

December 20, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz
Deepak Cherian
Josh Moore / @joshmoore
Sanket Verma / @MSanKeys963
Akshay Subramaniam / @akshaysubr

Release topics:

top level api
- https://github.com/zarr-developers/zarr-python/pull/2463
- open_foo(mode=r) defaults
- remove read_foo()
- remove read()
documentation
- add to migration guide
  - use create_foo and open_foo functions
  - create() and open() will be deprecated soon
- new page in user guide on runtime config

December 13, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz
Deepak Cherian

Notes:

Davis worked on docstrings
Davis is working on concurrent array creation
Norman takes over write_empty_chunks
Joe will work on docs next week
Rename RemoteStore to FsspecStore
Array.__iter__ is slower compared to v2 because v2 loaded the entire array in memory upfront. not a release blocker
Next meeting next Wednesday 5pm CET, 8am PST

December 6, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz
Sanket Verma / @MSanKeys963
Deepak Cherian /

Notes:

Davis will be working on some of the blocked PRs
Joe will share a V3 blog post next week for review
Deepak has been working on tests

Discussion points for today:

✅ beta release -> https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-beta.3
Default codec -> https://github.com/zarr-developers/zarr-python/issues/2267
What's off spec?

string codecs and dtypes
consolidated metadata

November 29, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Norman Rzepka / @normanrz

Notes:

Concurrent members
Create array with sharding https://github.com/zarr-developers/zarr-python/issues/2170
Runtime config attribute on the Array and Group class
zarrs looks promising, great validation on the extensibility
Final release before the holidays, release right after new years

November 22, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Tom Augspurger / @TomAugspurger
Norman Rzepka / @normanrz
Sanket Verma / @MSanKeys963

Notes:

ChunkStore
ZipStore specification: https://github.com/zarr-developers/zarr-specs/pull/311
Should obstore-based Store be in zarr-python or its own package? https://github.com/zarr-developers/zarr-python/pull/1661
- Keep in zarr-python, make obstore an optional dep
- later: config what protocol to open in which store
Top-level sharding configuration in zarr.open etc.
- something like?: https://github.com/zarr-developers/zarr-python/blob/76904eac556a71817eb7ea2e54df703cba919a12/src/zarr/core/chunk_grids.py#L30C5-L37
- or add shards kwargs +10

November 15, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Tom Augspurger / @TomAugspurger
Sanket Verma / @MSanKeys963
Theodore Visvikis /
Josh Moore / @joshmoore
Akshay Subramaniam / @akshaysubr
Norman Rzepka / @normanrz

Notes:

Discussions on https://github.com/zarr-developers/zarr-python/blob/f74e53aca5311ec077da71585dd962c4af7b8a11/tests/test_api.py#L68-L78

November 8, 2024

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Tom Augspurger / @TomAugspurger
Sanket Verma / @MSanKeys963

Notes:

Joe is working on store stuff and needs help with review - Tom would help with the review
Davis would like this PR to be reviewed: https://github.com/zarr-developers/zarr-python/pull/2447

Discussion points

https://github.com/zarr-developers/zarr-python/issues/2412

November 1, 2024

Joe Hamman / @jhamman

October 25, 2024

Joe Hamman / @jhamman
Tom Augspurger / @TomAugspurger
Sanket Verma / @MSanKeys963

Notes:
- Updates from Tom — working on info, size and tree properties
- Joe - hopefully wrapping store mode refactor up today

October 18, 2024

Tom Augspurger / @TomAugspurger
Davis Bennett / @d-v-b
Ryan Abernathey / @rabernat
Joe Hamman / @jhamman
Norman Rzepka / @normanrz
Matt Iannucci / @mpiannucci
Akshay Subramaniam

Notes:

Updates
- Joe: getting icechunk out, interested in speaking about release blockers
- Davis: moving v3 tests, working on store api
- Norman: working on the numcodecs, sharding bug, filter/codecs for v2 arrays
- Ryan: worked on strings (out of spec), commited to dealing with spec problems (specifically on extensions)
- Tom: Xarray compat (probably ready to merge)
- Akshay: focusing on gpu compression codecs (nvcomp)
- Matt: working on getting v3 working with kerchunk and virtualizarr

Topics:

store api
- DB: two phases of IO: reading/writing chunks or initializing an array a group
  - mode was added to the store
  - pathalogical situations where clear() is happening on reopen
- NR: Be able to use StorePath in zarr.open, e.g. zarr.open(LocalStore("...", mode="a") / "testdata.zarr")
- JH: https://github.com/zarr-developers/zarr-python/issues/2359
v2 filters/codecs
- https://github.com/zarr-developers/zarr-python/issues/2325
- NR: hasn't done much here yet
- MI: same, just looked at the issue
  - codec naming
- NR: numcodecs codec namespace will be just for v3 arrays
  - we may also want to split the kerchunk filters into compressor/filter categories
- RA: what would give us more developer velocity here?
release blockers
extensions

October 11, 2024

Tom Augspurger / @TomAugspurger
Davis Bennett / @d-v-b
Sanket Verma / @MSanKeys963
Josh Moore / @joshmoore
Ryan Abernathey / @rabernat
Joe Hamman / @jhamman
Norman Rzepka / @normanrz

Notes:

Summary
- JH: xarray & dask test suites passing with v3.
- JH: milestone for a beta release
  - string PR should be included
- NR: doc sprint?
  - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Docs.20sprint.20in.20September.3F/near/474312033
Big items
- Strings (RA) - braindump
  - confusing how it (ever) used to work
  - np<2 had no notion of varlen str
    - fixlen (utf-8)
    - or object array
  - zarr allowed both as valid dtypes, e.g. u4 or object
  - now np has a varlen
  - question of dtype+codec
  - PRs merged
    - 236: new dtype "string" and "bytes". also zarrs implements them.
      - NR: v2 compatibility? pickling mode and another mode.
      - RA: two questions
        
        API: strings in zarr and pull them out. works as expected? (with this PR: https://github.com/zarr-developers/zarr-python/pull/2036)
        
        data on disk stored the same between v2 & v3 (without rewriting). believe so if using vlenutf8 codec.
      - RA: assumption is always vlen, and impl can use the appropriate in memory data structure (i.e., different in c)
        
        in python, decoding to varlen if np>2 (breaking from zarr v2); decoding to object if np<2
      - NR: zarr array v2 a year ago will that work? believe so. tom working on the v2 side of it.
        
        TA: https://github.com/zarr-developers/zarr-python/pull/2323
        
        supported once that is supported
      - NR: for v3, bit of concern since there's no spec. a ZEP? (see stuck https://github.com/zarr-developers/zeps/pull/47)
        
        RA: two components need to be addressed in the spec (dtype+codec; leaky abstraction?)
        
        …integer array interpreted as bytes?! unsure.
        
        RA: far from being able to make changes to the spec.
        
        JM: good to confer with John Kirkham
        
        JH: pickled dtypes is inoperable with other languages
      - NR: how do we want to deal with experiments?
        
        can't write specs without implementation and python is a great place to do that
        
        but maintain a few other implementations. not useful if the main implementation blazes ahead and everyone must follow
        
        suggest we be cautious about that.
        
        various ways to handle that
        
        previously environment variables
        
        issue a warning on non-standard dtypes
        
        some discussion then it'll be fine
        
        RA: see https://github.com/zarr-developers/zarr-specs/pull/312#issuecomment-2407444223
        
        SV: Isaac stepped down from https://github.com/zarr-developers/zeps/pull/47 (no one to steer)
        
        RA: different opinion – just everything through extensions.
        
        trying to get to parity. can we do the same things.
        
        still struggling to evolve it. we decided that it's unversioned, so it's unchangeable.
        
        would need to move to v4.
        
        immutability was to be balanced by extensions.
        
        haven't managed to develop a robust ecosystem/process for extensions. ZEPs have failed. nothing adopted. we can't agree…
        
        extensions and then let's go make them
        
        see TA's https://github.com/zarr-developers/zarr-specs/issues/316
        
        let them be free
        
        practical way forward
        
        DB: tried to address that in https://github.com/zarr-developers/zarr-specs/pull/312
        
        are we willing to change the parts of the spec that are blatantly contradictory
        
        if it's immutable, so be it.
        
        RA: clarifications are ok
        
        NR: state of zarr-specs is terrible. ZEPs are a symptom. people are fatigued. process broken.
        
        spec core team is a good path.
        
        will have the same issue treating everything as an exception. names need to be coordinated. two string dtypes?!
        
        who controls the namespace. need a process. even a repository, pypi style.
        
        JM: feedback on zarr versioning from other implementations
        
        RA: namespacing
        
        extensions need to be namespaced. URI ok. absolute. resolves to the document.
        
        need to figure how what are the extensions and what's their scope
        
        2 different extensions (different URIs) that define the same codec. that's ok.
        
        make whatever changes are needed to have that process, socialize it, etc. shutdown ZEPs.
        
        NR: that's not how they were meant to work in v3. extension points. let's you create, e.g., codecs. (nothing wraps it)
        
        RA's is a new concept. might work. there might be issues with composability.
        
        comfortable in zarr-python if you need to actively opt in.
        
        RA: ask everyone if the original intention of zarr-spec work in practice.
        
        haven't been able to make forward progress. incorporate learning
        
        face reality of how things work in the real world and adapt.
        
        look at others where it's working
        
        JH: laser focused on getting zarr-python out
        
        can set config (don't need environment variable). go off-road
        
        consoldiated metadata, few codecs, etc. to add (not much more coming)
        
        SV: lack of implementations was definitely an issue. Tried to work that into https://github.com/zarr-developers/zeps/pull/59
        
        JH: on spec process, v3 is in accepted not final. missed that date by a year.
        
        reasonable to say that changes are going to have to be made
        
        change the status of the spec for a while?
        
        SV: previous conversation about when to set it to final
  - Technical things
    - Beta release (JH): ok when smashing merge on string PR?
      - NR: v2 filters in beta or after? JH: weird kerchunk
        
        RA: to make xarray work we had to special case everything ("working" isn't accurate)
        
        convenience for our users
        
        NR: backwards compatibility. (lack of) v2 spec are out in the wild that we have to define indefinitely. (see extensions above)
        
        2 camps: people that think it through for a long time and the others that want to wage ahead. a tension that we have to work out.
        
        JH: filters just land in the array metadata. never seen that in the wild.
    - Endianness (DB)
      - https://github.com/zarr-developers/zarr-python/issues/2324
      - no longer part of the dtype. if someone create endian whatever, then the zarr array doesn't report it (have to check the codec)
      - creating a new array then it will get drop the endianess
      - don't care? in memory representation is decoupled from how it is stored.
      - NR: yes, that's what we did in zarrita. you can control how it lands out, but not how it is read into memory.
      - RA: prevent memmapping data? NR: that's what we have the metadata for. RA: can imagine an impl without codecs that wants memmap to access the data (though zarr-python doesn't work that way) NR: zarr-spec requires a bytes codec which defines the endianess.
      - JH: if you get a big endian array (e.g., zarr.save(np.array)) … round-tripe so you get a big-endian back out the other side
      - NR: zarr.save() would need to handle.
      - JH: yes, interpret at the top-level and do something smart about the bytes codec
      - DB: if we keep it, what was the point of parameterizing it in the codec.
      - NR: compatibility, you need to store it somewhere
      - DB: what comes out is undefined.
      - RA: use platforms preferred endianness
      - DB: then users won't round-trip. won't come out in the chosen endianness
      - NR: struggled with this, but it is an implementation detail (incl. exposing it to the user). matters only for some performance issues.
      - DB: what is the dtype of a zarr array relative to what the user puts in.
      - NR: zarr only cares about how it looks on disk (not in-memory)
      - JM: zarr_array.dtype calculated from zarr data_type and checking the codec ("dynamic dtype")
      - NR: need to check the read path and what it is doing
      - RA: similar to the strings that it's coupling dtype and codec
      - DB: in v4 would like to see codec & dtype together
        
        also want to put shape and chunk together (JM: plus shard)

October 4, 2024

Tom Augspurger / @TomAugspurger
Davis Bennett / @d-v-b
Sanket Verma / @MSanKeys963
Josh Moore / @joshmoore
Ryan Abernathey / @rabernat

Notes:

Array metadata refactor needs some mypy fixes (or we accept overriding)
Discussions on https://github.com/zarr-developers/zarr-python/pull/2272 (Davis)
- https://yarl.aio-libs.org/en/latest/
- DB: not making stores mutable, but they do IO!..
- RA: inconsistency in the definition of where a store begins
  - can some stores disallow starting from inside?
- Similarity between store and URL (abstractly speaking)
- Ryan: would suggest (also for sharding) to use stores?
  - Josh: but how to bootstrap?
Strings (Ryan)
- https://github.com/zarr-developers/zarr-python/pull/2278
  - define a string dtype on a zarr DataType rather than np
- https://github.com/zarr-developers/zarr-python/pull/2036
  - just implemented and then do the spec later
  - JM: try this out as a community codec (i.e. extension)?
    - then can discuss a ZEP to make a core codec.
  - SV: know users who are interested. (feel for core or not)
    - e.g. geopandas
DataClasses (Tom)

September 20, 2024

Joe Hamman / @jhamman
Tom Augspurger / @TomAugspurger
Davis Bennett / @d-v-b
Sanket Verma / @MSanKeys963
Akshay Subramaniam

Notes:

Storage transformers - decision: error when creating an array - https://github.com/zarr-developers/zarr-python/pull/2180
Dtype validation for v3 - https://github.com/zarr-developers/zarr-python/pull/2209
Fill value validation for v3 - https://github.com/zarr-developers/zarr-python/pull/2216
Consolidated metadata discussion
Xarray integration
- https://github.com/pydata/xarray/issues/9515
Dask integration
- https://github.com/dask/dask/pull/11388
Doc sprint: https://github.com/zarr-developers/zarr-python/issues/2215
- What modules are we targetting for the sprint?
  - Create issues for different modules so that folks self-assign?
- Some functions have docstring but missing a code sample - should we have code sample for docstring?
  - highest priority:
    - zarr.Group
    - zarr.Array
    - zarr.api.synchronous
  - next tier:
    - zarr.AsyncGroup
    - zarr.AsyncArray
    - zarr.api.asynchronous
    - zarr.storage
    - zarr.metadata
- Should we also plan for tutorials?
consider removing chunk_shape kwarg

September 13, 2024

Attendees

Joe Hamman / @jhamman
Tom Augspurger / @TomAugspurger
Davis Bennett / @d-v-b
Sanket Verma / @MSanKeys963

Notes

Updates
- Davis is happy about recent improvements to
on people's minds:
- Davis is going to look at the synchronizer api
- Sanket: Doc sprint in September? Dates? How many days? Async?
  - Let's try for Sept. 30-Oct 1
  - Yes, Async with a kickoff on Sept. 30
- Tom: consolidated metadata is getting pretty close
  - reworking metadata layout
  - first iteration will support reading/writing v3 consolidated metadata and reading v2
  - should be possible to write new v2 metadata as well
  - will need to do more thinking on future proofing for metadata schemas
  - also thinking about the maximum depth of consolidation
- storage transformers issue: https://github.com/zarr-developers/zarr-python/issues/2178
  - may need to update the spec lanaguage around optional metadata fields

import zarr

kwargs = zarr.codecs.make_sharding_pipeline(
    read_chunks={...},
    write_chunks={}, 
    compressor=Gzip(),
)

zarr.create_array(shape=(...), **kwargs)

August September 6, 2024

Attendees

Joe Hamman / @jhamman
Tom Augspurger / @TomAugspurger
Norman Rzepka / @normanrz
Josh Moore / @joshmoore
Davis Bennett / @d-v-b
Akshay Subramaniam

Notes

https://github.com/orgs/zarr-developers/projects/5/views/2 big lifts?
- d-v-b: shape for sharding? does it have to change for 3.0.0?
  - shape is currently dependent on which codec
  - should encourage thinking about it as a new interpretation of chunking
  - JH: define some preset pipelines? NR: similarly. doesn't have to change the array. i.e., top-level API.
  - DB: people want easy access to the configuration for looping
  - JM: .writing_chunks to go with .reading_chunks. (would dask also adopt?)
  - DB: agreed, might be the right level of detail for users
    - would also help to guard against other implementations (transformers, etc.)
  - NR: also produce an ergonomic way of creating them
  - DB: you'd also want to pass as an argument
  - NR: ok, and doesn't have to be set forever.
  - JH: xarray/dask zarr-readers didn't need the attribute. (just from_array needs as an argument)
  - JM: default? which one wouldn't fail.
  - JH: default today is write chunk? NR: yes. but can be too big.
- consolidated metadata
  - JH: reading/writing v2 metadata as a blocker
  - TA: writing, too? kinda yeah.
  - TA: status update
    - pretty straight-forward
    - issues with del item: do we synchronize out to the consolidated? (i.e. doing more IO)
    - relationship between group & store objects is just "call save metadata"?
  - JM: writing down the v2 schema in the v3 (since no v2 process)
  - JH: just do it in the v2 schema. people are using it.
- docs (NR)
  - sprint? still happening.
  - issue raised about the formatting. not using the left pain. (sad & empty)
- synchronizer API (JH)
  - issues ("it doesn't work")
  - hot potato: v2 has one but without distributed version
  - DB: how does it plug in in V2?
  - JH: mucked up the v2 code _set_item_nosync
  - DB: property of an array (i.e. high-up in the API)
  - JH: could go further down. store level?
  - DB: every store has a locking class
  - JH: zip store requires thread/asyncio locks (not-merged)
  - NR: not using synchronizers
  - JH: frequent bug reports in xarray
  - DB: does it have to live at arrays and groups because stores didn't know the key names
    - does it tie in to having the names knowledge in the store?
    - JH: possibly a high-level and a low-level store API
    - NR: higher-order store so that you can compose them
      - zip store always has it
- mutable mapping
  - JH: use memory store to adapt anything (no async stuff though)
- GPU (AS)
  - merged
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
  - testing with the codec interface. things look to be working.
  - JH: v3 branch works. other big lifts for 3.0.0?
  - batched store API. few minor issues. (rust under the hood?)
  - JH: definitely lots of small chunk calls. add to store API (bunch of keys)
    - DB: allow fetches to run out of order then it changes the API
    - gather runs sequentially
    - NR: async iterator? or as_completed
    - JH: streaming approach is the most powerful but almost most complicated
    - JM: add in delete and then it's approaching transactions
    - DB: no lazy execution model right now. leverage futures?
    - AS: gpu batch in kvikio does that, collects all the futures and then waits
    - DB: path is open for that. (if we're leave the mutable mapping API) not too painful.
    - AS: also async events needs separate codec pipeline. effects more.
    - DB: (dreaming) if txn as a context manager, then it could take a region
    - NR:
      Image Not Showing Possible Reasons
      The image file may be corrupted
      The server hosting the image is unavailable
      The image path is incorrect
      The image format is not supported
      Learn More →
- DB: docs as the most important
  - NR: agreed
  - DB: pay attention to what sucks.
  - NR: migration guide.
- CLI tools (convert v2 to v3) - DB
  - NR: not difficult for small arrays
  - TA: zarr v3 metadata refer to v2 data?
  - NR: most of the time. only if the codec is compatible. zarrita had a function for that. could do that.
time-permitting (Josh)
- impl tests, netcdf-c, bluesky/tiled

August 30, 2024

Attendees

Joe Hamman / @jhamman
Sanket Verma / @MSanKeys963
Akshay Subramaniam /
Davis Bennett / @d-v-b
Josh Moore /
Tom Augspurger / @TomAugspurger

Agenda

alpha release last week
2.18.3 is close (maybe today)
GPU PR is in
lots of stale PRs
async / sync boundary in store
- look at how tensorstore does this
  - probably pass the store name and a config dict?
- dvb: having users instantiate a store is kind of an anti pattern
  - want more or a declarative pattern
- as: could be useful to decouple protocol from store api
  - like what we have w/ codecs
consolidated metadata
- https://github.com/zarr-developers/zarr-specs/pull/309
- https://github.com/zarr-developers/zarr-python/pull/2113
- discussion about store api
  - any changes to the on disk format are a spec change
- discussion about cache consistency and invalidation
back to attrs? or something else?
- serialization of metadata is really hard
- Tom is looking at something here -> https://github.com/TomAugspurger/zarr-python/blob/feature/serde/src/zarr/_serialization.py
- probably don't need to go to attrs

August 23, 2024

Attendees

Joe Hamman / @jhamman
Sanket Verma / @MSanKeys963
Josh Moore / @joshmoore
Norman Rzepka
Akshay Subramaniam
Gustavo Hidalgo

Agenda

https://github.com/zarr-developers/zarr-python/pull/2102
- NR: important to have a written document
  - OME is also interested in support for reading v2 data
  - may be good to remove the v2 module asap
- JM: crux is supporting v2 and v3 data
  - does it make sense to create a zarr3 library
- NR: not a fan of the zarrv3
  - discoverability and asthetics are not great
  - pitch weekly alpha releases
  - need to do the work and get the release out
- JM: when do we go from alpha to beta to full release
- SV: also address these questions: https://github.com/zarr-developers/zarr-python/discussions/2093#discussioncomment-10429985
alpha release frequency
- proposal: weekly release on Monday
consolidated metadata
- https://github.com/zarr-developers/zarr-specs/issues/136
- JM: no problem supporting this for v2 ala 2.*
  - add something that supports v2 data
  - add zep for v3
RemoteStore
- PR1956
- blocking: writing is completely broken because the exist method
- doing naive synchronous user thing
  - open_array(s3://...)
  - using sync in user code
  - accessing fsspec directly
- store = await MyStore.open('s3://foo')
- store = sync(MyStore.open('s3://foo'))
- store = MyStore.open_sync('s3://foo'), loop=...)
- sync_store = SyncWrapper(MyStore, 's3://foo')
  - sync_store.set(filename, bytes)
GPU array progress
- squashing bugs around merge conflicts
- GPU CI is working now, need to sort out liminiting the size of the matrix and installing cupy

August 9, 2024

Attendees

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Sanket Verma / @MSanKeys963
Eschal Najmi

Agenda

https://github.com/zarr-developers/zarr-python/issues/2008

PR updates
- v2/v3 metadata: https://github.com/zarr-developers/zarr-python/pull/2059 would love to see this merged –Davis
- picklable classes: https://github.com/zarr-developers/zarr-python/pull/2006
- GPU support: https://github.com/zarr-developers/zarr-python/pull/1967
  - blocked by GPU runner on GitHub
- also 2064, 2065 are close
set a docs sprint date
- target late september

July 26, 2024

Attendees

Joe Hamman / @jhamman
Norman Rzepka / @normanrz
Davis Bennett / @d-v-b
Hannes Spitz / @brokkoli71
Gustavo Hidalgo / @ghidalgo3
Sanket Verma /

Agenda

API surface
- Array API: https://github.com/zarr-developers/zarr-python/discussions/2052
- API survey: https://docs.google.com/spreadsheets/d/1ev4Hj_YU-QCiZJuxRYMrBqdrYYqP3tIdnYGmp9saJS8/edit?usp=sharing
  - https://github.com/zarr-developers/zarr-python/issues/2037
Second alpha relase: https://github.com/zarr-developers/zarr-python/issues/2008

Notes:

sharding is complicating the .chunks attribute on Arrays

ideas

Array.chunks -> tuple[int]  # raise if variable chunks or sharding
Array.chunk_grid.read_chunks()  # or inner_chunks or chunks
Array.chunk_grid.write_chunks() # or outter_chunks or shards

sharding configuration is pretty complicated today
- template module
- zarr.open_array remains the top level API where we can do user-friendly things
- the Array.open method remains a low level entrypoint
async codec api
- sharding is the only codec that needs to be async / do IO
- NR: today we get scheduling in a threadpool
- assumption that codecs release the GIL
- need to do performance testing

Deprecate in 2.18.3

h5py compat methods

TODOs from this meeting

performance test suite
- dask + threaded scheduler
GPU runner billing

July 12, 2024

Attendees

Ryan Abernathey / @rabernat
Norman Rzepka / @normanrz
Davis Bennett / @d-v-b
Akshay Subramaniam / @akshaysubr

Agenda

What to do with numcodecs?
- Make a release, needs docs for
Move more codecs specs into Zarr

June 6, 2024

Attendees

Joe Hamman / @jhamman
Juan Nunez-Iglesias / @jni

May 30, 2024

Attendees

Joe Hamman / @jhamman
Norman Rzepka / @normanrz
Davis Bennett / @d-v-b
Akshay Subramaniam
Max Jones / @maxrjones

Agenda

Upcoming alpha release

Quick topics:

Norman, do we have an accessible api for extracting a shard index?
chunkstore API
- joe: ask Martin
MemoryStore has Buffer objects in it :(
out kwarg

Notes

Joe
- Store open mode is in, but incomplete
- Top level API is functional but needs a bunch of work
- Working on sharding codec, using fsspec branch + top level API branch
  - slow for now
Norman
- working on indexing, tests are working
  - stuck on typing
  - ready early next week
Davis
- Store tests for Martin to get fsspec
- Hierarchy api
- codec pipeline API
- typed dicts for metadata objects
Akshay
- On vacation, keeping track of Buffer/Indexing PRs
Max
- No updates, can contribute to the v3.0.0 docs task (starting with dev docs)

May 23, 2024

Attendees

Joe Hamman / @jhamman
John Kirkham / @jakirkham
Juan Nunez-Iglesias / @jni
Sanket Verma / @MSanKeys963

Agenda

Upcoming alpha release
Joe's demo of new features: https://gist.github.com/jhamman/8381dd971d928bf220405057107562b1

May 17, 2024

Attendees

Joe Hamman / @jhamman
Norman Rzepka / @normanrz
Davis Bennett / @d-v-b
Max Jones / @maxrjones

Agenda

Outstanding design topics for 3.0.0.alpha - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3A"design+discussion"
Additional topics to consider before 3.0.0 (more deprecations may be desired)
- synchronizers? or move to design topic
  - move sync.py to new module
- object arrays? need a plan here
  - open issue
- meta_array
  - assign to nvidia folks
  - maybe move to config
- consolidated metadata (v2 and v3)
  - joe to take on
  - no support for v3
- write_empty_chunks
  - runtime array configuration
Test sprint soon?

Notes

release alpha next week, need top level api
chunks attribute
- for now, regular chunk grid
indexing
- oindex, vindex, integer, …
- https://zarr.readthedocs.io/en/stable/tutorial.html#indexing-with-coordinate-arrays

May 8, 2024

Attendees

Joe Hamman / @jhamman
Davis Bennett @d-v-b
Norman Rzepka / @normanrz
Sanket Verma
Alden Keefe Sampson (AKS)
Akshay Subramaniam
John Kirkham

Notes:

Progress update (project board)
numcodecs codecs: numcodecs#524
zstd in numcodecs needs a review: numcodecs#519
HybridCodecPipeline (interleaved with configurable batch size) needs a review: #1670
Runtime configuration? #1772
Batched store discussion
Store metadata methods: zarr-python#1851
Initial NDBuffer implementation: zarr-python#1826
Proposed new meeting times
- week 1: Friday 7a PT
- week 2: Thursday 3p PT

Notes:
Major updates

JH:
- implicit groups are gone :)
NR: codecs are getting into a good place
- new rev on batched pipeline
DB:
- out last week, getting back into it
- group tests
- need a decision about removing v2 code paths
  - should go now
AKS:
- open PR generalizing array types

TODOs:

add tests for v2 and v3 arrays

April 24, 2024

Attendees

Joe Hamman / @jhamman
Davis Bennett @d-v-b
Jack Kelly / @jackkelly
Ryan Abernathey / @rabernat
Max Jones / @maxrjones
Sanket Verma
Akshay Subramaniam
Norman Rzepka / @normanrz

Notes:

Progress update (project board)
Codecs
- Norman needs a review on #1670
Store API #1806
- discussion around batch vs interleve API
- someone could look at https://github.com/zarr-developers/zarr-python/pull/1661
Group API

April 22, 2024

Attendees

Joe Hamman / @jhamman
Norman Rzepka / @normanrz
Josh Moore / @joshmoore
Sanket Verma
John Kirkham
Martin Durant
Ryan Abernathey
Davis Bennett
Akshay Subramaniam

Excused: Juan Nuñez-Iglesias

Goals

Make sure we're all on the same page with what has been going on with the project
Organize around v3 efforts

Agenda

Recent efforts
- Updates to core team (JH)
  - Moved some to emeritus status, etc.
  - We should work to get more core devs. (Lots to do)
  - RA: candidates? JH: let's get people making commits for a while.
- meeting (JM)
  - propose to make it the regular meeting but find a time where everyone can join.
  - all aboard
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
- Zarr-Python 3 update (JH)
  - Design doc
  - Progress update (project board + notes below 👇)
  - Ambitious release schedule (#1777)
    - loose plan: May/alpha, June/release
    - roughly following the Pydantic 2 model (breaking API changes)
    - JM: all for getting pre-releases out
    - NR: need to get rid of the v3 folder (messes things up)
  - support/v2 branch may be useful (JM)
    - JM: we probably should be pushier
    - JH: sure, just our 100% confidence may lag
    - NR: define a window, e.g., by the end of the year everyone should move.
  - JH: pinned issue with the release plan? Yes.
    - https://github.com/zarr-developers/zarr-python/issues/1777
  - v3 update
    - DB:
      - v3 metadata is done, i.e., can create spec compliant v3 arrays
      - working on groups that would work as expected, e.g., listing children (one of 2 big PRs). nearly done. required getting into the async implementation which is one of the biggest changes for the storage layer. Also means that we're not able to just paste in old code.
      - high-level convenience APIs are not there
      - only the nucleus of a testing strategy. using a different strategy from v2. bringing in what we can.
    - NR:
      - codecs are pretty advanced. async …
        
        MD: that means thread pools? NR: Yes. They can choose how they do that.
        
        JH: core part of that is in the v3 release that will spend time on async/threading/scheduling. Lots of new behavior that we're going to learn about it. But now we have an API that can be tuned.
      - arrays are missing indexing
      - documentation is largely open. (pushed to post alpha for the launch)
    - JH:
      - on our way towards having 100% type hints
      - abstract base classes for the Store and Codecs that allow people to implement their own (outside of zarr-python) with an entrypoint system. perhaps something for chunk grids as well
      - store is no longer a mutable mapping but a custom class. list methods are async generators, etc. all synchronization happens upstream. Synchronize wrapper of Arrays and Groups, but wait until you're at the top-level API for sync.
      - build is cleaner. using hatch.
      - need to discuss numcodecs. currently isn't taking part in the protocol system. what does it mean to Zarr going forward?
        
        https://github.com/zarr-developers/numcodecs/issues/502
    - Discussion
      - JK: documented path for upgrading? JH: no, there's an open ticket. Need docs on upgrading code but also migrating data, e.g., metadata only changes. (Alistair did this for v1->v2).
      - MD: need to discuss what kerchunk is going to do. it will take some pretty deep working. the style of where the metadata is has changed (along with the codeces). JH: yeah, no filters, all one pipeline. i.e., just the metadata. MD: more involved (i.e. goes deeper) into Zarr then other things. RA: zarr data model that is independent of the spec could be super useful. DB: don't think there is an overlap of v2 and v3 arrays. i.e., it would be a UNION. you need to map between the names and the types. don't get that for free just with the hierarchies. RA: don't do it once off, but build something re-usable and then serialize those. JH: clearly separated metadata from the classes. can turn one dataclass into another dataclass. (work to be done)
      - DB: spec allows v2/v3 things to be mixed, so a coroutine of some form may need to be opinionated about what it prioritizes. JH: good point, since you might have to look for 4 things, or prioritize one or the other. we should just be clear and then let people suffer the consequences. RA: have some shim functions similar to the current open() which keeps things working. JH: zarr.open has a version flag. None could mean do both.
      - implicit groups (DB) basically anything is a group even if it has garbage in it. NR: haven't seen anyone who is against removing it.
      - DB: if so, also make mixing versions disallowed? JM: can we allow a complete mixing? JH: don't want to be polymorphic about children. DB: can't forbid having .zattrs in a v3 group/array. Agreed.
      - JM: if need be, can try to organize a ZIC meeting with SV.
    - Numcodecs (NR)
      - Opened PR today if someone wants to review that, but more generally where are we with numcodecs
      - have specialized codec classes in v3 branch. arrays-to-bytes, etc. etc. Different classes from in numcodecs. Do use it under the hood.
      - for v2 support in the v3 branch we use the code unchanged. we ask numcodecs to do it for us. we could pull that into the v3 arrays which would give us support for batching, async, etc. we will likely need some glue code. (that's with minimal effort). Do we move numcodecs in a direction such that it uses v3 abstract classes.
      - DB: I like the idea of their being a repo on github for people to go to. numcodecs should exist where we have these compression routines at a low level. It should be there to support zarr.
      - NR: closed list. How do we handle that?
        
        DB: spec says that it just needs a URL.
        
        NR: what if two implement blosc2 differently.
        
        DB: people are going to do what they want.
        
        NR: make it more difficult? or use the github URL to prefix?
        
        JH: raise a warning that users can turn off if they aren't using an approved list of codecs. for experimentation, we definitely want to make it possible (and easy). That's what zarr-python is known for.
        
        DB: what's the advantage of enumerating a list of codecs?
        
        NR: when creating an implementation, you can just follow the list.
        
        JM: allows us to say, "this implementation is not complete". plus also
        
        Image Not Showing Possible Reasons
        The image file may be corrupted
        The server hosting the image is unavailable
        The image path is incorrect
        The image format is not supported
        Learn More →
        
        for a schema where possible.
        
        NR: ok to have optional codecs
      - JH: open issue with tifffile of a missing default flag (size parameter?)
        
        https://github.com/cgohlke/tifffile/issues/211
      - RA: a way out of this is to outsource as much as we can with blosc, has an ambition of being a meta compressor
        
        AS: blosc as the main library is that it also has sharding etc. under the hood. better IMO to just expose the compression stream formats. blosc is less flexible than numcodecs currently. (more difficult to add new compressors or options)
      - AS: gzip links to RFC not an implementation, i.e., a specific stream format. this is also an issue with numcodecs lz4. would be good to have these written down and link to a spec.
      - Continue conversation on https://github.com/zarr-developers/numcodecs/issues/502
- NASA Funding Opportunity (JH)
  - Planning to submit a LOI next week
  - targetting Zarr-Python, v3 feature development, and support at least 3 years
TODOs
- find a new time for the bi-weekly meeting. becomes zarr-python dev meeting but open invitation to anyone who would want to join.

Notes

April 10, 2024

SV: Zarr-Python B&P meeting discontinued - rename refactor meeting to 'Zarr-Python meeting'?
JH: updates since last meeting
- Project board
- Many new issues!
- Discuss timeline #1777

Active work

DB: removing old v3
JH: move v3 dir to root, remove v2 stuff
JH: list_* (AsyncGenerator[str])
- stream through members: https://github.com/zarr-developers/zarr-python/pull/1782/files#r1558820360
- https://stackoverflow.com/questions/78301926/asyncio-creating-a-producer-consumer-flow-with-async-generator-output
AS: generalized NDarray support
- two options for the design in https://github.com/zarr-developers/zarr-python/issues/1751
- can we use c++ for this? will make zero-copy memory sharing easier
- Qs: what does it mean for development process and pyodide support?
MJ: merged CI updates
- looking for the next thing

Apr 5, 2024

Davis Bennett / @d-v-b (DB)
Norman Rzepka / @normanrz (NR)
Joe Hamman / @jhamman
Deepak Cherian / @dcherian

Todo list

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Note: the topics listed below have been converted to issues and placed on the v3 project board: https://github.com/orgs/zarr-developers/projects/5

p0 - must happen now
p1 - must happen before alpha release (target first week of May)
p2 - must happen before 3.0 release (target early June)
p3 - nice to have, can happen after 3.0 release

Arrays
Groups
- [p1] implement members / children
- [p3] (reach) declarative hierarchy API
Store
- [p0] Finalize store API
- [p1] Deprecate stores we don't want in Zarr-Python core anymore
- [p1] remote store support (s3, gcs, azure, http)
- [p3] request coalescing: where to implement?
- [p3] Storage transormer API – do we need it now (or ever)?
- [p3] (reach) http proxy - may not need to be in zarr-python
- [p3] (reach) caching bytes-chunks in the store, caching array chunks in the array
Tests
- [p1] Bring in as much of the existing test suite in as possible
- [p1] Test serialization of Arrays
- [p1] Add test that instruments traffic to the store – we should be very careful to only read what is needed
- [p2] Develop integration test suite in Zarr-Python – needed to validate new async tooling (could be xarray and Dask)
- [p2] [Coordinate downstream testing (e.g. Dask + Xarray)]
- [p3] Develop performance test suite in Zarr-Python
- [p3] Add hypothesis test hooks
Docs
- [p2] Update developer docs
- [p2] Update API docs
- [p2] Update tutorial docs
- [p2] Write a doc about how Zarr-Python thinks about consistency and how it opperates when concurrent writers are acting on a store/group/array/chunk
- [p2] Start a release doc summarizing the major changes in V3
- [p3] docs for extending zarr
  - how to write a custom store
  - how to subclass array / group (e.g., to have typed attributes, typed members)
  - How to get good performance
Misc
[p2] Add logging throughout
Migration
- [p3] 2 -> 3 conversion cli, (maybe in its own repo)
- [p1] remove v2 code

March 27, 2024

Davis Bennett (DB)
Alden Keefe Sampson (AKS)
Norman Rzepka / @normanrz (NR)
Sanket Verma / @MSanKeys963 (SV)
Akshay Subramaniam / @akshaysubr (AS)
Max Jones / @maxrjones (MJ)
Raphael Hagen / @norlandrhagen (RH)

Meeting notes:

Sanket: Bi-weekly meeting ends on May 1st, 2024 - shall we continue after that?
- Yes! Schedule it until June end! - DONE
DB: Fleshing out the group API in v3 https://github.com/zarr-developers/zarr-python/pull/1726
NR: We need to find a common understanding of what we still need to work on for beta release. NR will create a tracking issue.

Akshay: Generalized array support

Where to create issue to track this? zarr-python or zarr-specs? Any direction for structuring the issue and proposal?
Create a native zarr NDArray class for typing and to interface with existing protocols. This includes:
- Buffer protocol
- __array_interface__
- __cuda_array_interface__
- DLPack
- Raw pointers

namespace zarr
{

namespace py = pybind11;
using namespace py::literals;

class Array
{
public:
Array(zarrArrayInfo_t* array_info, int device_id);
Array(py::object o, intptr_t cuda_stream = 0);

py::dict array_interface() const;
py::dict cuda_interface() const;

py::tuple shape() const;
py::tuple strides() const; // Strides of axes in bytes
py::object dtype() const;

zarrArrayBufferKind_t getBufferKind() const; // Device or Host buffer
py::capsule dlpack(py::object stream) const; // Export to DLPack

py::object cpu(); // Move array to CPU
py::object cuda(bool synchronize, int device_id) const; // Move array to GPU

const zarrArrayInfo_t& getArrayInfo() const
{
return array_info_;
};
static void exportToPython(py::module& m);
};

} // namespace zarr

Interoparability with Numpy

ascending = np.arange(0, 4096, dtype=np.int32)
zarray_h = zarr.ndarray.as_array(ascending)

print(ascending.__array_interface__)
print(zarray_h.__array_interface__)
print(zarray_h.__cuda_array_interface__)
print(zarray_h.buffer_size)
print(zarray_h.buffer_kind)
print(zarray_h.ndim)
print(zarray_h.dtype)

Interoparability with Cupy

data_gpu = cp.array(ascending)
zarray_d = zarr.ndarray.as_array(data_gpu)
print(data_gpu.__cuda_array_interface__)
print(zarray_d.__cuda_array_interface__)
print(zarray_d.buffer_kind)
print(zarray_d.ndim)
print(zarray_d.dtype)

Convert CPU to GPU

zarray_d_cnv = zarray_h.cuda()
print(zarray_d_cnv.__cuda_array_interface__)

Convert GPU to CPU

zarray_h_cnv = zarray_d.cpu()
print(zarray_h_cnv.__array_interface__)

Anything that supports the buffer protocol

with open('file.txt', "rb") as f:
    text = f.read()

zarray_txt_h = zarr.ndarray.as_array(text)
print (zarray_txt_h.__array_interface__)

zarray_txt_d = zarray_txt_h.cuda()
print(zarray_txt_d.__cuda_array_interface__)

March 13, 2024

Joe Hamman / @jhamman (JH)
Alden Keefe Sampson (AKS)
Norman Rzepka / @normanrz (NR)
Sanket Verma / @MSanKeys963 (SV)
Akshay Subramaniam / @akshaysubr (AS)
Max Jones / @maxrjones (MJ)
Agriya Khetarpal / @agriyakhetarpal (AK)

Meeting notes:

Alden / top level API
- https://github.com/zarr-developers/zarr-python/issues/1598#issuecomment-1994729420
- In the v3 library, are the top level methods
  - the primary way users interact with the library? or
  - the smoothest on ramp for v2 library users into v3? (and Array.xxx and Group.xxx become primary)
  - something else?
  - Notes:
    - Joe's thought: We want to provide a pretty similar interface, help massage or raise errors when args not compatible. We can start deprecating and changing behavior
    - Norman: like Array. and Group entrypoints, but we need to have these top level entry points. promote the Array and Group classmethods in the docs
    - Joe: don't love polymophism of .open, but it exists
- The kwargs to any method that can create an array are currently v2 specific (filters vs codecs, etc), plus there are a number of performance/behavior modifying args (cache_[metadata|attrs], partial_decompress, write_empty_chunks, dim separator). Do we
  - try not to change the api at all and try to translate into spec V3 land
  - make the kwargs actually match those of the Array.xxx v3 library methods, but also take in **v2_kwargs and translate where possible, checking for conflicts with v3 kwargs if provided
  - make the top level methods kwargs align with those of Array.create, etc
    - Norman: for open: make it compatible, you get back your method, for create could make
      - Runtime parameters: many of these didn't know exist, can debate case by case
      - Joe: if run time param provided that v3 don't use, raise error
- Currently zarr.open and similar will return an array if it exists, even if the existing array's dtype, codecs, etc don't match those provided.
  - Keep this?
  - Joe: think we should raise an error, wide agreement on this
Norman / Batched and interleaved codec pipelines
- https://github.com/zarr-developers/zarr-python/pull/1670
- Hybrid interleaved batched codec pipeline
- How to set runtime configuration?
- Batched store API
- BatchedCodecPipeline as abstract class that can be overridden by user code
- Move thread dispatch from codec to pipeline to allow for coalescing and locality?
Akshay / Generalized array support
- Open issue and tag with v3
Sanket / Summary for the core-devs → potential blog post in the future
- JH: I can take this on – target April?
- SV: Sounds good!
Agriya / Zarr Pyodide support, out-of-tree
- Requires patching numcodecs, zarr here and there
  - Zarr is pure Python, so lesser patches there. Numcodecs needs more patches because it is Cython-based.
  - Already done by Pyodide devs per Pyodide/Emscripten release
    - The Emscripten and Pyodide versions are not decoupled yet
    - Leads to missing versions
- Establish CI job that runs on PRs and nightly — or just nightly
- Issue with this is maintainability and how to keep support?
- Interactive documentation (end goal).
- Action item: I will be opening an issue for this on the Zarr repository and link both previous discussions (the ones that I have found). Discussion may proceed there further

February 28, 2024

Joe Hamman / @jhamman (JH)
Tom Nicholas / @TomNicholas
Norman Rzepka / @normanrz (NR)
Davis Bennett / @d-v-b (DB)
Sanket Verma / @MSanKeys963 (SV)
Akshay Subramaniam / @akshaysubr (AS)
Charles Stern
Alden Keefe Sampson (AKS)

Meeting notes:

Numcodecs discussion
- https://github.com/zarr-developers/numcodecs/issues/502
How ready is the v3 branch for kerchunk-related experiments?
- i.e. chunk manifest ZEP, virtual concatenation ZEP
- https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view
v3 store discussion
- https://github.com/zarr-developers/zarr-python/discussions/1686

February 14, 2024

Norman Rzepka / @normanrz (NR)
Davis Bennett / @d-v-b (DB)
Sanket Verma / @MSanKeys963 (SV)
Akshay Subramaniam / @akshaysubr (AS)

Meeting notes:

AS: Planning to send a couple PRs around numcodecs and wanted to join the refactor meeting to get the sense of current state of things
NR: https://github.com/zarr-developers/zarr-python/pull/1660
AS: Plan to add the encode and decode batch in numcodecs and move the logic from the Zarr-Python to numcodecs
NR: In the current codec mechanism there will be place to add the encode/decode class
SV: Also a question of how you'd add a new codec in V3 - https://zarr-specs.readthedocs.io/en/latest/v3/codecs.html
NR: Could use entrypoints for the new codec registrations
AS: New codecs are added via KwickIO
AS: https://github.com/zarr-developers/zarr-python/issues/1398

January 31, 2024

Attendees

Sanket Verma / @MSanKeys963 (SV)
Norman Rzepka / @normanrz (NR)
Davis Bennett / @d-v-b (DB)
Max Jones / @maxrjones (MJ)
Alden Keefe Sampson / @aldenks (AS)
Raphael Hagen / @norlandrhagen (RH)
Charles Stern / @cisaacstern (CS)
Jeremy Maitin-Shepard / @jbms (JMS)

Meeting notes:

NR: Codec pipeline
- Open question: merging partial and full versions?
- Next - reading/writing partial chunks for uncompressed data
DB: Saransh helped with hatch and source layout updates
- providing a review on packaging PRs: https://github.com/zarr-developers/zarr-python/pull/1592
- new branch for V3 work
  - removing attrs
  - using frozen dataclasses
  - relies on handling to/from dict in each class with validation functions
MJ: No updates, participating in Joe's virtual sprint on Zarr refactor
- can test out setting up test env with Hatch, provide feedback
AS: Setup on dev environment, still intending to work on high-level methods.
- Also adding setup/dev environment doc improvements to https://github.com/zarr-developers/zarr-python/pull/1643
RH: No updates
CS: Interested in participating in Zarr sprint remotely
JMS: No updates, analogous decisions in tensorstore

January 17, 2024

Attendees

Sanket Verma / @MSanKeys963 (SV)
Joe Hamman / @jhamman (JH)
Norman Rzepka / @normanrz (NR)
Davis Bennett / @d-v-b (DB)
Max Jones / @maxrjones (MJ)
Alden Keefe Sampson / @aldenks (AS)
Raphael Hagen / @norlandrhagen (RH)

Meeting notes:

SV: plans for numcodecs going forward
- TODO: connect with JK about this
JH: made some good progress on the Store API
- Thinking about what to do when keys are missing, raise KeyError or return None
- Needs work: getsize, move, tree, rmdir, open, close
  - NR: move should only exist on a store if its cheep
- Open questions:
  - Store.list_* could change to return async generators
NR: working on codec api, removing array metadata
- not 100% happy with the API yet
- new methods: evolve and validate - check if the codec matches
- looking for input here https://github.com/zarr-developers/zarr-python/pull/1632
DB: working on a messy / unmergable PR for the Array API
- end goal: unify array/group apis for v2 and v3
- added a new directory with v2 and v3 metadata
- stuck on dataclass inheritance
- JH: where will the normalization of metadata keys
MJ: not much, going to pick up the hatch PR
- JH: yay!
- DB: flag issue around imported modules from tests https://github.com/zarr-developers/zarr-python/pull/1601
AKS: playing with zarr in rust
- very fast!
- going to pick up the top level api this week
RH: No updates at this time

Discussion

What goes in the beta release

core array, group, and store api
thesis: almost feature complete but the api should be set
- we want people to start using it so we can get some feedback

January 3, 2024

Attendees

Sanket Verma / @MSanKeys963 (SV)
Alden Keefe Sampson / @aldenks (AS)
Joe Hamman / @jhamman (JH)
Norman Rzepka / @normanrz (NR)
Davis Bennett / @d-v-b (DB)
Max Jones / @maxrjones (MJ)

Meeting notes:

JH: Still working on Group and Store APIs
NR: Left off with codec api, sharding api, and sharding layouts
DB: Still working on array api
- considering a major change to indexing/slicing api (slicing a Zarr array gives NumPy array, which is weird and should give out a Zarr array)
- thinking about serialization of nested objects
- partial writes
MJ: Looking for a place to jump in
- Refactor metadata objects (e.g. ChunkGrid, ChunkKeyEncoding)
- Remove attrs and refactor (de)serialization
- Hatch PR https://github.com/zarr-developers/zarr-python/pull/1592
AKS - may want jump in on the top level zarr.foo* api

December 20, 2023

Attendees

Charles Stern / @cisaacstern (CS)
Jack Kelly / @JackKelly (JK)
Sanket Verma / @MSanKeys963 (SV)
Alden Keefe Sampson / @aldenks (AS)
Joe Hamman / @jhamman (JH)

Meeting notes:

CS: nothing directly on zarr, keeping an eye on the zarr issues with help wanted tags
SV: looking at hatch pr from davis, zep 0 revisions, and zarr paper
JK: working on a light-speed-io project (rust), playing around with ideas for fast data loading
AS: seems too early to jump in, don't want to get in the way
JH: Many things in progress: Store, Codecs, Arrays, Groups

December 6, 2023

Attendees

Ryan Aberanthey / @rabernat
Joe Hamman / @jhamman (JH)
Charles Stern / @cisaacstern (CS)
Jack Kelly / @JackKelly (JK)
Sanket Verma / @MSanKeys963 (SV)
Davis Bennett (DB) / @d-v-b
Raphael Hagen / @norlandrhagen
Alden Keefe Sampson / @aldenks
Norman Rzepka / @normanrz

Meeting notes:

Design doc for v3.0 has moved to GitHub. If you want to comment on the design then comment on the GitHub pull request.
Zarrita: Alistair started it as a place to experiment with the Zarr v3 spec (back in July 2020). Norman picked Zarrita up in Summer.
We've taken Zarrita, copied it, renamed & refactored things. There's a new Store interface (we're leaving behind the mutable mapping interface of Zarr stores.) Aiming for 100% coverage of static type checking.
Norman, Davis & Joe have been together for the last 3 days (in Berlin). JH has been working on the Group API (compatibility with Zarr-Python's group API). See this PR. For v2 there are two metadata docs which describe a group. Reads now happen async: which cuts the loading time for groups in half. Now working on listing contents of a group.
DB: In Zarrita we have representations of arrays for v2 and v3. DB has been working on a uniform interface to v2 and v3. Breaking lots of things :). Looking at how codecs decompress & compress. Overall strategy is to use the v3 way of doing things. See DB's PR.
RA: It's great that there are both async and sync APIs. But downstream datascience libraries will always want the sync API.
NR: The Zarrita code is based on fsspec, with some small changes.
JH: We're just using fsspec (for now). Very convinced that having an async API is the right call for Zarr-Python. Less convinced that the fsspec way of doing things will be the long-term solution.
NR: Adding sharding strategies. Customise how chunks are laid out in the file. e.g. if you want to do partial writes (where you can write to specific places). Instead of having to write entire shards at a time. Writing tests right now. First PR has been merged into the v3 branch (codecs).
JH: The codecs are now an entry point into the Zarr-Python code. Zarr-Python v2 basically supports any codec in numcodecs. Do we need to register all of those compressors and filters as codecs? Or should we limit them?
NR: Let's make a generic codec. bytes-to-bytes, and array-to-bytes.
JH: For now, we've decided not to work on variable chunk sizes. We could release a version of Zarr-Python without variable chunk sizes. Questions?
RA: Everyone's very supportive! This is what we need to get over the rut that zarr-python has been stuck in. A lot of folks would like to help, but don't know where. Are there concrete tasks that we can give to people? (The answer may be "no"! Some software projects are just no parallelizable like that.)
JH: Some of these first blocks of work have required us to already resolve conflicts. I'll jot down a couple of tasks which could be useful for folks to work on.
- the top-level API has not been ported over yet (e.g. zarr.open()). Most people use that top-level API.
- documentation! A lot of copy-and-pasting from Zarr-Python v2.0. But some function signatues will change.
- type checking needs work! MyPy isn't happy right now.
DB: v3 introduces the concept of a codec pipeline. We build an object which is a sequence of transformations of chunk data, which leads to it being storage, or - in reverse - leads to it being opened. The documentation for this doesn't exist yet. If anyone has an idea for a transformation, then work through the process of doing this with the v3 machinery, and write up some docs for how to do this. v3 is a lot more explicit about how chunks are encoded.
NR: +1 to adding codecs, or wrapping numcodecs. Also:
- try wiring the new Zarr-Python to downstream libraries. That would tell us what's missing in terms of the public API.
- Also, it'd be useful to having a champion for variable chunking. The codec pipeline should know about variable chunking.
RA: The problem with the ZEP process is that it's hard for ZEP authors to just implement the ZEP.
JH: We should be able to find byte-sized chunks which folks can work on.
NR: It'd be great to keep up the momentum after this week. And it'd be great to have a beta by Jan 2024!
RA: Where is the discussion of the ZEP3 proposal (here's the PoC implementation PR)? And the discussion is here.

Looking for a champion on:

variable chunking
synchronizers
h5py compat

Agenda

Update from Joe + Davis on refactor progress

November 22, 2023

Attendees

Joe Hamman / @jhamman (JH)
Charles Stern / @cisaacstern (CS)
Jack Kelly / @JackKelly (JK)
Sanket Verma / @MSanKeys963 (SV)
Davis Bennett (DB)

Agenda

Zarr-Python 3.0 design doc: https://hackmd.io/0DVKP6d9QI-VaHc0zvOuxw

Meeting notes:

JH: We can start working from store interface - kind of leaky abstraction
JK: Looking to read million chunks - sharding helps with that - discussions around batching requests in Zarr-Python - requesting million chunks in a single request - if Zarr V3 is a good place to pull in these performance bumps? (don't want to delay the existing work)
DB: To make get more efficient, you need to wrap it in something - mostly users are getting multiple chunks at a time
JH: In Dask/Xarray world you map a single chunk of Zarr at a time - At Earthmover there is 1-to-1 reads - to handle big size chunks we have rechunker - sharding codec sits above the store interface
JH: https://github.com/scalableminds/zarrita/blob/async/zarrita/sharding.py#L309 - indexing for sharding - the sharding codec will need access to store API whereas the other codecs doesn't need it
DB: Like the idea - add a new abstraction - we have leaky abstraction and we can use it
JH: Norman is willing to help but only if Zarr-Python is first class citizen
JH: https://docs.xarray.dev/en/stable/roadmap.html - publish the roadmap on Zarr-Python docs for the community
JH: Jack can help us in fast concurrent loading problem
JH: Meeting with Davis and Norman in 1.5 weeks to work on Zarr-Python 3.0

November 8, 2023

Attendees

Joe Hamman / @jhamman
Charles Stern /
Raphael Hagen / @norlandrhagen
Sanket Verma / @MSanKeys963
Martin Durant /

Agenda

Zarr-Python 3.0 design doc: https://hackmd.io/0DVKP6d9QI-VaHc0zvOuxw
- how would batching work across arrays
- use pydantic zarr
- other dependencies
  - python 3.9 - drop in dec.?
- which sharding impl

November 1, 2023

Attendees

Joe Hamman / @jhamman
Davis Bennett / @d-v-b
Sanket Verma / @msankeys963
Raphael Hagen / @norlandrhagen
Charles Stern
Brian Davis
Thomas Nicholas

Agenda

Request - can someone try to take some notes today?
Request - can we move this meeting time to 8:30a PT (currently at 9a PT).
V3 API migration
- Now that we are starting to work on implementing v3, we're faced with the question of what to do with the existing API
- Observation: the current v2/v3 polymorphism is unsustainable (and incorrectly prioritizes v2 internally)
- Proposal - we create a v3 namespace within zarr-python where we can develop in an isolated space toward a complete v3-spec implementation
  - Included in this namespace:
    - classes: zarr.v3.{Array,Group,Store}
      - These classes implement an internal api that closely aligns with the v3 spec
    - high level functions: `zarr.v3.{create, open, …}``
      - As much as possible, these function should look and feel like the v2 equivalents but should not be tied to the exact implementation
        
        e.g. zarr.create(shape=..., dtype=..., compressor=...) -> zarr.create(shape=..., data_type=..., codecs=..., attributes=...)
      - We may also want to deprecate and/or rename some of the existing top level functions
    - backward compatability:
      - high-level functions in the v3 namespace should be able to create or open a v2 dataset
      - The Group or Array does not need to be backward compatible though.
  - All development toward v3 happens on the main branch in zarr-python
- Alternative proposal
  - We avoid the v3 namespace and instead take over the primary namespace in a development branch (e.g. v3)
  - When we feel that the v3 branch is complete, we merge to main and make a 3.0 release
  - Folks have less time to test out the v3 implementation but we have a cleaner development process
- Ideas
  - Idea of zarr array is to look like a numpy array
    - could move all the zarr array details to a polymorphic metadata object
    - trim things down to just the minimal array api interface
  - declarative heirarchy specification
  - type hints

Sanket Notes

DB: Definition of Zarr and Dask chunks are different and that's not good
JH: Benefits of generative chunk indexing
- Impacts with sharding, variable chunking and other shiny feature
- Large array with billions of chunks
JH: Maintaining both V2 and V3 at the same time is not ideal
- DB: V2 has of lot stuff that people don't use - stores
TN: The current public facing APIs (V2 and V3) are conformant to the existing spec - but what we're thinking to work on a new public facing API which is wrapper of V2 and V3, and not conformant to V3 exactly - isn't that a bad thing?
- DB: The public-facing Zarr array object API is not covered by the spec anyway
- Also can't be, because syntax might be language-dependent
- Therefore we have full freedom in the public python API of the python zarr array type
- TN: Okay, in that case makes sense to follow python array API standard as much as possible
TN: Array API has granular functionality which is super useful (e.g. you can say "we don't support the statistical functions")
TN: Note that chunking is not part of the array API standard

October 18, 2023

Attendees

Joe Hamman / @jhamman
Max Jones / @maxrjones
Davis Bennett / @d-v-b
Tom Nicholas
Charles Stern
Sanket Verma
Ryan Abernathey

Agenda

Proposal: just use Zarrita :)
- 0.1% done: https://github.com/jhamman/zarr-python/pull/1
- Ryan added memorystore to Zarrita: https://github.com/scalableminds/zarrita/pull/12

September 20, 2023

Attendees

Joe Hamman / @jhamman
Charles Stern / @cisaacstern
Sanket Verma / @MSanKeys963
Raphael Hagen / @norlandrhagen

Agenda

Review ZEP 6 proposal and proposed implementation
- https://github.com/zarr-developers/zeps/pull/46
- https://github.com/zarr-developers/zarr-python/pull/1526
Goal with ZEP6 in Zarr-Python
- Clean up interface for Group/Array constructors from V2/V3 metadata
- Use ZOMs internally as part of the migration to V3 spec
- Use ZOMs in array/group constructors to consolidate initialization reads/writes
  - https://github.com/zarr-developers/zarr-python/issues/538 (repeated writes to set attrs)
  - https://github.com/pangeo-data/pangeo-eosc/issues/39 (many contains/iter calls)
- Expose ZOMs to third parties

September 6, 2023

Attendees

Joe Hamman / @jhamman
Ryan Abernathey / @rabernat
Sanket Verma / @msankeys963
Raphael Hagen / @norlandrhagen
Ryan Williams
Charles Stern / @cisaacstern
Davis Bennett / @d-v-b

Agenda

review scoping section (below)
performance
zarr + pydantic (https://github.com/janelia-cellmap/pydantic-zarr)
- observation: Zarr-python is missing specific data models for Groups / Arrays
- price of depending on pydantic is probably not worth it

Scoping V3 update (by @jhamman)

Written by @jhamman on September 5, 2023

In the Winter and Spring of 2022, while the V3 spec was still under development, an experimental V3 implementation was added to the Zarr-Python codebase (#898). This implementation followed the spec, as it was written at the time. However, in the months following these developments, major changes to the spec were made. This has left Zarr-Python out of sync with the V3 specification.

Summary of current status

V3 support is behind an experimental API (accessed by setting zarr_version=3 and ZARR_V3_EXPERIMENTAL_API=1).
A separate code path for V3 stores was implemented in zarr._storage.v3.

Major changes to the spec since the experimental implementation include:

Entrypoint metadata document (zarr.json) is no longer required
Metadata keys were renamed (e.g. meta/foo/bar.group.json -> /foo/bar/zarr.json)
Group and metadata documents are no longer distinguished by their key name (everything is zarr.json and a node_type field is included in all documents)
Various updates to metadata fields:
- format_version → int
- added dimension_names
- removed chunk_memory_layout (in favor of transpose codec)
- codecs now includes a list of codects that was previously split between the filters and compressor fields
- etc.

Open questions:

fallback data types

Actions

https://github.com/orgs/zarr-developers/projects/5/views/1

Zarr refactor meeting

Aug 16, 2023

Attendees

Joe Hamman (Earthmover)
- Xarray and Zarr dev
Sanket Verma (Zarr)
Tom White (independent dev)
- SGKit and Cubed
Max Jones (CarbonPlan)
- Data scientist
Raphael Hagen (CarbonPlan)
- Data eng.
Charles Stern (Columbia)
- Pangeo-forge

Discussion

Max: how do we view V3 extensions already in Zarr-python
Charles: how does Zarr python register plugins
Zarrita (https://github.com/scalableminds/zarrita/) - reference implementation
- no baggage / tech debt of Zarr-python
- not production ready
- also has sharding
Tom: Interop tests between implementations

Timeline

Goal: by the end of the year, have a fully-functional implementation of V3 in Zarr Python

Starting now: survey users to get an understanding of how a breaking change to the V3 implementation will impact users
Next two weeks: Break #1290 into smaller chunks and set up project board
September: start refactor efforts
Oct-Dec: Integration and interop testing

TODOs:

add regular call to community calendar
break out V3 implementation tasks into issues / project board (try to identify issues that can be picked up by others)
publish read out of this call

Joe Hamman

2024/04/05 19:02:41

2 -> 3 conversion cli, (maybe in its own repo)

Davis - can you open an issue here?

Davis Bennett

2024/04/17 09:33:08

https://github.com/zarr-developers/zarr-python/issues/1798

2024/04/05 18:12:59

[p3] (reach) http proxy - may not need to be in zarr-python

Davis - can you open a ticket for this one?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Zarr-Python Developer Meeting Notes

April 4, 2025

Agenda

Minutes

March 21, 2025

Minutes

March 7, 2025

Minutes

February 28, 2025

Minutes

February 21, 2025

Minutes

February 14, 2025

Agenda

February 7, 2025

Agenda

January 31, 2025

Agenda

January 24, 2025

Agenda

January 10, 2025

Agenda

January 3, 2025

Agenda

create_array API design notes

goals

non goals

current proposal

Usage examples

questions

December 20, 2024

December 13, 2024

December 6, 2024

November 29, 2024

November 22, 2024

November 15, 2024

November 8, 2024

November 1, 2024

October 25, 2024

October 18, 2024

October 11, 2024

October 4, 2024

September 20, 2024

September 13, 2024

Attendees

Notes

August September 6, 2024

Attendees

Notes

August 30, 2024

Attendees

Agenda

discussion about cache consistency and invalidation

August 23, 2024

Attendees

Agenda

August 9, 2024

Attendees

Agenda

July 26, 2024

Attendees

Agenda

Notes:

July 12, 2024

Attendees

Agenda

June 6, 2024

Attendees

May 30, 2024

Attendees

Agenda

Notes

May 23, 2024

Attendees

Agenda

May 17, 2024

Attendees

Agenda

Notes

May 8, 2024

`create_array` API design notes