Joe Hamman
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Zarr-Python Developer Meeting Notes _formerly Zarr-Python Refactor Meeting Notes_ ## September 5, 2025 - Davis Bennett @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones - Rayn Abernathey / @rabernat ### Agenda - Joe's [ZEP8 PR](https://github.com/zarr-developers/zarr-python/pull/3369) - Open PRs - Zarr summit topics - Virtual views / lazy indexing of Zarr arrays (usable zarr without dask) ## August 22, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones ### Agenda - zep8 - codec configs - zep10 - paused in favor of registered attributes ## August 8, 2025 - Davis Bennett @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Tom Nicholas / @TomNicholas - Tom Augsburger / @tomaugsburger - Max Jones / @maxrjones ### Agenda - codecs (https://github.com/zarr-developers/zarr-python/pull/3318) - Tom's xarray benchmarking ## July 25, 2025 - Davis Bennett @d-v-b - Ilan Gold / @ilangold - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi ### Agenda - sharding bug - @davis has a fix working - codecs 💣 - - Making public custom indexer APIs: https://github.com/zarr-developers/zarr-python/issues/3175 ## July 18, 2025 - Davis Bennett @d-v-b - Ilan Gold / @ilangold - Joe Hamman / @jhamman - Tom Augsburger / @tomaugsburger - Tom Nicholas / @TomNicholas ### Agenda - Zarr config ## July 11, 2025 - Davis Bennett @d-v-b - Max Jones @maxrjones - Ilan Gold / @ilangold - Joe Hamman / @jhamman - ### Agenda - 3.1.0 release - Zarr summit - Ilan's data loader - anecdotally zarrs is much faster for sharded data - Codec refactor ## July 04, 2025 - Davis Bennett @d-v-b - Max Jones @maxrjones ### Agenda - 3.0.9 retro - Store API - Zarr spec changes ## June 27, 2025 - Davis Bennett @d-v-b - Ian Hunt-Isaak / @ianhi - Josh Moore / @joshmoore - Tom Augsburger / @tomaugsburger - Ilan Gold / @ilangold - Tom Nicholas / @TomNicholas ### Agenda - Tom's performance optimizations for chunk encoding - avoid sloppy memory allocations - use compute for compute-bound tasks, async for io-bound tasks - in main, io and compute-bound tasks are pipelined - chunk streaming would be great - codec abcs dont support an output buffer ## June 20, 2025 - Davis Bennett / @d-v-b - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones ### Agenda - 3.1 release - xarray's tests failures - Ian plans to work on this next week - https://github.com/pydata/xarray/issues/10329 - docs for data type ## June 06, 2025 - Davis Bennett / @d-v-b - Max Jones / @maxrjones - Joe Hamman / @jhamman - Tom Augsburger / @TomAugsburger ### Agenda - figuring out endianness - [table](https://app.excalidraw.com/s/762mjh4w1Pv/3LxSUZJfy1P) - specifying https://zarr.dev/codecs-registry/Others/OtherCodecs.html applies only to V2 - mkdocs-material update - dtypes ## May 30, 2025 - Davis Bennett / @d-v-b - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones ### Agenda - zarr store modes - dtypes blog posts - what is breaking in dtypes PR? - not allowed to show up with a numpy object dtype, requires explicit definition - vlenbytes wasn't actually supported but now is - changing config, in that you can no longer specify a default compressor ## May 23, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones - Tom Nicholas / @TomNicholas ### Agenda - dtypes plan - merge to 3.1 branch - aim for pre-release in the next few weeks - also include a few other small breaking changes - zarr-extensions PR for dtypes - https://github.com/zarr-developers/zarr-extensions/pull/5 ## May 16, 2025 - Davis Bennett / @d-v-b - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones ### Agenda - 3.1.0 pre-release - Codec API design - Imagecodecs compatibility ## May 9, 2025 - Davis Bennett / @d-v-b - Josh Moore / @joshmoore - Tom Augsburger / @TomAugsburger - Tom Nicholas / @TomNicholas ### Agenda - 3.1.0 pre-release - Buffer API testing - CodecPipeline API changes - Zarr-aware store API ideas ## May 2, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Ryan Abernathey / @rabernat - Ian Hunt-Isaak / @ianhi ### Agenda - my recent PRs (d-v-b) - contents for a 3.1 release ## April 25, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Max Jones / @maxrjones - Ian Hunt-Isaak / @ianhi - Tom Nicholas / @TomNicholas ### Agenda - slow hypothesis CI job - Deepak opened a PR :crown: - Need a separate issue for disallowing zarr.json groups - codecs - regtest pytest? - Specs for different codecs - restore 2.0 exceptions - 3.1.0 release items - dtypes - exceptions? ## April 18, 2025 - Davis Bennett / @d-v-b - Ian Hunt-Isaak / @ianhi - Max Jones / @maxrjones - Matt Iannucci / @mpiannucci ### Agenda - dtypes: - still waiting on spec. some disagreement - davis position - explcitly aiming for numpy comapt, not aiming fo rbest datetime impl - quiet nan vs signal nan - coudl represent - breaking change - updated version policy to effver from semver - how big of a change is this? - what can others do to help? blocked on things davis needs to do. clean up commits + break up PR - Integration/regression testing - tests for if we can still read data written by old zarr versions/formats - could be a github repo - max has used dvc which lets you store files on object storage and just checkin the hashes when doing similar thing with lots of images - check against other zarr implementations . e.g. can we read stuff written by tensorstore - should coordinate with other zarr implementations - codecs (Max) - need to understand why imagecodecs cares what the datatype of the array (numpy or cupy) in the buffer is - should be fine as long as all Zarr-Python's test pass, meaning that we don't make assumptions about the datatype - Davis checked that the data represented as "B" or "b" produces the same output from to_bytes - general functionality of codecs - numcodecs and imagecodecs provide Cython bindings from C languages - all numcodecs are pretty independent apart from an ABC - some people want to use system zstd, cramjack etc. with numcodecs so that numcodec's ztd would be optional - Zarr release? ## April 4, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Sanket Verma / @sanketverma1704 - Tom Augspurger / - Ryan Abernathey / - Josh Moore / @joshmoore - Tom Nicholas / @TomNicholas ### Agenda - release? - (davis) learnings from tensorstore - (davis) A rough idea for a zarr-format-aware store API ### Minutes - (ryan a.) update on the state of the spec w.r.t. extension names - Datetypes extension names - datetime64 - timedelta - string - The dtype plan: - open issue in extensions repo for each new datatype - get feedback on names / configuration - split davis' pr into pieces - registry framework - new dtypes ## March 21, 2025 - Davis Bennett / @d-v-b - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi - Sanket Verma / @sanketverma1704 - Kyle Barron / ### Minutes Core topic for the day: https://github.com/zarr-developers/zarr-python/pull/2874 - Davis discovered a weird NumPy dtype in Windows - Relevant NumPy issue: https://github.com/numpy/numpy/issues/9464 ``` >>> from ml_dtypes import bfloat16 >>> import numpy as np >>> np.zeros(4, dtype=bfloat16) array([0, 0, 0, 0], dtype=bfloat16) ``` - see https://github.com/zarr-developers/zarr-python/pull/2874#issuecomment-2701802998 ## March 7, 2025 - Davis Bennett / @d-v-b - Josh Moore / @joshmoore - Joe Hamman / @jhamman - Tom Nicholas / @TomNicholas - Deepak Cherian / @dcherian - Tom Augspurger - Ian Hunt-Isaak / @ianhi ### Minutes - Davis' Dtypes (https://github.com/zarr-developers/zarr-python/pull/2874) - target: fixed length unicode strings - also covering numpy dtypes - and ml-dtypes - Dtype wrapper class -> holds on to data needed to generate a dtype - Planned reviewers: - Tom N - Nick (eye on custom dtypes) - Josh (eye on the spec) - Versioning policy (https://zarr.readthedocs.io/en/latest/developers/contributing.html#compatibility-and-versioning-policies) - issue: https://github.com/zarr-developers/zarr-python/issues/2889 - started: https://github.com/zarr-developers/zarr-python/pull/2819 - SciPy abstracts that went in - Akshay - GPUs + Zarr - Tom N. - Virtualizarr - Joe - Icechunk - Xarray Tutorial - Ian - Xarray in Biology - ## February 28, 2025 - Davis Bennett / @d-v-b - Josh Moore / @joshmoore - Joe Hamman / @jhamman - Ian Hunt-Isaak / @ianhi ### Minutes - Josh and Davis on extension dtype naming - Davis is working on extension dtypes in zarr-python - need to add support for parametric dtypes and extension dtypes - Akshay and co have been hacking on zarr/gpus this week ## February 21, 2025 - Davis Bennett / @d-v-b - Josh Moore / @joshmoore - Sanket Verma / @sanketverma1704 - Ian Hunt-Isaak / @ianhi ### Minutes - Yank the recent release: https://github.com/zarr-developers/zarr-python/issues/2852 due to a bug - https://github.com/zarr-developers/zarr-python/pull/2665 - would like to merge soon ## February 14, 2025 - Deepak Cherian / @dcherian - Josh Moore / @joshmoore - Norman Rzepka / @normanrz - Davis Bennett / @d-v-b - Ian Hunt-Isaak / @ianhi ### Agenda - release? ## February 7, 2025 - Deepak Cherian / @dcherian - Josh Moore / @joshmoore - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Max Jones / @maxrjones - Sanket Verma / @sanketverma1704 ### Agenda - codec/numcodecs issue: https://github.com/zarr-developers/zarr-python/issues/2800 - Need reviews on: - [ ] [reviewer=norman?] boundary chunk problem -> https://github.com/zarr-developers/zarr-python/pull/2784 - [x] [reviewer=joe] batch creation PR -> https://github.com/zarr-developers/zarr-python/pull/2665 - [ ] [reviewer?] create_array explicit groups -> https://github.com/zarr-developers/zarr-python/pull/2795 - [x] [reviewer=davis?] empty chunk contents -> https://github.com/zarr-developers/zarr-python/pull/2755 - [x] [reviewer=david] scalar return type -> https://github.com/zarr-developers/zarr-python/pull/2718 - [x] [reviewer=joe] zipstore pickling -> https://github.com/zarr-developers/zarr-python/pull/2807 - [x] [reviewer=joe] obstore store -> https://github.com/zarr-developers/zarr-python/pull/1661 - store hypothesis tests (Max ? for Deepak) - next steps for ObjectStore (https://github.com/zarr-developers/zarr-python/pull/1661) - Docs: write a section [here](https://zarr.readthedocs.io/en/stable/user-guide/storage.html). Note that this store is experimental. - Test coverage: ## January 31, 2025 - Deepak Cherian / @dcherian - Norman Rzepka / @normanrz - Josh Moore / @joshmoore - Akshay Subramaniam / @akshaysubr - Joe Hamman / @jhamman ### Agenda - conda-forge? - Are V2 tests missing? - Feedback on https://github.com/zarr-developers/zarr-python/pull/2780 - Move function definitions from teh api modules into core, e.g. api.sync.create_array -> core.array.sync - Pull out into its own PR - Strict parsing of metadata - Feedback on https://github.com/zarr-developers/zarr-python/pull/2751 - Explicit CPU buffers for metadata - Basic GPU docs ### January 24, 2025 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Josh Moore / @joshmoore - Sanket Verma / @sanketverma1704 - Nick Byrne / @nenb - Norman Rzepka / @normanrz - Ryan Abernathey / @rabernat - Max Jones / @maxrjones - Yurii Zubov - Akshay Subramaniam / @akshaysubr ## Agenda - dtypes - Nick talked through some slides - https://github.com/nenb/zarr-dtype-presentation/tree/main - NR: - Array-to-bytes codec - Endianess - Order is now a runtime config - zarr.core is private api, would need zarr.dtypes module or something - extensions - store API / tests - Max looking for guidance on a handful of questions - https://docs.google.com/presentation/d/17eXBCI3WoI3pELVt_uyaWxDF2G2ovuiXoHB5wPd68z4/edit?usp=sharing ## January 10, 2025 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz - Josh Moore - Sanket Verma / @MSanKeys963 ## Agenda - Object data type - https://github.com/zarr-developers/zarr-python/issues/2617 - Next steps after 3.0 - variable chunking? - deprecating more api? - numcodecs thing that could use some thought / design work? - semi-circular dependency - what do we do with the v2 codecs - what do we do with the v3 things - future directions ## January 3, 2025 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz - Deepak Cherian - Sanket Verma / @MSanKeys963 ## Agenda 1. 3.0 release schedule update (Joe) a. `3.0.0-rc.1` went out yesterday b. we will publish and socialize the migration guide today c. we will make the full 3.0.0 release on Thursday Jan 9 at 10a ET -> during this time, we will focus on documenation and bug fixes (no major feture additions) 2. release announcement a. Joe has written a blog post. The full zarr-dev team has comment access here: https://www.notion.so/earthmover/Zarr-Python-3-Release-Blogpost-14b492ee309f80d28af3ebfdeedf96f7 b. sanket will prepare a social media thread 3. shape of array after the addition of filters/compressors to top-level api - davis is opening an issue on this 4. Norman will write a docs section on sharding ## `create_array` API design notes We are struggling to find a user-facing API for creating new arrays. We have decided to create a new top-level API function (`create_array`) to handle this but questions remain about how to provide a simple / intuitive API that covers both v2 and v3 arrays, and sharded/non-sharded arrays in one API. This short design note lays out the goals and options we are considering. ### goals 1. provide a single function to create v2 and v3 arrays 2. make it easy to create sharded arrays 3. provide a way to configure codecs (ala compressors and filters from v2) ### non goals 1. extending sharding to v2 array 2. ? ### current proposal ```python= async def create_array( store: str | StoreLike, *, name: str | None = None, shape: ShapeLike, dtype: npt.DTypeLike, chunk_shape: ChunkCoords | Literal["auto"] = "auto", shard_shape: ChunkCoords | None = None, filters: FiltersParam = "auto", compression: CompressionParam = "auto", fill_value: Any | None = 0, order: MemoryOrder | None = "C", zarr_format: ZarrFormat | None = 3, attributes: dict[str, JSON] | None = None, chunk_key_encoding: ChunkKeyEncoding | ChunkKeyEncodingParams | None = None, dimension_names: Iterable[str] | None = None, storage_options: dict[str, Any] | None = None, overwrite: bool = False, config: ArrayConfig | ArrayConfigParams | None = None, data: npt.ArrayLike | None = None, ) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]: ``` This function signature includes parameters that fall into the follwing categories: **Store parameters** - `store` - `storage_options` **Runtime parameters** - `order` - `overwrite` - `config` - `data` **V3-only parameters** - `dimension_names` - `shard_shape` - `chunk_key_encoding` **Generic parameters** - `name` - `shape` - `dtype` - `chunk_shape` - `filters` - `compression` - `fill_value` - `attributes` **Note 1: the focus of this document is on the parameters that control how the core array metadata is configured. **- `shape` - `dtype` - `chunk_shape` - `shard_shape` - `compression` - `filters` Note 2: it may be worth grouping the parameters in `create_array` using a similar framework to above. This will help users navigate this fairly large parameter space. #### Usage examples 1. minimal example w/o sharding: _this creates an array using default / inferred parameters for zarr_format, chunk_shape, etc., etc._ ```python create_array(store=store, shape=(1000, 1000), dtype='f8') ``` 2. create sharded array _this creates a sharded array where chunks are compressed with Zstd ```python create_array( store=store, shape=(1000, 1000), shard_shape=(100, 100), chunk_shape=(10, 10), compressors=[ZstdCodec(level=3)] dtype='f8', ) ``` ### questions 1. what is the value/justification for providing arguments for `filters` and `compressors` instead of a single `codecs` parameter? Will we enforce that all `filters` are `array->array` codecs and all `compressors` are `bytes->bytes`? @d-v-b --> this seems like the question we need to answer first. How will we des (d-v-b): `filters` and `compressors` map on to the two types of variadic codecs allowed in the v3 `codecs` attribute. This makes those parameters simple to parse -- `filters` must resolve to `tuple[ArrayArrayCodec, ...]`, and `compressors` must resolve to `tuple[BytesBytesCodec, ...]`. I think we could have just 1 `codecs` parameter, but it would need to take a form that allowed separably specifying the `ArrayArray` and `BytesBytes` codecs. Something like this: ```python class CodecParams(TypedDict): filters: NotRequired[tuple[ArrayArrayCodec, ...]] compressors: NotRequired[tuple[BytesBytesCodec, ...]] array_serializer: NotRequired[BytesBytesCodec] ``` any missing keys would resolve to the defaults set in the config. But if `codecs` was `tuple[Codec, ...]` then users would be confused, and parsing it would be a headache. (NR): I like `filters` and `compressors` because imo they better convey what the codecs are used for instead of "array->array" or "bytes->bytes" codecs. We should enforce that only the right type is used for both kwargs. 2. what is correct type for the `filters` / `compressors` argument? Options include: a. list of strings, e.g. `['gzip']` b. list of dicts, e.g. `[{"name": 'gzip', "configuration": {"level": 4}]` c. list of objects, e.g. `[GZipCodec(level=4)]` (b) and \(c) seem like a reasonable choice. (d-v-b) IMO the only option here is something that unambiguously represents a codec instance, which rules out `a`. If we can make constructing the dict representation of the codecs ergonomic (i.e., autocomplete), then I think `b` is a pretty nice option, because users don't need to import a bunch of classes to use the create function. but we should also accept the complete codec class instances as well, so `c`. (NR): I like `c` best, but also fine with `b`. Agree that `a` is too ambiguous. I also cleaned that up for the default codecs https://github.com/zarr-developers/zarr-python/commit/5cb6dd8f62ad6ed5391a08535dc05ef9ac50bbad 3. How do we want to parametrize the partitioning of the array? Right now the PR in question takes two parameters, `chunk_shape: tuple[int, ...] | Literal["auto"]` and `shard_shape: tuple[int, ...] | Literal["auto"] | None`. In the interest of backwards compatibility and brevity I would support the names `chunks` and `shards`. An alternative API would be to have a single parameter, e.g., `chunking`, that takes: - `tuple[int, ...]`, (no sharding, regular chunking), - A dict like `{"chunks": tuple[int, ...] | Literal["auto"], "shards": Tuple[int, ...] | Literal["auto"]}` - and maybe more complicated types? This basically pushes complexity into a single parameter, but it's convenient given that chunk shape and shard shape have to be defined together. (NR): I'd prefer `chunks` and `shards`. `chunking: tuple[int, ...] | {"chunks": tuple[int, ...] | Literal["auto"], "shards": tuple[int, ...] | None | Literal["auto"]}` would also be fine. Not really a fan of `auto` , though. (JH): would it help reduce scope to remove auto chunking / chunk/shared alignment from this first version? (DVB): I don't think the auto chunking / sharding adds a lot of complexity here, and I think it's a big win for usability to have some defaults that "just work" (whether the defaults in my pr actually "just work" is another question). As for `auto`, we need *some* way of expressing "pick chunks / shards automatically". Often we use `None` to mean "default", but if we are using `shards=None` to denote "no sharding", `None` can't mean "default" anymore, and we need to pick another value. I think `auto` is short and literate but I'd be up for alternatives. ## December 20, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz - Deepak Cherian - Josh Moore / @joshmoore - Sanket Verma / @MSanKeys963 - Akshay Subramaniam / @akshaysubr Release topics: - top level api - https://github.com/zarr-developers/zarr-python/pull/2463 - open_foo(mode=r) defaults - remove read_foo() - remove read() - - documentation - add to migration guide - use create_foo and open_foo functions - create() and open() will be deprecated soon - new page in user guide on runtime config - ## December 13, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz - Deepak Cherian Notes: - Davis worked on docstrings - Davis is working on concurrent array creation - Norman takes over write_empty_chunks - Joe will work on docs next week - Rename RemoteStore to FsspecStore - `Array.__iter__` is slower compared to v2 because v2 loaded the entire array in memory upfront. not a release blocker - Next meeting next Wednesday 5pm CET, 8am PST ## December 6, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz - Sanket Verma / @MSanKeys963 - Deepak Cherian / Notes: - Davis will be working on some of the blocked PRs - Joe will share a V3 blog post next week for review - Deepak has been working on tests Discussion points for today: 1. ✅ beta release -> https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-beta.3 2. Default codec -> https://github.com/zarr-developers/zarr-python/issues/2267 3. What's off spec? - string codecs and dtypes - consolidated metadata ## November 29, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Norman Rzepka / @normanrz Notes: - Concurrent `members` - Create array with sharding https://github.com/zarr-developers/zarr-python/issues/2170 - Runtime config attribute on the `Array` and `Group` class - zarrs looks promising, great validation on the extensibility - Final release before the holidays, release right after new years ## November 22, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Tom Augspurger / @TomAugspurger - Norman Rzepka / @normanrz - Sanket Verma / @MSanKeys963 Notes: - ChunkStore - ZipStore specification: https://github.com/zarr-developers/zarr-specs/pull/311 - Should `obstore`-based Store be in zarr-python or its own package? https://github.com/zarr-developers/zarr-python/pull/1661 - Keep in zarr-python, make `obstore` an optional dep - later: config what protocol to open in which store - Top-level sharding configuration in `zarr.open` etc. - something like?: https://github.com/zarr-developers/zarr-python/blob/76904eac556a71817eb7ea2e54df703cba919a12/src/zarr/core/chunk_grids.py#L30C5-L37 - or add `shards` kwargs +10 ## November 15, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Tom Augspurger / @TomAugspurger - Sanket Verma / @MSanKeys963 - Theodore Visvikis / - Josh Moore / @joshmoore - Akshay Subramaniam / @akshaysubr - Norman Rzepka / @normanrz Notes: - Discussions on https://github.com/zarr-developers/zarr-python/blob/f74e53aca5311ec077da71585dd962c4af7b8a11/tests/test_api.py#L68-L78 - ## November 8, 2024 - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Tom Augspurger / @TomAugspurger - Sanket Verma / @MSanKeys963 Notes: - Joe is working on store stuff and needs help with review - Tom would help with the review - Davis would like this PR to be reviewed: https://github.com/zarr-developers/zarr-python/pull/2447 Discussion points - https://github.com/zarr-developers/zarr-python/issues/2412 - ## November 1, 2024 - Joe Hamman / @jhamman ## October 25, 2024 - Joe Hamman / @jhamman - Tom Augspurger / @TomAugspurger - Sanket Verma / @MSanKeys963 Notes: - Updates from Tom — working on `info`, `size` and `tree` properties - Joe - hopefully wrapping store mode refactor up today ## October 18, 2024 - Tom Augspurger / @TomAugspurger - Davis Bennett / @d-v-b - Ryan Abernathey / @rabernat - Joe Hamman / @jhamman - Norman Rzepka / @normanrz - Matt Iannucci / @mpiannucci - Akshay Subramaniam Notes: - Updates - Joe: getting icechunk out, interested in speaking about release blockers - Davis: moving v3 tests, working on store api - Norman: working on the numcodecs, sharding bug, filter/codecs for v2 arrays - Ryan: worked on strings (out of spec), commited to dealing with spec problems (specifically on extensions) - Tom: Xarray compat (probably ready to merge) - Akshay: focusing on gpu compression codecs (nvcomp) - Matt: working on getting v3 working with kerchunk and virtualizarr Topics: - store api - DB: two phases of IO: reading/writing chunks or initializing an array a group - `mode` was added to the store - pathalogical situations where `clear()` is happening on reopen - NR: Be able to use StorePath in `zarr.open`, e.g. `zarr.open(LocalStore("...", mode="a") / "testdata.zarr")` - JH: https://github.com/zarr-developers/zarr-python/issues/2359 - - v2 filters/codecs - https://github.com/zarr-developers/zarr-python/issues/2325 - NR: hasn't done much here yet - MI: same, just looked at the issue - codec naming - NR: numcodecs codec namespace will be just for v3 arrays - we may also want to split the kerchunk filters into compressor/filter categories - RA: what would give us more developer velocity here? - - release blockers - extensions - ## October 11, 2024 - Tom Augspurger / @TomAugspurger - Davis Bennett / @d-v-b - Sanket Verma / @MSanKeys963 - Josh Moore / @joshmoore - Ryan Abernathey / @rabernat - Joe Hamman / @jhamman - Norman Rzepka / @normanrz Notes: - Summary - JH: xarray & dask test suites passing with v3. - JH: milestone for a beta release - string PR should be included - NR: doc sprint? - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Docs.20sprint.20in.20September.3F/near/474312033 - Big items - Strings (RA) - braindump - confusing how it (ever) used to work - `np<2` had no notion of varlen str - fixlen (utf-8) - or object array - zarr allowed both as valid dtypes, e.g. u4 or object - now np has a varlen - question of dtype+codec - PRs merged - 236: new dtype "string" and "bytes". also zarrs implements them. - NR: v2 compatibility? pickling mode and another mode. - RA: two questions - API: strings in zarr and pull them out. works as expected? (with this PR: https://github.com/zarr-developers/zarr-python/pull/2036) - data on disk stored the same between v2 & v3 (without rewriting). believe so if using vlenutf8 codec. - RA: assumption is always vlen, and impl can use the appropriate in memory data structure (i.e., different in c) - in python, decoding to varlen if `np>2` (breaking from zarr v2); decoding to object if `np<2` - NR: zarr array v2 a year ago will that work? believe so. tom working on the v2 side of it. - TA: https://github.com/zarr-developers/zarr-python/pull/2323 - supported once that is supported - NR: for v3, bit of concern since there's no spec. a ZEP? (see stuck https://github.com/zarr-developers/zeps/pull/47) - RA: two components need to be addressed in the spec (dtype+codec; leaky abstraction?) - ...integer array interpreted as bytes?! unsure. - RA: far from being able to make changes to the spec. - JM: good to confer with John Kirkham - JH: pickled dtypes is inoperable with other languages - NR: how do we want to deal with experiments? - can't write specs without implementation and python is a great place to do that - but maintain a few other implementations. not useful if the main implementation blazes ahead and everyone must follow - suggest we be cautious about that. - various ways to handle that - previously environment variables - issue a warning on non-standard dtypes - some discussion then it'll be fine - RA: see https://github.com/zarr-developers/zarr-specs/pull/312#issuecomment-2407444223 - SV: Isaac stepped down from https://github.com/zarr-developers/zeps/pull/47 (no one to steer) - RA: different opinion -- just everything through extensions. - trying to get to parity. can we do the same things. - still struggling to evolve it. we decided that it's unversioned, so it's unchangeable. - would need to move to v4. - immutability was to be balanced by extensions. - haven't managed to develop a robust ecosystem/process for extensions. ZEPs have failed. nothing adopted. we can't agree... - extensions and then let's go make them - see TA's https://github.com/zarr-developers/zarr-specs/issues/316 - let them be free - practical way forward - DB: tried to address that in https://github.com/zarr-developers/zarr-specs/pull/312 - are we willing to change the parts of the spec that are blatantly contradictory - if it's immutable, so be it. - RA: clarifications are ok - NR: state of zarr-specs is terrible. ZEPs are a symptom. people are fatigued. process broken. - spec core team is a good path. - will have the same issue treating everything as an exception. names need to be coordinated. two string dtypes?! - who controls the namespace. need a process. even a repository, pypi style. - JM: feedback on zarr versioning from other implementations - RA: namespacing - extensions need to be namespaced. URI ok. absolute. resolves to the document. - need to figure how what are the extensions and what's their scope - 2 different extensions (different URIs) that define the same codec. that's ok. - make whatever changes are needed to have that process, socialize it, etc. shutdown ZEPs. - NR: that's not how they were meant to work in v3. extension *points*. let's you create, e.g., codecs. (nothing wraps it) - RA's is a new concept. might work. there might be issues with composability. - comfortable in zarr-python if you need to actively opt in. - RA: ask everyone if the original intention of zarr-spec work in practice. - haven't been able to make forward progress. incorporate learning - face reality of how things work in the real world and adapt. - look at others where it's working - JH: laser focused on getting zarr-python out - can set config (don't need environment variable). go off-road - consoldiated metadata, few codecs, etc. to add (not much more coming) - SV: lack of implementations was definitely an issue. Tried to work that into https://github.com/zarr-developers/zeps/pull/59 - JH: on spec process, v3 is in accepted not final. missed that date by a year. - reasonable to say that changes are going to have to be made - change the status of the spec for a while? - SV: previous conversation about when to set it to final - - Technical things - Beta release (JH): ok when smashing merge on string PR? - NR: v2 filters in beta or after? JH: weird kerchunk - RA: to make xarray work we had to special case everything ("working" isn't accurate) - convenience for our users - NR: backwards compatibility. (lack of) v2 spec are out in the wild that we have to define indefinitely. (see extensions above) - 2 camps: people that think it through for a long time and the others that want to wage ahead. a tension that we have to work out. - JH: filters just land in the array metadata. never seen that in the wild. - Endianness (DB) - https://github.com/zarr-developers/zarr-python/issues/2324 - no longer part of the dtype. if someone create endian whatever, then the zarr array doesn't report it (have to check the codec) - creating a new array then it will get drop the endianess - don't care? in memory representation is decoupled from how it is stored. - NR: yes, that's what we did in zarrita. you can control how it lands out, but not how it is read into memory. - RA: prevent memmapping data? NR: that's what we have the metadata for. RA: can imagine an impl without codecs that wants memmap to access the data (though zarr-python doesn't work that way) NR: zarr-spec requires a bytes codec which defines the endianess. - JH: if you get a big endian array (e.g., zarr.save(np.array)) .... round-tripe so you get a big-endian back out the other side - NR: zarr.save() would need to handle. - JH: yes, interpret at the top-level and do something smart about the bytes codec - DB: if we keep it, what was the point of parameterizing it in the codec. - NR: compatibility, you need to store it somewhere - DB: what comes out is undefined. - RA: use platforms preferred endianness - DB: then users won't round-trip. won't come out in the chosen endianness - NR: struggled with this, but it is an implementation detail (incl. exposing it to the user). matters only for some performance issues. - DB: what is the dtype of a zarr array relative to what the user puts in. - NR: zarr only cares about how it looks on disk (not in-memory) - JM: zarr_array.dtype calculated from zarr data_type and checking the codec ("dynamic dtype") - NR: need to check the read path and what it is doing - RA: similar to the strings that it's coupling dtype and codec - DB: in v4 would like to see codec & dtype together - also want to put shape and chunk together (JM: plus shard) ## October 4, 2024 - Tom Augspurger / @TomAugspurger - Davis Bennett / @d-v-b - Sanket Verma / @MSanKeys963 - Josh Moore / @joshmoore - Ryan Abernathey / @rabernat Notes: - Array metadata refactor needs some mypy fixes (or we accept overriding) - Discussions on https://github.com/zarr-developers/zarr-python/pull/2272 (Davis) - https://yarl.aio-libs.org/en/latest/ - DB: not making stores mutable, but they do IO!... - RA: inconsistency in the definition of where a store begins - can some stores disallow starting from inside? - Similarity between store and URL (abstractly speaking) - Ryan: would suggest (also for sharding) to use stores? - Josh: but how to bootstrap? - Strings (Ryan) - https://github.com/zarr-developers/zarr-python/pull/2278 - define a string dtype on a zarr DataType rather than np - https://github.com/zarr-developers/zarr-python/pull/2036 - just implemented and then do the spec later - JM: try this out as a community codec (i.e. extension)? - then can discuss a ZEP to make a core codec. - SV: know users who are interested. (feel for core or not) - e.g. geopandas - DataClasses (Tom) ## September 20, 2024 - Joe Hamman / @jhamman - Tom Augspurger / @TomAugspurger - Davis Bennett / @d-v-b - Sanket Verma / @MSanKeys963 - Akshay Subramaniam Notes: - Storage transformers - decision: error when creating an array - https://github.com/zarr-developers/zarr-python/pull/2180 - Dtype validation for v3 - https://github.com/zarr-developers/zarr-python/pull/2209 - Fill value validation for v3 - https://github.com/zarr-developers/zarr-python/pull/2216 - Consolidated metadata discussion - Xarray integration - https://github.com/pydata/xarray/issues/9515 - Dask integration - https://github.com/dask/dask/pull/11388 - Doc sprint: https://github.com/zarr-developers/zarr-python/issues/2215 - What modules are we targetting for the sprint? - Create issues for different modules so that folks self-assign? - Some functions have docstring but missing a code sample - should we have code sample for docstring? - highest priority: - zarr.Group - zarr.Array - zarr.api.synchronous - next tier: - zarr.AsyncGroup - zarr.AsyncArray - zarr.api.asynchronous - zarr.storage - zarr.metadata - Should we also plan for tutorials? - consider removing `chunk_shape` kwarg ## September 13, 2024 #### Attendees - Joe Hamman / @jhamman - Tom Augspurger / @TomAugspurger - Davis Bennett / @d-v-b - Sanket Verma / @MSanKeys963 #### Notes - Updates - Davis is happy about recent improvements to - on people's minds: - Davis is going to look at the synchronizer api - Sanket: Doc sprint in September? Dates? How many days? Async? - Let's try for Sept. 30-Oct 1 - Yes, Async with a kickoff on Sept. 30 - Tom: consolidated metadata is getting pretty close - reworking metadata layout - first iteration will support reading/writing v3 consolidated metadata and reading v2 - should be possible to write new v2 metadata as well - will need to do more thinking on future proofing for metadata schemas - also thinking about the maximum depth of consolidation - storage transformers issue: https://github.com/zarr-developers/zarr-python/issues/2178 - may need to update the spec lanaguage around optional metadata fields ```python import zarr kwargs = zarr.codecs.make_sharding_pipeline( read_chunks={...}, write_chunks={}, compressor=Gzip(), ) zarr.create_array(shape=(...), **kwargs) ``` ## August September 6, 2024 #### Attendees - Joe Hamman / @jhamman - Tom Augspurger / @TomAugspurger - Norman Rzepka / @normanrz - Josh Moore / @joshmoore - Davis Bennett / @d-v-b - Akshay Subramaniam #### Notes - https://github.com/orgs/zarr-developers/projects/5/views/2 big lifts? - d-v-b: shape for sharding? does it have to change for 3.0.0? - shape is currently dependent on which codec - should encourage thinking about it as a new interpretation of chunking - JH: define some preset pipelines? NR: similarly. doesn't have to change the array. i.e., top-level API. - DB: people want easy access to the configuration for looping - JM: .writing_chunks to go with .reading_chunks. (would dask also adopt?) - DB: agreed, might be the right level of detail for users - would also help to guard against other implementations (transformers, etc.) - NR: also produce an ergonomic way of *creating* them - DB: you'd also want to pass as an argument - NR: ok, and doesn't have to be set forever. - JH: xarray/dask zarr-readers didn't need the attribute. (just from_array needs as an argument) - JM: default? which one wouldn't fail. - JH: default today is write chunk? NR: yes. but can be too big. - consolidated metadata - JH: reading/writing v2 metadata as a blocker - TA: writing, too? kinda yeah. - TA: status update - pretty straight-forward - issues with del item: do we **synchronize** out to the consolidated? (i.e. doing more IO) - relationship between group & store objects is just "call save metadata"? - JM: writing down the v2 schema in the v3 (since no v2 process) - JH: just do it in the v2 schema. people are using it. - docs (NR) - sprint? still happening. - issue raised about the formatting. not using the left pain. (sad & empty) - synchronizer API (JH) - issues ("it doesn't work") - hot potato: v2 has one but without distributed version - DB: how does it plug in in V2? - JH: mucked up the v2 code `_set_item_nosync` - DB: property of an array (i.e. high-up in the API) - JH: could go further down. store level? - DB: every store has a locking class - JH: zip store requires thread/asyncio locks (not-merged) - NR: not using synchronizers - JH: frequent bug reports in xarray - DB: does it have to live at arrays and groups because stores didn't know the key names - does it tie in to having the names knowledge in the store? - JH: possibly a high-level and a low-level store API - NR: higher-order store so that you can compose them - zip store always has it - mutable mapping - JH: use memory store to adapt anything (no async stuff though) - GPU (AS) - merged :tada: - testing with the codec interface. things look to be working. - JH: v3 branch *works*. other big lifts for 3.0.0? - batched store API. few minor issues. (rust under the hood?) - JH: definitely lots of small chunk calls. add to store API (bunch of keys) - DB: allow fetches to run out of order then it changes the API - gather runs sequentially - NR: async iterator? or as_completed - JH: streaming approach is the most powerful but almost most complicated - JM: add in delete and then it's approaching transactions - DB: no lazy execution model right now. leverage futures? - AS: gpu batch in kvikio does that, collects all the futures and then waits - DB: path is open for that. (if we're leave the mutable mapping API) not too painful. - AS: also async events needs separate codec pipeline. effects more. - DB: (dreaming) if txn as a context manager, then it could take a region - NR: :heart: - DB: docs as the most important - NR: agreed - DB: pay attention to what sucks. - NR: migration guide. - CLI tools (convert v2 to v3) - DB - NR: not difficult for small arrays - TA: zarr v3 metadata refer to v2 data? - NR: most of the time. only if the codec is compatible. zarrita had a function for that. could do that. - time-permitting (Josh) - impl tests, netcdf-c, bluesky/tiled ## August 30, 2024 #### Attendees - Joe Hamman / @jhamman - Sanket Verma / @MSanKeys963 - Akshay Subramaniam / - Davis Bennett / @d-v-b - Josh Moore / - Tom Augspurger / @TomAugspurger #### Agenda - alpha release last week - 2.18.3 is close (maybe today) - GPU PR is in - lots of stale PRs - async / sync boundary in store - look at how tensorstore does this - probably pass the store name and a config dict? - dvb: having users instantiate a store is kind of an anti pattern - want more or a declarative pattern - as: could be useful to decouple protocol from store api - like what we have w/ codecs - consolidated metadata - https://github.com/zarr-developers/zarr-specs/pull/309 - https://github.com/zarr-developers/zarr-python/pull/2113 - discussion about store api - any changes to the on disk format are a spec change - discussion about cache consistency and invalidation - - back to attrs? or something else? - serialization of metadata is really hard - Tom is looking at something here -> https://github.com/TomAugspurger/zarr-python/blob/feature/serde/src/zarr/_serialization.py - probably don't need to go to attrs ## August 23, 2024 #### Attendees - Joe Hamman / @jhamman - Sanket Verma / @MSanKeys963 - Josh Moore / @joshmoore - Norman Rzepka - Akshay Subramaniam - Gustavo Hidalgo #### Agenda - https://github.com/zarr-developers/zarr-python/pull/2102 - NR: important to have a written document - OME is also interested in support for reading v2 data - may be good to remove the `v2` module asap - JM: crux is supporting v2 and v3 data - does it make sense to create a zarr3 library - NR: not a fan of the zarrv3 - discoverability and asthetics are not great - pitch weekly alpha releases - need to do the work and get the release out - JM: when do we go from alpha to beta to full release - SV: also address these questions: https://github.com/zarr-developers/zarr-python/discussions/2093#discussioncomment-10429985 - alpha release frequency - proposal: weekly release on Monday - consolidated metadata - https://github.com/zarr-developers/zarr-specs/issues/136 - JM: no problem supporting this for v2 ala 2.* - add something that supports v2 data - add zep for v3 - RemoteStore - PR1956 - blocking: writing is completely broken because the exist method - doing naive synchronous user thing - `open_array(s3://...)` - using `sync` in user code - accessing fsspec directly - `store = await MyStore.open('s3://foo')` - `store = sync(MyStore.open('s3://foo'))` - `store = MyStore.open_sync('s3://foo'), loop=...)` - `sync_store = SyncWrapper(MyStore, 's3://foo')` - `sync_store.set(filename, bytes)` - GPU array progress - squashing bugs around merge conflicts - GPU CI is working now, need to sort out liminiting the size of the matrix and installing cupy - ## August 9, 2024 #### Attendees - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Sanket Verma / @MSanKeys963 - Eschal Najmi - #### Agenda https://github.com/zarr-developers/zarr-python/issues/2008 - PR updates - v2/v3 metadata: https://github.com/zarr-developers/zarr-python/pull/2059 *would love to see this merged --Davis* - picklable classes: https://github.com/zarr-developers/zarr-python/pull/2006 - GPU support: https://github.com/zarr-developers/zarr-python/pull/1967 - blocked by GPU runner on GitHub - also 2064, 2065 are close - set a docs sprint date - target late september ## July 26, 2024 #### Attendees - Joe Hamman / @jhamman - Norman Rzepka / @normanrz - Davis Bennett / @d-v-b - Hannes Spitz / @brokkoli71 - Gustavo Hidalgo / @ghidalgo3 - Sanket Verma / #### Agenda - API surface - Array API: https://github.com/zarr-developers/zarr-python/discussions/2052 - API survey: https://docs.google.com/spreadsheets/d/1ev4Hj_YU-QCiZJuxRYMrBqdrYYqP3tIdnYGmp9saJS8/edit?usp=sharing - https://github.com/zarr-developers/zarr-python/issues/2037 - Second alpha relase: https://github.com/zarr-developers/zarr-python/issues/2008 #### Notes: - sharding is complicating the `.chunks` attribute on Arrays - ideas ```python Array.chunks -> tuple[int] # raise if variable chunks or sharding Array.chunk_grid.read_chunks() # or inner_chunks or chunks Array.chunk_grid.write_chunks() # or outter_chunks or shards ``` - sharding configuration is pretty complicated today - template module - zarr.open_array remains the top level API where we can do user-friendly things - the Array.open method remains a low level entrypoint - async codec api - sharding is the only codec that needs to be async / do IO - NR: today we get scheduling in a threadpool - assumption that codecs release the GIL - need to do performance testing Deprecate in 2.18.3 - h5py compat methods TODOs from this meeting - performance test suite - dask + threaded scheduler - GPU runner billing - ## July 12, 2024 #### Attendees - Ryan Abernathey / @rabernat - Norman Rzepka / @normanrz - Davis Bennett / @d-v-b - Akshay Subramaniam / @akshaysubr #### Agenda - What to do with numcodecs? - Make a release, needs docs for - https://github.com/zarr-developers/numcodecs/pull/535 - https://github.com/zarr-developers/numcodecs/pull/531 - https://github.com/zarr-developers/numcodecs/pull/515 - Move more codecs specs into Zarr - ## June 6, 2024 #### Attendees - Joe Hamman / @jhamman - Juan Nunez-Iglesias / @jni ## May 30, 2024 #### Attendees - Joe Hamman / @jhamman - Norman Rzepka / @normanrz - Davis Bennett / @d-v-b - Akshay Subramaniam - Max Jones / @maxrjones #### Agenda * Upcoming alpha release Quick topics: - Norman, do we have an accessible api for extracting a shard index? - chunkstore API - joe: ask Martin - MemoryStore has Buffer objects in it :( - `out` kwarg #### Notes * Joe * Store open mode is in, but incomplete * Top level API is functional but needs a bunch of work * Working on sharding codec, using fsspec branch + top level API branch * slow for now * Norman * working on indexing, tests are working * stuck on typing * ready early next week * Davis * Store tests for Martin to get fsspec * Hierarchy api * codec pipeline API * typed dicts for metadata objects * Akshay * On vacation, keeping track of Buffer/Indexing PRs * Max * No updates, can contribute to the v3.0.0 docs task (starting with dev docs) ## May 23, 2024 #### Attendees - Joe Hamman / @jhamman - John Kirkham / @jakirkham - Juan Nunez-Iglesias / @jni - Sanket Verma / @MSanKeys963 #### Agenda * Upcoming alpha release * Joe's demo of new features: https://gist.github.com/jhamman/8381dd971d928bf220405057107562b1 ## May 17, 2024 #### Attendees - Joe Hamman / @jhamman - Norman Rzepka / @normanrz - Davis Bennett / @d-v-b - Max Jones / @maxrjones #### Agenda - Outstanding design topics for 3.0.0.alpha - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3A%22design+discussion%22 - Additional topics to consider before 3.0.0 (more deprecations may be desired) - synchronizers? or move to design topic - move sync.py to new module - object arrays? need a plan here - open issue - meta_array - assign to nvidia folks - maybe move to config - consolidated metadata (v2 and v3) - joe to take on - no support for v3 - write_empty_chunks - runtime array configuration - - Test sprint soon? #### Notes - release alpha next week, need top level api - chunks attribute - for now, regular chunk grid - indexing - oindex, vindex, integer, ... - https://zarr.readthedocs.io/en/stable/tutorial.html#indexing-with-coordinate-arrays ## May 8, 2024 #### Attendees - Joe Hamman / @jhamman - Davis Bennett @d-v-b - Norman Rzepka / @normanrz - Sanket Verma - Alden Keefe Sampson (AKS) - Akshay Subramaniam - John Kirkham Notes: - Progress update ([project board](https://github.com/orgs/zarr-developers/projects/5/views/2)) - numcodecs codecs: [numcodecs#524](https://github.com/zarr-developers/numcodecs/pull/524) - zstd in numcodecs needs a review: [numcodecs#519](https://github.com/zarr-developers/numcodecs/pull/519) - `HybridCodecPipeline` (interleaved with configurable batch size) needs a review: [#1670](https://github.com/zarr-developers/zarr-python/pull/1670) - Runtime configuration? [#1772](https://github.com/zarr-developers/zarr-python/pull/1772) - Batched store discussion - Store metadata methods: [zarr-python#1851](https://github.com/zarr-developers/zarr-python/pull/1851) - Initial `NDBuffer` implementation: [zarr-python#1826](https://github.com/zarr-developers/zarr-python/pull/1826) - Proposed new meeting times - week 1: Friday 7a PT - week 2: Thursday 3p PT Notes: Major updates - JH: - implicit groups are gone :) - - NR: codecs are getting into a good place - new rev on batched pipeline - DB: - out last week, getting back into it - group tests - need a decision about removing v2 code paths - should go now - AKS: - open PR generalizing array types TODOs: - add tests for ***v2*** and v3 arrays - ## April 24, 2024 #### Attendees - Joe Hamman / @jhamman - Davis Bennett @d-v-b - Jack Kelly / @jackkelly - Ryan Abernathey / @rabernat - Max Jones / @maxrjones - Sanket Verma - Akshay Subramaniam - Norman Rzepka / @normanrz Notes: - Progress update ([project board](https://github.com/orgs/zarr-developers/projects/5/views/2)) - Codecs - Norman needs a review on [#1670](https://github.com/zarr-developers/zarr-python/pull/1670) - Store API [#1806](https://github.com/zarr-developers/zarr-python/issues/1806) - discussion around batch vs interleve API - someone could look at https://github.com/zarr-developers/zarr-python/pull/1661 - - Group API ## April 22, 2024 #### Attendees - Joe Hamman / @jhamman - Norman Rzepka / @normanrz - Josh Moore / @joshmoore - Sanket Verma - John Kirkham - Martin Durant - Ryan Abernathey - Davis Bennett - Akshay Subramaniam Excused: Juan Nuñez-Iglesias #### Goals 1. Make sure we're all on the same page with what has been going on with the project 2. Organize around v3 efforts #### Agenda - Recent efforts - [Updates to core team](https://github.com/zarr-developers/zarr-python/blob/main/TEAM.md) (JH) - Moved some to emeritus status, etc. - We should work to get more core devs. (Lots to do) - RA: candidates? JH: let's get people making commits for a while. - meeting (JM) - propose to make it the regular meeting but find a time where everyone can join. - all aboard :tada: - Zarr-Python 3 update (JH) - [Design doc](https://github.com/zarr-developers/zarr-python/blob/main/v3-roadmap-and-design.md) - Progress update ([project board](https://github.com/orgs/zarr-developers/projects/5/views/2) + notes below 👇) - Ambitious release schedule ([#1777](https://github.com/zarr-developers/zarr-python/issues/1777)) - loose plan: May/alpha, June/release - roughly following the Pydantic 2 model (breaking API changes) - JM: all for getting pre-releases out - NR: need to get rid of the v3 folder (messes things up) - `support/v2` branch may be useful (JM) - JM: we probably should be pushier - JH: sure, just our 100% confidence may lag - NR: define a window, e.g., by the end of the year everyone should move. - JH: pinned issue with the release plan? Yes. - https://github.com/zarr-developers/zarr-python/issues/1777 - v3 update - DB: - v3 **metadata** is done, i.e., can create spec compliant v3 arrays - working on **groups** that would work as expected, e.g., listing children (one of 2 big PRs). nearly done. required getting into the async implementation which is one of the biggest changes for the storage layer. Also means that we're not able to just paste in old code. - high-level **convenience** APIs are not there - only the nucleus of a testing strategy. using a different strategy from v2. bringing in what we can. - NR: - codecs are pretty advanced. async ... - MD: that means thread pools? NR: Yes. They can choose how they do that. - JH: core part of that is in the v3 release that will spend time on async/threading/scheduling. Lots of new behavior that we're going to learn about it. But now we have an API that can be tuned. - arrays are missing **indexing** - **documentation** is largely open. (pushed to post alpha for the launch) - JH: - on our way towards having 100% **type hints** - abstract **base classes** for the Store and Codecs that allow people to implement their own (outside of zarr-python) with an entrypoint system. perhaps something for **chunk grids** as well - store is no longer a mutable mapping but a custom class. list methods are async generators, etc. all synchronization happens upstream. Synchronize wrapper of Arrays and Groups, but wait until you're at the top-level API for sync. - build is cleaner. using **hatch**. - need to discuss **numcodecs**. currently isn't taking part in the protocol system. what does it mean to Zarr going forward? - https://github.com/zarr-developers/numcodecs/issues/502 - Discussion - JK: documented path for upgrading? JH: no, there's an open ticket. Need docs on upgrading code but also **migrating data**, e.g., metadata only changes. (Alistair did this for v1->v2). - MD: need to discuss what **kerchunk** is going to do. it will take some pretty deep working. the style of where the metadata is has changed (along with the codeces). JH: yeah, no filters, all one pipeline. i.e., just the metadata. MD: more involved (i.e. goes deeper) into Zarr then other things. RA: zarr data model that is independent of the spec could be super useful. DB: don't think there is an overlap of v2 and v3 arrays. i.e., it would be a UNION. you need to map between the names and the types. don't get that for free just with the hierarchies. RA: don't do it once off, but build something re-usable and then serialize those. JH: clearly separated metadata from the classes. can turn one dataclass into another dataclass. (work to be done) - DB: spec allows **v2/v3 things to be mixed**, so a coroutine of some form may need to be opinionated about what it prioritizes. JH: good point, since you might have to look for 4 things, or prioritize one or the other. we should just be clear and then let people suffer the consequences. RA: have some shim functions similar to the current `open()` which keeps things working. JH: zarr.open has a version flag. None could mean do both. - **implicit groups** (DB) basically anything is a group even if it has garbage in it. NR: haven't seen anyone who is against removing it. - DB: if so, also make mixing versions disallowed? JM: can we allow a complete mixing? JH: don't want to be polymorphic about children. DB: can't **forbid** having .zattrs in a v3 group/array. Agreed. - JM: if need be, can try to organize a ZIC meeting with SV. - Numcodecs (NR) - Opened PR today if someone wants to review that, but more generally where are we with numcodecs - have specialized codec classes in v3 branch. arrays-to-bytes, etc. etc. Different classes from in numcodecs. Do use it under the hood. - for v2 support in the v3 branch we use the code unchanged. we ask numcodecs to do it for us. we could pull that into the v3 arrays which would give us support for batching, async, etc. we will likely need some glue code. (that's with minimal effort). Do we move numcodecs in a direction such that it uses v3 abstract classes. - DB: I like the idea of their being a repo on github for people to go to. numcodecs should exist where we have these compression routines at a low level. It should be there to support zarr. - NR: **closed list**. How do we handle that? - DB: spec says that it just needs a URL. - NR: what if two implement blosc2 differently. - DB: people are going to do what they want. - NR: make it more difficult? or use the github URL to prefix? - JH: raise a warning that users can turn off if they aren't using an approved list of codecs. for experimentation, we definitely want to make it possible (and **easy**). That's what zarr-python is known for. - DB: what's the advantage of enumerating a list of codecs? - NR: when creating an implementation, you can just follow the list. - JM: allows us to say, "this implementation is not complete". plus also :+1: for a schema where possible. - NR: ok to have optional codecs - JH: open issue with `tifffile` of a missing default flag (size parameter?) - https://github.com/cgohlke/tifffile/issues/211 - RA: a way out of this is to outsource as much as we can with blosc, has an ambition of being a meta compressor - AS: blosc as the main library is that it also has sharding etc. under the hood. better IMO to just expose the compression stream formats. blosc is less flexible than numcodecs currently. (more difficult to add new compressors or options) - AS: gzip links to RFC not an implementation, i.e., a specific stream format. this is also an issue with numcodecs lz4. would be good to have these written down and link to a **spec**. - Continue conversation on https://github.com/zarr-developers/numcodecs/issues/502 - [NASA Funding Opportunity](https://nspires.nasaprs.com/external/solicitations/summary.do?solId=%7B910CC61E-4616-9958-C26F-F8D9BC5AB8D9%7D&path=&method=init) (JH) - Planning to submit a LOI next week - targetting Zarr-Python, v3 feature development, and support at least 3 years - TODOs - [ ] find a new time for the bi-weekly meeting. becomes zarr-python dev meeting but open invitation to anyone who would want to join. #### Notes ## April 10, 2024 - SV: Zarr-Python B&P meeting discontinued - rename refactor meeting to 'Zarr-Python meeting'? - JH: updates since last meeting - [Project board](https://github.com/orgs/zarr-developers/projects/5/views/2) - Many new issues! - Discuss timeline [#1777](https://github.com/zarr-developers/zarr-python/issues/1777) Active work - DB: removing old v3 - JH: move v3 dir to root, remove v2 stuff - JH: list_* (AsyncGenerator[str]) - stream through `members`: https://github.com/zarr-developers/zarr-python/pull/1782/files#r1558820360 - https://stackoverflow.com/questions/78301926/asyncio-creating-a-producer-consumer-flow-with-async-generator-output - AS: generalized NDarray support - two options for the design in https://github.com/zarr-developers/zarr-python/issues/1751 - can we use c++ for this? will make zero-copy memory sharing easier - Qs: what does it mean for development process and pyodide support? - MJ: merged CI updates - looking for the next thing ## Apr 5, 2024 - Davis Bennett / @d-v-b (DB) - Norman Rzepka / @normanrz (NR) - Joe Hamman / @jhamman - Deepak Cherian / @dcherian ### Todo list :point_down: Note: the topics listed below have been converted to issues and placed on the v3 project board: https://github.com/orgs/zarr-developers/projects/5 p0 - must happen now p1 - must happen before alpha release (target first week of May) p2 - must happen before 3.0 release (target early June) p3 - nice to have, can happen after 3.0 release * Arrays * [p1] [Finalize codec API, including codec pipeline](https://github.com/zarr-developers/zarr-python/issues/1659) * [p1] [Try out codec entry points](https://github.com/zarr-developers/zarr-python/issues/1748) * [p1] [Array indexing feature parity with zarr-python 2](https://github.com/zarr-developers/zarr-python/issues/1749) * [p2] [Resolve numcodecs question](https://github.com/zarr-developers/zarr-python/issues/1750) * [p2] [Generalized array support (numpy, cupy, jax, torch etc)](https://github.com/zarr-developers/zarr-python/issues/1751) * [p3] [ChunkGrid API -- do we need it now (or ever)?](https://github.com/zarr-developers/zarr-python/issues/1752) * Groups * [p1] [implement members / children](https://github.com/zarr-developers/zarr-python/issues/1753) * [p3] [(reach) declarative hierarchy API](https://github.com/zarr-developers/zarr-python/issues/1754) * Store * [p0] [Finalize store API](https://github.com/zarr-developers/zarr-python/issues/1755) * [p1] [Deprecate stores we don't want in Zarr-Python core anymore](https://github.com/zarr-developers/zarr-python/issues/1756) * [p1] [remote store support (s3, gcs, azure, http)](https://github.com/zarr-developers/zarr-python/issues/1757) * [p3] [request coalescing: where to implement?](https://github.com/zarr-developers/zarr-python/issues/1758) * [p3] [Storage transormer API -- do we need it now (or ever)?](https://github.com/zarr-developers/zarr-python/issues/1718) * [p3] (reach) http proxy - may not need to be in zarr-python * [p3] [(reach) caching bytes-chunks in the store, caching array chunks in the array](https://github.com/zarr-developers/zarr-python/issues/1500) * Tests * [p1] [Bring in as much of the existing test suite in as possible](https://github.com/zarr-developers/zarr-python/issues/1759) * [p1] [Test serialization of Arrays](https://github.com/zarr-developers/zarr-python/issues/1760) * [p1] [Add test that instruments traffic to the store -- we should be very careful to only read what is needed](https://github.com/zarr-developers/zarr-python/issues/1761) * [p2] [Develop integration test suite in Zarr-Python -- needed to validate new async tooling (could be xarray and Dask)](https://github.com/zarr-developers/zarr-python/issues/1762) * [p2] [Coordinate downstream testing (e.g. Dask + Xarray)] * [p3] [Develop performance test suite in Zarr-Python](https://github.com/zarr-developers/zarr-python/issues/1763) * [p3] [Add hypothesis test hooks](https://github.com/zarr-developers/zarr-python/issues/1764) * Docs * [p2] [Update developer docs](https://github.com/zarr-developers/zarr-python/issues/1765) * [p2] [Update API docs](https://github.com/zarr-developers/zarr-python/issues/1766) * [p2] [Update tutorial docs](https://github.com/zarr-developers/zarr-python/issues/1767) * [p2] [Write a doc about how Zarr-Python thinks about consistency and how it opperates when concurrent writers are acting on a store/group/array/chunk](https://github.com/zarr-developers/zarr-python/issues/1768) * [p2] [Start a release doc summarizing the major changes in V3](https://github.com/zarr-developers/zarr-python/issues/1769) * [p3] [docs for extending zarr](https://github.com/zarr-developers/zarr-python/issues/1770) * how to write a custom store * how to subclass array / group (e.g., to have typed attributes, typed members) * How to get good performance * Misc * [p1] [Top level zarr API (open, create)](https://github.com/zarr-developers/zarr-python/issues/1598) * [p1] [make mypy and pylint/ruff happy](https://github.com/zarr-developers/zarr-python/issues/1593) * [p1] [Dial in runtime config API -- e.g. IO loop cannot be attached to](https://github.com/zarr-developers/zarr-python/issues/1772) * [p2] [`TypedDict` for all typed dictionaries](https://github.com/zarr-developers/zarr-python/issues/1773) * [p2] [develop synchronization API or declare it dropped](https://github.com/zarr-developers/zarr-python/issues/1596) * [p2] Add logging throughout * Migration * [p3] 2 -> 3 conversion cli, (maybe in its own repo) * [p1] [remove v2 code](https://github.com/zarr-developers/zarr-python/issues/1771) ## March 27, 2024 - Davis Bennett (DB) - Alden Keefe Sampson (AKS) - Norman Rzepka / @normanrz (NR) - Sanket Verma / @MSanKeys963 (SV) - Akshay Subramaniam / @akshaysubr (AS) - Max Jones / @maxrjones (MJ) - Raphael Hagen / @norlandrhagen (RH) ### Meeting notes: - Sanket: Bi-weekly meeting ends on May 1st, 2024 - shall we continue after that? - Yes! Schedule it until June end! - DONE - DB: Fleshing out the group API in v3 https://github.com/zarr-developers/zarr-python/pull/1726 - NR: We need to find a common understanding of what we still need to work on for beta release. NR will create a tracking issue. - Akshay: Generalized array support - Where to create issue to track this? zarr-python or zarr-specs? Any direction for structuring the issue and proposal? - Create a native zarr NDArray class for typing and to interface with existing protocols. This includes: - Buffer protocol - `__array_interface__` - `__cuda_array_interface__` - DLPack - Raw pointers ```cpp namespace zarr { namespace py = pybind11; using namespace py::literals; class Array { public: Array(zarrArrayInfo_t* array_info, int device_id); Array(py::object o, intptr_t cuda_stream = 0); py::dict array_interface() const; py::dict cuda_interface() const; py::tuple shape() const; py::tuple strides() const; // Strides of axes in bytes py::object dtype() const; zarrArrayBufferKind_t getBufferKind() const; // Device or Host buffer py::capsule dlpack(py::object stream) const; // Export to DLPack py::object cpu(); // Move array to CPU py::object cuda(bool synchronize, int device_id) const; // Move array to GPU const zarrArrayInfo_t& getArrayInfo() const { return array_info_; }; static void exportToPython(py::module& m); }; } // namespace zarr ``` - Interoparability with Numpy ```python ascending = np.arange(0, 4096, dtype=np.int32) zarray_h = zarr.ndarray.as_array(ascending) print(ascending.__array_interface__) print(zarray_h.__array_interface__) print(zarray_h.__cuda_array_interface__) print(zarray_h.buffer_size) print(zarray_h.buffer_kind) print(zarray_h.ndim) print(zarray_h.dtype) ``` - Interoparability with Cupy ```python data_gpu = cp.array(ascending) zarray_d = zarr.ndarray.as_array(data_gpu) print(data_gpu.__cuda_array_interface__) print(zarray_d.__cuda_array_interface__) print(zarray_d.buffer_kind) print(zarray_d.ndim) print(zarray_d.dtype) ``` - Convert CPU to GPU ```python zarray_d_cnv = zarray_h.cuda() print(zarray_d_cnv.__cuda_array_interface__) ``` - Convert GPU to CPU ```python zarray_h_cnv = zarray_d.cpu() print(zarray_h_cnv.__array_interface__) ``` - Anything that supports the buffer protocol ```python with open('file.txt', "rb") as f: text = f.read() zarray_txt_h = zarr.ndarray.as_array(text) print (zarray_txt_h.__array_interface__) zarray_txt_d = zarray_txt_h.cuda() print(zarray_txt_d.__cuda_array_interface__) ``` ## March 13, 2024 - Joe Hamman / @jhamman (JH) - Alden Keefe Sampson (AKS) - Norman Rzepka / @normanrz (NR) - Sanket Verma / @MSanKeys963 (SV) - Akshay Subramaniam / @akshaysubr (AS) - Max Jones / @maxrjones (MJ) - Agriya Khetarpal / @agriyakhetarpal (AK) ### Meeting notes: - Alden / top level API - https://github.com/zarr-developers/zarr-python/issues/1598#issuecomment-1994729420 - In the v3 library, are the top level methods - the primary way users interact with the library? or - the smoothest on ramp for v2 library users into v3? (and Array.xxx and Group.xxx become primary) - something else? - Notes: - Joe's thought: We want to provide a pretty similar interface, help massage or raise errors when args not compatible. We can start deprecating and changing behavior - Norman: like Array. and Group entrypoints, but we need to have these top level entry points. promote the Array and Group classmethods in the docs - Joe: don't love polymophism of .open, but it exists - The kwargs to any method that can create an array are currently v2 specific (filters vs codecs, etc), plus there are a number of performance/behavior modifying args (cache_[metadata|attrs], partial_decompress, write_empty_chunks, dim separator). Do we - try not to change the api at all and try to translate into spec V3 land - make the kwargs actually match those of the Array.xxx v3 library methods, but also take in `**v2_kwargs` and translate where possible, checking for conflicts with v3 kwargs if provided - make the top level methods kwargs align with those of Array.create, etc - Norman: for open: make it compatible, you get back your method, for create could make - Runtime parameters: many of these didn't know exist, can debate case by case - Joe: if run time param provided that v3 don't use, raise error - Currently zarr.open and similar will return an array if it exists, even if the existing array's dtype, codecs, etc don't match those provided. - Keep this? - Joe: think we should raise an error, wide agreement on this - Norman / Batched and interleaved codec pipelines - https://github.com/zarr-developers/zarr-python/pull/1670 - Hybrid interleaved batched codec pipeline - How to set runtime configuration? - Batched store API - BatchedCodecPipeline as abstract class that can be overridden by user code - Move thread dispatch from codec to pipeline to allow for coalescing and locality? - Akshay / Generalized array support - Open issue and tag with v3 - Sanket / Summary for the core-devs → potential blog post in the future - JH: I can take this on -- target April? - SV: Sounds good! - Agriya / Zarr Pyodide support, out-of-tree - Requires patching numcodecs, zarr here and there - Zarr is pure Python, so lesser patches there. Numcodecs needs more patches because it is Cython-based. - Already done by Pyodide devs per Pyodide/Emscripten release - The Emscripten and Pyodide versions are not decoupled yet - Leads to missing versions - Establish CI job that runs on PRs and nightly — or just nightly - Issue with this is maintainability and how to keep support? - Interactive documentation (end goal). - **Action item**: I will be opening an issue for this on the Zarr repository and link both previous discussions (the ones that I have found). Discussion may proceed there further ## February 28, 2024 - Joe Hamman / @jhamman (JH) - Tom Nicholas / @TomNicholas - Norman Rzepka / @normanrz (NR) - Davis Bennett / @d-v-b (DB) - Sanket Verma / @MSanKeys963 (SV) - Akshay Subramaniam / @akshaysubr (AS) - Charles Stern - Alden Keefe Sampson (AKS) ### Meeting notes: - Numcodecs discussion - https://github.com/zarr-developers/numcodecs/issues/502 - How ready is the v3 branch for kerchunk-related experiments? - i.e. chunk manifest ZEP, virtual concatenation ZEP - https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view - v3 store discussion - https://github.com/zarr-developers/zarr-python/discussions/1686 ## February 14, 2024 - Norman Rzepka / @normanrz (NR) - Davis Bennett / @d-v-b (DB) - Sanket Verma / @MSanKeys963 (SV) - Akshay Subramaniam / @akshaysubr (AS) ### Meeting notes: - AS: Planning to send a couple PRs around numcodecs and wanted to join the refactor meeting to get the sense of current state of things - NR: https://github.com/zarr-developers/zarr-python/pull/1660 - AS: Plan to add the encode and decode batch in numcodecs and move the logic from the Zarr-Python to numcodecs - NR: In the current codec mechanism there will be place to add the encode/decode class - SV: Also a question of how you'd add a new codec in V3 - https://zarr-specs.readthedocs.io/en/latest/v3/codecs.html - NR: Could use entrypoints for the new codec registrations - AS: New codecs are added via KwickIO - AS: https://github.com/zarr-developers/zarr-python/issues/1398 ## January 31, 2024 ### Attendees - Sanket Verma / @MSanKeys963 (SV) - Norman Rzepka / @normanrz (NR) - Davis Bennett / @d-v-b (DB) - Max Jones / @maxrjones (MJ) - Alden Keefe Sampson / @aldenks (AS) - Raphael Hagen / @norlandrhagen (RH) - Charles Stern / @cisaacstern (CS) - Jeremy Maitin-Shepard / @jbms (JMS) ### Meeting notes: - NR: Codec pipeline - Open question: merging partial and full versions? - Next - reading/writing partial chunks for uncompressed data - DB: Saransh helped with hatch and source layout updates - providing a review on packaging PRs: https://github.com/zarr-developers/zarr-python/pull/1592 - new branch for V3 work - removing attrs - using frozen dataclasses - relies on handling to/from dict in each class with validation functions - MJ: No updates, participating in Joe's virtual sprint on Zarr refactor - can test out setting up test env with Hatch, provide feedback - AS: Setup on dev environment, still intending to work on [high-level methods](https://github.com/zarr-developers/zarr-python/issues/1598). - Also adding setup/dev environment doc improvements to https://github.com/zarr-developers/zarr-python/pull/1643 - RH: No updates - CS: Interested in participating in Zarr sprint remotely - JMS: No updates, analogous decisions in tensorstore ## January 17, 2024 ### Attendees - Sanket Verma / @MSanKeys963 (SV) - Joe Hamman / @jhamman (JH) - Norman Rzepka / @normanrz (NR) - Davis Bennett / @d-v-b (DB) - Max Jones / @maxrjones (MJ) - Alden Keefe Sampson / @aldenks (AS) - Raphael Hagen / @norlandrhagen (RH) ### Meeting notes: - SV: plans for numcodecs going forward - TODO: connect with JK about this - JH: made some good progress on the Store API - Thinking about what to do when keys are missing, `raise KeyError` or `return None` - Needs work: `getsize`, `move`, `tree`, `rmdir`, `open`, `close` - NR: move should only exist on a store if its cheep - Open questions: - `Store.list_*` could change to return async generators - NR: working on codec api, removing array metadata - not 100% happy with the API yet - new methods: `evolve` and `validate` - check if the codec matches - looking for input here https://github.com/zarr-developers/zarr-python/pull/1632 - DB: working on a messy / unmergable PR for the Array API - end goal: unify array/group apis for v2 and v3 - added a new directory with v2 and v3 metadata - stuck on dataclass inheritance - JH: where will the normalization of metadata keys - MJ: not much, going to pick up the hatch PR - JH: yay! - DB: flag issue around imported modules from tests https://github.com/zarr-developers/zarr-python/pull/1601 - AKS: playing with zarr in rust - very fast! - going to pick up the top level api this week - RH: No updates at this time ### Discussion What goes in the beta release 1. core array, group, and store api 2. thesis: almost feature complete but the api should be set - we want people to start using it so we can get some feedback 3. ## January 3, 2024 ### Attendees - Sanket Verma / @MSanKeys963 (SV) - Alden Keefe Sampson / @aldenks (AS) - Joe Hamman / @jhamman (JH) - Norman Rzepka / @normanrz (NR) - Davis Bennett / @d-v-b (DB) - Max Jones / @maxrjones (MJ) ### Meeting notes: - JH: Still working on Group and Store APIs - NR: Left off with codec api, sharding api, and sharding layouts - DB: Still working on array api - considering a major change to indexing/slicing api (slicing a Zarr array gives NumPy array, which is weird and should give out a Zarr array) - thinking about serialization of nested objects - partial writes - MJ: Looking for a place to jump in - Refactor metadata objects (e.g. ChunkGrid, ChunkKeyEncoding) - Remove attrs and refactor (de)serialization - Hatch PR https://github.com/zarr-developers/zarr-python/pull/1592 - AKS - may want jump in on the top level `zarr.foo*` api ## December 20, 2023 ### Attendees - Charles Stern / @cisaacstern (CS) - Jack Kelly / @JackKelly (JK) - Sanket Verma / @MSanKeys963 (SV) - Alden Keefe Sampson / @aldenks (AS) - Joe Hamman / @jhamman (JH) ### Meeting notes: - CS: nothing directly on zarr, keeping an eye on the zarr issues with help wanted tags - SV: looking at hatch pr from davis, zep 0 revisions, and zarr paper - JK: working on a [light-speed-io](https://github.com/jackKelly/light-speed-io/) project (rust), playing around with ideas for fast data loading - AS: seems too early to jump in, don't want to get in the way - JH: Many things in progress: Store, Codecs, Arrays, Groups ## December 6, 2023 ### Attendees - Ryan Aberanthey / @rabernat - Joe Hamman / @jhamman (JH) - Charles Stern / @cisaacstern (CS) - Jack Kelly / @JackKelly (JK) - Sanket Verma / @MSanKeys963 (SV) - Davis Bennett (DB) / @d-v-b - Raphael Hagen / @norlandrhagen - Alden Keefe Sampson / @aldenks - Norman Rzepka / @normanrz ### Meeting notes: - Design doc for v3.0 has moved to GitHub. If you want to comment on the design then comment on [the GitHub pull request]( https://github.com/zarr-developers/zarr-python/pull/1583). - Zarrita: Alistair started it as a place to experiment with the Zarr v3 spec (back in July 2020). Norman picked Zarrita up in Summer. - We've taken Zarrita, copied it, renamed & refactored things. There's a new Store interface (we're leaving behind the mutable mapping interface of Zarr stores.) Aiming for 100% coverage of static type checking. - Norman, Davis & Joe have been together for the last 3 days (in Berlin). JH has been working on the Group API (compatibility with Zarr-Python's group API). See [this PR](https://github.com/zarr-developers/zarr-python/pull/1590). For v2 there are two metadata docs which describe a group. Reads now happen async: which cuts the loading time for groups in half. Now working on listing contents of a group. - DB: In Zarrita we have representations of arrays for v2 and v3. DB has been working on a uniform interface to v2 and v3. Breaking lots of things :). Looking at how codecs decompress & compress. Overall strategy is to use the v3 way of doing things. See [DB's PR](https://github.com/zarr-developers/zarr-python/pull/1589). - RA: It's great that there are both async and sync APIs. But downstream datascience libraries will always want the sync API. - NR: The Zarrita code is based on fsspec, with some small changes. - JH: We're just using `fsspec` (for now). Very convinced that having an async API is the right call for Zarr-Python. Less convinced that the `fsspec` way of doing things will be the long-term solution. - NR: Adding sharding strategies. Customise how chunks are laid out in the file. e.g. if you want to do partial writes (where you can write to specific places). Instead of having to write entire shards at a time. Writing tests right now. First PR has been merged into the v3 branch (codecs). - JH: The codecs are now an entry point into the Zarr-Python code. Zarr-Python v2 basically supports any codec in numcodecs. Do we need to register _all_ of those compressors and filters as codecs? Or should we limit them? - NR: Let's make a generic codec. bytes-to-bytes, and array-to-bytes. - JH: For now, we've decided not to work on variable chunk sizes. We could release a version of Zarr-Python without variable chunk sizes. Questions? - RA: Everyone's very supportive! This is what we need to get over the rut that zarr-python has been stuck in. A lot of folks would like to help, but don't know _where_. Are there concrete tasks that we can give to people? (The answer may be "no"! Some software projects are just no parallelizable like that.) - JH: Some of these first blocks of work have required us to already resolve conflicts. I'll jot down a couple of tasks which could be useful for folks to work on. - the top-level API has not been ported over yet (e.g. `zarr.open()`). Most people use that top-level API. - documentation! A lot of copy-and-pasting from Zarr-Python v2.0. But some function signatues will change. - type checking needs work! MyPy isn't happy right now. - DB: v3 introduces the concept of a codec pipeline. We build an object which is a sequence of transformations of chunk data, which leads to it being storage, or - in reverse - leads to it being opened. The documentation for this doesn't exist yet. If anyone has an idea for a transformation, then work through the process of doing this with the v3 machinery, and write up some docs for how to do this. v3 is a lot more explicit about how chunks are encoded. - NR: +1 to adding codecs, or wrapping numcodecs. Also: - try wiring the new Zarr-Python to downstream libraries. That would tell us what's missing in terms of the public API. - Also, it'd be useful to having a champion for variable chunking. The codec pipeline should know about variable chunking. - RA: The problem with the ZEP process is that it's hard for ZEP authors to just implement the ZEP. - JH: We should be able to find byte-sized chunks which folks can work on. - NR: It'd be great to keep up the momentum after this week. And it'd be great to have a beta by Jan 2024! - RA: Where is the discussion of the ZEP3 proposal ([here's the PoC implementation PR](https://github.com/zarr-developers/zarr-python/pull/1483))? And the discussion is [here](https://github.com/orgs/zarr-developers/discussions/52). Looking for a champion on: - variable chunking - synchronizers - h5py compat ## Agenda - Update from Joe + Davis on refactor progress ## November 22, 2023 ### Attendees - Joe Hamman / @jhamman (JH) - Charles Stern / @cisaacstern (CS) - Jack Kelly / @JackKelly (JK) - Sanket Verma / @MSanKeys963 (SV) - Davis Bennett (DB) ## Agenda - Zarr-Python 3.0 design doc: https://hackmd.io/0DVKP6d9QI-VaHc0zvOuxw ### Meeting notes: - JH: We can start working from store interface - kind of leaky abstraction - JK: Looking to read million chunks - sharding helps with that - discussions around batching requests in Zarr-Python - requesting million chunks in a single request - if Zarr V3 is a good place to pull in these performance bumps? (don't want to delay the existing work) - DB: To make `get` more efficient, you need to wrap it in something - mostly users are getting multiple chunks at a time - JH: In Dask/Xarray world you map a single chunk of Zarr at a time - At Earthmover there is 1-to-1 reads - to handle big size chunks we have rechunker - sharding codec sits above the store interface - JH: https://github.com/scalableminds/zarrita/blob/async/zarrita/sharding.py#L309 - indexing for sharding - the sharding codec will need access to store API whereas the other codecs doesn't need it - DB: Like the idea - add a new abstraction - we have leaky abstraction and we can use it - JH: Norman is willing to help but only if Zarr-Python is first class citizen - JH: https://docs.xarray.dev/en/stable/roadmap.html - publish the roadmap on [Zarr-Python docs](https://zarr.readthedocs.io/) for the community - JH: Jack can help us in fast concurrent loading problem - JH: Meeting with Davis and Norman in 1.5 weeks to work on Zarr-Python 3.0 ## November 8, 2023 ### Attendees - Joe Hamman / @jhamman - Charles Stern / - Raphael Hagen / @norlandrhagen - Sanket Verma / @MSanKeys963 - Martin Durant / ## Agenda - Zarr-Python 3.0 design doc: https://hackmd.io/0DVKP6d9QI-VaHc0zvOuxw - how would batching work across arrays - use pydantic zarr - other dependencies - python 3.9 - drop in dec.? - which sharding impl ## November 1, 2023 ### Attendees - Joe Hamman / @jhamman - Davis Bennett / @d-v-b - Sanket Verma / @msankeys963 - Raphael Hagen / @norlandrhagen - Charles Stern - Brian Davis - Thomas Nicholas ## Agenda - Request - can someone try to take some notes today? - Request - can we move this meeting time to 8:30a PT (currently at 9a PT). - V3 API migration - Now that we are starting to work on implementing v3, we're faced with the question of what to do with the existing API - Observation: the current v2/v3 polymorphism is unsustainable (and incorrectly prioritizes v2 internally) - Proposal - we create a v3 namespace within zarr-python where we can develop in an isolated space toward a complete v3-spec implementation - Included in this namespace: - classes: `zarr.v3.{Array,Group,Store}` - These classes implement an internal api that closely aligns with the v3 spec - - high level functions: `zarr.v3.{create, open, ...}`` - As much as possible, these function should look and feel like the v2 equivalents but should not be tied to the exact implementation - e.g. `zarr.create(shape=..., dtype=..., compressor=...) -> zarr.create(shape=..., data_type=..., codecs=..., attributes=...)` - We may also want to deprecate and/or rename some of the existing top level functions - backward compatability: - high-level functions in the v3 namespace should be able to `create` or `open` a v2 dataset - The `Group` or `Array` does not need to be backward compatible though. - All development toward v3 happens on the `main` branch in zarr-python - Alternative proposal - We avoid the `v3` namespace and instead take over the primary namespace in a development branch (e.g. `v3`) - When we feel that the `v3` branch is complete, we merge to main and make a `3.0` release - Folks have less time to test out the v3 implementation but we have a cleaner development process - Ideas - Idea of zarr array is to look like a numpy array - could move all the zarr array details to a polymorphic metadata object - trim things down to just the minimal array api interface - declarative heirarchy specification - type hints - #### Sanket Notes - DB: Definition of Zarr and Dask chunks are different and that's not good - JH: Benefits of generative chunk indexing - Impacts with sharding, variable chunking and other shiny feature - Large array with billions of chunks - JH: Maintaining both V2 and V3 at the same time is not ideal - DB: V2 has of lot stuff that people don't use - stores - TN: The current public facing APIs (V2 and V3) are conformant to the existing spec - but what we're thinking to work on a new public facing API which is wrapper of V2 and V3, and not conformant to V3 exactly - isn't that a bad thing? - DB: The public-facing Zarr array object API is not covered by the spec anyway - Also can't be, because syntax might be language-dependent - Therefore we have full freedom in the public python API of the python zarr array type - TN: Okay, in that case makes sense to follow python array API standard as much as possible - TN: Array API has granular functionality which is super useful (e.g. you can say "we don't support the statistical functions") - TN: Note that chunking is not part of the array API standard ## October 18, 2023 ### Attendees - Joe Hamman / @jhamman - Max Jones / @maxrjones - Davis Bennett / @d-v-b - Tom Nicholas - Charles Stern - Sanket Verma - Ryan Abernathey ## Agenda - Proposal: just use Zarrita :) - 0.1% done: https://github.com/jhamman/zarr-python/pull/1 - Ryan added memorystore to Zarrita: https://github.com/scalableminds/zarrita/pull/12 ## September 20, 2023 ### Attendees - Joe Hamman / @jhamman - Charles Stern / @cisaacstern - Sanket Verma / @MSanKeys963 - Raphael Hagen / @norlandrhagen ## Agenda - Review ZEP 6 proposal and proposed implementation - https://github.com/zarr-developers/zeps/pull/46 - https://github.com/zarr-developers/zarr-python/pull/1526 - Goal with ZEP6 in Zarr-Python - Clean up interface for Group/Array constructors from V2/V3 metadata - Use ZOMs internally as part of the migration to V3 spec - Use ZOMs in array/group constructors to consolidate initialization reads/writes - https://github.com/zarr-developers/zarr-python/issues/538 (repeated writes to set attrs) - https://github.com/pangeo-data/pangeo-eosc/issues/39 (many contains/iter calls) - Expose ZOMs to third parties ## September 6, 2023 ### Attendees - Joe Hamman / @jhamman - Ryan Abernathey / @rabernat - Sanket Verma / @msankeys963 - Raphael Hagen / @norlandrhagen - Ryan Williams - Charles Stern / @cisaacstern - Davis Bennett / @d-v-b ## Agenda - review scoping section (below) - performance - zarr + pydantic (https://github.com/janelia-cellmap/pydantic-zarr) - observation: Zarr-python is missing specific data models for Groups / Arrays - price of depending on pydantic is probably not worth it - ## Scoping V3 update (by @jhamman) _Written by @jhamman on September 5, 2023_ In the Winter and Spring of 2022, while the V3 spec was still under development, an experimental V3 implementation was added to the Zarr-Python codebase ([#898](https://github.com/zarr-developers/zarr-python/pull/898)). This implementation followed the spec, as it was written at the time. However, in the months following these developments, major changes to the spec were made. This has left Zarr-Python out of sync with the V3 specification. ### Summary of current status 1. V3 support is behind an experimental API (accessed by setting `zarr_version=3` and `ZARR_V3_EXPERIMENTAL_API=1`). 2. A separate code path for V3 stores was implemented in `zarr._storage.v3`. Major changes to the spec since the experimental implementation include: - Entrypoint metadata document (`zarr.json`) is no longer required - Metadata keys were renamed (e.g. `meta/foo/bar.group.json -> /foo/bar/zarr.json`) - Group and metadata documents are no longer distinguished by their key name (everything is `zarr.json` and a `node_type` field is included in all documents) - Various updates to metadata fields: - `format_version` → `int` - added `dimension_names` - removed `chunk_memory_layout` (in favor of transpose codec) - `codecs` now includes a list of codects that was previously split between the `filters` and `compressor` fields - etc. Open questions: - fallback data types ### Actions https://github.com/orgs/zarr-developers/projects/5/views/1 ## Zarr refactor meeting _Aug 16, 2023_ ### Attendees - Joe Hamman (Earthmover) - Xarray and Zarr dev - Sanket Verma (Zarr) - Tom White (independent dev) - SGKit and Cubed - Max Jones (CarbonPlan) - Data scientist - Raphael Hagen (CarbonPlan) - Data eng. - Charles Stern (Columbia) - Pangeo-forge ### Discussion - Max: how do we view V3 extensions already in Zarr-python - Charles: how does Zarr python register plugins - Zarrita (https://github.com/scalableminds/zarrita/) - reference implementation - no baggage / tech debt of Zarr-python - not production ready - also has sharding - Tom: Interop tests between implementations ### Timeline Goal: by the end of the year, have a fully-functional implementation of V3 in Zarr Python - Starting now: survey users to get an understanding of how a breaking change to the V3 implementation will impact users - Next two weeks: Break [#1290](https://github.com/zarr-developers/zarr-python/issues/1290) into smaller chunks and set up project board - September: start refactor efforts - Oct-Dec: Integration and interop testing - ### TODOs: - add regular call to community calendar - break out V3 implementation tasks into issues / project board (try to identify issues that can be picked up by others) - publish read out of this call

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully