owned this note
owned this note
Published
Linked with GitHub
---
tags: zarr, ZEPs, Meeting
---
# ZEPs Bi-weekly Meetings
### **Check out the website: https://zarr.dev/zeps/meetings**
Joining instructions: [https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09](https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09)
Meeting ID: 82447735305
Password: 016623
GitHub repo: https://github.com/zarr-developers/zeps
## 2024-10-17
**Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Michael Sumner (MS), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- Latest updates on ZEP0 Revision PR - https://github.com/zarr-developers/zeps/pull/59
- Proposed way on how to add extensions to Zarr - https://github.com/zarr-developers/zarr-specs/issues/316
**Open agenda (add here 👇🏻):**
- ZEP Process and how to do extensions
- SV: Bioconductor intros w/ Michael
- MS: Interest in the Rust and R implementations - RSE @ Australian Antartic Division
- SV: Impromptu intros with everyone
- MS: Been interested in VirtualiZarr to convert 20TB to Zarr
- SV:
- JMS: Opening a PR against the zarr-specs is the way to go
- SV: Difference b/w community extensions and core extensions
- JMS: Need to get interest in implementing the proposals - we need to drive it
- DH: At NetCDF we need to find resources to complete them - completing V3 and being spec compliant
- DH: What next thing to focus for Zarr extensions?
- JMS: Additional codecs, working on implementing irregular chunking
- DH: How about variable length strings?
- SV: Added https://github.com/zarr-developers/zarr-python/pull/2036
- DH: Was thinking that it would be added to the spec first!
- DH: Would not like to put a lot of weight on the codecs - codecs should be for experimenting
- DH, JMS and SV: _discussion on semver_
- JMS and DH: How does an implementation detect an extension in the spec? And how what does it do with it?
- DH: Removing the implicit groups should have been a version increment for the spec
## 2024-10-03
**Attending:** Sanket Verma (SV), Davis Bennett (DB), Josh Moore (JM)
**TL;DR:**
**Updates:**
- Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684
- We had a documentation sprint for Zarr-Python V3
- Zarr-Python V3 team good progress — alpha release every week — V3 main release soon!
**Open agenda (add here 👇🏻):**
- ZEP Process Revision
- https://github.com/zarr-developers/zeps/pull/59
- Not all PRs should be ZEPs, for e.g. typos and similar stuff
- Major changes are ZEPs and patch changes could just be simply PR
- Adding a codec is?
- DB: Core codecs and a community codecs
- JM: Hosting codecs and their schemas - how are we handling codec's breaking change?
- DB: We can add in the disclaimer that we're not handling the breaking changes
- What exactly are Lean ZEPs?
- JM: ZEPs are decision documents for the community
- JM: Next steps:
- Get rid of the lean ZEPs
- Merge the #59 PR
- Combine the websites (ZEPs + Zarr-Specs)
- Governance changes happens parallely
- Reflecting the changes back to ZEP0
- https://github.com/zarr-developers/governance/pull/44
- PRs in `zarr-specs` repo:
- https://github.com/zarr-developers/zarr-specs/pull/313
- https://github.com/zarr-developers/zarr-specs/pull/304
- https://github.com/zarr-developers/zarr-specs/pull/308
## 2024-09-19
**Attending:** Sanket Verma (SV) and Dennis Heimbigner (DH)
**TL;DR:**
**Updates:**
- Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct.
- Identify missing docs and start creating issues
- Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc
- Async working via Zoom meetings
- MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation
- Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27
**Open agenda (add here 👇🏻):**
- SV: Status update of Zarr from the last few weeks to DH
- DH: Consolidated metadata discussions
- https://github.com/zarr-developers/zarr-specs/pull/309
- https://github.com/zarr-developers/zarr-python/pull/2113
- DH: Will look into it and reply to it
- DH: If we can get consolidated metadata working in V3 it'd be easy to port it into V2
- DH: Survey for V2 to V3 data conversion and get an idea/response from the community
- NOAA has a lot of Zarr V2 data in S3
- DH: Can get me in touch with NOAA guys for outreach
- DH: NetCDF has Zarr V3 support now but it's not published
- Will be out soon
- DH: Working with Ward regularly
- Discussions on grants, funding, job opportunities and retirement! ;)
- **TABLED**
- PRs in `zeps` repo:
- https://github.com/zarr-developers/zeps/pull/59
- PRs in `zarr-specs` repo:
- https://github.com/zarr-developers/zarr-specs/pull/313
- https://github.com/zarr-developers/zarr-specs/pull/304
- https://github.com/zarr-developers/zarr-specs/pull/308
## 2024-09-05
**Meeting Notes:** TBA
## 2024-08-22
**Attending:** Josh Moore (JM), Sanket Verma (SV), Ward Fisher (WF)
**TL;DR:**
**Updates:**
- Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093
- Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence - https://zarr.dev/community-calls/
- Welcome David Stansby as core dev of Zarr-Python! 🎉
- https://github.com/zarr-developers/zarr-python/pull/2071
- WF: Preparing next NetCDF release on NCZarr
- Cleaning up bugs, working on documentation
- Command line syntax has grown over the last 3 decades
- Put up a RC last week
- Submitted an abstract to AGU on Zarr and NCZarr - Poster and 12 mins. talk with 3 mins. QnA included
- JM: Getting back from the backwoods of Czech Republic
- Doing a 5 month challenge to get 1 PB data in Zarr V3
- Facing bugs when trying to connect two softwares together
- The feeling - maybe we didn't get the codecs right in V3
- WF: Would need to get back to spec after the releases
- WF: Need to have a room to evolve the Zarr V3 spec
- JM: Zarr-Python is the reference implementation and maybe we should write it down
**Open agenda (add here 👇🏻):**
- #44 - https://github.com/zarr-developers/governance/pull/44
- WF: ZIC moves slowly and refinements could be made to the process
- WF: Analagous to leave spec in the hands of core-dev - they should look at the bigger picture besides what's infront of them
- JM: Not in agreement with the PR
- WF: Like being involved and want to keep coming back to the meetings - know a lot about maintaining specs - I think I can contribute to the perspective
- JM: Having constructive feedback for Ryan would be great
- WF: Having a procedural charter would help the committee
- JM: Having difficulty to get ZIC together so choosing a chair would be challenging
- WF: Was under the impression that if I don't participate I will be kicked out of the ZIC
- PR that needs attention in zarr-specs repo:
- #304 - https://github.com/zarr-developers/zarr-specs/pull/304
- #308 - https://github.com/zarr-developers/zarr-specs/pull/308
## 2024-08-08
**Attending:** Sanket Verma (SV) and Davis Bennett (DB)
**TL;DR:**
**Updates:**
**Open agenda (add here 👇🏻):**
- SV: https://github.com/zarr-developers/zeps/pull/59
- DB: Summary for the conversation happened so far
- DB: Get this out of the door and go ahead with the implicit groups PR
- PR that needs attention in zarr-specs repo:
- #304 - https://github.com/zarr-developers/zarr-specs/pull/304
## 2024-07-25
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Josh Moore (JM)
**TL;DR:**
**Updates:**
- Isaac won't be able to make progress on [ZEP7 - Strings](https://github.com/zarr-developers/zeps/pull/47) - need to find a new champion
**Open agenda (add here 👇🏻):**
- Talks about purchasing various types of saw 🪚
- Finding a new co-champion for ZEP3?
- JM: Ryan is being active on ZEP7, so maybe he can lead it
- JM: Everything is a codec!
- DB: But need to get strings
- DB: Ryan idea seems sensible - https://github.com/zarr-developers/zarr-python/pull/2031
- ZEP5
- SV: Hailiang plans to visit the ZEP5 soon
- DB: Whether it needs to be a ZEP or Zarr+something?
- JM: Maybe it could be something added in the VirtualiZarr layer with concatenation
- DB: Aggregation along chunks could become messy because you're adding stuff which is proportional to array size
- JM: Does someone else want to use it apart from NASA?
- JM: If `geo` folks want to add a prefix, then that would help the Zarr community
- DB: If the ZEP doesn't change the spec, then why we should have it?
- JM: More like it serve as a recommendation for the general Zarr community
- ZEP4
- DB: The ZEP has some flaws - you should have tools that should define the metadata not the other way around
- JM: [JSON-LD](https://json-ld.org/) could be useful
- _More discussions about the JSON discoverability, manipulation and how to use it efficiently in the Zarr ecosystem_
- PR that needs attention in zarr-specs repo:
- #295
- Merged! Thanks, Josh!
- #301
- JM: Can you find these errors using JSON schemas?
- DB: Probably!
- Merged! Thanks!
- #302
- Merged! Thanks!
- #292
- Get ZEP0 PR in before getting merged!
## 2024-07-11
**Attending:** Josh Moore (JM), Davis Bennett (DB), Ward Fisher (WF), Jeremy Maitin-Shepard (JMS)
**Notes:**
- WF: AI/ML expert writing blog post on Zarr based on conversations with HPC crowd from NCAR.
- Hopefully it won't be taken negatively.
- Zarr banned on the NCAR backend due to the number of files.
- Footprint is 20% smaller, but the number of files chokes the system.
- **"The right tool to the right job"**
- DB: drove the dimension separator conversation. And in general, too many objects is bad, even in the cloud. Pay a price either in time or money. --> sharded.
- JM: https://github.com/CBI-PITT/zarr_stores etc. for serving Zarrs.
- WF: reversible conversion...
- DB: happy to read and benchmark. could make a positive spin i.e. what are the bounds to zarr usage.
- WF: was more "here what's we observed" but will talk to him about sharding
- Software Engineering Assembly (SEA) - opportunity to invite someone to speak
- Davis: (misc) long standing issue of wanting to fix how the spec talks about codecs.
- UTF-8 (JM): needs a champion
- see:
- https://github.com/zarr-developers/zeps/pull/47
- https://github.com/zarr-developers/zarr-specs/issues/83#issuecomment-2220586750
- JMS: Isaac's proposal stalled, but nothing terribly tricky. Just need to decide on the structure.
- DB: Ryan suggested copying arrow. JMS: arrow is part of the binary format, but you still have to pack it into a single binary file. DB: Parquet?
- JMS: would be a very sophisticated header format. Might be less than ideal though. Parsers would need to pull in FlatBuffers. Could be more like sharding in that there's a simple footer.
- DB: similar to sharding but just pushed down to original elements.
- JMS: could use *exactly* the sharding format, except it has offsets and lengths (for random access). Isaac suggested just storing lengths. Assumes the order and can't do random access.
- JM: Isaac moved to CZI. May need another champion. Pinged on ZEP7
- JMS: needs to defined as a codec. The bytes codec doesn't do this. DB: why is it weird for the bytes codec to do this? JMS: this codec has various parameters. DB: rename bytes to ArrayBytes. You could think of a generic array of stuff to fixed length. And it could do something differently for variable length. JMS: originally was called "endian". For strings, there are various ways to encode them so it's more clear to have it as a codec. Give it a particular name. zarr-python can pick this as the default codec if it's of type string. i.e., users don't have to be too worried about it. DB: will need some validation of various combinations, but that's fine.
- JMS: some appeal to using Parquet but seems unfortunate to have such a heavyweight header.
- ... _side discussion on writing our own parser_ ...
- DB: prefer having a variable length type rather than ...
- in image segmentation, one representation is to keep a census of labels in a region and how often they appear. (personal agenda)
- JMS: can encode anything in a string.
- JM: doesn't need to be a string, though. couldn't the codec take non strings? but then what's the in-memory representation.
- DB: numpy can do anything. object type.
- JMS: if you add varlen, then can they be nested? at least, no partial indexing like awkward arrays do. i.e., if it's an array dimension then you can index. if it's a data type, you have to read the whole thing.
- DB: for genomics, they want the whole pattern. (should ask on issue)
- JMS: gene sequence? Don't know.
- DB: can see that codec gives us degrees of freedom for future encodings.
- JM: agreed, but a little worried that everything becomes a codec. are we missing an abstraction?
- JMS: alternative every codec accomodates varlen values.
- DB: index+data seems pretty generic. is there a representation of that so that it vanishes for constant-size types? so neglible that it's no burden to deal with it. (**breaking**)
- JM: thinking about documentation, we should be able to teach people that the containing JSON of sharding and varlen codecs "adds" the index header/footer.
- JMS: keeping multiple arrays in the same key, useful e.g. if they are being accessed together. similar to the index array and the data array. (or v2 interleaved values). More like the Parquet situation of multiple buffers in one file. Do want to support JPEG-like codecs. Existing format, no index. Ok to say some formats accomodate variable length data.
- JMS: String data type or UTF8 ... or generic way of describing varlen.
- DB: in the metadata, "string" has to appear somewhere.
- JM: is this a metadata-alisasing use case? i.e., `string` == `varlen<uint8, text>` (for example), but we should
## 2024-06-27
**Attending:** Davis Bennett (DB) and Sanket Verma (SV)
**TL;DR:**
**Updates:**
- Zarr-Python 3.0.0a0 out
- https://pypi.org/project/zarr/3.0.0a0/
- Good momentum and lots of things happening with ZP-V3 - aiming for mid July release
- SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466
- R users at bio-conductor looking to develop bindings for ZP-V3
- New blog post: https://zarr.dev/blog/nasa-power-and-zarr/
- ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819
- https://dynamical.org/ - making weather data easy and accessbile to work with
- Check: https://dynamical.org/about/
- Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV
**Open agenda (add here 👇🏻):**
- SV: Would like to invite Norman for one of the showcase/lightning talks
- DB: Having Tensorstore as backend for Zarr array writers would be good for performance
- SV: How about Rust?
- DB: Similar to C++ (Tensorstore)
- DB: Slicing returns NumPy arrays - we should have lazy slicing API
- DB: Would be good to be keep the momentum after V3
- SV: Anything we can do to keep them engaged?
- DB: Not as of now!
- https://github.com/zarr-developers/zeps/pull/59 - would like to go ahead with this
- DB: Impicit groups is a big change - maybe we need a major version bump
- SV: If there's a unanimous change then it could be submitted as a PR / Lean ZEP
- DB: Sounds good!
- Move ZEP1 and ZEP2 to `Final`?