owned this note
owned this note
Published
Linked with GitHub
---
tags: zarr, ZEPs, Meeting
---
# ZEPs Bi-weekly Meetings
### **Check out the website: https://zarr.dev/zeps/meetings**
Joining instructions: [https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09](https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09)
Meeting ID: 82447735305
Password: 016623
GitHub repo: https://github.com/zarr-developers/zeps
## 2024-04-18
**Attending:** Josh Moore (JM), Vicent Immler (VI), Sanket Verma (SV), Ward Fisher (WF), Altay Sansal (AS), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- Davis wants to remove implicit groups: https://github.com/zarr-developers/zarr-specs/pull/292
- Activity going-on at ZEP4 Review PR
**Open agenda (add here 👇🏻):**
- Introductions w/ last gift you got
- Sanket - cologne and clothes
- Vincent - wooden board forged with family crescent
- Ward - camping tent
- Josh - pecan nuts
- Altay - lead data scientist - lego
- Removing Implicit groups
- JM: Discussed at community meeting - needs to go back to root node to figure out the group
- JM: Tensorstore doesn't use Zarr groups at all
- WF: Supposition from my side
- WF: Dennis completed the V3 implementation!
- JM: Are we closer to parity in V3 work - a question for Dennis!
- VI: How does implicit groups affect performance?
- JM: No, implicit groups means performance improvement
- VI: Working on a new software implementation for students
- JMS: No experience in working with groups
- JM: Lot of callbacks
- JMS: You'd definitely want to remove the looking upward
- AL: Couldn't see a use-case for parallel creation of groups
- JMS: You're ingesting lot of data in S3 and they read group metadata and have implicit groups
- AL: `.zattrs` would have race condition?
- JMS: Kind of a niche use-case
- AL: Are Multi-processing locks concern metadata?
- JMS: Multiple machine can leverage this!
- AL: Removing would be a good idea!
- AL: ZEP4 and ZEP3 progress
- SV: AL, are you using V2 or V3?
- AL: Using V2 and would love to move to V3 - have 20-30 PB data
- AL: Want to work on `dimension_names` - what would be the best time to do it?
- SV: After V3 release
- VI: _explains GSoC application_
- AS: Hacked Zarr to submit reads in a async manner to the machine to circumvent the problem
- AS: Zarr V3 is going to be fully async so, it helps alleviates the problem
- VI: Would be good to have a way to improve the read speeds for Zarr
## 2024-04-04
**Attending:** Sanket Verma (SV), Josh Moore (JM), Ward Fisher (WF)
**TL;DR:**
**Updates:**
- CZI EOSS6 Application not funded
**Open agenda (add here 👇🏻):**
- NASA Grant (WF)
- https://nspires.nasaprs.com/external/solicitations/summary.do?solId=%7b910CC61E-4616-9958-C26F-F8D9BC5AB8D9%7d&path=&method=init
- Townhall meeting slides: https://docs.google.com/presentation/d/14g5UPUQFsk4QW3gqwB4gtNSHm8vcdUVAmwzSFjtdVN4/edit?usp=sharing
- Looking towards sustaining the already established open source software
- NetCDF is looking for collaboration for their application
- JM: Collaborators in US could be NF, OpenCollective, NVIDA, Columbia etc.
- JM: Will reach out to NF for their NASA grants' experience
## 2024-03-21
**Attending:** Sanket Verma (SV), Thomas Nicholas (TN), Ward Fisher (WF)
**TL;DR:**
**Updates:**
- Join ZulipChat: https://ossci.zulipchat.com/
- HTTP Extension meeting took place on 3/14
- Trying to figure out the best way forward, i.e. a ZEP or not
- Guaging interest and use cases from others in the community
**Open agenda (add here 👇🏻):**
- HTTP Extension
- WF: Can see the shape of it, and I think it would be useful
- SV: Existing thread: https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/HTTP.20Extension
- TN: Tom's company may have a use case for the HTTP work
- Showing [VirtualiZarr](https://github.com/TomNicholas/VirtualiZarr) (related to the "chunk manifest" ZEP)
- TN: Been working on the packages for the last 2 weeks - could potentially replace Kerchunk
- TN: _code walkthrough via screen sharing_
- TN: Storing the virtual Zarr manifests, not the actual array values
- TN: Could move `class ManifestArray` to Zarr-Python - arguments in favour and against it
- TN: Could see donating VirtualiZarr to zarr-developers
- SV: **Action items**
- TN to create a topic for VirtualiZarr to gather feedback/comments
- SV to try VirtualiZarr
- TN and SV to work on ZEP Extension proposal for virtual Zarr manifest and formally present it for broader feedback
- TABLED
- Revising ZEP0
- https://github.com/zarr-developers/zeps/pull/59
## 2024-03-07
**Attending:** Sanket Verma (SV), Ward Fisher (WF), Davis Bennett (DB), Josh Moore (JM), Thomas Nicholas (TN), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- Zarr HTTP Extension Meeting next week
- Check here: https://zarr.dev/community-calls
- TN: https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view
- Had a conversation with folks over at Development Seed, NASA, Earthmover
**Open agenda (add here 👇🏻):**
- TN: Jed wants to have nice ways to browse the Zarr stores - they have nice ways to browse `.tiff` files already
- Wants to propose an extension to add more information in the metadata
- The end result would look more like a Xarray HTML wrapper
- TN: https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view
- Had a conversation with folks over at Development Seed, NASA, Earthmover
- DB: Pushing Kerchunk functionality into Zarr stores
- DB: Whether the feature could be file format agnostic?
- TN: Argues that it should be a ZEP - and can be read every Zarr implementation
- JM: Having same thing implemented in FSSPEC
- DB: Would ZEP
- WF: HDF5 group may be open to a conversation
- SV: https://zarr.dev/zeps/meetings/2023/2023-08-10.html might have some useful information
- TN: _recaps the conversation for JMS_
- TN: Should concatenation be a part of the current ZEP?
- DB: Any reason you don't want to concatenate HDF5 and other file formats?
- TN: Chunk manifest would point inside the arrays - chunk manifest could let you create a Zarr store over other formats as well
- DB: This would make Zarr as an API/access pattern
- TN: Can be created and tested fairly separate to Zarr - personally think chunk manifest is neat feature - implementation can support/not support it
- DB: Array mutation can break the concatenation - having guidelines for archival arrays would help
- TN: Currently we're thinking about read-only case
- TN: Virtualisation in Kerchunk is a spotlight feature
- JMS: Manifest is a good idea and keeping it separate would be a minor difference - needs to align with Kerchunk
- JM: report/ZEP idea (time permitting)
- https://w3id.org/ro/crate/
- JM: Putting ro-create inside Zarr - or making Zarr specification a IETF standard
- JM: Would probably go ahead and write a convention in NGFF space
- https://fairdo.org/
- JM: Have a mechanism for going up/down the hierarchy - useful for the HTTP extension discussions
- Revising ZEP0
- https://github.com/zarr-developers/zeps/pull/59 - comments/feedback welcome
- DB: :+1:
- DB: Would be easy to have a single PR for my ZEP
- JMS: Putting narrative document in PR description
- JM: Weird for commenting on the PR description and for the public visibility
- JMS: Rationale can be put down as a footnote
- JMS: Having numeric numbering is something Python follows
- JMS: The actual specification change can also serve as a ZEP narrative
- SV: We can pick out certain sections out of the ZEP narrative document
- JMS: Having a PR template similar to ZEP's narrative could also help us
- WF: https://sea.ucar.edu/conference/2024
- In-person and virtual registrations are available
## 2024-02-22
**Attending:** Sanket Verma (SV), Ward Fisher (WF), Josh Moore (JM), Martin Durant (MD), Tom Nicholas (TN)
**TL;DR:**
**Updates:**
- HTTP Extension meeting: https://docs.google.com/document/d/14TJfrjbfU1R2REjrZ35GjV74MJ18j_m6geWdj0oB83Y/edit?usp=sharing
**Open agenda (add here 👇🏻):**
- LLMs and how WF is using them in trainings
- Feedback for new design for Zarr-Specs website (combines ZEP and Zarr-Specs together)
- Link: https://docs-test-sanket.readthedocs.io/en/latest/
- MD: How's V3 refactor work going on these days?
- JM: Quite good progress taking place these days
- SV: V3 PRs can be found here - https://github.com/zarr-developers/zarr-python/pulls?q=is%3Apr+is%3Aopen+label%3AV3
- MD: https://zarr.dev/zeps/draft/ZEP0003.html
- TN: Been discussing → https://github.com/zarr-developers/zarr-specs/issues/287
- interested in integrating kerchunk into zarr, especially two ZEPs
- (1) chunk manifest (Joe) - standardizing what chunk json files do
- (2) concatenation - https://github.com/zarr-developers/zarr-specs/issues/288
- 1. manifest: opinion that it's an incredible idea that is very popular
- fsspec relationship makes things complicated
- move to the zarr spec for other implementations?
- goal is readable in any language
- difficult position
- three things to think about
- read byte ranges
- write JSON
- combine module
- roadmap:
- standardize json for the chunks. manifest file?
- JM: storage in zarr array itself
- JM: log file anytime you read a full file into memory
- Josh: virtual zarr (access pattern)
- 2. concatenation
- multi-zarr-to-zarr leads to a loop
- more sense to think of concat of virtualized arrays objects
- see kerchunk array notebook
- read in byte ranges with kerchunk. array class which only stores byte-offset arrays in memory
- can be done in xarray. concat-classes can be put into xarray and can use higher-order API
- JM: store that xarray as a zarr :smile: (but need additional metadata for realizing the array)
- TN: part of notebook that isn't done. exactly.
- common case in geo. multiple NC files, concat those array.
- possibly compression options change over time.
- prevents it from being one zarr array
- JM: or just always serialize to the chunk manifest
- JM: i.e. where do we stop? (when does Zarr become Turing Complete?)
- TN: thought at concat (clear use case). but jeremy thought indexing (also clear use case)
- JM: starting to sound like transforms (https://github.com/ome/ngff/pull/138#issuecomment-1948424000)
- WF: periodically get requests for operations on the data
- no one has come close to making the argument for adding that into the storage
- so many math libraries that would do it better
- TN: no computation since you don't need the values. can do some subset of concat & indexing without values.
- TN: have now become a zarr producer :tada:
- JM: cross-language motivation
- SV: pyramiding ZEP discussions
## 2024-02-08
**Attending:** Sanket Verma (SV), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
Discussion revolves around refining storage transformer interfaces within ZEP1, exploring options for unified JSON representations, and considering the integration of Parquet with Zarr, alongside ZEP0 discussions focusing on streamlining processes while ensuring compatibility with existing bylaws.
**Open agenda (add here 👇🏻):**
- https://github.com/zarr-developers/zarr-specs/issues/287
- JM: The state we left ZEP1 and storage transformer, where does this fit in?
- JMS: Wrap the key-value interface in the existing implementation
- JMS: Kerchunk approach has 1 `.JSON` file and the proposed approach has 2 `.JSON` files
- JMS: Specify any array in-line?
- JM: May look like specifying kerchunk in Zarr which we may or may not want to do
- JMS: Kerchunk approach has keys and values - not exactly readable
- JMS: Various flavours of `.JSON`s can we somehow unify them? - Does it help to have a representation for inline arrays?
- JM: Will comment on the Joe's issue
- JMS: Would be good to get Martin's POV
- JMS: Kerchunk parquet format is worth looking at
- JM: Parquet folks are looking to combine parquet and Zarr - could look at the tabular data as 2D array
- JMS: Do you need to download the whole parquet to access it?
- JM: I think the offset works in parquet
- JM: https://spatialdata.scverse.org/en/latest/
- JMS: https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/annotations.md - created annotations in Tensorstore - spatial query has multi-index grid - sorta same like a sparse-array
- JMS: general missing feature of a cloud database (Josh: cf. work on a graph/zarr version in Spain)
- JM: Will try to get together SpatialData and JMS for discussions to prevent duplicative efforts
- JM: Having URLs as indices and if not generate them on the fly and if you have write access then write to it
- JMS: Annotations doesn't end up being too large
- JM: Duckdb is worth looking at - https://duckdb.org/
- JMS: Cloud database need regular maintenance
- JM: https://datamonkeysite.com/2022/08/27/running-a-serverless-duckdb-on-google-cloud/
- JM: Building index on the cloud or locally?
- ZEP0 discussions - https://github.com/zarr-developers/zeps/issues/55
- JM: Let's open a PR and go ahead!?
- JMS: Yes!
- JM: In favour of having a lighweight process would be helpful but if we reach to a point where we have contention then we should go back to the bylaws
- JMS: If the future ZEPs overlap then there would be a problem
- JM: Footnote is useful for future records