owned this note
owned this note
Published
Linked with GitHub
---
tags: zarr, Meeting
---
# Zarr Bi-weekly Community Calls
### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/**
Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1
## 2024-04-17
**Attending:** Josh Moore (JM), Davis Bennet (DB), Liam Dennis (LD), Eric Perlman (EP), Altay Sansal (AS)
* Happy Birthday, Sanket
* Introductions
- Josh: beverage of choice: whisky --> gin.
- Liam: finance --> energy. forecasting things like Weather. slicing regularly grided. NZ so kiwi fruit juice
- Eric: neuroscience/freelance. dirty chai latte.
- Davis: big image datasets/freelance.
- Altay: TGS lead data scientist. energy companies. wind, solar, gas. core is seismic. manage petabytes of data in zarr. Google cloud. (last 3 years). big fan. things slowing down. looking to help. coffee and old fashions.
* AS: ZEPs that are open. How to move them forward?
- JM: help on getting
- DB: Zarr Object Model is there. Don't think much about ZEPs.
- AS: discussion on metadata conventions?
- DB: not sure what's there.
- DB: to conventions, doesn't go far enough. want to validate the hierarchy as well.
- DB: on the ZOM, will wait until run into a need
- JM: probably when you cross language barriers
- DB: tried a typescript implementation. That's two. will revisit though as needed.
- AS: model is to validate the structure. We have a parallel effort. MDIO. https://mdio-python.readthedocs.io/en/stable/
- in the current stable branch (working on a v1 release) have json-schema to create zarrs (in the energy domain)
- also building a C++ API for this using tensorstore
- Another can of worms. lacking some tensorstore features like groups.
- DB: possible answer "zarr group is just a prefix with some JSON"
- AS: which zarr should I use. zarr-python slowing down.
- DB: yes, see the issues which are labeled "V3"
- JM: see https://github.com/zarr-developers/zarr-python/issues/1777
- EP: use different implementations depending on what I'm doing
- AS: talked to Joe.
- JM: scipy! can DB propose one?
- DB: logic to support v2 and v3 groups. runtime dispatch on what you read from the group. (not for arrays. get an exception if you hit a v2 array)
- perhaps 1774 (logging), 17773 (typedict/easy), ...
- testing! about to merge some work on group tests.
- in v2 the test suite, new one will be tighter. e.g., a template that you can build on.
- DB: but there are still APIs that aren't figured out. v2 had APIs that weren't intuitive coming from h5py, but weren't good for performance.
- DB: still need to answer the question "what is a zarr array?"
- AS: there's one in tensorstore, even if not perfect. (DB: happy to take issues like that)
- DB: tests for the array API?
- DB: (sidenote) slice a zarr and get a zarr (or future) without depending on dask.
- https://github.com/zarr-developers/zarr-python/discussions/1603
- DB: (sidenote) spec to define nouns not verbs...
- LD: re: slicing any advice/challenges
- efficient data structure for getting time-series slices and geo-graphic slices (region over time)
- compute versus space
- AS: weather forecast models. pre-calculate summary statistics possibly. partial reads help, but still a bit clunky. (similar to sharding which also helps though the zarr-python implementation is slow. tensorstore is fast.)
- was using 128**3 cube. went to 32**3 in shards. and got 4x speed up. (slower in zarr-python)
- LD: colleague is pushing tiledb. lazy might help.
- AS: tileDB too expensive
- miscellaneous compression thoughts
- https://www.blosc.org/pages/btune/
- `zarr optimize --force` CLI? (Anyone?)
- data engineering malpractice! (JPEG-like)
- lossy compression
- in-place codec conversion
- oopsies. 1B calls for "does file exist"
## 2024-04-03
**Attending:** Josh Moore (JM), Sanket Verma (SV), Thomas Nicholas (TN), Alfonso Ladino Rincon (ALR), Davis Bennett (DB), Florian Ziemen (FZ), Norman Rzepka (NR), Eric Perlman (EP), Gabor Kovacs (GB)
**TL;DR:**
**Updates:**
- CZI EOSS6 Application Update
- Not funded
- Looking for other grant opportunities if anyone has ideas
- Zarr-Python core-devs assemble!
- Please respond to the poll: https://whenisgood.net/tswj9kd (meeting timeline: 4/15-4/30)
- Looking to get the people who *don't* attend the meeting :smile:
**Open agenda (add here 👇🏻):**
- Introductions w/ where did you grow up
- Sanket - Delhi, India
- Tom - Small village in english countryside
- Alfonso - Bogota, Columbia
- Davis - Chapel Hill, North Carolina
- Eric - Berkeley, California
- Josh - Fairhope, Alabama
- Florian - Places in Germany and US
- Norman - mid-sized city in Germany
- Gabor - Chaperone, Hungary
- VirtualiZarr
- (Tom: Sanket told me to talk about it in this meeting)
- [repo](https://github.com/TomNicholas/VirtualiZarr)
- [Zulip thread](https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/VirtualiZarr.20and.20chunk.20manifests)
- TN: _explains VirtualiZarr_
- DB: Pydantic model for Zarr?
- TN: Yes, because it loads in-memory objects
- DB: Made Pydantic model for Zarr: https://github.com/janelia-cellmap/pydantic-zarr - currently under Janelia but can ask to move it under zarr-developers
- DB: Several class can be decorated in ZP-V3
- DB: Would be super interested in VirtualiZarr - have already uploaded legacy data to cloud
- TN: Would be good to go in a direction to have multiple readers for various file formats
- FZ: One index file has all the metadata in Kerchunk - is there something in your package?
- TN: Instead of storing chunks in disk you're storing `.JSON`s in memory - which would be language agnostic
- FZ: Have started using parquet instead of `.JSON` as it doesn't scale up - TN: We can store in parquet as well
- JM: Could write the filepaths/byte ranges into their own specialized zarr arrays - that would be scalable
- TN: To deal with the scaling -
- JM: Could place in `must_understand` flag while writing ZEP which looks up for Zarr
- DB: Is concatenation always chunk aligned? - TN: Treating it as chunk aligned for now
- DB: _shares screen_ and shows `test_array.py` from VirtualiZarr
- TN: Need to think a bit more about the implication of changing the concatenation style
- DB: Will create an issue in VirtualiZarr for slicing issue
- TN: Has there been any resolution? NR: Not yet, to not break the existing API
- TN: Lazy indexing problem - see xarray for example https://github.com/pydata/xarray/issues/5081
- EP: Will definitely take a look!
- NR: Steps to standardise it?
- TN: Waiting for Zarr-Python refactor to complete and then chunk manifest ZEP accepted
- Monthly meetings to close issues and merge PRs
- https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/.5BPROPOSAL.5D.20Monthly.20meetings.20to.20close.20PRs.20.26.20issues
- :thumbsup: from Davis, Joe and Josh; thoughts?
- EP: Napari did this and there were some good outcomes - PR and issues were merged and closed respectively but also some of them were kept open for continued discussion
- JM: Would also be good to have them quaterly
- SV to find a good time and schedule meeting
- The tests related to ABSStore are failing with an internal server error `azure.core.exceptions.HttpResponseError` very frequently these days - any workaround?
- DB: Pulling it out would be the best idea!
- https://github.com/zarr-developers/zarr-python/pull/1714 - good to go?
- Thanks, JM for merging!
- ALR: _shares screen_ and starts representing
- ALR: Discovered issues
- TN: Haven't optimised `open_datatree` so the issue is not surprising
- TN: Development effort is move datatree upstream to Xarray - part of the Xarray
- ALR: Will be opening issue in datatree repo
- ALR: Slides: https://drive.google.com/file/d/1N9Zq4Uly3O1bNYzFVfeFNavyyS1Jn26o/view?usp=sharing
- DB: _shares screen_ and shows PyDantic-Zarr new features
- TABLED
- Appetite for Jupyter notebooks in tutorials? - https://github.com/zarr-developers/zarr-python/pull/1163
## 2024-03-20
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Janos Zimmermann (JZ), Gābor Kovācs (GB)
**TL;DR:**
**Updates:**
- Join ZulipChat: https://ossci.zulipchat.com/
- HTTP Extension meeting took place on 3/14
- Trying to figure out the best way forward, i.e. a ZEP or not
- Guaging interest and use cases from others in the community
**Open agenda (add here 👇🏻):**
- DB: Zarr-Python doesn't use async while loading the chunks - but it's fairly easily to parallelize the chunks as they are mostly files
- JZ: User should have ability/freedom to do parallelism on their own
- DB: No concrete proposals rn - scheduling the reading of chunks via cloud
- JZ: Using Zarr rn to write data to S3 - chunk size 1MB
- DB: It's Dask fault - every dask task cost 1ms for task grapher - with million Zarr chunks you'll be spending a lot of time on dask graph
- JZ: Rechunking
- DB: Rechunking on Dask array?
- JZ: Yes!
- DB: Issues you'll face:
- no. of task will be v. v. large
- Complexity in the order of O(no. of chunks)
- JZ: Got speed of 1TB/S for local - got 200MB/S when switched to S3 - S3 is inserting keys and putting it S3 store
- DB: Makes sense to benchmark S3 until you timeout - at some point you start getting error code - once you're there you can't go any faster
- JZ: Currently trying to find the sweet size of chunks - will also look into Dask
- DB: Dask introduces some complexity - I use Dask in it's primitive form - and write custom functions for other stuff
- Building on https://github.com/zarr-developers/zarr-python/pull/1713
- Should we also add similar examples for AWS and Azure?
- Or move the existing material from [tutorial section](https://zarr.readthedocs.io/en/stable/tutorial.html#distributed-cloud-storage) to examples?
- DB: Basically fall under the tutorials sections
- SV: Will ask on the PR if any objections to put it under `tutorials.rst`
- Dimitri is working on applying Repo-Review suggestions to Zarr-Python `main`
- Any reason to not have in `V3` branch?
- DB: If there's nothing disrupting the day-to-day work it's good to have those PRs - we can always bring the stuff later in V3
- Fixing https://github.com/zarr-developers/zarr-python/pull/1671
- PR by David: https://github.com/zarr-developers/zarr-python/pull/1714/ - any objections?
- DB: Looks good for now but we should also check what changes does Pytest 8.0.1 brings
- GB: Any new updates to Zarr-Python V3?
- DB: Will be funded until May to work on Zarr-Python by Earthmover
- DB: Do you use N5?
- GB: Little bit
- DB: There's a proposal to remove N5 from Zarr-Python - you'd need to install N5py to work with N5
- DB: Saalfeld is making changes/developing N5 - adding major versions
- GB: N5 may support V3 - had a chat with Saalfeld
## 2024-03-06
**Attending:** Sanket Verma (SV), Davis Bennett (DB), Josh Moore (JM), Eric Perlman (EP), Gabor Kovacs (GB), Norman Rzepka (NR), Felix Cremer (FC), Agriya Khetarpal (AP)
**TL;DR:**
**Updates:**
- New blog post: https://zarr.dev/blog/zulip-transition/
- Join ZulipChat: https://ossci.zulipchat.com/
- Next HTTP Extension Meeting on 3/14
- Previous meeting notes: https://docs.google.com/document/d/14TJfrjbfU1R2REjrZ35GjV74MJ18j_m6geWdj0oB83Y/edit?usp=sharing
- Info: https://zarr.dev/community-calls/
- Joe H. submitted a talk on Zarr-Python V3 @ SciPy 2024
- Davis re-initiated the effort for Thinning Zarr-Python: https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/thinning.20.60zarr-python.60
**Open agenda (add here 👇🏻):**
- Introductions w/ Favourite Python Package
- SV: Zarr-Python
- FC: From Julia Programming Language
- EP: Msgpack
- DB: Typing
- NR: Zarrita
- JM: [SnoopyCrimeCop](https://pypi.org/project/scc/)
- GB: NumPy
- AK: Improve Pyiodide distribution; favourite package: SciPy
- HTTP Extension
- Davis: is it clear what they need/want?
- Josh: requirement seemed to be up/down navigation
- Eric: "nanocent" per listing. Would suggest avoiding it.
- EP: Would be against HTTP extension - would be in favour of it it's an optional spec
- NR: Valid uses cases to discover hierarchies - could be a _sidecar_ file
- EP: https://forum.image.sc/t/updated-ngff-support-in-fiji-hdf5-n5-zarr-ome-ngff/91705
- New release `2.17.1`? → https://github.com/zarr-developers/zarr-python/pull/1673
- JM: Doing a release via Zoom ScreenSharing
- Enable testing on big endian image → https://github.com/zarr-developers/zarr-python/pull/869#issuecomment-1978443218
- JM: Antonio Valentine from Pytables
- SV: Maybe involve Scientific Python community - they're looking for packages to upload nighly wheels
- Organising community meetings over at Gather
- For e.g.: https://napari.zulipchat.com/#narrow/stream/212875-general/topic/aint.20no.20party.20like.20a.20PR.20party
- https://gist.github.com/joshmoore/715b6cb74e74fce4feac7c610eef4d96
- JM: Helper to convert Geoparquet → GeoZarr
- DB: PangeoForge might have a solution
- DB: GeoParquet is mostly raster data - the lat and lon need to form a grid
- JM: Rasterize to Zarr!
- DB: Maybe Xarray support Zarr
- JM: Couldn't get Dask to open parquet file
- https://github.com/zarr-developers/zarr-python/issues/1695
- ZipStore can only write once
- DB: Released a new version of Pydantic-Zarr
- Release notes: https://github.com/janelia-cellmap/pydantic-zarr/releases
- AK: Would like to send a PR improve the Pyiodide support with numcodecs
- DB: Numcodecs is not super busy and PRs improving stuff are most welcome
## 2024-02-21
**Attending:** Sanket Verma (SV), Josh Moore (JM), Norman Rzepka (NR), Ward Fisher (WF)
**TL;DR:**
**Updates:**
- Lots going on with refactoring, v3, etc.
- Join the OSSCi ZulipChat here: https://ossci.zulipchat.com/
- Blog post coming out soon!
**Open agenda (add here 👇🏻):**
- NR: refactoring is going well. Bloat is possible, though. Would like to have something ready in a month or two. Won't be able to incorporate all wishes. E.g., virtual zarr array.
- JM: propose finding a time that more (all?) core devs can't get to.
- SV: sent out a poll already. Joe on vacation until the 26th.
- SV: preparing a summary for John. (JM: blog post?)
- NR: question of what the scope of the refactoring is. concerned about momentum. could lose steam/attention.
- JM: a step I would like to see is a pypi pre release
- NR: need a feature freeze for that. e.g., should the array class be lazy?
- WF: on v3 in netcdf? Dennis is tapering off, but it is going. Working on a plan for moving forward. Bit slower but not abandoned. Nothing existential.
- Another developer Tara will be sitting in on the community meetings.
- JM: another rust implementation https://github.com/LDeakin/zarrs
- NR: they have a sharding implementation
- JM to SV: probably should mark those on the implementation list (per ZEP?)
- NR: i.e. JBMS compatibility list
- Also on OME-Zarr.
- Current versioning doesn't scale well.
- Major version.
- cf. HTML and features.
- JM to SV: extend an invitation zarrs (LDeakin)
## 2024-02-07
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Josh Moore (JM), Norman Rzepka (NR), Eric Perlman (EP), Gābor Kovācs (GB), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
Meeting covers Zarr Sprint updates, GSoC participation plans, and SPEC0 endorsement. Discussions include dataset conversion to Zarr V3, proposals for OME-NGFF integration, impending changes in Zarr V3, Java support, chat platform considerations, and collaboration for SciPy 2024 proposal.
**Updates:**
- Zarr Sprint @ Columbia University, NYC (taking place right now):
- Agenda and links: https://docs.google.com/document/d/1x62xVWxcjJJHQWdNE5jPHkVzh5k6_HfoifUJ9v-AwPA/edit?usp=sharing
- Slack: https://join.slack.com/t/cloudnativegeo/shared_invite/zt-235w8flfo-TW5Tpi1sPqQFWeMy~7ROHA
- Participating in GSoC 2024
- Ideas list: https://github.com/zarr-developers/gsoc/blob/main/2024/ideas-list.md
- [SPEC0](https://scientific-python.org/specs/spec-0000/) endorsement
**Open agenda (add here 👇🏻):**
- EP: Converted a huge dataset into Zarr - storing the data on GCS
- DB: Using Zarr V3 and converting to shards
- NR: If you're already have in Zarr then it's easy to convert it into V3/Shards
- Links:
- https://images.jax.org/webclient/?show=image-190714
- https://storage.googleapis.com/jax-public-ngff/data/0.4/public_data/3580/sms_107/2022-07/20/12-04-09.625/3215%20D%20-%202022-06-15%2012.09.50.zarr/0/
- https://ome.github.io/ome-ngff-validator/?source=https://storage.googleapis.com/jax-public-ngff/data/0.4/public_data/3580/sms_107/2022-07/20/12-04-09.625/3215%20D%20-%202022-06-15%2012.09.50.zarr/0/
- JM: Some benchmarks
- https://github.com/ome/bioimage-latency-benchmark
- https://www.nature.com/articles/s41592-021-01326-w
- JMS: Neuroglancer viewer: http://tinyurl.com/yzpnxz7u
- JM: NR will write a proposal for Zarr3 in OME-NGFF to test drive the NGFF RFC process
- SV: Endorsing SPEC4
- DB: Doesn't see a downside
- JMS: Zarr-Python might not need nightly wheels but maybe numcodecs do
- JM: Big V3 change is coming in - the diff. is getting bigger day by day
- DB: Beta release in a month would be ambitious
- SV: Joe wants to release directly from V3 branch
- JM: Good, if they're pre-releases
- GB: Zarr Java supporting V3
- NR: N5 supports Zarr V2 - trying to get funds to bring Zarr-Java and N5 together
- GB: Using the N5 approach and dependent on it and can't switch to V3
- DB: N5 would like to support V3
- Chat platform discussion: https://github.com/zarr-developers/community/issues/68
- OSSCi Zulipchat has Zarr stream
- https://ossci.zulipchat.com
- EP: Like the idea of having a shared Zulip - don't want to get logged in several Zulip
- EP: Hierarchies can be useful for various topics
- SciPy 2024 proposal
- Looking for collaborators (currently we have: Sanket and Joe)
- SV is going to PyCon DE and PyData Berlin 2024 this year!