# Zarr Summit Developer Days - Breakout Group Planning
**Purpose**: Ensure everyone works on their priority topics across the 3 days
**Survey data**: This schedule is informed by 7 developer survey responses and 14 adopter survey responses
---
## Instructions for Participants
**During the opening session (9:00-9:30 AM Monday)**, add your name to topics below:
- This will help us schedule breakout sessions
- **Step 1**: Add new topics if needed (3 minutes)
- **Step 2**: Add your name under the column that matches your interest level (3 minutes)
- You can add your name to multiple topics!
- **Step 3**: Answer questions at the bottom of the document (2 minutes)
---
## Core Technical Topics
### ~~Prototyping rectilinear chunk grids (variable-length/variable-shaped chunks)~~
*High priority: 57% of devs rated high interest*
- **Lead It**:
- **Work On It**: @keewis
- **Join If Space**: @TomNicholas, @jhamman, Sebastian, @LDeakin, @maxrjones, @alimanfoo, @malmans2, @rouault, Hugo Gruson
Readout: Productive, implemented in zarrs with a extensions PR ready
### ~~Improving ease-of-use and performance of sharding~~
*7 adopter mentions*
- **Lead It**: Mark Kittisopikul (@markkitti)
- **Work On It**: @perlman
- **Join If Space**: @LDeakin, @keewis, @malmans2, @rouault, @markkitti, @maxrjones
Readout: Discussed how to implement sharding at the API level and implications for virtual zarr/icechunk
Group discussion:
- Should it be a codec?
- Seems like a missing abstraction (i.e., layout)
- layouts include Icechunk and VirtualiZarr
- what is the concern with it being a codec?
- impact on the abstractions in an implementation (i.e., contagious async)
- harder to profile with async
- chunks and sub-chunks would've been more clear, could be fixed in V4
- perhaps another breakout on this?
### Designing and prototyping sparse array storage
- **Lead It**: @keewis
- **Work On It**:
- **Join If Space**: @TomNicholas, Sebastian
### Designing and prototyping optional codecs
- See below
### ~~Advancing specs for data types, codecs, and extensions~~
*High priority: 71% of devs rated high interest*
- **Lead It**: @d-v-b
- **Work On It**: @keewis, @LDeakin, @maxrjones, @tomwhite
- **Join If Space**: @TomNicholas, @perlman, @tomwhite, @alimanfoo, @vincentsarago, Hugo Gruson
Readout: Discussed clarifications needed for zarr-extensions readout
### ~~Creating reference datasets and centralized benchmarks for comparing implementations~~
- **Lead It**: @maxrjones
- **Work On It**: @d-v-b, @jhamman, @perlman, @nishadhka, Hugo Gruson
- **Join If Space**: @alimanfoo @LDeakin @joshmoore
Readout: Discussed centralized vs decentralized, later mind-meld recommends CLI for implementations to be able run
Discussion:
- Starts to sound like a standard API
### Optimizing performance and reducing memory usage
*Gap: Top adopter priority (11 mentions) but low dev interest.*
- **Lead It**:
- **Work On It**: @jhamman @LDeakin
- **Join If Space**: Sebastian, @maxrjones, Hugo Gruson, @perlman
---
## Ecosystem & Tools Topics
### ~~Advancing VirtualiZarr capabilities~~
*8 adopter mentions*
- **Lead It**: @TomNicholas
- **Work On It**: @maxrjones
- **Join If Space**: @LDeakin, @jsignell @joshmoore (particularly intersted in spec-ifing), @rouault
Readout: Discussed "spec-ify" virtual chunks as native zarr, landed on a proposal to store virtual references as Zarr arrays, could conceptualize all zarr arrays as virtual arrays where "native arrays" have an implicit structure
Discussion:
- would it be a new node type?
- probably not necessary, falls into a category of needs for a group of arrays that mean something to certain people
- perhaps an implementation session?
### Upstreaming lessons learned from Icechunk
*7 adopter mentions*
- **Lead It**:
- **Work On It**: @TomNicholas, Sebastian
- **Join If Space**: @jhamman, @alimanfoo, @malmans2, @maxrjones
### Evolving the Store interface for modern use cases
- **Lead It**: @jhamman
- **Work On It**: Sebastian
- **Join If Space**: @alimanfoo @joshmoore @maxrjones
### Enabling better Zarr visualization and web access
*9 adopter mentions*
- **Lead It**:
- **Work On It**: @perlman, @vincentsarago,
- **Join If Space**: @nisahdhka, @maxrjones
---
## Community & Adoption Topics
### Improving Zarr documentation and resources
- **Lead It**:
- **Work On It**:
- **Join If Space**: @perlman, @alimanfoo
### Cross-disciplinary multi-scale extensions
*18 geospatial mentions, 4 microscopy mentions*
- **Lead It**: @maxrjones
- **Work On It**: @perlman, @keewis, @rouault
- **Join If Space**: @jhamman @LDeakin, @vincentsarago, @jsignell, @joshmoore
---
## Additional Topics
### Smoothing the V2 → V3 migration path
*4 adopter mentions. Goal: Create migration guides and tooling*
- **Lead It**: Mark Kittisopikul
- **Work On It**: Hugo Gruson
- **Join If Space**: @LDeakin @perlman @alimanfoo @maxrjones @tomwhite
### ~~Missing values~~
- **Topic**: How to support missing values in Zarr (has some overlap with Arrow bitmaps?)
- **Lead It**:
- **Work On It**: @tomwhite, @maxrjones
- **Join If Space**: @LDeakin, @keewis, @markkitti
Readout: Solved :tada:
### Arrow?
- **Topic**: Arrow is massive - can Zarr integrate deeply with it somehow?
- **Lead It**:
- **Work On It**: @rabernat, @markkitti
- **Join If Space**: @TomNicholas, @keewis, @LDeakin, @alimanfoo, @tomwhite @joshmoore, @maxrjones, @rouault
### ~~Optional codec~~
- **Topic**: Allow for codecs to be not applied on a per-chunk basis, matches HDF5 feature, allows for an upper-bound on chunk size, helps
- Could fold into sharding discussion
- **Lead It**: Mark Kittisopikul
- **Work On It**:
- **Join If Space**: @LDeakin
### Zarr Steering Council (ZSC) / governance feedback session
- **Topic**: Opportunity to discuss the overall organization and any needs of the community. This is very much an optional topic and can also be done during coffee breaks. Please grab us.
- **Lead It**: @joshmoore, John Kirkham, Alistair Miles
- **Work On It**: @jhamman, @alimanfoo
- **Join If Space**: @LDeakin, @maxrjones
### Interaction with Foreign Standards
- **Topic**: How do we work with HDF5, (Geo)TIFF, and other formats that can encode "arrays"
- Add a chunk-key-encoding that allows chunks to have file extensions?
- @LDeakin I have an experimental extension for this in `zarrs`
- "Chunk key encoding transformers"?
- Dual encoding
- How does this relate to virtualizarr?
- summarize note on grib and scan_grib routine
- **Lead It**:
- **Work On It**:
- **Join If Space**: @jsignell @TomNicholas @nishadhka @LDeakin @maxrjones
- References and Prior work
- https://element84.com/software-engineering/is-zarr-the-new-cog/
- https://github.com/mkitti/simple_image_formats
- https://virtualizarr.readthedocs.io/en/stable/index.html
- https://fsspec.github.io/kerchunk/
### Full stack control of concurrency/parallelism
- **Topic**: Zarr-Python 3 added significant concurrency through threading and asyncio. But interactions with upstream parallel array frameworks (e.g. Dask) have suffered.
- **Lead It**:
- **Work On It**: @jhamman
- **Join If Space**: @TomNicholas, @tomwhite
### Other topics (please add below with your name)
- **Topic**: _________________
- **Lead It**:
- **Work On It**:
- **Join If Space**:
---
## Quick Questions (Optional - helps with scheduling)
**Time constraints?** (Add your name if you have these)
- Leaving early on _____ (day):
- Arriving late on Monday: @rabernat, Tina
- Have conflict during _____ (time):
- @jsignell - At STAC sprint Tuesday, Wednesday, Thursday
- @joshmoore - Leaving before lunch on Tuesday
**What's one thing you hope to accomplish by Wednesday?**
- (Add your name + quick goal):
- @alimanfoo - collect any ideas for how the steering council could better support and reduce friction for the community
- Add a discussion on CMORPH AWS S3 netcdf file refernces in virtualizarr and note on pencil and pancake chunking benchmarking
## I'm done
- Josh
- @alimanfoo
- @jsignell
- @LDeakin
- Sebastian
- @jhamman
- @TomNicholas
- @mslyksVbToikcAGZIC-lDA
- @vincentsarago
- @tomwhite
- @malmans2
- @rouault
- @perlman
- @maxrjones
- @keewis
- Hugo
- @nishadhka
-
-
-
-