# Zarr Summit Developer Days - Breakout Group Planning **Purpose**: Ensure everyone works on their priority topics across the 3 days **Survey data**: This schedule is informed by 7 developer survey responses and 14 adopter survey responses --- ## Instructions for Participants **During the opening session (9:00-9:30 AM Monday)**, add your name to topics below: - This will help us schedule breakout sessions - **Step 1**: Add new topics if needed (3 minutes) - **Step 2**: Add your name under the column that matches your interest level (3 minutes) - You can add your name to multiple topics! - **Step 3**: Answer questions at the bottom of the document (2 minutes) --- ## Core Technical Topics ### ~~Prototyping rectilinear chunk grids (variable-length/variable-shaped chunks)~~ *High priority: 57% of devs rated high interest* - **Lead It**: - **Work On It**: @keewis - **Join If Space**: @TomNicholas, @jhamman, Sebastian, @LDeakin, @maxrjones, @alimanfoo, @malmans2, @rouault, Hugo Gruson Readout: Productive, implemented in zarrs with a extensions PR ready ### ~~Improving ease-of-use and performance of sharding~~ *7 adopter mentions* - **Lead It**: Mark Kittisopikul (@markkitti) - **Work On It**: @perlman - **Join If Space**: @LDeakin, @keewis, @malmans2, @rouault, @markkitti, @maxrjones Readout: Discussed how to implement sharding at the API level and implications for virtual zarr/icechunk Group discussion: - Should it be a codec? - Seems like a missing abstraction (i.e., layout) - layouts include Icechunk and VirtualiZarr - what is the concern with it being a codec? - impact on the abstractions in an implementation (i.e., contagious async) - harder to profile with async - chunks and sub-chunks would've been more clear, could be fixed in V4 - perhaps another breakout on this? ### Designing and prototyping sparse array storage - **Lead It**: @keewis - **Work On It**: - **Join If Space**: @TomNicholas, Sebastian ### Designing and prototyping optional codecs - See below ### ~~Advancing specs for data types, codecs, and extensions~~ *High priority: 71% of devs rated high interest* - **Lead It**: @d-v-b - **Work On It**: @keewis, @LDeakin, @maxrjones, @tomwhite - **Join If Space**: @TomNicholas, @perlman, @tomwhite, @alimanfoo, @vincentsarago, Hugo Gruson Readout: Discussed clarifications needed for zarr-extensions readout ### ~~Creating reference datasets and centralized benchmarks for comparing implementations~~ - **Lead It**: @maxrjones - **Work On It**: @d-v-b, @jhamman, @perlman, @nishadhka, Hugo Gruson - **Join If Space**: @alimanfoo @LDeakin @joshmoore Readout: Discussed centralized vs decentralized, later mind-meld recommends CLI for implementations to be able run Discussion: - Starts to sound like a standard API ### Optimizing performance and reducing memory usage *Gap: Top adopter priority (11 mentions) but low dev interest.* - **Lead It**: - **Work On It**: @jhamman @LDeakin - **Join If Space**: Sebastian, @maxrjones, Hugo Gruson, @perlman --- ## Ecosystem & Tools Topics ### ~~Advancing VirtualiZarr capabilities~~ *8 adopter mentions* - **Lead It**: @TomNicholas - **Work On It**: @maxrjones - **Join If Space**: @LDeakin, @jsignell @joshmoore (particularly intersted in spec-ifing), @rouault Readout: Discussed "spec-ify" virtual chunks as native zarr, landed on a proposal to store virtual references as Zarr arrays, could conceptualize all zarr arrays as virtual arrays where "native arrays" have an implicit structure Discussion: - would it be a new node type? - probably not necessary, falls into a category of needs for a group of arrays that mean something to certain people - perhaps an implementation session? ### Upstreaming lessons learned from Icechunk *7 adopter mentions* - **Lead It**: - **Work On It**: @TomNicholas, Sebastian - **Join If Space**: @jhamman, @alimanfoo, @malmans2, @maxrjones ### Evolving the Store interface for modern use cases - **Lead It**: @jhamman - **Work On It**: Sebastian - **Join If Space**: @alimanfoo @joshmoore @maxrjones ### Enabling better Zarr visualization and web access *9 adopter mentions* - **Lead It**: - **Work On It**: @perlman, @vincentsarago, - **Join If Space**: @nisahdhka, @maxrjones --- ## Community & Adoption Topics ### Improving Zarr documentation and resources - **Lead It**: - **Work On It**: - **Join If Space**: @perlman, @alimanfoo ### Cross-disciplinary multi-scale extensions *18 geospatial mentions, 4 microscopy mentions* - **Lead It**: @maxrjones - **Work On It**: @perlman, @keewis, @rouault - **Join If Space**: @jhamman @LDeakin, @vincentsarago, @jsignell, @joshmoore --- ## Additional Topics ### Smoothing the V2 → V3 migration path *4 adopter mentions. Goal: Create migration guides and tooling* - **Lead It**: Mark Kittisopikul - **Work On It**: Hugo Gruson - **Join If Space**: @LDeakin @perlman @alimanfoo @maxrjones @tomwhite ### ~~Missing values~~ - **Topic**: How to support missing values in Zarr (has some overlap with Arrow bitmaps?) - **Lead It**: - **Work On It**: @tomwhite, @maxrjones - **Join If Space**: @LDeakin, @keewis, @markkitti Readout: Solved :tada: ### Arrow? - **Topic**: Arrow is massive - can Zarr integrate deeply with it somehow? - **Lead It**: - **Work On It**: @rabernat, @markkitti - **Join If Space**: @TomNicholas, @keewis, @LDeakin, @alimanfoo, @tomwhite @joshmoore, @maxrjones, @rouault ### ~~Optional codec~~ - **Topic**: Allow for codecs to be not applied on a per-chunk basis, matches HDF5 feature, allows for an upper-bound on chunk size, helps - Could fold into sharding discussion - **Lead It**: Mark Kittisopikul - **Work On It**: - **Join If Space**: @LDeakin ### Zarr Steering Council (ZSC) / governance feedback session - **Topic**: Opportunity to discuss the overall organization and any needs of the community. This is very much an optional topic and can also be done during coffee breaks. Please grab us. - **Lead It**: @joshmoore, John Kirkham, Alistair Miles - **Work On It**: @jhamman, @alimanfoo - **Join If Space**: @LDeakin, @maxrjones ### Interaction with Foreign Standards - **Topic**: How do we work with HDF5, (Geo)TIFF, and other formats that can encode "arrays" - Add a chunk-key-encoding that allows chunks to have file extensions? - @LDeakin I have an experimental extension for this in `zarrs` - "Chunk key encoding transformers"? - Dual encoding - How does this relate to virtualizarr? - summarize note on grib and scan_grib routine - **Lead It**: - **Work On It**: - **Join If Space**: @jsignell @TomNicholas @nishadhka @LDeakin @maxrjones - References and Prior work - https://element84.com/software-engineering/is-zarr-the-new-cog/ - https://github.com/mkitti/simple_image_formats - https://virtualizarr.readthedocs.io/en/stable/index.html - https://fsspec.github.io/kerchunk/ ### Full stack control of concurrency/parallelism - **Topic**: Zarr-Python 3 added significant concurrency through threading and asyncio. But interactions with upstream parallel array frameworks (e.g. Dask) have suffered. - **Lead It**: - **Work On It**: @jhamman - **Join If Space**: @TomNicholas, @tomwhite ### Other topics (please add below with your name) - **Topic**: _________________ - **Lead It**: - **Work On It**: - **Join If Space**: --- ## Quick Questions (Optional - helps with scheduling) **Time constraints?** (Add your name if you have these) - Leaving early on _____ (day): - Arriving late on Monday: @rabernat, Tina - Have conflict during _____ (time): - @jsignell - At STAC sprint Tuesday, Wednesday, Thursday - @joshmoore - Leaving before lunch on Tuesday **What's one thing you hope to accomplish by Wednesday?** - (Add your name + quick goal): - @alimanfoo - collect any ideas for how the steering council could better support and reduce friction for the community - Add a discussion on CMORPH AWS S3 netcdf file refernces in virtualizarr and note on pencil and pancake chunking benchmarking ## I'm done - Josh - @alimanfoo - @jsignell - @LDeakin - Sebastian - @jhamman - @TomNicholas - @mslyksVbToikcAGZIC-lDA - @vincentsarago - @tomwhite - @malmans2 - @rouault - @perlman - @maxrjones - @keewis - Hugo - @nishadhka - - - -