--- tags: zarr, zsc --- # ZSC 2021-09-27 Attending: Alistair, Josh, Ryan Abernathey, Ryan W * [TODOs from last time: 2021-09-01](https://hackmd.io/IRjXZ8lRRzCyQD76R9wyhw?view#Additional-paper-notes-amp-TODOs) * JupyterBook Intro - https://distill.pub/ (https://distill.pub/2021/distill-hiatus/) - Use of templates, etc. web first. - need a couple of designers. * Governance (from last time) - Twitter: tweetdeck & credentials - invited Ryan A to tweetdeck. * carbonplan (mapbox) - Alistair: nice post - Ryan: think we should focus on web-first - Ryan: want smaller chunks for web (--> caterva) - That's what would wave a wand at - Want to work together - Alistair: clarity on functionality? blosc2 v caterva - blosc2 is n-dim aware... - RA: hard to understand. (we aren't blosc experts anyway) - header stuff is useful (metadata layer that we don't need) - most of what we need is in blosc2 (we avoid the header because it's redundant on chunks) - good to try to get caterva demo - (post-meeting) still need array order for blocks. might be fun to experiment. space-filling curve for batching - similar to partial reads. need to know range - RA: don't love how it's coded. - https://github.com/zarr-developers/zarr-python/blob/78eb8b728e92cf5cbb6ff58d7da0d4a26c54a0ec/zarr/util.py#L548 - map array slice to bytes within (chunk) file - John has different opinion: completely bypass numcodecs. Expose caterva array directly in zarr (already ND) - would work but caterva wouldn't be a codec (leak in abstraction) - propose instead to augment numcodecs with possible awareness of the underlying arrays & slicing capabilities - chicken & egg: playing with tools to figure out what we want to achieve - Josh: trying a non-blosc sharded backend? - Vital: keep clear abstractions (similar for V3) - **Is there a missing concept?** Key question. - Blosc2 Python example of chunked data: https://github.com/Blosc/python-blosc2/blob/main/examples/schunk.py - Josh: explanation of webknossos format (see last community call) - RyanA: Extend to support uncompressed. Know how to find something then. Pass through codec for flat-binary. Propose to structure the API -- anything that lives within a single file is accessed through numcodecs. (Slight expand to some degree) But numcodecs is for reading blobs. Zarr is responsible for coordinating many of those blobs. - Alistair: have the feeling that all through the numcodec API isn't quite right, since it assumes you've retrieve (i.e. just a sequence of bites). Something needed in the storage in the API. - Josh: translation like fsspec-reference-maker? key --> (url, offset, length) ... and multiple chunks?! - RyanA: don't believe we can outsource it. (just like V3) - Alistair: how do we solve these difficult design problems then? Previously AM, RA, Matt Rocklin, Stephan Hoyer. People were engaged. We need a forum. We need some input. Experience and knowledge. - Josh: (1) comm. mgr. to run design meetings? (2) think we can get WK to join the community - Josh: working on xarray SoW. Not sure how to balance it. (conflict) - roughly "support zarr multiscale in xarray" - Ryan: forcing third-party library to R&W image pyramids would force us to formalize the convention. xarray has a richer model. - being able to encode that in a round-trip-able way - show that it can be read by downstream software - increase adoption of format - forces to make zarr better (named dimensions) - cf. fsspec, came through xarray - repository for zarr-conventions (registry, discovery) - sidelined by the v3 spec. - AM: keep extension and convention separate. need a home. - JM: https://data-apis.org/ ? - JM: file-loading? anything else? - AM: some conventions in genomics community. our names. - JM: tabular data... - AM: include b-open in review, percentage of their time. - RA: hear about companies using zarr. people don't know. - "looking for maintainers." - AM: could try to help mentor maintainers. 1hr/wk - community manager for coordinate. advertise. build capacity. schedule with someone - RA: https://medium.com/pangeo/supporting-new-xarray-contributors-6c42b12b0811 * Governance - numfocus updates (hire, meetings) - V3 Spec & Unidata funding (6-8 months until begin) - any vetoes on - logo - trademarking - zarr-format.org: [issue585](https://github.com/zarr-developers/zarr-python/issues/585) * GitHub: (day-to-day business) (from last time) - Dependabot and other "noise" - CODEOWNERS - Reviewers.... * Brainstorming - Corporate involvement - Google 20% / xarray-beam - protocollabs - Contractors - Webknossos - bopen - blosc * Zarr backlog * Please list issues here!