--- tags: zarr, zsc --- # ZSC 2021-11-30 > *Previously*: [2021-09-27](https://hackmd.io/zRYgVxgMQZegcEqqNY3tMQ) ## Attending * Josh Moore, Ryan Williams, Ryan Abernathey (i.e. Persephone), Alistair Miles, John Kirkham ## Agenda * Status - Outreachy & Contractor status (Josh) - generally moving forward - Manager status (Josh) - 50% position (part-time) - stackoverflow - RA: anaconda, 2i2c, quansight ([sgkit](https://pystatgen.github.io/sgkit/latest/)**) - different skill set - push through pangeo? someone in Ryan's group (not currently) * Decision making (e.g. ["rollcall" XLS"](https://docs.google.com/spreadsheets/d/17kHn1ANinDjeOInNUQA2FX2MOdayJqShMRbTqQYJCyU/edit#gid=0)) (All) - logos: just pick - one pager: - https://docs.google.com/document/d/1uKwOpBeybNxh1daNNG-9kpYWTA9rIzSxHrZe5i1QPBM/edit - add C / R and get feedback from unidata - JM: spend money on it? difficult ask - JK: developers are more comfortable with C++ - JK: convince netcdf to buy-in to C++? (extern C / linked to libstdc++/libc++) - RA: generally great to work with unidata - JM: use one pager to discuss with Ward. (going for funding) - RA: fsspec, numcodec, and zarr all in C - AM: sensitive to what Ward wants. Feel good about the work they've put in. - tiledb: need to talk to stavros - "if you implemented Zarr as alternative underlying storage with feature parity" - AM: what's in it for them? - RA: need to think about hard about this. don't know what they want out of this. what are they pitching to their investors. don't see a world where Zarr isn't a thread to their business. they are sell cloud business (closed versions of what we want all in our domains) - RA: kerchunk as game-changing, no transcoding. - RA: (*alternative*) deprecate zarr, keep API, switch backend to tiledb. delineates what the value is. tiledb performance. - AM: what we lose? - hackability/simplicity/transparency (RA: hard for interoperability) - the zarr playground community (openness/exploration would be diminished. see reference implementation in Python). Extensions were trying to get more ways for people to come in. - AM: say we want to protect *choice* (cloud portability). good to be able to migrate. "you can take over the world as long as people can get back to zarr". - JK: tiledb is focused on particular data that's not amenable to microscopy. bang their heads on it. - AM: "feature parity & translate between the two formats, that makes it easier for us to use" - zarr-format.org ([issue 585](https://github.com/zarr-developers/zarr-python/issues/585)) - zarr.dev: https://domains.google.com/registrar/search?searchTerm=zarr.dev&utm_source=google&utm_medium=cpc&hl=en&_ga=2.41802148.783600009.1638294970-1238279762.1638294970 * Sharding - AM: spent a few hours looking at caterva & blosc2. Understood perfectly, but have since forgotten the reasoning, but: - feeling: caterva and blosc2 are a distraction. breaks the layers of abstraction. difficult to do certain things - would like to get value out of it and work with them, but a distraction. - scalableminds layout seemed sensible. want a per-array storage layout. seems like a good approach. - JM: is the complexity worth it? - AM: help to have specific use cases (for benchmarking)? drove the early stages of zarr development. packing into shards means you can cut out a lot of the back-n-forth. just intuition, not sure about it. (if there's enough interest as it seems, then) - RA: continugous in space satellite images chunked in times. fail-case is time-series from a single point. worst-case scenario. - AM: one use case is like that: primary access pattern and you sometimes need another; another set is where you sometimes big and sometimes large slices - RA: async helps with the latency. (i.e. reducing how much data that's returned). Note: *must* chunk that way based on the imaging stream. - JM: push back from the company - AM: v2 or v3? JM: that's a looming issue. - JK: they did mention v3 - AM: in v3 spec, there is no hook to change layout per array. there should be. important layer of abstraction to introduce. one option is the current zarr format. another would be the sharded layout. there might be other mappings. also: what's core & what's extension? - JM: does their PR help point to that abstraction? - AM: spec needs translational mapping between chunks and storage objects ("layer of indirection" but with right terminology) - JM: can do it with .zarray but it's a LOT of work. - AM: will this help push us over the line to V3. something substantial and new to offer the world. - RA: copy what tiledb does? AM: they store data by write operations JM: our experience is that you need a compaction. RA: trade-off is concurrency. - AM: e.g. need awareness of shard boundaries JM: think so. (could just use tiledb for writing) * ZSC itself - next meeting: Josh to organize something in January - RW's email: January refresh is coming. Could offer a carrot to some contributor. Hailey? - Don't remember the process - Hailey as an option? - AM: good to look for opportunities. also haven't had time (hanging on by the skin of my teeth on the sharding conversation) - good to think about candidates. who should we invite? - JK: not a zero-sum game. people can regulate their involvement. - JM: just must be able to make decisions - JK: had an emeritus position (doesn't count if you don't vote) - need to codify - also monthly meetings? - RA: also guilty. lack of clarity on what the expectations. forgotten/lost. there are different ways to contribute. Josh burnout is strategic risk. can add new people. no one *needs* to step down. good to have different perspectives. could do some process things to help. bringing enthusiastic people on board would help. in a similar position with pangeo. what is helped was to get support. admin for 10/hrs per week. schedules meeting. updates website. etc. - *tl;dr* Focus on: how does the ZSC help the project? also more clarity on the process. and get some support (i.e. help the health of the ZSC) - RW: perhaps 2 orthogonal things. step-down and to add-others. whatever is best for the project. - last extension goes to the end of this year. would love to get that wrapped up somehow. (asked for 6 months last time) happy to not overstate what we achieved and say what did and did not work. - JK: should check on 5 signatories issue with numfocus * Tabled * GitHub: day-to-day business - Contributors (no action taken) - Issues that need discussing (list here) * Anything else? - JM: Nature Methods, "Lots of talk", Brain * Post meeting actions (Josh) - [x] update manager position for part time: email sent to Nicole - [x] move forward with pink logo: email sent to caroyln - [x] update one pager with C: email sent to carolyn - [x] purchase zarr.dev domain name: email sent to carolyn - [ ] discuss v3 with scalableminds - [ ] discuss with Stavros - [ ] discuss C library with Ward - [ ] review governance for ZSC changes, process, expectations, emeritus position ([eg1](https://github.com/paketo-buildpacks/rfcs/blob/main/text/0025-emeritus-status.md)) - [ ] organize meeting in January