tags: zarr, ZEPs, Meeting
# ZEPs Bi-weekly Meetings
### **Check out the website: https://zarr.dev/zeps/meetings**
Joining instructions: [https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09](https://openmicroscopy-org.zoom.us/j/82447735305?pwd=U3VXTnZBSk84T1BRNjZxaXFnZVQvZz09)
Meeting ID: 82447735305
GitHub repo: https://github.com/zarr-developers/zeps
**Attending:** Sanket Verma (SV), Ward Fisher (WF), Jeremy Maitin-Shepard (JMS)
- No updates
**Open agenda (add here 👇🏻):**
- WF: Recording a virtual talk for AGU next week
- https://discourse.pangeo.io/t/pangeo-showcase-how-to-transform-thousands-of-cmip6-datasets-to-zarr-with-pangeo-forge-and-why-we-should-never-do-this-again/3856 - CMIP6 trying to get rid of NetCDF?
- WF: Zarr as archival? - Trying to be there - much younger format - but NetCDF is a robust archival format
- SV: The interoperablility b/w Zarr and NetCDF is a good thing in here
- WF: Zarr implementation in FORTRAN?
- JMS: Are people writing FORTRAN actively?
- WF: Yes - The latest book on FORTRAN came last year - NCZarr is supported via FORTRAN
- JMS: Zarr V3 FORTRAN would be good
- WF: NetCDF FORTRAN is used by supercomputers across US
- SV: Why end NetCDF FORTRAN?
- WF: Selfish reasons - takes a lot of time fixing the modules
- SV: Why supercomputers still use FORTRAN?
- WF: Supercomputers unparalleled performance using FORTRAN is just great!
- WF: E.g. Mathworks upgrading to newer NetCDF version is a long process
- Good to go with?
- JMS: Still need to work on this - also work on the implementation of the ZEP
- Let's go ahead with [zarr-developers/zarr-specs/#276](https://github.com/zarr-developers/zarr-specs/pull/276)?
- Thoughts on [zarr-developers/zarr-specs/#205](https://github.com/zarr-developers/zarr-specs/pull/205)
- Good to go with?
**Attending:** Sanket Verma (SV), Josh Moore (JM), Ward Fisher (WF), Norman Rzepka (NR), Jeremy Maitin-Shephard (JMS)
- Sanket recently added [PR#281](https://github.com/zarr-developers/zarr-specs/pull/281) to update ZEP2 status
- Discussions around ZEP8: https://github.com/zarr-developers/zeps/pull/48
- Jeremy to update the PR
- Davis update ZEP6
- Question to Davis: Status of completion?
**Open agenda (add here 👇🏻):**
- Cleaning of zarr-specs.readthedocs.io
- Remove 'Under construction'
- Remove 'Array storage transformers'
- Maybe rename data types to extension data types?
- JMS: [bfloat16 dtype](https://github.com/zarr-developers/zarr-specs/pull/257) should be in core spec under data type table - required and optional table separately
- NR: May need some explanation on the data type
- NR: Remove version number from the codecs and stores
- JMS: Could track down the implementations adopting the different versions of codecs/stores
- NR: No version number in metadata, so not useful
- JMS: But you'd not be allowed to change the metadata
- JM: Helps to write down this; for example a ZEP
- JMS: Implementations might not implement all the versions
- JMS: How about STAC?
- SV: STAC uses incremental versions to evolve the specification
- NR: Zarr V2 & V3 compatbility issues may arise in the future
- JM: Flip side if we're dealing with a long list of codecs - versions may help here
- JM: For example: Bumping the Blosc2 codec
- JMS: Pretty rare to change the versions
- JMS: Having a shorter voting period may help
- JM: Having a step voting period may be troublesome - experience from NGFF and OME-Zarr - strong word for the first phase
- NR: Silence in the second phase?
- JM: It'll be good!
- NR: How do we make the vote earlier?
- JM: Once month for roll call - done reading, start implementing. finish implementing, also no vetos
- JM: Graph for the progress
- JMS: Make revisions after the voting - working for now but not great
- JMS: A word for grace period: `Final revision deadline`
- NR: Finding 1-2 champion for starting a new ZEP
- JM: Having a mailing list to ask for champion
- JM: Close the ZEP proposal if it's not active for sometime
- JMS: C++ standard is much complicated than Zarr - only some people capable of changing the wording
- SV: If someone is not able to find a champion should they not proceed with the ZEP? - Not in favour of the champion process to be the only condition to move forward
- JM: List of endorses and endorsement for the ZEP process
- FYI: John Bogovic may join future ZEP meeting for ZEP8 discussions
- Thoughts on [PR#276](https://github.com/zarr-developers/zarr-specs/pull/276)?
**Attending:** Josh Moore (JM), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Jonathan Striebel (JS)
- [ZEP0002](https://zarr.dev/zeps/accepted/ZEP0002.html) finally accepted! 🎉
**Open agenda (add here 👇🏻):**
- Changes to ZEP0
- JMS: Don't think there's gonna be any changes to ZEP1 now
- JM: Could be 2 voting - but how do you get everyone to implement ZEP without voting?
- SV: Yeah! Good point.
- JMS: Don't need to go overhead for small changes
- JM: `index_location` is backwards compatible and maybe the best case scenario
- JMS: But endian codec is not backward compatible
- SV: Very natural to see this scenario coming up
- JM: Architect building a building - they go through submittals - 20% increment - show the adoption percentage
- SV: How would you define percentage?
- JM: Voting could be in phases - reading phase - implementation change - grace period
- *Jonathan joins*
- JS: Voting encourages people to read the spec
- JM: Setting a common calendar for the ZIC and ZSC - can help the author
- JS: It was good to the response when we set the deadline - also the examples of implementation currently is good
- JMS: Would be great to have a table of what part of spec they're current implementing would be great
- JMS: Having a compatibility table would be great - e.g. Neuroglancer doesn't support boolean type
- JMS: Once a ZEP is accepted the spec matters
- JM: Looking at RFC for NGFF these days
- JMS: Random reviews vs the expert group reviews
- SV: If you're going to implement the spec and how you're going to implement the spec - Form a voting procedure around that
- JS: Having a process definitely helped me to finsh the sharding - find it good as contributor!
- JM: Having rebuttal would change the tone a lil' bit
- JS: Telling to vote before implementation is fine - as you can find things spec
- JM: Having it defined in ZEP0 would be great!
- JS: Doesn't like the grace period - leave the door open a little
- JS: Having an implementation phase would be good
- SV: Reading notification during the voting phase - could be helpful
- JM: Keeping a table would be helpful - 5 states
- JMS: Three phases -
- reading phase and express opinions
- implementation phase and raise issues
- solve issues raised in the last phase
- JM: It's clear we're in agreement of phases/periods - intent
- JM: People want general confidence from the audience that they're moving forward
- JMS: https://chromestatus.com/feature/6213121689518080
- JMS: Getting formal feedback from the ZIC before the vote
- SV: Let ZIC know when you're going to put up the ZEP for voting
- JM: Worst case scenarios - no-one read the spec and someone vetos the ZEP in the initial stage
- JM: Roll call helps
- JS: May not need a second vote - those are implementation details
- JM: Add a different state - a pre-implementation checkpoint
- Endian codecs to bytes
- JMS: Need to resolve git conflicts
- Changes to ZEP2:
- https://github.com/zarr-developers/zarr-specs/pull/280 (add index_location)
- https://github.com/zarr-developers/zarr-specs/issues/277 (versioning)
- https://github.com/zarr-developers/zarr-specs/issues/278 (tracking intermediate buffer sizes)
- Adding v1 and v2 spec to zarr-specs
**Attending:** Sanket Verma (SV), Mark Kittisopikul (MK), Ward Fisher (WF), Davis Bennett (DB), Norman Rzepka (NR), Isaac Virshup (IV)
- ZEP0002 voting closes on 31st October - https://github.com/zarr-developers/zarr-specs/issues/254
**Open agenda (add here 👇🏻):**
- MK: Changes to ZEP0001 are still coming in - how do we handle them?
- SV: ZEP1 was provisionally accepted but not at the final stage
- NR: These changes are minor and would need voting from ZIC
- MK: Mention potential grace period in ZEP0000 would be helpful
- NR: Needs to be written out!
- MK: Zarr shards as HDF5 file
- ZEP0002 should proceed as it is atm
- Having a null codec to ignore the checksum would be helpful - can work on this
- Recent numcodecs release helps a lot - getting Jenkins lookup checksum
- Relation with ZEP0002 and Kerchunk?
- NR: Don't know if there is
- IV: Zarrita can read the HDF5 file using Kerchunk
- IV: Multiple arrays in a single Zarr shard
- NR: I don't think it'll be possible
- DB: Why the chunks in the directory called C?
- NR: Helps when when scanning down the groups and arrays
- DB: Any questions for ZEP0006?
- NR: Getting a JSON schema out of the ZEP0006 would be helpful
- DB: This would also help us to get a container level validation
- IV: How differs from consolidated metadata?
- DB: Consolidated metadata
- NR: Serializing children metadata would be helpful - could be a different ZEP
- DB: Flattening array
- DB: Pydantic-Zarr would have the reference implementation for ZOM
- ZEP0002 discussions at Unidata
- Mostly going for 'Yes' - but looking for resources who can handle and complete it
- MK: NetCDF has adopted HDF filters? Making a part of Zarr filters?
- WF: Would like to see spec support - supports interoperability - but hasn't considered it
- MK: Between N5 and Zarr we encounter difficulties for LZ4 codec
- DB: Sounds like N5 problem to me! ;)
- IV: List of HDF5 filters? MK: Yes, there's a list
- MK: But I think NCZarr support a select few only
- WF: Yes!
- MK: List of HDF5 Registered Filters: https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins - GitHub library for plugins: https://github.com/hdfGroup/hdf5_plugins
- DB: Getting away from storing F9 transformations in metadata
- IV: Thoughts on namespacing codecs?
- NR: Would be helpful
- WF: The overhead of maintenance and administration is daunting - experience from Unidata - How would I guarantee that information would be there in 10 years?
- IV: Pointing at URI where the codec is defined - Having this in Zenodo would be helpful
- IV: ZEP0003 progress
- SV: Waiting for technical review - would help if Martin/IV could make it the ZEP meetings to raise the discussion
- IV: Sure
**Attending:** Sanket Verma (SV), Josh Moore (JM), Thomas Nicholas (TN), Jonathan Striebel (JS), Jeremy Maitin-Shepard (JMS)
**Open agenda (add here 👇🏻):**
- Introductions with favourite book
- SV: ASOF by GRR Martin
- TN: Dispossessed
- JS: Hitchhicker's Guide to Galaxy
- JM: Kingkiller Triology (Rothfuss) — can HIGHLY recommend. Beware: only 2 of the 3 books is written.
- Issues and PRs to look at:
- JS and JMS: Mostly an implementation detail
- JMS: Make store less explicit
- JS: Should not enforce the parameter; will send a PR after 2 weeks
- JMS: Will try to make it more clear
- JMS: Chris Barnes' implementation hasn't made the change yet
- SV: Send an email for this to ZIC
- JMS: https://github.com/zarr-developers/zarr-specs/pull/264
- JS: Fortran array needs to be inverted
- JMS: C and Fortran array are contiguous in different ways
- JMS: Neuroglancer and Tensorstore has 100% V3 implementation 🎉
- Working on some CI issues and will merge it
- JM: https://github.com/ome/ngff/pull/206
- TN: Cubed discussions
- Anything which increases the performance would be useful - interested in Jack's work
- How can we get variable chunking into Zarr-Python?
- SV: Needs to finalise the ZEP0003 - will go into voting soon!
**Attending:** Sanket Verma (SV), Josh Moore (JM), Ward Fisher (WF), Davis Bennett (DB), Jeremy Maitin-Shepard (JMS)
- ZEP0006 by Davis Bennett: https://github.com/zarr-developers/zeps/pull/46 (Zarr Object Model)
- Implementation: https://github.com/zarr-developers/zarr-python/pull/1526
- ZEP0007 by Isaac Virshup: https://github.com/zarr-developers/zeps/pull/47 (String)
- ZEP0008 by Jeremy Maitin-Shepard: https://github.com/zarr-developers/zeps/pull/48 (URL Syntax)
**Open agenda (add here 👇🏻):**
- Overview of the newly submitted ZEPs
- JMS: ZEP8 Could be generalised apart from the Zarr ecosystem - provides parameters at each specific level
- DB: Considering query strings?
- JMS: Clearly diverging from convential syntax -
- JMS: Issue with `#` syntax - interpretation will be different - a few downsides of using it - argument for using fragment identifier
- JM: Descending down the attributes in N5 land?
- DB: Query syntax like JSON - the idea was shared among and used across in N5 land
- DB: ZEP0006 (ZOM) discussions - [Tally Lambert](https://github.com/tlambert03) from Napari was looking for JSON schema for Zarr
- JMS: JSON Schema for tensorstore
- V2: https://github.com/google/tensorstore/blob/master/tensorstore/driver/zarr/schema.yml
- V3: https://github.com/google/tensorstore/blob/master/tensorstore/driver/zarr3/schema.yml
- DB: Wise thing to move towards the standardisation of the schema
- JMS: Consolidated metadata
- JM: Engage positively with consolidated metadata and not break it
- JMS: V3 consolidated metadata not in-line with ZOM would be a bad thing :)
- DB: Could be use with HDF5 as well and can be future proof for the upcoming Zarr specifications
- SV: Zarr shards as a valid HDF5
- DB: Need to have a mechanism where both the formats can talk to each other - otherwise may lead to brittleness
- JMS: The approach is hacky atm
- Martin wants to kick-off [ZEP0003](https://zarr.dev/zeps/draft/ZEP0003.html) voting ASAP
- Prototype implementation: https://github.com/zarr-developers/zarr-python/pull/1483
- Techincal review needed
- DB will have at it soon!
- JMS: Useful for Tensorstore
**Attending:** Sanket Verma (SV), Norman Rzepka (NR), Ward Fisher (WF), Jeremy Maitin-Shepard (JMS)
- ZEP0006 by Davis Bennett: https://github.com/zarr-developers/zeps/pull/46
- ZEP0007 by Isaac Virshup: https://github.com/zarr-developers/zeps/pull/47
**Open agenda (add here 👇🏻):**
- WF: NASA F.15 grant could help hosting Zarr-Con over at Unidata
- SV: Update on new ZEPs by Davis and Isaac
- NR: Overview of the ZEP0007
- WF: Character encoding addressed? - Not implemented robustly across NetCDF
- SV: Norman as co-author?
- NR: No, just left some comments
- JMS: Define a name for the codec - array to bytes - can be applied to raw data buffer
- NR: Could model it as a data type - not clear how the translation from bytes would work in a codec
- JMS: Encourage a spec PR first - make things straightforward
- SV: ZEP document and spec PR - anyone can come first - depends which is the clear and straightforward way to introduce the changes
- SV: ZEP0002
- JMS: Extremely close on tensorstore
- NR: Zarrita.js can be added to the sharding implementation in the issue review
- NR: Adding ZipStore as a ZEP
- JMS: Added read-only support for ZipStore to tenstore
- NR: Certain features that can be included in the ZEP - like allow different types of hierarchy
- JMS: Various ways to use ZipStore in Zarr-Python - depends on different ways how want to organise your data in the Zip
- NR: Maybe more of a convention - Zip on S3: How do you access it? (URL gets funky)
- JMS: `s3://bucket/path/to/zip.zip|zip:path/to/array/|zarr3` - pipe URL - convey what's happening - `:` downsides - they're valid in a URL
- NR: Go down further and address things further in the Zarr like array
- WF: We have Zip support in NCZarr - not the similar URL style
- NR: Microscopy folks - napari folks - Saalfeld could be potential people who could work on the Zip ZEP
- NR: Protocol for Google storage? GS or GCS
- JMS: `gs://bucket/path` - GS
- JMS: General Zip required sequential access
- JMS: Standardizing some kind for URL would be good thing
- NR: Getting feedback from Ward, Stephan Saalfeld would be good
- WF: HTTP post style syntax in NetCDF is supported
- WF: What would happen if we try to read non-Zarr store by Zarr?
- JMS: Looking at the metadata file and then figuring it out?
- WF: Some part of NetCDF uses HDF5 and we try to open it with Zarr and it crashed
- WF: Curious to what the failed `open()` should look like? Having a defined behaviour would be good
- JMS: Launch missiles if the data is malformed! 😄🚀
- WF: NetCDF have certain error code when it can't read insted of crashing the software - should be a part of the spec
- JMS: Could be a good addition
- WF: Just curious about the crashing!
**Attending:** Jeremy Maitin-Shepard (JMS), Sanket Verma (SV), Josh Moore (JM)
- ZEP0004 preliminary work for review: https://github.com/zarr-developers/zarr-specs/pull/262
- Open PRs:
- JMS: Lightweight ZEP process for adding codecs and datatypes
- JM: How do you think it should look like?
- JMS: Opening a PR and get the votes from ZIC and ZSC could be it
- SV: Minimilastic ZEP for adding codec and data types
- JM: The voting process keep going-on for codecs and data types without hinderance from the bigger ZEPs
- JMS: Can work on a lightweight ZEP template for codec and dtype
- JM: Certain datatypes addition may be possible blocks for some domains - having a fast-track ZEPs would help that
- JMS: Would like to work on dtypes for ML use-cases
- SV: Tensorstore implementation
- JMS: V3 and Sharding implementation almost complete - working on some bugs - will be finalising soon!
**Attending:** Sanket Verma (SV), Josh Moore (JM), Jonathan Striebel (JS), Norman Rzepka (NR)
- Zarr-Python working groups
- Benchmarking and performance: https://github.com/zarr-developers/zarr-python/discussions/1479
- Refactoring: https://github.com/zarr-developers/zarr-python/discussions/1480
- POC implementation of [ZEP0003](https://zarr.dev/zeps/draft/ZEP0003.html)
- SciPy 2023 proceedings
- Talk slides: https://doi.org/10.25080/gerudo-f2bc6f59-035
- Tools update slides: https://doi.org/10.25080/gerudo-f2bc6f59-038
- ZEP0001 Contributors section: https://zarr.dev/zeps/accepted/ZEP0001.html#contributors
- JM: JS are you using Zarr at work?
- JS: We're convinced that it's a good idea to use Zarr! 😄
- Growing rapidly - good thing
- SV: Presentation for EuroSciPy 2023!
- JS: Coming up
- SV: You can also cite SciPy 2023 talks
- JM: Will be using the EuroSciPy 2022 poster for an upcoming meeting
- SV: Tweeting about the contributors for ZEP0001 and tagging everyone
- JM: Thanks to everyone! 🙏🏻
- JS: Removes the watermark from the V3 spec_ https://github.com/zarr-developers/zarr-specs/pull/260
- Figured out the CSS selector
- JM: Zeiss got back to JM
- SciPy 2023 discussions
- James Webb Space Telescope - how they can use Zarr
- JS: Misc. link: https://github.com/braingram/asdf_zarr
- JS: Met with Francesc (Blosc) ?
- JM: Yeah, we spend a lot of time and it was great!
- How Blosc and Sharding can co-exist together - like a package
- SV: Sharding can provide cloud enability to Blosc2 - discussions with Francesc
- JM: Recently filled out feedback for CZI EOSS - we can do join grants as well
- NR: Writing a Blosc2 codec for Zarr could help
- JM: Dask chunking comes down to `.chunk()` property for the object - how about data API chunking specification around the chunks? - chunking could work across the whole PyData stack - and we can add sharding too - could help with what's the efficient access pattern for sharding chunk!?
- Unified package for Zarr (Sharding), Blosc2 and Dask and other packages
- NR: Interesting! Sharding has 2 access pattern
- Chunk level for read and shard level for write
- For Dask purposes you probably want the shard access pattern - because you're in a HPC environment
- JS: Writing it as a spec and collaborating with Dask and Blosc team
- NR: Agreed!
- JS: Sharding also needs a lot of explanation - lots of education needed
- NR: Limbo state rn - blog posts and videos can help a lot - maybe after 6 months
- JM: Unifying names - block.dev? - and same documentation as well
- SV: Can include HDF5 as well
- JM: HDF5 could be beneficial if you're working on cluster/HPC and Zarr can help you bring that data down from the cloud to your machine
- NR: Can apply to EOSS grant with same applications
- JM: Less chances of getting funded
- JM: Zarr can solve world hunger! 😁
- NR: Good momentum but need to deliver as well
- JM: Zarr as a sister project of Napari!?
- SV: Having conversations regarding fundraising for Zarr to keep the project funded
- We can work on joint grant or something similar
- NR: Lightweight ZEP process for codecs?
- JM: Light voting procedure?
- NR: Could be!
- JM: A new ZEP to update ZEP0000 and add a new type of ZEP in the types and loosen the restrictions
- NR: No problem with voting but how do we setup the ZEPs up for voting - anyone can do it via creating a issue but that can lead to mayhem if everyone starts doing it!
- JM: We're following a chronological order but we can have a statement which can allow lightweight ZEPs to come in while big ones are on the way
- Having a separate ZEPs for codecs and extension with less voting burden
- How do we schedule the ZEPs for voting?
- Something for https://github.com/zarr-developers/zarr-specs/pull/256
- JM: There are improvements we can make to ZEP0000 and let's keep working on that
**Attending:** Sanket Verma (SV), Norman Rzepka (NR), Ward Fisher (WF), Jonathan Striebel (JS), Jeremy Maitin-Shepard (JMS)
- Send [ZEP2](https://zarr.dev/zeps/draft/ZEP0002.html) to the ZIC
- Merged https://github.com/zarr-developers/zarr-specs/pull/253
- Need to merge https://github.com/zarr-developers/zeps/pull/40 -> SV
- SV will send out the email to the ZIC
- Try to fix crosslinks
- JS to close [PR #152](https://github.com/zarr-developers/zarr-specs/pull/152)
- NR to create an issue to gather votes and update the ZEP [PR #40](https://github.com/zarr-developers/zeps/pull/40)
- SV to send an email to the ZIC after issue creation and PR merging
- Everyone looking forward to it!
- Mark wanting to organise a ZEP2 review call - but didn't happen
- Sharding implementation
- JMS working on Tensorstore implementation
- V3 implementation on Tensorstore is close to completion
- Zarrita had a noticeable overhead while running the benchmarks
- NR: Zarr-Python sharding implementation has deviated over the time
- JS: Make sense to add sharding as a codec once V3 in Zarr-Python gets in
- JMS: Is there aim for having the similar API for V2 and V3 in Zarr-Python?
- NR: Zarrita doesn't have various stores
- JMS: Is Zarrita compatible with FSSPEC?
- NR: Yes!
- ZEP1 and V3
- Once the PRs are merged, SV to send out the FYI to the ZIC
- SV: Any developments on Unidata side?
- WF: Not yet, but it's on our roadmap
→ SciPy 2023
**Attending:** Sanket Verma (SV), Josh Moore (JM), Jonathan Striebel (JS), Ryan Abernathey (RA), Ward Fisher (WF), Daniel Jahn (DJ), Jeremy Maitin-Shepard (JMS), Norman Rzepka (NR)
- SV: ZEPs 3 and 5 ...
- RA: feedback on the ZEP
- Need to be implementing as we go, otherwise leads to stalling
- JS: Zarr-Python has tech debt which makes it difficult to implement new stuff
- *Impromptu round of introduction*
- JMS: Reference implementation in any language is helpful for any new ZEPs
- RA: Having explicit tweet/statement about implementation would help
- JS: benchmark repo? also sample data?
- NR: Sample datasets
- RA: https://github.com/zarr-developers/zeps/pull/28
- best practices going forward
- but a way to support old conventions
- from OGC, "conformance class" determined with "conformance test"
- namespacing up to the convention
- would like to get it into draft form and then we can move forward with the process
- SV: few open comments?
- RA: just merge it and move the process forward.
- still need to open a template
- NR: add existing conventions at this point (OME-NGFF)
- JM: Thoughts on multiscales?
- RA: It should be an extension
- JMS: Multiscales doesn't lead to a lot of objects
- SV: ZEP0002 review process to kick-off by next week
- NR: Zarrita has sharding implementation
- JM: Kicking-off review process and having neat benchmarks, any idea how we could do both?
- NR: Make a new issue for ZIC feedback
**Attending:** Sanket Verma (SV), Alan Watson (AM), Jeremy Maitin-Shepard (JMS), Josh Moore (JM), Norman Rzepka (NR)
- SV gave a talk on Zarr @ Vrije Universiteit Amsterdam - https://vu-nl.libcal.com/event/4030713
- Slides and code: https://github.com/MSanKeys963/presentations/tree/main/vu_amsterdam_6_23
- Are there any command line tools for Zarr?
- Found this: https://github.com/BaroudLab/zarr-tools
- JM: Was working on something on this but didn't use it much
- NR: There are bunch of tools which you can use OME-Zarr, having something like H5LS would be cool
- JMS: Operations for small things makes sense - rechunking, copying
- JM: Nextflow - workflow engine
- AM: BIL is working on games
- Pushing them to use Zarr for their image (.png) data
- AM: Interest and benefit in attending brain conferences - someone from the Zarr community
- AM: Allen is using Zarr for their work
- SV: Recent paper out of Allen: https://www.biorxiv.org/content/10.1101/2023.05.22.541700v2
- Extensive usage at Allen have revealed some problems and it may be worth addressing them
- JMS: Writing electrophysiology data in Zarr rather than tiff is a good oway to go forward
- SV: Blog post on V3 coming out soon!
- JMS: Fallback data types issue needs to be addressed
- JMS: Not clear how it'll be specified
- JMS: How do you handle it in Zarrita?
- NR: Currently, we do not!
- SV: How serious it is? Implementation or spec issue?
- JMS: Kind of ignoring for it now! We're in implementation phase now!
- JM: If everybody is ignoring it, then it's fine
- NR: Would not be straight forward to add it later!
- JMS: If implementation doesn't support, it'll fail
- SV: Have you started on the implementation?
- JMS: Tensorstore has V3 minus sharding; planning to work on it this week
- NR: Rust implementation of V3 - https://github.com/clbarnes/zarr3-rs
- NR: Benchmarking in Zarrita
- SV: Benchmarks in Tensorstore?
- JMS: Seen bottlenecks in IO layer than array layer
- NR: What about codecs?
- NR: 10x performance improvement would be great!
- NR: ZEP0002 review timeline
- SV: V3 blog post, feedback for ZEP process, and then we can add ZEP0002 in the pipeline
- NR: Ok!
- SV: Maybe we need to invite Chris and others from the ZIC to the ZEP meeting
- JM: There used to be a Zarr-Rust meeting
**Attending:** Ward Fisher (WF), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Ryan Abernathey (RA), Norman Rzepka(NR), Josh Moore (JM)
- SV: ZEP 4 → https://github.com/zarr-developers/zeps/pull/28
- RA: Still working on it
- SV: How can we help?
- RA: Need to decide if it's going to be a general convention (for bio-imaging, geospatial, genomics etc.), or a convention of Geospatial domain all
- RA: Also need to decide if the dataset are conforming to a convention or not - lot of legacy data out there which doesn't conform to it
- RA: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html
- WF: Convention doesn't need to be broad, cf convention are based on NetCDF model - but there's nothing in the NetCDF library or code that mentions the cf conventions!
- RA: The existing conventions are broad, and it's difficult to place cf in a specific place
- WF: Agree with you
- RA: Define what's the process to put a new convention for the community
- JMS: You have group level attribute and array level attribute?
- RA: Mostly yes!
- WF: There
- RA: Getting all convention on the website would be a good way for cross domain and community collaboration - conventions can be composable - conventions could not be universal
- WF: Conventions move slowly - take time to adopt to new things - took a good amount of time to solve SST (Sea Surface Temperature)
- RA: Will get the another draft out soon
- JMS: Not feasible to namespace an attribute?
- RA: It would require a deep re-factoring for the software we use! - It would break Xarray, GDAL, NetCDF - Zarr-Python doesn't care about the attributes
- WF: Namespacing would definitely break the NetCDF library!
- RA: JMS, how do you handle conventions?
- JMS: Generally, doesn't invent new conventions, and implement existing conventions - the data formats I invented, I defined those conventions - these existing conventions doesn't lack _certain_ things
- NR: In the process of adopting Zarr V3, currently using `OME-Key`
- WF: Attributes are strictly defined - defined to be interpretable not changeable
- JMS: Maybe the best idea is to say clearly that _X_ datasets use the conventions and multiscales
- JMS: https://github.com/zarr-developers/zarr-specs/issues/242
- NR: No need to separate them - my opinion: to keep it as it is - maybe Chris's comment is coming out of the Rust world and separating the codecs will be convenient for him
- JMS: https://github.com/zarr-developers/zarr-specs/pull/236#issuecomment-1539188066
- JM: Cares about the codec pipeline - adding some transformations in the middle would be tricky!
- NR: Feel the same! - Sharding as a single codec makes sense but adding anything would make it complicated -
- JM: Current partial codec --- define the metadata format for shard codec would be great!
- NR: Adding 2 partial codecs would make it tricky to implement
- JMS: Blosc is kind-of sharding codec - not clear if using sharding as a partial read codec is good idea!
- JMS: How do you have partial writes for the codec?
- JMS: Fallback data types
- JMS: Need to have a broader discussion
- NR: Also, do extensions data type need to have a fallback?
- JMS: No, it's optional
- NR: The definition of fallback - like a tuple - having a datatype and the fallback value
- JMS: Maybe it's the way to go!
**Attending:** Sanket Verma (SV), Josh Moore (JM), Ryan Abernathey (RA), Norman Rzepka (NR), Jeremy Maitin-Shepard(JMS)
- Discussions about climate and weather 🌡️
- [ZEP0001](https://github.com/zarr-developers/zarr-specs/issues/227) is finally accepted! 🎉
- Implementors can start working on their implementations
- Will be moving the ZEP0001 under the new `Accepted` section
- Will move it under `Final` section once we have atleast one complete reference implementation
- SV - updates and next steps for ZEP0001 and V3:
- 1 year into the process... ((https://zarr.dev/blog/v3-update/) first discussion)
- [gdal](https://github.com/OSGeo/gdal/pull/7706) moving quickly
- will be checking in on the various implementations
- zarrita as one of the most complete (in terms of code, not docs or tests), i.e. PoC
- JM: Reference implementation needs to be useable!
- NR: Yeah!
- RA: why it's not a complete implementation?
- NR: no optimizations (sharding, etc.). meant to be easy to read code.
- lacks features like buffer protocol, etc. but could be used.
- don't currently plan to maintain it over a long period of time.
- SV: was thinking less of end-user and more of supporting all the features so others can refer to it.
- NR: that probably could be done now.
- NR: could write an intro for people to read. (don't want to write end-user docs)
- RA: NR's production implementation? use different file format currently. must have sharding.
- considering using zarrita as the implementation (for Python stack)
- also have a scala stack (baked into software)
- JM: ZEP0002 voting and discussions
- JM: We could open up the voting for ZEP0002 and give a month/two month/full summer for voting - any open issues?
- NR: None!
- JM: Shard as recrusive Zarrs? - treat internals of shards like another Zarr
- NR: Sharding being a codec would work that way but it's more of a implementation detail
- JM: What would it look like from a URI structure? - similar to what Saalfeld is doing in N5 ecosystem - if I access a chunk inside a shard and I could treat it as a Zarr array and not as a blob
- NR: Fair enough! I could have something like this in Zarrita
- NR: Not have implementation details in the spec but rather point to the reference implementation for the details
- RA: ZEP process should be more of an iterative process and not an ultimatum
- JMS: I feel, most of the implementors would be working on sharding and V3 together
- RA: flat files to virtualization (experimentation). (Once ZEP0003 lands!)
- Also want to see linked referencing in metadata (to allow browsing through HTTP). Better than consolidated. For e.g. infinite hierarchies, allows browsing like a catalog
- Allow parent to list it's children in Zarr groups
- Also multiscales. Relates the arrays. Within the array directory?
- NR: listing parent/children
- NR: paths...
- RA: see "link" in https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#link-object for any node (self, root, parent, child)
- NR: Discussions on umaintained Zarr implementations
- SV: Dissolve them with the help of the maintainers
- JM: Tricky to get a hold of them
- SV: Start deprecating them and then removing them from https://zarr.dev/implementations/
- JMS: But all of that is V2 - if there's going to be something on V3 we can certainly help them