---
tags: draft
---
# CZI EOSS4 Reports
<details><summary>Josh editing</summary>
From https://chanzuckerberg.force.com/
* Zarr: a common backbone for the scalable storage of annotated tensor data (EOSS4) Due: 8/1/2022 RR-5765
* Zarr: a common backbone for the scalable storage of annotated tensor data (EOSS4) Due: 10/31/2023 RR-5766
# EOSS4 Interim Report #1 (RR-5765)
## I. Grant Overview
### Grantee Name
NumFOCUS Inc.
## Grantee Contact
Ms. Leah Silen
Executive Director
NumFOCUS Inc
P.O. Box 90596
Austin, TX 78709
Email: leah@numfocus.org
### Key Personnel
| Name | Email | Affiliation | GitHub Handle |
| ---- | ----- | ----------- | ------------- |
| Josh Moore | j.a.moore@dundee.ac.uk | University of Dundee, Scotland | joshmoore |
| Sanket Verma | | |
## II. Financial Overview
### Proposed Budget (FYI)
Our proposed budget of $397,625.00 was comprised of:
* $150,000 for contract work
* $22,500 for contract overheads at 15%
* $165,00 for a Community Manager salary
* $28,875 for C.M overheads at 15%
* $7,500 for C.M. travel
* $3,750 for C.M. hardware
* $20,000 for trademarking and other standardization efforts
which was funded in full.
### Budget narrative
* changes to original budget: Expect a surplus for both community manager and contractors
* challenges to spending:
- hiring process throughout 2021, community manager began Jan 2022
- Contractors on hold awaiting ZEP0001.
- Unfortunately open-source developers keep contributing projects
- xarray datatree
- netcdf support for xarray (Deliverable whatever)
* plans for use:
- assume extensions
- advanced developers: sharding, IPFS
- netcdf junior developer
- with ZIC implemented, focus on community building (GSoC, blogs, etc.)
## III. Progress Overview
### Progress towards the deliverables
Below is a review of the most recently revised deliverables; more discussion follows:
* A1. API unification
* Seamless interchangeable, data-apis, dask, numpy, napari, indexing | SOME | |
* A2. FSSpec
* fsspec zips, etc. | TBD | |
* A3. netcdf | SOME | |
*
* A4. xarray/multiscale | Lots? | |
* A5. sparse arrays, GSOC? | Some | |
* B1. upgrade tools | None | |
* B2. test_suite verifications | None (but GSOC) | |
* B3. OME/pangeo v3 specifications and extensions | Some | |
* B4. project maturity (DNS, website, logo, etc.) | Lots | |
- Logo, trademarking
- ogc standard
* No progress on: A2
- Community manager
- Hiring (NumFOCUS, etc.)
- Blogs
-
- ZEP...
- ZSC/ZIC/ZEP
- Xarray/B-Open
### Major changes in scope or project plan
Q: Where do we include sharding? As extension in V3 since lots of community support?
--> sharding here
sharding
scalableminds
potential new member, seeking funding
### Key outputs and project recognition
Community manager
ZEP
ZIC
OGC
xarray
OSSci???
- Blog and website revamping
- Assisting with the releases
- Social media coverage
- Speaking at conference and meet-ups
- ZEP and ZIC formation
- Zarr participation in GSoC for the first time
- V3 progress
- Community calls engagement
- Surfacing important stuff for ZSC meetings
- OGC Standard
- Download numbers (PyPi, Conda, mamba)!?
-----
## TODO Tweet:
- respond to https://twitter.com/notjustmoore/status/1432795729890877441
## Original deliverables
A key feature of the Python data ecosystem is the reliance on simple but efficient primitives that follow
6 / 39
well-defined interfaces to make tools work seamlessly together (Cf. http://data-apis.org/). NumPy
provides an in-memory representation for tensors. Dask provides parallelization of tensor access. Xarray
provides metadata linking tensor dimensions. Zarr provides a missing feature, namely the scalable,
persistent storage for annotated hierarchies of tensors.
A. Strengthening bridges
EOSS 1 funded Zarr to: develop a v3 of the format specification, extend support to other programming
languages, and solidify the project’s governance and operation. As a result, Zarr built bridges to several
open-source projects. We now seek to establish Zarr as a standard storage mechanism across these
communities.
Concretely, we propose the following list of independent milestones. Each has been discussed with the
referenced projects and is suitable for contracting via NumFOCUS:
A1. We will work with array-providing projects NumPy and Dask on API unification. This will free
developers to transparently choose between implementations, making algorithms more generalizable
and scalable.
A2. We will work with the fsspec community to foster the fsspec-reference-maker specification. This
effort will allow accessing non-Zarr files (HDF5, TIFF, Zip, etc.) as if they were Zarr.
A3. Similarly, we will work with the NetCDF community, a long-time provider of stable, file format
solutions, to have transparent access to both Zarr and NetCDF4 (HDF5-based) files.
A4. We will work with the Xarray community to formalize the multiscale array representation that
resulted from Zarr’s EOSS 1 funding. The result will be a clear, public home for such cross-cutting,
community conventions.
A5. We will work with the community to identify and implement extensions, as defined in Zarr’s EOSS 1
work. Sparse arrays, e.g. from the Awkward Array project, are a first candidate.
B. Building community and trust
Beyond these bridges, we seek support to continue fostering users and contributors into our own
community.
B1. In addition to API stability -- vital to growing OSS ecosystems, data formats have the additional long-
term burden of preventing data loss. With new Zarr versions, older data formatted may become less
accessible. To ensure data integrity, we will provide data producer’s with upgrade tools.
B2. To encourage the creation of new data in Zarr v3, we will engage with domain-specific organizations
like pangeo and the Open Microscopy Environment to define specifications which meet their annotation
needs, increasing the FAIRness of their tensor data.
B3. A primary difficulty of our EOSS 1 grant, exacerbated by the pandemic, was providing timely
feedback to paid and open-source developers on their work. We propose funding a dedicated community
manager to that end.
B4. Finally, as part of community management, a number of project maturity tasks could be addressed,
including: trademarking the Zarr logo, expanding the web presence, and shepherding the Zarr format
through certification bodies, e.g., OGC, ISO.
We hope with these activities to convince scientific communities to trust their long-term data to this
young yet powerful format.
</details>