owned this note
owned this note
Published
Linked with GitHub
---
tags: zarr, Meeting
---
# Zarr Bi-weekly Community Calls
### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/**
Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1
## 2025-09-03
**Attending:** Eric Perlman (EP), Justus Magin (JMa)
* JMa:
* sparse container for faster slicing / concatenation: https://github.com/keewis/sparse-indexing-container/
* fixed a roundtrip bug on BytesCodec: https://github.com/zarr-developers/zarr-python/pull/3417
* causes a bunch of failing tests in ArrayV3Metadata tests
* might be visible from the public API?
## 2025-08-20
**Attending:** Lachlan Deakin (LD), Ryan Abernathey (RA), Jeremy Maitin-Shepard (JMS), Davis Bennett (DB), Max Jones (MJ), Josh Moore (JM)
Special Session on [ZEP10](https://github.com/zarr-developers/zeps/pull/67)
- Lachlan presents https://zarrs.dev/slides/zarr_generic_extensions_20250820
- LD: Like the "extensions" *object* a lot
- JMS: from [issue](https://github.com/zarr-developers/zeps/pull/67#issuecomment-3207787301) we can use syntax like a `_` prefix to mark things as optional
- DB: roughly 10 verbs create/remove array/group, read/write chunks, metadata (subsubgroups are a separate question)
- what are we designing `must_understand` for, to tune the level granularity?
- Ryan
- Agenda (in 45 minutes)
- what ZEP10 should look like?
- alignment on `must_understand`
- extension field
- "registered attributes"
- use cases (what do we want)
- consolidated metadata
- groups declaring members
- subarrays (array under another array; relation to a parent grid)
- multiscales
- differently chunked copies (for differing access patterns)
- chunk aliases
- declaring separate source for the attributes
- icechunk (group storage transformer?)
- encrypted or DB-stored attributes
- discussion
- LD: consolidated metadata can benefit from constraints
- DB: zarr aware routine that copies an existing hierarchy to a new store; direct copying should be allowed
- JMS: if there's a relative path
- DB: the verbs
- create_array
- create_group
- remove_array
- remove_group
- update_array_metadata
- update_group_metadata
- read_chunk
- write_chunk
- read_array_metadata
- read_group_metadata
- RA: unevenly chunked array (chunk grid)
- DB: extensibility of the chunk grid is narrowly scoped. doesn't interact with anything else
- DB: top-level are unscoped
- LD: lets us look at new stuff (essentially 4.0)
- RA: chunk statistics (or as anotehr array within a group)
- all about updates, and who is responsible
- new opinion while developing icechunk. zarr is data at rest, spec only partially deals with consistency.
- JM: building up a language of constraints to unaware implementations
- DB: error is giving someone mutable access (cf. chunk statistic)
- JMS: not untrusted users, but missing a plugin or an old version. failing is useful
- however, tricky, since you don't want to necessarily update the multiscales and the statistics right way (i.e. implementation might not be able to maintain the )
- theoretical consistent maintenance might not be feasible
- DB: data model that has relationship between nodes and an API that doesn't, then it can be difficult.
- RA: everything is out the window during the non-zero window of time
- only solution is a transactional mechanism
- can declare what a dataset at rest looks like (up to implementation & store)
- RA: propose (semi-controversial)
- define all the above as attributes
- LD: fine until we hit something that can't deal with a silent failure
- JMS: example is forcing consolidated metadata to get updated
- fill_array
- MJ: consider not being able to handle cm
- formalize the relationship between zarr and icechunk
- don't over index on cm
- JM: on the list of things to avoid, false reads
- JMS: move to registered attributes?
- RA: include `must_understand` there?
- LD: no, because in 3.1 there is nothing about that.
- JMS: preventing writing is not that big of a deal, limited. reading is a bigger problem.
- RA: support for pushing functionality to attributes for a higher-level framework. may lead to data structure that requires synchronization
## 2025-08-20
**Attending:** Josh Moore (JMo), Justus Magin (JMa), Davis Bennett (DB), Jeremy Maitin-Shepard (JMS), Eric Perlman (EP), Gábor Kovács (GK)
- Rome
- DB: People signing up
- Limit on the developer days
- Discussing limit on the adopter days
- Sparse
- JMa: something that works :tada: (with modifications to zarr)
- changed Zarr to take the prototype from the codec
- array-to-array then array-to-bytes codec to see which NDBuffer it should take
- sparse codec needs a specific buffer
- not attached to it. (could use a global option)
- modified zarr to pass along the chunk size when creating a buffer
- not fast; spends most of its time indexing into the sparse array (CPU bound)
- DB: fixed with a faster sparse library?
- JMa: yes, and being smarter with number of operations
- DB: which type of memory is an unsolved issue; don't allow codec to publicize which memory
- i.e. have to handle an arbitrary NDBuffer
- cf. every store method needs a buffer to write into
- wider codec pipeline discussion
- JMS: tensorstore doesn't do GPU or sparse but ...
- think of it more as a top-down choice
- rather than depending on the metadata array, read op returns a type of buffer
- instead have an explicit API for reading sparse (type annotations can be stricter)
- DB: Zarr doesn't support any operation where dense vs. sparse matters
- people currently expect a dense numpy-like array
- JMS: modifying an existing array is the expensive bit
- JMa: screen shares with the zarr-python changes needed


- JMS: similar to GPU, you may want to do more in parallel
- DB: getting the parallel GPU story working is difficult
- CUDA has a different model of concurrency
- want a GPU specific pipeline that's smart about allocating memory and then batching with respect to that
- codec pipeline is the target for a lot of these optimizations
- cf. scverse/rust speed up is from having a smarter pipeline class
- JMo: wonder if we're missing an abstraction, "Operation", etc.
- DB: would most like to have Zarr arrays be composed of other Zarr arrays.
- chunks could negotiate that they can only go on certain devices
- everything is just a collection of chunks
- JMS: tensorstore thinking of an array as a set of operations
- can be decomposed into smaller arrays
- proposed chunkgrid as just being a top-level of a codec stack
- you could imagine concatentations of other arrays
- currently awkward. (codecs below the level of chunking)
- DB: what's the function signature of a codec. not currently defined.
- so e.g. we don't track endianess
- need to be formal about what an array is
- then it's clear what information the codec has access to
- JMS: do some form of propagation up and down the stack
- i.e. implementation specific (GPU, endianness, ...)
- DB: perhaps then each implementation needs to know in its own language
- JMS: what codecs attributes are dynamic and which are fixed?
- e.g. datatype and shape fixed. DB: there are dtype
- JMS: given once you have the config, not chunk to chunk
- JMS: in ts there's resolution where each codec says (forward & backward passes) given X I put out Y (dtype, memory layout, ...)
- DB: certainly not multiple passes in zarr-python. Added TODOs. Have a forward pass and rewrite the codecs dynamically based on the previous' advertised output
- JMo: listening to the conversation, I wonder if metadata needs specifying
- JMS: you need to do a resolution process, necessary information is there
- we specify the overall output as dtype, that gets propagated to the lowest level codec
- you could imagine the other way around, you just save what the lowest level stores and then calculate the top-level data type
- comes into play in imagecodecs. lot of complexity in how they store thing (RGB, channels, etc.; color spaces). if you want to read that in, the zarr data type for the array is a bit arbitrary
- same for endian conversation, could be implicitly propagated back up
- JMa: issue of storing metadata about the array can only be a codec property (e.g., the kind of the sparse array)
## 2025-08-06
**Attending:** Josh Moore (JMa), Eric Perlman (EP), Justus Magin (JMa), Davis Bennett (DB), Gábor Kovács (GK), Jeremy Maitin-Shepard (JMS)
- Find the meeting notes under https://zarr.dev/community-calls/
- Sparse (JMa)
- Trying to how to customize NDBuffer (which assumes numpy array representation)
- Possibly figured it out
- Creating a NDBuffer, have to pass the chunks to make a chunk grid
- Assignment to sparse array isn't a good idea
- Have an object-type numpy array to represent the grid
- Then concatenate only when you want the data
- Need to know the chunking for mapping the slicing
- some functionality in zarr-python that can be used
- not all changes pushed to zarr-sparse (lots of experimentation)
- performance metrics yet
- next part of the problem: bytes to compressed, then uncompress and put into sparse arrays
- so not yet
- nested codec? probably
- there will some rules (i.e. sparse must be first)
- anything to get the default codec?
- future of the repo? pip installable?
- haven't thought about it yet
- options: keep it separate or merge into zarr-python
- open to discussion
- JMo: open question of having a non-dense API (AMR, APR, multiscales, etc.)
- DB: array API? JMa: not all of them.
- DB: zarr-python currently takes all chunks and packs them into a numpy array. a mistake.
- JMa: *summary on NDBuffer from above*
- DB: separate sparse array per chunk? (Big change in business operation)
- JMa: with NDBuffer can control what is contained
- DB: don't want an object-type
- JMa: already translating between arrays and chunks
- DB: see indexing module (copied from zarr-python 2). rooms for improvement.
- Draft PRs welcome.
- ZEP10:
- Josh owes Jeremy a
- JMS: leave purely metadata in a separate realm
- DB: Examples?
- JMS: fill value as json array the broadcasts. motivates the PR.
- DB: Breaking change to the array model "soft zarr 4"
- JMo: difference between 3.x and 4.x
- DB: inviting fragmentation
- don't see how these leads to fragmentation
- stac doesn't describe behavior; chunk encoding describes a function. makes zarr function.
- JMS: adding a feature that *requires* implementation updates
- burden of reducing that is on the implementations
- DB: difficult to motivate the change; difficult it's an application.
- JMS: way to play/evolve the spec. you need a way to implement it (create your keys) that doesn't cause a problem
- "URL" following the spec today.
- doesn't seem like that big of an idea either way
- another example: inline array
- DB: take for granted that it is for an implementation that only some will be able to read
- alternative would be just to do that today
- JMS: possible examples: fill_value_array, offset, consolidated metadata
- DB: differentiating between metadata writing and chunk writing
- DB: don't make it hypothetical, more focused on solving the engineering problem
- EP: you set the bar really hard. have been bit in the foot on default fill values
- bar of nice to have is fine. (otherwise is a core feature)
- extension is the place for nice to have
- DB: v3 spec contained contradictions so making the bar high
- closing
- DB: suggest -- build everything about consolidated metadata and these others things
- JMS: seemed like most people are for metadata-only extensions
- danger of how implementations are implemented if attributes aren't registerable
- same time or handling attribute registration first?
## 2025-07-23
**Attending:** Ward Fisher (WF), Eric Perlman (EP), Davis Bennett (DB)
**Meeting Minutes:**
- DB: Discussion and quick demo of `uv` for managing python environments
- uv headers are now used in zarr-python issue template
- uv powers the current zarr2-tests in zarr3-python tests
## 2025-07-09
## 2025-06-25
**Attending:** Ward Fisher (WF), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)
**Meeting Minutes:**
- DB: https://github.com/zarr-developers/zarr-python/pull/2874 — got merged!
- Next steps:
- Fix the codecs situation — unify them
- Stop depending on numcodecs
- Variable size chunks
- Other things:
- Indexing issues
- NVIDIA folks are interested in how GPU can be added into Zarr-Python codebase
- SV: New data type addition is also included in DB's PR
- DB: Also been working on the new data types stuff
- WF: Unidata is back in operation. We are also collaborating with DKRZ (German Climate Computing Center) on some proposed `ncZarr` work they would like to see (consolidated metadata, amongst other things).
- There have been questions from DKRZ devs re: functionality in various Zarr Python packages that deviate from the specification (v2, specifically).
- DB: Having code samples in the Zarr-Python repository, using PEP723
- DB: Found a funny bug in Zarr-Python 2
## 2025-06-11
**Attending:** Josh Moore (JM), Gábor Kovács (GK)
- GK: any obvious issues to get involved on.
- JM: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/zarr.2Eload.20deletes.20data/with/522616083
- Other suggestions welcome.
- People still interested in the meeting? New time? Monthly?
## 2025-05-28
**Attending:** Josh Moore (JM)
- No show.
## 2025-04-16
**Attending:** Josh Moore (JMo), Eric Perlman (EP), Justus Magin (JMa), Gábor Kovács (GK)
**TL;DR:**
**Updates:**
**Meeting Minutes:**
- JMa: "signed URLs"
- EP: looking at them for raw data
- e.g. HTTP parameters NOT the AWS thing
- JMo: bug? JMa: Zarr is oversimplifying the use of paths
- Planetary computer pulls an access token, appends to any URL (Zarr or Geo-TIFF or ...)
- EP: would try hacking it in in a little fsspec wrapper
- JMa: use-case -- trying to access something S3-like that needs parameters
- JMo: does obstore "do the right thing"?
- JMa: Kyle has something that will work for planetary computer, but that's just one endpoint
- EP: with shards can almost use *proper* signed URLs
- JMo: will likely require Virtualizarr
- also: https://earthmover.io/blog/announcing-flux
- JMa: sparse arrays
- looked at binsparse
- which decomposes into one dimensional arrays
- another level of nesting
- JMo: good format but need library support like `.chunks`. need to be aware of the metadata.
- JMa: encoding the sparseness **per chunk**?
## 2025-04-02
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Eric Perlman (EP), Jeremy Maitin-Shepard (JMS), Michael Sumner (MS)
**TL;DR:**
**Updates:**
- Version policy update: https://github.com/zarr-developers/zarr-python/pull/2910
- Blog post PR: https://github.com/zarr-developers/blog/pull/67
- Obstore-base store implementation PR has landed: https://github.com/zarr-developers/zarr-python/pull/1661
- Zarr-Python V2 Support release [2.18.5](https://github.com/zarr-developers/zarr-python/releases/tag/v2.18.5) took place last week! Thanks, David!
**Meeting Minutes:**
- DB: Working to add support for V3 and Tensorstore in [Pydantic Zarr](https://github.com/zarr-developers/pydantic-zarr)
- Also to add group support in Pydantic for Tensorstore
- Appreciate the results by reading and writing in Tensorstore, i.e. returns an object
- DB: _elaborates on the version policy change_
- DB: [Effver](https://jacobtomlinson.dev/effver/)—mostly a function of efforts put in by the users
- JMS:
- DB: https://github.com/zarr-developers/numcodecs/issues/686—formalise old and new styles of JSON serialisation
- DB: Numcodecs doesn't interoperate well with Zarr-Python, also there's code in Cython which only handful of folks can maintain
- MS: There have been great developments in the Zarr ecosystem but things have been moving so fast that I worry it will start proliferate. It's difficult to keep track of all the things
- GDAL: https://lists.osgeo.org/pipermail/gdal-dev/2025-April/060414.html
- List of EOPF product samples publicly available from the EOPF s3 public bucket: https://eopf-public.s3.sbg.perf.cloud.ovh.net/product.html
- https://cpm.pages.eopf.copernicus.eu/eopf-cpm/main/PSFD/4-storage-formats.html
- EP: Most of the Jackson Lab data is in V3 sharded effort
- EP helped in conversion and Eric Ratamero lead the effort
## 2025-03-19
**Attending:** Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS)
**TL;DR:**
**Updates:**
**Open agenda (add here 👇🏻):**
- AR had GSoC questions
- https://github.com/zarr-developers/zarr-python/pull/2910 — atleast social media announcement would be good, blog post ++ (more the merrier)
- DB: Dtype addition to Zarr-Python (https://github.com/zarr-developers/zarr-python/pull/2874)
- JMS: Would the data type be mapped to 1-1?
- DB: Currently support NumPy and CuPy datatypes
- DB: More tests needed to handle endianess, need to change the API too
- JMS: Norman registered new codecs—how are they gonna work in Zarr-Python 3?
- DB: We haven't made a final decision on that
-
## 2025-02-19
**Attending:** Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP)
**TL;DR:**
**Updates:**
- Zarr participating in GSoC this year, ideas list: https://github.com/numfocus/gsoc/blob/master/2025/ideas-list.md
- ZEP9 Draft published — https://zarr.dev/zeps/draft/ZEP0009.html.
- Follow up PRs in zarr-specs:
- https://github.com/zarr-developers/zarr-specs/pull/330
- https://github.com/zarr-developers/zarr-specs/pull/331
- Setting up https://zarr.dev/extensions
- EP: Jackson lab will be converting the datasets to sharded Zarr V3 arrays
- SV: https://2025.pycon.de/talks/ABWHSD/ — speaking at PyCon DE 2025!
**Open agenda (add here 👇🏻):**
- MS using Pizzarr for their work
- GDAL and Pizzarr work for virtual references as well
- Has datasets in HDF5 and NetCDF - both has their pecularities
- JM: _gives an overview of ZEP9_
- EP: Jackson lab data conversion
- JM: Any benefits in performance?
- EP:
- EP: Sticking with OMERO 5D arrays
- MS: Public link for the data
- EP: https://images.jax.org/webclient/userdata/?experimenter=-1 (can be added https://zarr.dev/datasets)
- JM: could also add https://ome.github.io/ome2024-ngff-challenge/
- JM: Zarrs (Rust implementation) would be useful
- EP: Solely using OMERO but could pass the URL to Neuroglancer to view it
- EP: Cloudflare is potentially working with OS projects and giving them resonable tier prices
- EP: Raw 10TB nbytes - how do you convert it to sharded V3 array?
- DB: Using np.mempap might be useful
- JM: Can also use Kerchunk
- JM: Can use Tensorstore to convert the data as well
- DB: Could potentially use Zarr-Python - but is slower 10x slower
- JM: https://github.com/LDeakin/zarr_benchmarks
- JM: Satra ran into memory issues with Tensorstore: https://github.com/ome/ome2024-ngff-challenge/issues/83
- JM: Best way to shard large arrays - a good GSoC project
- JM: https://github.com/asdf-format/asdf-standard
- https://github.com/asdf-format/asdf-zarr
- EP: Zarr being used in bio space - Folks at the Allen are looking to submit a proposal at SciPy 2025
- JM: Francesc Alted did a nice presentation at SciPy 2023: https://youtu.be/0GX5nDqUUZE?si=WvE6asx5zjtrBcHI
## 2025-02-05
Notes TBA
## 2025-01-22
**Attending:** Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB)
**TL;DR:**
**Updates:**
- Zarr-Python 3 released on January 9th, 2025!
- Blog post: https://zarr.dev/blog/zarr-python-3-release/
**Open agenda (add here 👇🏻):**
- N5
- JM: There'll be new release to add support for N5 in Zarr-Python 3
- EP: They can leverage sharding and other useful features
- Zarr-Python 3
- DB: Gave a presentation on Zarr-Python 3 at Allen Institute
- DB: Realised some issues in Zarr-Python 2 when listing groups
- JMS: Because ZP 2 listing processes were taking place parallely
- DB: In Zarr V2 spec there's nothing says that groups and arrays are different
- JMS: Looked at the spec as well as the implementation when working with my implementation
- JMS: Added support for ZEP8 in Tensorstore
- PR: https://github.com/google/neuroglancer/pull/696
- JM: Too many files, JMS: Includes linting changes and test files
- JMS: The path resolution for Zarr V3 in Neuroglancer: https://host/path/to/n5/group/|n5:path/to/array/ would look for array and not go up in the group directory
- JM: The searching is mostly top down in OME land but we should work towards more completeness
- JMS: Also, planning to add Icechunk support to Neuroglancer
- Discussion on URL for Neuroglancer
- Deciding the right characters to use
- Tricky to decide the right URL
## 2025-01-09
**Attending:** Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard
**TL;DR:**
**Updates:**
- Happy New Year! :clinking_glasses:
- Zarr-Python [v3.0.0-rc.1](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.1) and [rc.2](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.2) out now! Full release coming up tomorrow!
- Zarr has a wikipedia page now! — https://en.wikipedia.org/wiki/Zarr_(data_format)
- A group of Zarr-Python devs are at AMS next week including Joe and Ryan
- CFP for SciPy 2025 are open: https://www.scipy2025.scipy.org/
**Open agenda (add here 👇🏻):**
- EP: the month wait was good to get other projects like napari up-to-speed
- DB: reached out to people using n5 in python. They weren't pinning to `zarr-python<3`. Sent an email. No response. EP anyone? No. Using Zarr.
- JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it.
- TN: Kerchunk doesn't support zarr-python 3.x (API usage)
- without kerchunk: fits & netcdf won't work.
- lose access to anything in the future (in-progress HDF4)
- JH: requires rethinking of MultiZarrToZarr logic
- TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects)
- JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec
- JM: I'll get the ZSC to respond on the longing issues
- DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it)
- DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3
- JMS: Planning to add Icechunk support to Tensorstore
## 2024-12-11
**Attending:** Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev
- Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman!
- A group of Zarr-Python devs are at AGU this week including Joe and Ryan
- Zarr-Python V3 release before holidays!
- DB has bunch of PRs coming in soon!
- Planning to expose to sharding in a user friendly way
- WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata
**Open agenda (add here 👇🏻):**
- Intros w/ favourite places for holidays
- Sanket — into Himalayas
- Ward - near Colorado
- Davis — Italy
- Eric - coming to India soon!
- Gábor — Canada for Skiing
- Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data
- DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape?
- JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult
- JMS: Could also be based on user preference
- DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it?
- JMS: The non-regular/partial chunks would not compose across shards
- DB: I see! The proposal is off the table then!
## 2024-11-27
**Attending:** Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- Zarr-Python V3 release in first week of December
- New numcodecs release (includes fixes and improvements)
- Check here: https://github.com/zarr-developers/numcodecs/releases/tag/v0.14.1
- Zarrs-Python:
- Check: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Announcing.20zarrs-python!
**Open agenda (add here 👇🏻):**
- DB: OME-NGFF hackathon update:
- They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it!
- https://github.com/BioImageTools/ome-zarr-models-py
- EP: John Bogoviç made good progress on Zarr Java (in the NGFF land)
- FIJI being able to open Zarr V3
- DB: `zarr.open()` and `zarr.create()` are confusing - instead we could have `zarr.create_array()` or `zarr.create_group()` to make things clear
- DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine
- JM: Would be cool if `zarr.open()` could figure out if it's an array or a group
- Zarrs-Python
- JM: The nomenclature could've been better! A bit confusing.
- DB: Major scope is to improve the IO
- JM: https://bsky.app/profile/zarr.dev
- Follow us!
- https://github.com/zarr-developers/zarr-specs/pull/311
- JMS: The only part which requires spec is the URL
- JM: I care about the internal directory structure
- DB: The folks who I spoke at the OME hackathon they want the drag and drop feature
- JM: In ZP land
## 2024-11-13
**Attending:** Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
**Open agenda (add here 👇🏻):**
- meetings (Josh)
- conversation
- zarr-python: going strong (weekly)
- ZEP: (not great for Dennis)
- one off as necessary
- community: combine into ZEP.
- or vice versa
- office hours: likely to end
- :point_right: run a doodle
- People
- Dennis: not before 10am MST
- Jeremy: ZEP meetings are less critical at the moment
- Decisions/TODOs
- Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones)
- Josh: remove ZEP calendar entry
- Josh: update on Zulip (anywhere else?)
- Davis: was "zarr.json" a mistake?
- Josh: good question. benefits were:
- only one GET (or HEAD) rather than needing a frequent 404
- non-hidden files with proper file-ending
- Davis: true. just have a pattern now where want to iterate and bottom out on arrays
- Jeremy: often need to load the json anyway
- or storage transformers that are needed
- Dennis: preference is to have the directories marked (e.g., in the name)
- price is mostly paid with large numbers of groups/arrays
- cf. consolidated metadata -- locating all objects before reading them
- maybe be able to recognize that by name
- Davis: give it its own document?
- Josh: like `.zmetadata`; downside is it requires an extra GET but that's ok.
- Davis: using `_nc` for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues)
- Josh: could additionally gzip the extra file. benchmarking?
- Dennis: how big can get the total metadata? (in characters)
- Davis: maybe tabular data.
- Dennis: in NetCDF, lot of use of groups (1000s) as namespaces.
- Davis: Store API
- getters take memory type (GPU, CPU, ...)
- Josh: good to track (or disallow?) copies
- Jeremy: most Stores are CPU, so actively copying for GPU.
- Davis: Separate stores as in v2 (regular and chunk)
- Davis: Store is simple key/value. Agnostic to Zarr formats.
- Is the Store API overloaded?
- Davis: On extra files, an extension where sqlite for every group and array. Good for tabular.
- Jeremy: sqlite doesn't work for cloud storage.
- What stops people from doing it today?
- Prototype
- Is this icechunk? That's more a Store API.
- Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead.
- Josh: duckdb?
- Davis: see BigStitcher's use of arrays.
- Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing.
- Davis: GeoParquet has a spatial index
- https://github.com/opengeospatial/geoparquet/pull/191
- Theodoros: interested in adopting Zarr
- Problem is that we're dealing with really sparse datasets (mass spec imaging).
- Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign.
- Efficient encoding of a single sample and "plugged into" zarr.
- TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels)
- JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours.
- other step would be full spare support. ton of people have asked for this. but has to be woven throughout.
- Davis: tell us what doesn't work for you. "we want to use Zarr but ..."
- Josh: https://github.com/GraphBLAS/binsparse-specification
- Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array"
## 2024-10-30
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests
- JMS: Back from the parental leave — the baby is doing great! :tada:
- Been working on bugs for tensorstore
- SV: GeoZarr spec meetings have been updated on the community calendar
**Open agenda (add here 👇🏻):**
- Frequency of the meetings
- DB: No strong feelings
- JMS: Less activity, so make sense
- GB: Fine by me
- DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful
- SV: Will open a wide discussion for the community to get everyone thoughts
- DB: Zarr V2 arrays using sharded arrays?
- JMS: Not simple enough to do that because of overlapping arrays
- DB: Zarr V2 codecs can utilise sharding codec
- JMS: The JSON metadata is differ for sharding
- JMS: Who's the user base?
- DB: Someone who's using Zarr V2 and want to use sharding
- DB: People might be scared of switching to a new format
- DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference
- JMS: The idea of community and core codecs is not super impressive!
- DB: Would be good to avoid namespacing issues
- JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core?
- DB: Good question, need to come up with a process for this
- JMS: Adding a vendor name could work — value in having a vendor name
- Discussions on upcoming possible extensions
## 2024-10-16
**Attending:** Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA)
**TL;DR:**
**Updates:**
- The default branch has been changed back to `main` to prepare for V3 main release - https://github.com/zarr-developers/zarr-python/pull/2335
- Numcodecs 0.13.1 release soon - https://github.com/zarr-developers/numcodecs/pull/592
- VirtualiZarr has a dedicated ZulipChat channel now - https://ossci.zulipchat.com/#narrow/stream/461625-VirtualiZarr
- Check VirtualiZarr repo: https://github.com/zarr-developers/VirtualiZarr
- New OS project release by Earthmover - https://earthmover.io/blog/icechunk
- Transactional storage engine for ND array data on cloud object storage
- Zarr-Python V3 updates
- Any other updates?
**Open agenda (add here 👇🏻):**
- Intro w/ favourite food
- Sanket - Dumplings
- Joe - Burrito
- Eric - Donuts
- Davis - Ethopian, Mexican and Indian dishes
- Michael - RSE at Australian Antartcic Division - Burgers
- Ilana - Works at Earthmover
- Daniel - NCAR
- JH: _starts screen sharing_
- JH: Presents on Earthmover, Arraylake, Icechunk...
- JH: _presentation ends_ — time for questions
- DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask?
- JH: Fetching is done by a different library - we're handling the concurrency better on the IO side
- DB: What lessons could be take from this plot that can be applied to Zarr-Python?
- JH: Python binding to rust crate needs to be looked at
- JH: Doing decompression and IO in a relieved fashion
- SV: Does Icechunk works with Zarr V2?
- JH: Only with V3 for some parts - but we can change that
- EP: Able to leverage Zarr sharding in some way for Icechunk would be great
- JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec
- JH: Implies sharding in a different manner
- DB: How coupled are you with the current Zarr V3 API?
- JH: Highly coupled
- JH: LDeakin has started filling issues
- JH: Can envision a high-level and a low-level store - that's what we build in the rust store
- JH: We should ask store to do more, but we should be specific about it
- MS: Really interested in Rust implementation - Does Rust part take over the encodings?
- JH: No. We haven't implemented all of ZP yet
- JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around
- SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you
- **Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!**
- DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays!
- JH: Try to get sharded V2 data to work, and let us know!
## 2024-10-02
**Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP)
**TL;DR:**
**Updates:**
- Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684
- We had a documentation sprint for Zarr-Python V3
- The doc sprint officially ended on 10/1 evening. The participants have sent PRs to document the `zarr.array` and `zarr.storage` modules. Here are the open PRs:
- https://github.com/zarr-developers/zarr-python/pull/2276
- https://github.com/zarr-developers/zarr-python/pull/2279
- https://github.com/zarr-developers/zarr-python/pull/2281
- Zarr-Python V3 team good progress — alpha release every week — V3 main release soon!
- Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass
- OME Challenge
- EP: Was able to convert a big JAX datasets into V3
- JM: Ran into issues and was able to convert them into Zarr-Python V3 issues
- More discussion down below
- Any other updates?
**Open agenda (add here 👇🏻):**
- OME Challenge
- EP: JAX ran into issues for remote access and it's good to point them out and later rectify that
- EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets
- EP: Checking if an object exists before a write could cost us $2k!
- EP: JZarr is currently being written
- DH: Any decisions about deleting objects?
- EP: When you check for existing objects, you have the ability to rewrite them
- DH: That means you can delete it!
- DH: NetCDF implements a recursive delete operation
- EP:
- DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them
- DH: Having consolidated metadata help in rewriting operations
- DB: Defining schema and knowing the entire hierarchy has been helpful
- DH: We have this in NetCDF
- DB: https://github.com/janelia-cellmap/pydantic-zarr/
- Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/
- JM: Will need to read this
- DB: Seems good enough and harmless
- Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/
- JM: Sharing password has been a challenge
- DB: https://github.com/zarr-developers/zarr-specs/pull/312
- JM: Need to merge on https://github.com/zarr-developers/governance/pull/44
- JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes
- DH: Diversity defined on the structure of the internal architecture and not programming language implementation
## 2024-09-18
**Attending:** Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP)
**TL;DR:**
**Updates:**
- Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct.
- Identify missing docs and start creating issues
- Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc
- Async working via Zoom meetings
- MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation
- Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27
- Updates from Zarr-Python V3 effort / OME challenge
- DB: Getting through issues from V2 and V3 compatibility
- Tom and Deepak taking care of Dask issues
- Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history
- Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined
- Main release by the end of year
**Open agenda (add here 👇🏻):**
- DB: https://github.com/zarr-developers/zarr-python/issues/2170
- The way of defining sharding codec is not intuitive and can be improved
- https://github.com/zarr-developers/zarr-python/pull/2169 - proposed solution
- DB: Will update this PR and make it ready for review
- DB: All stores should have cache: https://github.com/zarr-developers/zarr-python/issues/1500
- EP: Some stores like S3 would benefit from this
- EP: Compression and decompression on cache is expensive
- DB: We can default it to 0 turn it on accordingly
- DB: FSSpec have a default cache enabled - we can look into it
- EP: Will try to join the Zarr-Python core devs meetings on Friday
- SV: Early morning for west coast
- EP: Can make it!
- SV: Early morning stuff: presented on Zarr V3 at EuroBioc: https://eurobioc2024.bioconductor.org/abstracts/paper-bioc4/
## 2024-09-04
**Attending:** Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP)
**TL;DR:**
**Updates:**
**Meeting Minutes**:
* Consolidated v2
- DH: annex for v2, "officially"/loose recognized (would be a **great** favor)
- JM: and if we put it in v3 to say, "this is the former version"?
- DH: add a forward pointer
- EP: how many edge cases?
* Deprecation (DH)
- JM: No plan to deprecate v2 format (format vs. library)
- DH: presumably people will use the new library, that will be the "test" of the consolidated metadata.
* Bugs between implementations (JM)
- DH: list of those bugs? JM: no, bug good idea.
- DH: available data?
- JM: yes! see https://github.com/ome/ome2024-ngff-challenge?tab=readme-ov-file#challenge-overview
- EP: billions of objects isn't fun.
* Consolidated v3
- JM: pushed recently at zarr-python meeting for a spec (and with more design)
- DH: as soon as it's in the format, then it's not just caching
- metadata caching prevents multiple reads
- DH: caching -> "big set of objects, keeping subset in memory"
- JM: can be re-created? "index"?
- DH: regardless, have to specify construction any block of JSON
- could say a subtree looks like some other pre-defined block
- JM: parameterized MetadataLoader (or "MetadataDriver")
- DH: that's what I was going to implement anyway
- DH: like StorageDrivers (not caching) -- "VirtualAccess"
- DH: but same wave length
- JM: would like to offload some JSON (speed vs size)
- DH: should that API do more than read/write the JSON?
- should it interpret it?
- "give me key X out of this dictionary"
- JM: like mongodb or jq queries
- DH: walk binary without needing to convert down to JSON
- EP: this gets back to N5 as an API rather than a format
- logical versus storage
- DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP)
- essential to have some virtual object/class
- hammer applied to everything ("common API")
* EP: https://github.com/zarr-developers/zarr_implementations
- why didn't that find the codec issues?
- JM: no v3!
- EP: hackathon?
- EP: need mozilla support for HTML things
- DH: agreed. hugely important
- JM: As a github action?
## 2024-08-21
**Attending:** Eric Perlman (EP) and Sanket Verma (SV)
**TL;DR:**
**Updates:**
- Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093
- Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence
- Welcome David Stansby as core dev of Zarr-Python! 🎉
- https://github.com/zarr-developers/zarr-python/pull/2071
- Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see [here](https://github.com/zarr-developers/zarr-python/commits/v3/?since=2024-08-08&until=2024-08-21)
**Open agenda (add here 👇🏻):**
- EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore
- Repo: https://github.com/ome/ome2024-ngff-challenge
- EP: Jackson lab will be utilising the docker for converting V2 to V3 data created by EP
- SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places!
## 2024-08-07
**Attending:** Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN)
**TL;DR:**
**Updates:**
- Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy
- Zarr-Python updates
- DB: There have been some movements in ZP
- Discussion around the new API: https://github.com/zarr-developers/zarr-python/discussions/2052
- Chunks, shards, and other terminology - need to decide what to use
- Getting more active core-devs for ZP will help in having lively discussion
- TN: Applying for money to work on VirtualiZarr / Zarr upstream
- TN: Development Seed is applying for the NASA grant - Julia Signell would work on it
- DB: Non-zero origin for Zarr arrays would help
- Related issue: https://github.com/zarr-developers/zarr-specs/issues/122
**Open agenda (add here 👇🏻):**
- EP: Discussion about cycling, library and picking up books from library and reading them in the park! :bicyclist: :books:
- EP/DB: Tangent on write directly to sharding from microscopes...
- TN: Various ways of storing the large metadata for a huge Zarr array
- TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed
- TN: Seems important because it has come up every time there's a discussion about scalability
- DB: Store the aggregrated information in the header of the chunk
- SV: How doe BSON scale as compared to JSON?
- TN: We would still need to have a pointer to the BSON in JSON
- DB: How do we introduce it to the Zarr V3 Spec?
- TN: Maybe a convention
- TN: Zarr is close to be a superformat!
- DB: We could also increment the spec to a major version to include the change
- TN: Discussions on if its possible for Zarr to be a _superformat_!
- TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that
- DB: A fundamental Zarr array could be a set of small Zarr arrays
- TN: VirtualiZarr basically does that
- TN: _starts screen sharing_
- DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner
- TN: Hopefully I'd be able to work on this after VirtualiZarr
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages
## 2024-07-24
**Attending:** Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN)
**TL;DR:**
**Updates:**
- SciPy 2024 was great! 🎉
- DB: Zarr-Python updates
- Sharding codec is pickleable
- Decision need to made about array API
- How sharding codec should look like to the user?
- DB: Easy to find if your array is sharded
- JM: Partial reading this in Zarr V2
- TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr
- DB: All the arrays should have sharding configuration
- JM: Working with Tensorstore, the order of codecs didn't matter --> read_chunks / write_chunks
- DB: some weirdness when it comes to different backends when uncompressed
- New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan!
- New codec added - Pcodec
- JM: Conda is unhappy
**Open agenda (add here 👇🏻):**
- Intros
- SV: Yosemite National Park
- JM: National Seashore in Florida - Gulf of Mexico
- FC: Jackson Lab working in ML - Saccida National Park
- EP: Zayn National Park
- WF: Yellowstone National Park
- DB: Yellowstone National Park
- TN: Want to open issues on bunch of ideas
- 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this
- Can use Zarr to open NetCDF directly
- 2. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays
- Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081
- DB: Could be handled and should be a priority now
- TN:
- JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct!
- JM: Talked to 2 Napari maintainers - on a problem of chunking
- TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that
- JM: Finding more people with similar interests would help us provide more engineering power
- DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase
- TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices
- DB: Slicing and concatenation are duals - if you have both its complete
- DB:
- JM: Query optimisation can be tweaked as we move forward
- TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader
- JM: What does user do with the plan? Do they do something with it?
- TN: Array API folks has deliberately made arrays lazy
- GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041
- GitHub and Cirun sounds good and easy to setup
- Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF
- NF has money reserved for projects in the infrastructure committee for similar costs
- JM: Good to have it!
- SV: Need to get it sooner that later
- Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F
- JM: My poster was cited multiple times in the last few weeks
- JM: JOSS is a potential venue - IETF is more work
- TN: Submitting to a computing journal - W3C, IEEE, etc.
- TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148
- JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html
- **TABLED**
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages
## 2024-07-10
**Attending:** Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC)
**Updates:**
- SciPy! :tada:
- Josh: testing zarr v3
- issue for each problem? Davis: sure
- Davis: to be fixed:
- no validation of fill value
- multiple bugs with sharding: 1d
- Josh: missing "attributes"
- Josh: but neuroglancer working?
- Davis: not for all static file servers. need PR.
- Davis: various forks. Josh: plugins? Davis: tough
- or: neuroglancer as a component that can be embedded
- Janelia NG is a React component.
- "Visualization is tough."
- Motion for food :knife_fork_plate: Seconded.
## 2024-06-26
**Attending:** Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)
**TL;DR:**
**Updates:**
- Zarr-Python 3.0.0a0 out
- https://pypi.org/project/zarr/3.0.0a0/
- Good momentum and lots of things happening with ZP-V3 - aiming for mid July release
- SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466
- R users at bio-conductor looking to develop bindings for ZP-V3
- New blog post: https://zarr.dev/blog/nasa-power-and-zarr/
- ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819
- https://dynamical.org/ - making weather data easy and accessbile to work with
- Check: https://dynamical.org/about/
- Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV
**Open agenda (add here 👇🏻):**
- BP: Will be talking about how Zarr is utilised at NASA!
- _starts screen sharing and presenting_
- BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - **not representing all the data centres**
- BP: Lot of people are coming into Zarr from the SMD (Science mission directorates)
- BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management
- BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use
- BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats
- BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB!
- BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse!
- BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range
- BP: Over at NASA, we do many Zarr stuff...
- Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS)
- Giovanni in cloud - duplicates Zarr (variable based)
- Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data
- ZEP0005
- Brianna is leading the GeoZarr work
- VEDA - no. of things Zarr/STAC related going on in VEDA
- TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?)
- BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes!
- TN: From the slides - Xarray can take care of some of the stuff that Giovanni does
- TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake
- BP: LakeFS is OS ArrayLake - no vendor lock-in
- SV: What does Giovanni actually do when you say, ‘it grooms metadata’?
- BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services
- BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata
- TN: Can we do the flipping with Zarr/VirtualiZarr?
- DB: If you flip at the store level - you'd need to find out the how deep you'd need to go
- BP: Will try to make time standard across the datasets
- BP: https://github.com/briannapagan/quirky-data-checker
- BP: _from the Zoom chat_
- Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices
- Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True
- Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data
- ZEP 0005: Zarr accumulation extension for optimizing data analysis
- Looking into a GIS service for zarr stores
- POWER https://power.larc.nasa.gov/data-access-viewer/
- https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
- https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7
- EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful
- DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document
- DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature
- EP: The bio and geo world have parallel tracks and working in silos
- EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363
- DB: The challenge doesn't seems interesting to me! - convering `JSON`s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem
- EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr