owned this note
owned this note
Published
Linked with GitHub
---
tags: zarr, Meeting
---
# Zarr Bi-weekly Community Calls
### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/**
Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1
## 2024-11-13
**Attending:** Davis Bennett, Eric Perlman, Josh Moore, Dennis Heimbigner, Jeremy Maitin-Shepard
**TL;DR:**
**Updates:**
**Open agenda (add here 👇🏻):**
- meetings (Josh)
- conversation
- zarr-python: going strong (weekly)
- ZEP: (not great for Dennis)
- one off as necessary
- community: combine into ZEP.
- or vice versa
- office hours: likely to end
- :point_right: run a doodle
- People
- Dennis: not before 10am MST
- Jeremy: ZEP meetings are less critical at the moment
- Decisions/TODOs
- Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones)
- Josh: remove ZEP calendar entry
- Josh: update on Zulip (anywhere else?)
- Davis: was "zarr.json" a mistake?
- Josh: good question. benefits were:
- only one GET (or HEAD) rather than needing a frequent 404
- non-hidden files with proper file-ending
- Davis: true. just have a pattern now where want to iterate and bottom out on arrays
- Jeremy: often need to load the json anyway
- or storage transformers that are needed
- Dennis: preference is to have the directories marked (e.g., in the name)
- price is mostly paid with large numbers of groups/arrays
- cf. consolidated metadata -- locating all objects before reading them
- maybe be able to recognize that by name
- Davis: give it its own document?
- Josh: like `.zmetadata`; downside is it requires an extra GET but that's ok.
- Davis: using `_nc` for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues)
- Josh: could additionally gzip the extra file. benchmarking?
- Dennis: how big can get the total metadata? (in characters)
- Davis: maybe tabular data.
- Dennis: in NetCDF, lot of use of groups (1000s) as namespaces.
- Davis: Store API
- getters take memory type (GPU, CPU, ...)
- Josh: good to track (or disallow?) copies
- Jeremy: most Stores are CPU, so actively copying for GPU.
- Davis: Separate stores as in v2 (regular and chunk)
- Davis: Store is simple key/value. Agnostic to Zarr formats.
- Is the Store API overloaded?
- Davis: On extra files, an extension where sqlite for every group and array. Good for tabular.
- Jeremy: sqlite doesn't work for cloud storage.
- What stops people from doing it today?
- Prototype
- Is this icechunk? That's more a Store API.
- Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead.
- Josh: duckdb?
- Davis: see BigStitcher's use of arrays.
- Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing.
- Davis: GeoParquet has a spatial index
- https://github.com/opengeospatial/geoparquet/pull/191
- Theodoros: interested in adopting Zarr
- Problem is that we're dealing with really sparse datasets (mass spec imaging).
- Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign.
- Efficient encoding of a single sample and "plugged into" zarr.
- TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels)
- JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours.
- other step would be full spare support. ton of people have asked for this. but has to be woven throughout.
- Davis: tell us what doesn't work for you. "we want to use Zarr but ..."
- Josh: https://github.com/GraphBLAS/binsparse-specification
- Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array"
## 2024-10-30
**Attending:** Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS)
**TL;DR:**
**Updates:**
- DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests
- JMS: Back from the parental leave — the baby is doing great! :tada:
- Been working on bugs for tensorstore
- SV: GeoZarr spec meetings have been updated on the community calendar
**Open agenda (add here 👇🏻):**
- Frequency of the meetings
- DB: No strong feelings
- JMS: Less activity, so make sense
- GB: Fine by me
- DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful
- SV: Will open a wide discussion for the community to get everyone thoughts
- DB: Zarr V2 arrays using sharded arrays?
- JMS: Not simple enough to do that because of overlapping arrays
- DB: Zarr V2 codecs can utilise sharding codec
- JMS: The JSON metadata is differ for sharding
- JMS: Who's the user base?
- DB: Someone who's using Zarr V2 and want to use sharding
- DB: People might be scared of switching to a new format
- DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference
- JMS: The idea of community and core codecs is not super impressive!
- DB: Would be good to avoid namespacing issues
- JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core?
- DB: Good question, need to come up with a process for this
- JMS: Adding a vendor name could work — value in having a vendor name
- Discussions on upcoming possible extensions
## 2024-10-16
**Attending:** Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA)
**TL;DR:**
**Updates:**
- The default branch has been changed back to `main` to prepare for V3 main release - https://github.com/zarr-developers/zarr-python/pull/2335
- Numcodecs 0.13.1 release soon - https://github.com/zarr-developers/numcodecs/pull/592
- VirtualiZarr has a dedicated ZulipChat channel now - https://ossci.zulipchat.com/#narrow/stream/461625-VirtualiZarr
- Check VirtualiZarr repo: https://github.com/zarr-developers/VirtualiZarr
- New OS project release by Earthmover - https://earthmover.io/blog/icechunk
- Transactional storage engine for ND array data on cloud object storage
- Zarr-Python V3 updates
- Any other updates?
**Open agenda (add here 👇🏻):**
- Intro w/ favourite food
- Sanket - Dumplings
- Joe - Burrito
- Eric - Donuts
- Davis - Ethopian, Mexican and Indian dishes
- Michael - RSE at Australian Antartcic Division - Burgers
- Ilana - Works at Earthmover
- Daniel - NCAR
- JH: _starts screen sharing_
- JH: Presents on Earthmover, Arraylake, Icechunk...
- JH: _presentation ends_ — time for questions
- DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask?
- JH: Fetching is done by a different library - we're handling the concurrency better on the IO side
- DB: What lessons could be take from this plot that can be applied to Zarr-Python?
- JH: Python binding to rust crate needs to be looked at
- JH: Doing decompression and IO in a relieved fashion
- SV: Does Icechunk works with Zarr V2?
- JH: Only with V3 for some parts - but we can change that
- EP: Able to leverage Zarr sharding in some way for Icechunk would be great
- JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec
- JH: Implies sharding in a different manner
- DB: How coupled are you with the current Zarr V3 API?
- JH: Highly coupled
- JH: LDeakin has started filling issues
- JH: Can envision a high-level and a low-level store - that's what we build in the rust store
- JH: We should ask store to do more, but we should be specific about it
- MS: Really interested in Rust implementation - Does Rust part take over the encodings?
- JH: No. We haven't implemented all of ZP yet
- JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around
- SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you
- **Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!**
- DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays!
- JH: Try to get sharded V2 data to work, and let us know!
## 2024-10-02
**Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP)
**TL;DR:**
**Updates:**
- Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684
- We had a documentation sprint for Zarr-Python V3
- The doc sprint officially ended on 10/1 evening. The participants have sent PRs to document the `zarr.array` and `zarr.storage` modules. Here are the open PRs:
- https://github.com/zarr-developers/zarr-python/pull/2276
- https://github.com/zarr-developers/zarr-python/pull/2279
- https://github.com/zarr-developers/zarr-python/pull/2281
- Zarr-Python V3 team good progress — alpha release every week — V3 main release soon!
- Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass
- OME Challenge
- EP: Was able to convert a big JAX datasets into V3
- JM: Ran into issues and was able to convert them into Zarr-Python V3 issues
- More discussion down below
- Any other updates?
**Open agenda (add here 👇🏻):**
- OME Challenge
- EP: JAX ran into issues for remote access and it's good to point them out and later rectify that
- EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets
- EP: Checking if an object exists before a write could cost us $2k!
- EP: JZarr is currently being written
- DH: Any decisions about deleting objects?
- EP: When you check for existing objects, you have the ability to rewrite them
- DH: That means you can delete it!
- DH: NetCDF implements a recursive delete operation
- EP:
- DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them
- DH: Having consolidated metadata help in rewriting operations
- DB: Defining schema and knowing the entire hierarchy has been helpful
- DH: We have this in NetCDF
- DB: https://github.com/janelia-cellmap/pydantic-zarr/
- Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/
- JM: Will need to read this
- DB: Seems good enough and harmless
- Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/
- JM: Sharing password has been a challenge
- DB: https://github.com/zarr-developers/zarr-specs/pull/312
- JM: Need to merge on https://github.com/zarr-developers/governance/pull/44
- JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes
- DH: Diversity defined on the structure of the internal architecture and not programming language implementation
## 2024-09-18
**Attending:** Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP)
**TL;DR:**
**Updates:**
- Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct.
- Identify missing docs and start creating issues
- Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc
- Async working via Zoom meetings
- MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation
- Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27
- Updates from Zarr-Python V3 effort / OME challenge
- DB: Getting through issues from V2 and V3 compatibility
- Tom and Deepak taking care of Dask issues
- Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history
- Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined
- Main release by the end of year
**Open agenda (add here 👇🏻):**
- DB: https://github.com/zarr-developers/zarr-python/issues/2170
- The way of defining sharding codec is not intuitive and can be improved
- https://github.com/zarr-developers/zarr-python/pull/2169 - proposed solution
- DB: Will update this PR and make it ready for review
- DB: All stores should have cache: https://github.com/zarr-developers/zarr-python/issues/1500
- EP: Some stores like S3 would benefit from this
- EP: Compression and decompression on cache is expensive
- DB: We can default it to 0 turn it on accordingly
- DB: FSSpec have a default cache enabled - we can look into it
- EP: Will try to join the Zarr-Python core devs meetings on Friday
- SV: Early morning for west coast
- EP: Can make it!
- SV: Early morning stuff: presented on Zarr V3 at EuroBioc: https://eurobioc2024.bioconductor.org/abstracts/paper-bioc4/
## 2024-09-04
**Attending:** Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP)
**TL;DR:**
**Updates:**
**Meeting Minutes**:
* Consolidated v2
- DH: annex for v2, "officially"/loose recognized (would be a **great** favor)
- JM: and if we put it in v3 to say, "this is the former version"?
- DH: add a forward pointer
- EP: how many edge cases?
* Deprecation (DH)
- JM: No plan to deprecate v2 format (format vs. library)
- DH: presumably people will use the new library, that will be the "test" of the consolidated metadata.
* Bugs between implementations (JM)
- DH: list of those bugs? JM: no, bug good idea.
- DH: available data?
- JM: yes! see https://github.com/ome/ome2024-ngff-challenge?tab=readme-ov-file#challenge-overview
- EP: billions of objects isn't fun.
* Consolidated v3
- JM: pushed recently at zarr-python meeting for a spec (and with more design)
- DH: as soon as it's in the format, then it's not just caching
- metadata caching prevents multiple reads
- DH: caching -> "big set of objects, keeping subset in memory"
- JM: can be re-created? "index"?
- DH: regardless, have to specify construction any block of JSON
- could say a subtree looks like some other pre-defined block
- JM: parameterized MetadataLoader (or "MetadataDriver")
- DH: that's what I was going to implement anyway
- DH: like StorageDrivers (not caching) -- "VirtualAccess"
- DH: but same wave length
- JM: would like to offload some JSON (speed vs size)
- DH: should that API do more than read/write the JSON?
- should it interpret it?
- "give me key X out of this dictionary"
- JM: like mongodb or jq queries
- DH: walk binary without needing to convert down to JSON
- EP: this gets back to N5 as an API rather than a format
- logical versus storage
- DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP)
- essential to have some virtual object/class
- hammer applied to everything ("common API")
* EP: https://github.com/zarr-developers/zarr_implementations
- why didn't that find the codec issues?
- JM: no v3!
- EP: hackathon?
- EP: need mozilla support for HTML things
- DH: agreed. hugely important
- JM: As a github action?
## 2024-08-21
**Attending:** Eric Perlman (EP) and Sanket Verma (SV)
**TL;DR:**
**Updates:**
- Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093
- Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence
- Welcome David Stansby as core dev of Zarr-Python! 🎉
- https://github.com/zarr-developers/zarr-python/pull/2071
- Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see [here](https://github.com/zarr-developers/zarr-python/commits/v3/?since=2024-08-08&until=2024-08-21)
**Open agenda (add here 👇🏻):**
- EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore
- Repo: https://github.com/ome/ome2024-ngff-challenge
- EP: Jackson lab will be utilising the docker for converting V2 to V3 data created by EP
- SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places!
## 2024-08-07
**Attending:** Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN)
**TL;DR:**
**Updates:**
- Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy
- Zarr-Python updates
- DB: There have been some movements in ZP
- Discussion around the new API: https://github.com/zarr-developers/zarr-python/discussions/2052
- Chunks, shards, and other terminology - need to decide what to use
- Getting more active core-devs for ZP will help in having lively discussion
- TN: Applying for money to work on VirtualiZarr / Zarr upstream
- TN: Development Seed is applying for the NASA grant - Julia Signell would work on it
- DB: Non-zero origin for Zarr arrays would help
- Related issue: https://github.com/zarr-developers/zarr-specs/issues/122
**Open agenda (add here 👇🏻):**
- EP: Discussion about cycling, library and picking up books from library and reading them in the park! :bicyclist: :books:
- EP/DB: Tangent on write directly to sharding from microscopes...
- TN: Various ways of storing the large metadata for a huge Zarr array
- TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed
- TN: Seems important because it has come up every time there's a discussion about scalability
- DB: Store the aggregrated information in the header of the chunk
- SV: How doe BSON scale as compared to JSON?
- TN: We would still need to have a pointer to the BSON in JSON
- DB: How do we introduce it to the Zarr V3 Spec?
- TN: Maybe a convention
- TN: Zarr is close to be a superformat!
- DB: We could also increment the spec to a major version to include the change
- TN: Discussions on if its possible for Zarr to be a _superformat_!
- TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that
- DB: A fundamental Zarr array could be a set of small Zarr arrays
- TN: VirtualiZarr basically does that
- TN: _starts screen sharing_
- DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner
- TN: Hopefully I'd be able to work on this after VirtualiZarr
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages
## 2024-07-24
**Attending:** Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN)
**TL;DR:**
**Updates:**
- SciPy 2024 was great! 🎉
- DB: Zarr-Python updates
- Sharding codec is pickleable
- Decision need to made about array API
- How sharding codec should look like to the user?
- DB: Easy to find if your array is sharded
- JM: Partial reading this in Zarr V2
- TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr
- DB: All the arrays should have sharding configuration
- JM: Working with Tensorstore, the order of codecs didn't matter --> read_chunks / write_chunks
- DB: some weirdness when it comes to different backends when uncompressed
- New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan!
- New codec added - Pcodec
- JM: Conda is unhappy
**Open agenda (add here 👇🏻):**
- Intros
- SV: Yosemite National Park
- JM: National Seashore in Florida - Gulf of Mexico
- FC: Jackson Lab working in ML - Saccida National Park
- EP: Zayn National Park
- WF: Yellowstone National Park
- DB: Yellowstone National Park
- TN: Want to open issues on bunch of ideas
- 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this
- Can use Zarr to open NetCDF directly
- 2. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays
- Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081
- DB: Could be handled and should be a priority now
- TN:
- JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct!
- JM: Talked to 2 Napari maintainers - on a problem of chunking
- TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that
- JM: Finding more people with similar interests would help us provide more engineering power
- DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase
- TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices
- DB: Slicing and concatenation are duals - if you have both its complete
- DB:
- JM: Query optimisation can be tweaked as we move forward
- TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader
- JM: What does user do with the plan? Do they do something with it?
- TN: Array API folks has deliberately made arrays lazy
- GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041
- GitHub and Cirun sounds good and easy to setup
- Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF
- NF has money reserved for projects in the infrastructure committee for similar costs
- JM: Good to have it!
- SV: Need to get it sooner that later
- Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F
- JM: My poster was cited multiple times in the last few weeks
- JM: JOSS is a potential venue - IETF is more work
- TN: Submitting to a computing journal - W3C, IEEE, etc.
- TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148
- JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html
- **TABLED**
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages
## 2024-07-10
**Attending:** Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC)
**Updates:**
- SciPy! :tada:
- Josh: testing zarr v3
- issue for each problem? Davis: sure
- Davis: to be fixed:
- no validation of fill value
- multiple bugs with sharding: 1d
- Josh: missing "attributes"
- Josh: but neuroglancer working?
- Davis: not for all static file servers. need PR.
- Davis: various forks. Josh: plugins? Davis: tough
- or: neuroglancer as a component that can be embedded
- Janelia NG is a React component.
- "Visualization is tough."
- Motion for food :knife_fork_plate: Seconded.
## 2024-06-26
**Attending:** Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)
**TL;DR:**
**Updates:**
- Zarr-Python 3.0.0a0 out
- https://pypi.org/project/zarr/3.0.0a0/
- Good momentum and lots of things happening with ZP-V3 - aiming for mid July release
- SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466
- R users at bio-conductor looking to develop bindings for ZP-V3
- New blog post: https://zarr.dev/blog/nasa-power-and-zarr/
- ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819
- https://dynamical.org/ - making weather data easy and accessbile to work with
- Check: https://dynamical.org/about/
- Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV
**Open agenda (add here 👇🏻):**
- BP: Will be talking about how Zarr is utilised at NASA!
- _starts screen sharing and presenting_
- BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - **not representing all the data centres**
- BP: Lot of people are coming into Zarr from the SMD (Science mission directorates)
- BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management
- BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use
- BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats
- BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB!
- BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse!
- BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range
- BP: Over at NASA, we do many Zarr stuff...
- Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS)
- Giovanni in cloud - duplicates Zarr (variable based)
- Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data
- ZEP0005
- Brianna is leading the GeoZarr work
- VEDA - no. of things Zarr/STAC related going on in VEDA
- TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?)
- BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes!
- TN: From the slides - Xarray can take care of some of the stuff that Giovanni does
- TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake
- BP: LakeFS is OS ArrayLake - no vendor lock-in
- SV: What does Giovanni actually do when you say, ‘it grooms metadata’?
- BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services
- BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata
- TN: Can we do the flipping with Zarr/VirtualiZarr?
- DB: If you flip at the store level - you'd need to find out the how deep you'd need to go
- BP: Will try to make time standard across the datasets
- BP: https://github.com/briannapagan/quirky-data-checker
- BP: _from the Zoom chat_
- Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices
- Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True
- Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data
- ZEP 0005: Zarr accumulation extension for optimizing data analysis
- Looking into a GIS service for zarr stores
- POWER https://power.larc.nasa.gov/data-access-viewer/
- https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
- https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7
- EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful
- DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document
- DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature
- EP: The bio and geo world have parallel tracks and working in silos
- EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363
- DB: The challenge doesn't seems interesting to me! - convering `JSON`s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem
- EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr