Zarr Bi-weekly Community Calls

--- tags: zarr, Meeting --- # Zarr Bi-weekly Community Calls ### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/** Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success) GitHub repo: https://github.com/zarr-developers/community-calls Previous notes: https://j.mp/zarr-community-1 ## 2024-07-24 **Attending:** Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN) **TL;DR:** **Updates:** - SciPy 2024 was great! 🎉 - DB: Zarr-Python updates - Sharding codec is pickleable - Decision need to made about array API - How sharding codec should look like to the user? - DB: Easy to find if your array is sharded - JM: Partial reading this in Zarr V2 - TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr - DB: All the arrays should have sharding configuration - JM: Working with Tensorstore, the order of codecs didn't matter --> read_chunks / write_chunks - DB: some weirdness when it comes to different backends when uncompressed - New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan! - New codec added - Pcodec - JM: Conda is unhappy **Open agenda (add here 👇🏻):** - Intros - SV: Yosemite National Park - JM: National Seashore in Florida - Gulf of Mexico - FC: Jackson Lab working in ML - Saccida National Park - EP: Zayn National Park - WF: Yellowstone National Park - DB: Yellowstone National Park - TN: Want to open issues on bunch of ideas - 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this - Can use Zarr to open NetCDF directly - 2. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays - Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081 - DB: Could be handled and should be a priority now - TN: - JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct! - JM: Talked to 2 Napari maintainers - on a problem of chunking - TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that - JM: Finding more people with similar interests would help us provide more engineering power - DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase - TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices - DB: Slicing and concatenation are duals - if you have both its complete - DB: - JM: Query optimisation can be tweaked as we move forward - TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader - JM: What does user do with the plan? Do they do something with it? - TN: Array API folks has deliberately made arrays lazy - GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041 - GitHub and Cirun sounds good and easy to setup - Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF - NF has money reserved for projects in the infrastructure committee for similar costs - JM: Good to have it! - SV: Need to get it sooner that later - Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F - JM: My poster was cited multiple times in the last few weeks - JM: JOSS is a potential venue - IETF is more work - TN: Submitting to a computing journal - W3C, IEEE, etc. - TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148 - JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html - **TABLED** - Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages ## 2024-07-10 **Attending:** Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC) **Updates:** - SciPy! :tada: - Josh: testing zarr v3 - issue for each problem? Davis: sure - Davis: to be fixed: - no validation of fill value - multiple bugs with sharding: 1d - Josh: missing "attributes" - Josh: but neuroglancer working? - Davis: not for all static file servers. need PR. - Davis: various forks. Josh: plugins? Davis: tough - or: neuroglancer as a component that can be embedded - Janelia NG is a React component. - "Visualization is tough." - Motion for food :knife_fork_plate: Seconded. ## 2024-06-26 **Attending:** Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB) **TL;DR:** **Updates:** - Zarr-Python 3.0.0a0 out - https://pypi.org/project/zarr/3.0.0a0/ - Good momentum and lots of things happening with ZP-V3 - aiming for mid July release - SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466 - R users at bio-conductor looking to develop bindings for ZP-V3 - New blog post: https://zarr.dev/blog/nasa-power-and-zarr/ - ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819 - https://dynamical.org/ - making weather data easy and accessbile to work with - Check: https://dynamical.org/about/ - Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV **Open agenda (add here 👇🏻):** - BP: Will be talking about how Zarr is utilised at NASA! - _starts screen sharing and presenting_ - BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - **not representing all the data centres** - BP: Lot of people are coming into Zarr from the SMD (Science mission directorates) - BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management - BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use - BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats - BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB! - BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse! - BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range - BP: Over at NASA, we do many Zarr stuff... - Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS) - Giovanni in cloud - duplicates Zarr (variable based) - Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data - ZEP0005 - Brianna is leading the GeoZarr work - VEDA - no. of things Zarr/STAC related going on in VEDA - TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?) - BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes! - TN: From the slides - Xarray can take care of some of the stuff that Giovanni does - TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake - BP: LakeFS is OS ArrayLake - no vendor lock-in - SV: What does Giovanni actually do when you say, ‘it grooms metadata’? - BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services - BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata - TN: Can we do the flipping with Zarr/VirtualiZarr? - DB: If you flip at the store level - you'd need to find out the how deep you'd need to go - BP: Will try to make time standard across the datasets - BP: https://github.com/briannapagan/quirky-data-checker - BP: _from the Zoom chat_ - Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices - Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True - Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data - ZEP 0005: Zarr accumulation extension for optimizing data analysis - Looking into a GIS service for zarr stores - POWER https://power.larc.nasa.gov/data-access-viewer/ - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html - https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7 - EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful - DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document - DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature - EP: The bio and geo world have parallel tracks and working in silos - EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363 - DB: The challenge doesn't seems interesting to me! - convering `JSON`s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem - EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.