Zarr BoF

https://tinyurl.com/zarr-bof

Attendance

Discussion

  • EUMETSAT
    • Should we use Zarr in addition to NetCDF?
    • specifically for reprocessed climate data records and training ML models
  • Interested in hearing about NetCDF -> Zarr via Kerchunk workflows and experiences
  • New users to learn what is possible and why to use it
  • How can STAC and Zarr play friendly together / how to map
  • Browser visualization of Zarr
  • Status of GeoZarr
  • Daniel Loos: Building a native file format for Discrete Global Grid Systems
  • Zarr as file format for Copernicus S1/S2/S3 imagery
  • How important is chunking for performance?
  • How to assess the performance, especially in a distributed environment/cluster? What should be taken into account besides execution time?
  • How suited is Zarr as an archival format?
  • How to build data platform components w/ Zarr
  • Comparison to HDF5 and interested in Metadata
  • Relationship of Zarr with Pangeo (Interested in Executable Workflows, such as openEO/OGC-API-Processes - which leads to Pangeo 'workflows' -> Zarr)
  • Want to know about the capability of ZARR (& Kerchunk) for improving access to n-dimensional L1b/L2 SAFE format Copernicus products for Sentinel-3 marine (as well as regularly gridded stuff) - perhaps this relates to ragged arrays?
  • Best practices for updating a .zarr file and considerations for (re)chunking.
  • support for sparse arrays? -> implementation expected within a year
  • What is the difference between GeoZarr and NCZarr?

Topics

  • Intro to Zarr
    • Created in 2015 by Alistair Miles
    • 21 core devs
    • Sponsored by NumFocus
    • Generic format for Scientific Array data
    • Container of items
    • Wide range of compression options
    • Retrieve chunks only when needed
    • Read/Write in parallel
    • All dimensions are treated equally
    • Implementations in many programming languages
    • programming language support: Python, C, C++, Julia, Java, Javascript, R
    • Governed by the Zarr Implementation Council (No 'Zarr-Ruler')
    • Zarr rechunker( https://github.com/pangeo-data/rechunker)
  • GeoZarr
    • GeoZarr Spec: https://github.com/zarr-developers/geozarr-spec
    • standard to store geospatial data in Zarr
    • Started by Christoph Noel @ Space Bell
    • Differences between CF-Conventions and GIS people
    • is an OGC standards working group
    • storing multidimensional georeferenced gridded geodata (including raster) -> first off, raster and grids need a sound definition
    • Desirable software from (GIS) community to be able to use Zarr format in: R, SNAP, QGIS, ArcGIS, all the other GIS softwares
    • ZEPs as a way forward to push the conventions (https://zarr.dev/zeps)
  • Zarr + STAC
  • Zarr and / or NetCDF
  • Zarr for ML
Select a repo