owned this note changed 11 days ago
Published Linked with GitHub

Zarr Bi-weekly Community Calls

Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/

Joining instructions: https://zoom.us/j/300670033 (password: 558943)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1

2025-03-20

Attending: Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

2025-02-19

Attending: Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

2025-02-05

Notes TBA

2025-01-22

Attending: Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • N5
    • JM: There'll be new release to add support for N5 in Zarr-Python 3
    • EP: They can leverage sharding and other useful features
  • Zarr-Python 3
    • DB: Gave a presentation on Zarr-Python 3 at Allen Institute
    • DB: Realised some issues in Zarr-Python 2 when listing groups
    • JMS: Because ZP 2 listing processes were taking place parallely
    • DB: In Zarr V2 spec there's nothing says that groups and arrays are different
    • JMS: Looked at the spec as well as the implementation when working with my implementation
  • JMS: Added support for ZEP8 in Tensorstore
  • Discussion on URL for Neuroglancer
    • Deciding the right characters to use
    • Tricky to decide the right URL

2025-01-09

Attending: Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • EP: the month wait was good to get other projects like napari up-to-speed
  • DB: reached out to people using n5 in python. They weren't pinning to zarr-python<3. Sent an email. No response. EP anyone? No. Using Zarr.
  • JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it.
    • TN: Kerchunk doesn't support zarr-python 3.x (API usage)
    • without kerchunk: fits & netcdf won't work.
    • lose access to anything in the future (in-progress HDF4)
    • JH: requires rethinking of MultiZarrToZarr logic
    • TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects)
    • JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec
    • JM: I'll get the ZSC to respond on the longing issues
    • DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it)
    • DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3
    • JMS: Planning to add Icechunk support to Tensorstore

2024-12-11

Attending: Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

  • Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev
  • Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman!
  • A group of Zarr-Python devs are at AGU this week including Joe and Ryan
  • Zarr-Python V3 release before holidays!
    • DB has bunch of PRs coming in soon!
    • Planning to expose to sharding in a user friendly way
  • WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata

Open agenda (add here 👇🏻):

  • Intros w/ favourite places for holidays
    • Sanket — into Himalayas
    • Ward - near Colorado
    • Davis — Italy
    • Eric - coming to India soon!
    • Gábor — Canada for Skiing
    • Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data
  • DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape?
    • JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult
    • JMS: Could also be based on user preference
    • DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it?
    • JMS: The non-regular/partial chunks would not compose across shards
    • DB: I see! The proposal is off the table then!

2024-11-27

Attending: Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • DB: OME-NGFF hackathon update:
    • They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it!
    • https://github.com/BioImageTools/ome-zarr-models-py
    • EP: John Bogoviç made good progress on Zarr Java (in the NGFF land)
      • FIJI being able to open Zarr V3
  • DB: zarr.open() and zarr.create() are confusing - instead we could have zarr.create_array() or zarr.create_group() to make things clear
    • DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine
    • JM: Would be cool if zarr.open() could figure out if it's an array or a group
  • Zarrs-Python
    • JM: The nomenclature could've been better! A bit confusing.
    • DB: Major scope is to improve the IO
  • JM: https://bsky.app/profile/zarr.dev
    • Follow us!
  • https://github.com/zarr-developers/zarr-specs/pull/311
    • JMS: The only part which requires spec is the URL
    • JM: I care about the internal directory structure
    • DB: The folks who I spoke at the OME hackathon they want the drag and drop feature
    • JM: In ZP land

2024-11-13

Attending: Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • meetings (Josh)
    • conversation
      • zarr-python: going strong (weekly)
      • ZEP: (not great for Dennis)
        • one off as necessary
      • community: combine into ZEP.
        • or vice versa
      • office hours: likely to end
      • Image Not Showing Possible Reasons
        • The image file may be corrupted
        • The server hosting the image is unavailable
        • The image path is incorrect
        • The image format is not supported
        Learn More →
        run a doodle
      • People
        • Dennis: not before 10am MST
        • Jeremy: ZEP meetings are less critical at the moment
    • Decisions/TODOs
      • Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones)
      • Josh: remove ZEP calendar entry
      • Josh: update on Zulip (anywhere else?)
  • Davis: was "zarr.json" a mistake?
    • Josh: good question. benefits were:
      • only one GET (or HEAD) rather than needing a frequent 404
      • non-hidden files with proper file-ending
    • Davis: true. just have a pattern now where want to iterate and bottom out on arrays
    • Jeremy: often need to load the json anyway
      • or storage transformers that are needed
    • Dennis: preference is to have the directories marked (e.g., in the name)
      • price is mostly paid with large numbers of groups/arrays
      • cf. consolidated metadata locating all objects before reading them
      • maybe be able to recognize that by name
      • Davis: give it its own document?
      • Josh: like .zmetadata; downside is it requires an extra GET but that's ok.
      • Davis: using _nc for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues)
      • Josh: could additionally gzip the extra file. benchmarking?
      • Dennis: how big can get the total metadata? (in characters)
      • Davis: maybe tabular data.
      • Dennis: in NetCDF, lot of use of groups (1000s) as namespaces.
  • Davis: Store API
    • getters take memory type (GPU, CPU, )
    • Josh: good to track (or disallow?) copies
    • Jeremy: most Stores are CPU, so actively copying for GPU.
    • Davis: Separate stores as in v2 (regular and chunk)
    • Davis: Store is simple key/value. Agnostic to Zarr formats.
    • Is the Store API overloaded?
  • Davis: On extra files, an extension where sqlite for every group and array. Good for tabular.
    • Jeremy: sqlite doesn't work for cloud storage.
    • What stops people from doing it today?
      • Prototype
      • Is this icechunk? That's more a Store API.
      • Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead.
      • Josh: duckdb?
      • Davis: see BigStitcher's use of arrays.
      • Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing.
      • Davis: GeoParquet has a spatial index
      • https://github.com/opengeospatial/geoparquet/pull/191
  • Theodoros: interested in adopting Zarr
    • Problem is that we're dealing with really sparse datasets (mass spec imaging).
    • Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign.
    • Efficient encoding of a single sample and "plugged into" zarr.
    • TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels)
    • JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours.
      • other step would be full spare support. ton of people have asked for this. but has to be woven throughout.
    • Davis: tell us what doesn't work for you. "we want to use Zarr but "
    • Josh: https://github.com/GraphBLAS/binsparse-specification
    • Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array"

2024-10-30

Attending: Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS)

TL;DR:

Updates:

  • DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests
  • JMS: Back from the parental leave — the baby is doing great!
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    • Been working on bugs for tensorstore
  • SV: GeoZarr spec meetings have been updated on the community calendar

Open agenda (add here 👇🏻):

  • Frequency of the meetings
    • DB: No strong feelings
    • JMS: Less activity, so make sense
    • GB: Fine by me
    • DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful
    • SV: Will open a wide discussion for the community to get everyone thoughts
  • DB: Zarr V2 arrays using sharded arrays?
    • JMS: Not simple enough to do that because of overlapping arrays
    • DB: Zarr V2 codecs can utilise sharding codec
    • JMS: The JSON metadata is differ for sharding
    • JMS: Who's the user base?
    • DB: Someone who's using Zarr V2 and want to use sharding
    • DB: People might be scared of switching to a new format
  • DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference
  • JMS: The idea of community and core codecs is not super impressive!
    • DB: Would be good to avoid namespacing issues
    • JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core?
    • DB: Good question, need to come up with a process for this
    • JMS: Adding a vendor name could work — value in having a vendor name
  • Discussions on upcoming possible extensions

2024-10-16

Attending: Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • Intro w/ favourite food
    • Sanket - Dumplings
    • Joe - Burrito
    • Eric - Donuts
    • Davis - Ethopian, Mexican and Indian dishes
    • Michael - RSE at Australian Antartcic Division - Burgers
    • Ilana - Works at Earthmover
    • Daniel - NCAR
  • JH: starts screen sharing
    • JH: Presents on Earthmover, Arraylake, Icechunk
    • JH: presentation ends — time for questions
    • DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask?
      • JH: Fetching is done by a different library - we're handling the concurrency better on the IO side
      • DB: What lessons could be take from this plot that can be applied to Zarr-Python?
      • JH: Python binding to rust crate needs to be looked at
      • JH: Doing decompression and IO in a relieved fashion
    • SV: Does Icechunk works with Zarr V2?
      • JH: Only with V3 for some parts - but we can change that
    • EP: Able to leverage Zarr sharding in some way for Icechunk would be great
      • JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec
      • JH: Implies sharding in a different manner
    • DB: How coupled are you with the current Zarr V3 API?
      • JH: Highly coupled
      • JH: LDeakin has started filling issues
      • JH: Can envision a high-level and a low-level store - that's what we build in the rust store
      • JH: We should ask store to do more, but we should be specific about it
    • MS: Really interested in Rust implementation - Does Rust part take over the encodings?
      • JH: No. We haven't implemented all of ZP yet
      • JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around
      • SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you
    • Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!
  • DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays!
    • JH: Try to get sharded V2 data to work, and let us know!

2024-10-02

Attending: Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • OME Challenge
    • EP: JAX ran into issues for remote access and it's good to point them out and later rectify that
    • EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets
    • EP: Checking if an object exists before a write could cost us $2k!
    • EP: JZarr is currently being written
    • DH: Any decisions about deleting objects?
    • EP: When you check for existing objects, you have the ability to rewrite them
    • DH: That means you can delete it!
    • DH: NetCDF implements a recursive delete operation
    • EP:
    • DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them
    • DH: Having consolidated metadata help in rewriting operations
    • DB: Defining schema and knowing the entire hierarchy has been helpful
    • DH: We have this in NetCDF
    • DB: https://github.com/janelia-cellmap/pydantic-zarr/
  • Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/
    • JM: Will need to read this
    • DB: Seems good enough and harmless
  • Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/
    • JM: Sharing password has been a challenge
  • DB: https://github.com/zarr-developers/zarr-specs/pull/312
    • JM: Need to merge on https://github.com/zarr-developers/governance/pull/44
    • JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes
    • DH: Diversity defined on the structure of the internal architecture and not programming language implementation

2024-09-18

Attending: Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

2024-09-04

Attending: Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP)

TL;DR:

Updates:

Meeting Minutes:

  • Consolidated v2
    • DH: annex for v2, "officially"/loose recognized (would be a great favor)
    • JM: and if we put it in v3 to say, "this is the former version"?
    • DH: add a forward pointer
    • EP: how many edge cases?
  • Deprecation (DH)
    • JM: No plan to deprecate v2 format (format vs. library)
    • DH: presumably people will use the new library, that will be the "test" of the consolidated metadata.
  • Bugs between implementations (JM)
  • Consolidated v3
    • JM: pushed recently at zarr-python meeting for a spec (and with more design)
    • DH: as soon as it's in the format, then it's not just caching
      • metadata caching prevents multiple reads
    • DH: caching -> "big set of objects, keeping subset in memory"
      • JM: can be re-created? "index"?
      • DH: regardless, have to specify construction any block of JSON
        • could say a subtree looks like some other pre-defined block
    • JM: parameterized MetadataLoader (or "MetadataDriver")
      • DH: that's what I was going to implement anyway
      • DH: like StorageDrivers (not caching) "VirtualAccess"
      • DH: but same wave length
      • JM: would like to offload some JSON (speed vs size)
      • DH: should that API do more than read/write the JSON?
        • should it interpret it?
        • "give me key X out of this dictionary"
        • JM: like mongodb or jq queries
        • DH: walk binary without needing to convert down to JSON
      • EP: this gets back to N5 as an API rather than a format
        • logical versus storage
      • DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP)
        • essential to have some virtual object/class
        • hammer applied to everything ("common API")
  • EP: https://github.com/zarr-developers/zarr_implementations
    • why didn't that find the codec issues?
    • JM: no v3!
    • EP: hackathon?
    • EP: need mozilla support for HTML things
    • DH: agreed. hugely important
    • JM: As a github action?

2024-08-21

Attending: Eric Perlman (EP) and Sanket Verma (SV)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore
  • SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places!

2024-08-07

Attending: Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • EP: Discussion about cycling, library and picking up books from library and reading them in the park!
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • EP/DB: Tangent on write directly to sharding from microscopes
  • TN: Various ways of storing the large metadata for a huge Zarr array
    • TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed
    • TN: Seems important because it has come up every time there's a discussion about scalability
    • DB: Store the aggregrated information in the header of the chunk
    • SV: How doe BSON scale as compared to JSON?
    • TN: We would still need to have a pointer to the BSON in JSON
    • DB: How do we introduce it to the Zarr V3 Spec?
    • TN: Maybe a convention
    • TN: Zarr is close to be a superformat!
    • DB: We could also increment the spec to a major version to include the change
  • TN: Discussions on if its possible for Zarr to be a superformat!
    • TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that
  • DB: A fundamental Zarr array could be a set of small Zarr arrays
    • TN: VirtualiZarr basically does that
    • TN: starts screen sharing
    • DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner
    • TN: Hopefully I'd be able to work on this after VirtualiZarr
  • Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages

2024-07-24

Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN)

TL;DR:

Updates:

  • SciPy 2024 was great! 🎉
  • DB: Zarr-Python updates
    • Sharding codec is pickleable
    • Decision need to made about array API
    • How sharding codec should look like to the user?
      • DB: Easy to find if your array is sharded
      • JM: Partial reading this in Zarr V2
        • TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr
        • DB: All the arrays should have sharding configuration
        • JM: Working with Tensorstore, the order of codecs didn't matter > read_chunks / write_chunks
          • DB: some weirdness when it comes to different backends when uncompressed
  • New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan!
    • New codec added - Pcodec
    • JM: Conda is unhappy

Open agenda (add here 👇🏻):

  • Intros
    • SV: Yosemite National Park
    • JM: National Seashore in Florida - Gulf of Mexico
    • FC: Jackson Lab working in ML - Saccida National Park
    • EP: Zayn National Park
    • WF: Yellowstone National Park
    • DB: Yellowstone National Park
  • TN: Want to open issues on bunch of ideas
      1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this
      • Can use Zarr to open NetCDF directly
      1. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays
    • DB: Could be handled and should be a priority now
    • TN:
    • JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct!
    • JM: Talked to 2 Napari maintainers - on a problem of chunking
    • TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that
    • JM: Finding more people with similar interests would help us provide more engineering power
    • DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase
    • TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices
    • DB: Slicing and concatenation are duals - if you have both its complete
    • DB:
    • JM: Query optimisation can be tweaked as we move forward
    • TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader
    • JM: What does user do with the plan? Do they do something with it?
    • TN: Array API folks has deliberately made arrays lazy
  • GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041
    • GitHub and Cirun sounds good and easy to setup
    • Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF
    • NF has money reserved for projects in the infrastructure committee for similar costs
    • JM: Good to have it!
    • SV: Need to get it sooner that later
  • Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F
  • TABLED

2024-07-10

Attending: Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC)

Updates:

  • SciPy!
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Josh: testing zarr v3
    • issue for each problem? Davis: sure
    • Davis: to be fixed:
      • no validation of fill value
      • multiple bugs with sharding: 1d
    • Josh: missing "attributes"
    • Josh: but neuroglancer working?
      • Davis: not for all static file servers. need PR.
      • Davis: various forks. Josh: plugins? Davis: tough
      • or: neuroglancer as a component that can be embedded
      • Janelia NG is a React component.
      • "Visualization is tough."
  • Motion for food
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Seconded.

2024-06-26

Attending: Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)

TL;DR:

Updates:

Open agenda (add here 👇🏻):

  • BP: Will be talking about how Zarr is utilised at NASA!
    • starts screen sharing and presenting
    • BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - not representing all the data centres
    • BP: Lot of people are coming into Zarr from the SMD (Science mission directorates)
    • BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management
    • BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use
    • BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats
    • BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB!
    • BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse!
    • BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range
    • BP: Over at NASA, we do many Zarr stuff
      • Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS)
      • Giovanni in cloud - duplicates Zarr (variable based)
        • Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data
      • ZEP0005
      • Brianna is leading the GeoZarr work
      • VEDA - no. of things Zarr/STAC related going on in VEDA
    • TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?)
      • BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes!
      • TN: From the slides - Xarray can take care of some of the stuff that Giovanni does
    • TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake
      • BP: LakeFS is OS ArrayLake - no vendor lock-in
    • SV: What does Giovanni actually do when you say, ‘it grooms metadata’?
      • BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services
      • BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata
      • TN: Can we do the flipping with Zarr/VirtualiZarr?
      • DB: If you flip at the store level - you'd need to find out the how deep you'd need to go
      • BP: Will try to make time standard across the datasets
      • BP: https://github.com/briannapagan/quirky-data-checker
    • BP: from the Zoom chat
  • EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful
    • DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document
    • DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature
    • EP: The bio and geo world have parallel tracks and working in silos
    • EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363
      • DB: The challenge doesn't seems interesting to me! - convering JSONs documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem
    • EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr
Select a repo