Zarr Bi-weekly Community Calls

--- tags: zarr, Meeting --- # Zarr Bi-weekly Community Calls ### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/** Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success) GitHub repo: https://github.com/zarr-developers/community-calls Previous notes: https://j.mp/zarr-community-1 ## 2025-09-17 **Attending:** Eric Perlman (EP) (No one else joined in first 20 minutes.) ## 2025-09-03 **Attending:** Eric Perlman (EP), Justus Magin (JMa) * JMa: * sparse container for faster slicing / concatenation: https://github.com/keewis/sparse-indexing-container/ * fixed a roundtrip bug on BytesCodec: https://github.com/zarr-developers/zarr-python/pull/3417 * causes a bunch of failing tests in ArrayV3Metadata tests * might be visible from the public API? ## 2025-08-20 **Attending:** Lachlan Deakin (LD), Ryan Abernathey (RA), Jeremy Maitin-Shepard (JMS), Davis Bennett (DB), Max Jones (MJ), Josh Moore (JM) Special Session on [ZEP10](https://github.com/zarr-developers/zeps/pull/67) - Lachlan presents https://zarrs.dev/slides/zarr_generic_extensions_20250820 - LD: Like the "extensions" *object* a lot - JMS: from [issue](https://github.com/zarr-developers/zeps/pull/67#issuecomment-3207787301) we can use syntax like a `_` prefix to mark things as optional - DB: roughly 10 verbs create/remove array/group, read/write chunks, metadata (subsubgroups are a separate question) - what are we designing `must_understand` for, to tune the level granularity? - Ryan - Agenda (in 45 minutes) - what ZEP10 should look like? - alignment on `must_understand` - extension field - "registered attributes" - use cases (what do we want) - consolidated metadata - groups declaring members - subarrays (array under another array; relation to a parent grid) - multiscales - differently chunked copies (for differing access patterns) - chunk aliases - declaring separate source for the attributes - icechunk (group storage transformer?) - encrypted or DB-stored attributes - discussion - LD: consolidated metadata can benefit from constraints - DB: zarr aware routine that copies an existing hierarchy to a new store; direct copying should be allowed - JMS: if there's a relative path - DB: the verbs - create_array - create_group - remove_array - remove_group - update_array_metadata - update_group_metadata - read_chunk - write_chunk - read_array_metadata - read_group_metadata - RA: unevenly chunked array (chunk grid) - DB: extensibility of the chunk grid is narrowly scoped. doesn't interact with anything else - DB: top-level are unscoped - LD: lets us look at new stuff (essentially 4.0) - RA: chunk statistics (or as anotehr array within a group) - all about updates, and who is responsible - new opinion while developing icechunk. zarr is data at rest, spec only partially deals with consistency. - JM: building up a language of constraints to unaware implementations - DB: error is giving someone mutable access (cf. chunk statistic) - JMS: not untrusted users, but missing a plugin or an old version. failing is useful - however, tricky, since you don't want to necessarily update the multiscales and the statistics right way (i.e. implementation might not be able to maintain the ) - theoretical consistent maintenance might not be feasible - DB: data model that has relationship between nodes and an API that doesn't, then it can be difficult. - RA: everything is out the window during the non-zero window of time - only solution is a transactional mechanism - can declare what a dataset at rest looks like (up to implementation & store) - RA: propose (semi-controversial) - define all the above as attributes - LD: fine until we hit something that can't deal with a silent failure - JMS: example is forcing consolidated metadata to get updated - fill_array - MJ: consider not being able to handle cm - formalize the relationship between zarr and icechunk - don't over index on cm - JM: on the list of things to avoid, false reads - JMS: move to registered attributes? - RA: include `must_understand` there? - LD: no, because in 3.1 there is nothing about that. - JMS: preventing writing is not that big of a deal, limited. reading is a bigger problem. - RA: support for pushing functionality to attributes for a higher-level framework. may lead to data structure that requires synchronization ## 2025-08-20 **Attending:** Josh Moore (JMo), Justus Magin (JMa), Davis Bennett (DB), Jeremy Maitin-Shepard (JMS), Eric Perlman (EP), Gábor Kovács (GK) - Rome - DB: People signing up - Limit on the developer days - Discussing limit on the adopter days - Sparse - JMa: something that works :tada: (with modifications to zarr) - changed Zarr to take the prototype from the codec - array-to-array then array-to-bytes codec to see which NDBuffer it should take - sparse codec needs a specific buffer - not attached to it. (could use a global option) - modified zarr to pass along the chunk size when creating a buffer - not fast; spends most of its time indexing into the sparse array (CPU bound) - DB: fixed with a faster sparse library? - JMa: yes, and being smarter with number of operations - DB: which type of memory is an unsolved issue; don't allow codec to publicize which memory - i.e. have to handle an arbitrary NDBuffer - cf. every store method needs a buffer to write into - wider codec pipeline discussion - JMS: tensorstore doesn't do GPU or sparse but ... - think of it more as a top-down choice - rather than depending on the metadata array, read op returns a type of buffer - instead have an explicit API for reading sparse (type annotations can be stricter) - DB: Zarr doesn't support any operation where dense vs. sparse matters - people currently expect a dense numpy-like array - JMS: modifying an existing array is the expensive bit - JMa: screen shares with the zarr-python changes needed ![image](https://hackmd.io/_uploads/rJ41Z97Klg.png) ![image](https://hackmd.io/_uploads/BJ6kbc7Klg.png) - JMS: similar to GPU, you may want to do more in parallel - DB: getting the parallel GPU story working is difficult - CUDA has a different model of concurrency - want a GPU specific pipeline that's smart about allocating memory and then batching with respect to that - codec pipeline is the target for a lot of these optimizations - cf. scverse/rust speed up is from having a smarter pipeline class - JMo: wonder if we're missing an abstraction, "Operation", etc. - DB: would most like to have Zarr arrays be composed of other Zarr arrays. - chunks could negotiate that they can only go on certain devices - everything is just a collection of chunks - JMS: tensorstore thinking of an array as a set of operations - can be decomposed into smaller arrays - proposed chunkgrid as just being a top-level of a codec stack - you could imagine concatentations of other arrays - currently awkward. (codecs below the level of chunking) - DB: what's the function signature of a codec. not currently defined. - so e.g. we don't track endianess - need to be formal about what an array is - then it's clear what information the codec has access to - JMS: do some form of propagation up and down the stack - i.e. implementation specific (GPU, endianness, ...) - DB: perhaps then each implementation needs to know in its own language - JMS: what codecs attributes are dynamic and which are fixed? - e.g. datatype and shape fixed. DB: there are dtype - JMS: given once you have the config, not chunk to chunk - JMS: in ts there's resolution where each codec says (forward & backward passes) given X I put out Y (dtype, memory layout, ...) - DB: certainly not multiple passes in zarr-python. Added TODOs. Have a forward pass and rewrite the codecs dynamically based on the previous' advertised output - JMo: listening to the conversation, I wonder if metadata needs specifying - JMS: you need to do a resolution process, necessary information is there - we specify the overall output as dtype, that gets propagated to the lowest level codec - you could imagine the other way around, you just save what the lowest level stores and then calculate the top-level data type - comes into play in imagecodecs. lot of complexity in how they store thing (RGB, channels, etc.; color spaces). if you want to read that in, the zarr data type for the array is a bit arbitrary - same for endian conversation, could be implicitly propagated back up - JMa: issue of storing metadata about the array can only be a codec property (e.g., the kind of the sparse array) ## 2025-08-06 **Attending:** Josh Moore (JMa), Eric Perlman (EP), Justus Magin (JMa), Davis Bennett (DB), Gábor Kovács (GK), Jeremy Maitin-Shepard (JMS) - Find the meeting notes under https://zarr.dev/community-calls/ - Sparse (JMa) - Trying to how to customize NDBuffer (which assumes numpy array representation) - Possibly figured it out - Creating a NDBuffer, have to pass the chunks to make a chunk grid - Assignment to sparse array isn't a good idea - Have an object-type numpy array to represent the grid - Then concatenate only when you want the data - Need to know the chunking for mapping the slicing - some functionality in zarr-python that can be used - not all changes pushed to zarr-sparse (lots of experimentation) - performance metrics yet - next part of the problem: bytes to compressed, then uncompress and put into sparse arrays - so not yet - nested codec? probably - there will some rules (i.e. sparse must be first) - anything to get the default codec? - future of the repo? pip installable? - haven't thought about it yet - options: keep it separate or merge into zarr-python - open to discussion - JMo: open question of having a non-dense API (AMR, APR, multiscales, etc.) - DB: array API? JMa: not all of them. - DB: zarr-python currently takes all chunks and packs them into a numpy array. a mistake. - JMa: *summary on NDBuffer from above* - DB: separate sparse array per chunk? (Big change in business operation) - JMa: with NDBuffer can control what is contained - DB: don't want an object-type - JMa: already translating between arrays and chunks - DB: see indexing module (copied from zarr-python 2). rooms for improvement. - Draft PRs welcome. - ZEP10: - Josh owes Jeremy a - JMS: leave purely metadata in a separate realm - DB: Examples? - JMS: fill value as json array the broadcasts. motivates the PR. - DB: Breaking change to the array model "soft zarr 4" - JMo: difference between 3.x and 4.x - DB: inviting fragmentation - don't see how these leads to fragmentation - stac doesn't describe behavior; chunk encoding describes a function. makes zarr function. - JMS: adding a feature that *requires* implementation updates - burden of reducing that is on the implementations - DB: difficult to motivate the change; difficult it's an application. - JMS: way to play/evolve the spec. you need a way to implement it (create your keys) that doesn't cause a problem - "URL" following the spec today. - doesn't seem like that big of an idea either way - another example: inline array - DB: take for granted that it is for an implementation that only some will be able to read - alternative would be just to do that today - JMS: possible examples: fill_value_array, offset, consolidated metadata - DB: differentiating between metadata writing and chunk writing - DB: don't make it hypothetical, more focused on solving the engineering problem - EP: you set the bar really hard. have been bit in the foot on default fill values - bar of nice to have is fine. (otherwise is a core feature) - extension is the place for nice to have - DB: v3 spec contained contradictions so making the bar high - closing - DB: suggest -- build everything about consolidated metadata and these others things - JMS: seemed like most people are for metadata-only extensions - danger of how implementations are implemented if attributes aren't registerable - same time or handling attribute registration first? ## 2025-07-23 **Attending:** Ward Fisher (WF), Eric Perlman (EP), Davis Bennett (DB) **Meeting Minutes:** - DB: Discussion and quick demo of `uv` for managing python environments - uv headers are now used in zarr-python issue template - uv powers the current zarr2-tests in zarr3-python tests ## 2025-07-09 ## 2025-06-25 **Attending:** Ward Fisher (WF), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB) **Meeting Minutes:** - DB: https://github.com/zarr-developers/zarr-python/pull/2874 — got merged! - Next steps: - Fix the codecs situation — unify them - Stop depending on numcodecs - Variable size chunks - Other things: - Indexing issues - NVIDIA folks are interested in how GPU can be added into Zarr-Python codebase - SV: New data type addition is also included in DB's PR - DB: Also been working on the new data types stuff - WF: Unidata is back in operation. We are also collaborating with DKRZ (German Climate Computing Center) on some proposed `ncZarr` work they would like to see (consolidated metadata, amongst other things). - There have been questions from DKRZ devs re: functionality in various Zarr Python packages that deviate from the specification (v2, specifically). - DB: Having code samples in the Zarr-Python repository, using PEP723 - DB: Found a funny bug in Zarr-Python 2 ## 2025-06-11 **Attending:** Josh Moore (JM), Gábor Kovács (GK) - GK: any obvious issues to get involved on. - JM: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/zarr.2Eload.20deletes.20data/with/522616083 - Other suggestions welcome. - People still interested in the meeting? New time? Monthly? ## 2025-05-28 **Attending:** Josh Moore (JM) - No show. ## 2025-04-16 **Attending:** Josh Moore (JMo), Eric Perlman (EP), Justus Magin (JMa), Gábor Kovács (GK) **TL;DR:** **Updates:** **Meeting Minutes:** - JMa: "signed URLs" - EP: looking at them for raw data - e.g. HTTP parameters NOT the AWS thing - JMo: bug? JMa: Zarr is oversimplifying the use of paths - Planetary computer pulls an access token, appends to any URL (Zarr or Geo-TIFF or ...) - EP: would try hacking it in in a little fsspec wrapper - JMa: use-case -- trying to access something S3-like that needs parameters - JMo: does obstore "do the right thing"? - JMa: Kyle has something that will work for planetary computer, but that's just one endpoint - EP: with shards can almost use *proper* signed URLs - JMo: will likely require Virtualizarr - also: https://earthmover.io/blog/announcing-flux - JMa: sparse arrays - looked at binsparse - which decomposes into one dimensional arrays - another level of nesting - JMo: good format but need library support like `.chunks`. need to be aware of the metadata. - JMa: encoding the sparseness **per chunk**? ## 2025-04-02 **Attending:** Davis Bennett (DB), Sanket Verma (SV), Eric Perlman (EP), Jeremy Maitin-Shepard (JMS), Michael Sumner (MS) **TL;DR:** **Updates:** - Version policy update: https://github.com/zarr-developers/zarr-python/pull/2910 - Blog post PR: https://github.com/zarr-developers/blog/pull/67 - Obstore-base store implementation PR has landed: https://github.com/zarr-developers/zarr-python/pull/1661 - Zarr-Python V2 Support release [2.18.5](https://github.com/zarr-developers/zarr-python/releases/tag/v2.18.5) took place last week! Thanks, David! **Meeting Minutes:** - DB: Working to add support for V3 and Tensorstore in [Pydantic Zarr](https://github.com/zarr-developers/pydantic-zarr) - Also to add group support in Pydantic for Tensorstore - Appreciate the results by reading and writing in Tensorstore, i.e. returns an object - DB: _elaborates on the version policy change_ - DB: [Effver](https://jacobtomlinson.dev/effver/)—mostly a function of efforts put in by the users - JMS: - DB: https://github.com/zarr-developers/numcodecs/issues/686—formalise old and new styles of JSON serialisation - DB: Numcodecs doesn't interoperate well with Zarr-Python, also there's code in Cython which only handful of folks can maintain - MS: There have been great developments in the Zarr ecosystem but things have been moving so fast that I worry it will start proliferate. It's difficult to keep track of all the things - GDAL: https://lists.osgeo.org/pipermail/gdal-dev/2025-April/060414.html - List of EOPF product samples publicly available from the EOPF s3 public bucket: https://eopf-public.s3.sbg.perf.cloud.ovh.net/product.html - https://cpm.pages.eopf.copernicus.eu/eopf-cpm/main/PSFD/4-storage-formats.html - EP: Most of the Jackson Lab data is in V3 sharded effort - EP helped in conversion and Eric Ratamero lead the effort ## 2025-03-19 **Attending:** Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS) **TL;DR:** **Updates:** **Open agenda (add here 👇🏻):** - AR had GSoC questions - https://github.com/zarr-developers/zarr-python/pull/2910 — atleast social media announcement would be good, blog post ++ (more the merrier) - DB: Dtype addition to Zarr-Python (https://github.com/zarr-developers/zarr-python/pull/2874) - JMS: Would the data type be mapped to 1-1? - DB: Currently support NumPy and CuPy datatypes - DB: More tests needed to handle endianess, need to change the API too - JMS: Norman registered new codecs—how are they gonna work in Zarr-Python 3? - DB: We haven't made a final decision on that - ## 2025-02-19 **Attending:** Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Zarr participating in GSoC this year, ideas list: https://github.com/numfocus/gsoc/blob/master/2025/ideas-list.md - ZEP9 Draft published — https://zarr.dev/zeps/draft/ZEP0009.html. - Follow up PRs in zarr-specs: - https://github.com/zarr-developers/zarr-specs/pull/330 - https://github.com/zarr-developers/zarr-specs/pull/331 - Setting up https://zarr.dev/extensions - EP: Jackson lab will be converting the datasets to sharded Zarr V3 arrays - SV: https://2025.pycon.de/talks/ABWHSD/ — speaking at PyCon DE 2025! **Open agenda (add here 👇🏻):** - MS using Pizzarr for their work - GDAL and Pizzarr work for virtual references as well - Has datasets in HDF5 and NetCDF - both has their pecularities - JM: _gives an overview of ZEP9_ - EP: Jackson lab data conversion - JM: Any benefits in performance? - EP: - EP: Sticking with OMERO 5D arrays - MS: Public link for the data - EP: https://images.jax.org/webclient/userdata/?experimenter=-1 (can be added https://zarr.dev/datasets) - JM: could also add https://ome.github.io/ome2024-ngff-challenge/ - JM: Zarrs (Rust implementation) would be useful - EP: Solely using OMERO but could pass the URL to Neuroglancer to view it - EP: Cloudflare is potentially working with OS projects and giving them resonable tier prices - EP: Raw 10TB nbytes - how do you convert it to sharded V3 array? - DB: Using np.mempap might be useful - JM: Can also use Kerchunk - JM: Can use Tensorstore to convert the data as well - DB: Could potentially use Zarr-Python - but is slower 10x slower - JM: https://github.com/LDeakin/zarr_benchmarks - JM: Satra ran into memory issues with Tensorstore: https://github.com/ome/ome2024-ngff-challenge/issues/83 - JM: Best way to shard large arrays - a good GSoC project - JM: https://github.com/asdf-format/asdf-standard - https://github.com/asdf-format/asdf-zarr - EP: Zarr being used in bio space - Folks at the Allen are looking to submit a proposal at SciPy 2025 - JM: Francesc Alted did a nice presentation at SciPy 2023: https://youtu.be/0GX5nDqUUZE?si=WvE6asx5zjtrBcHI ## 2025-02-05 Notes TBA ## 2025-01-22 **Attending:** Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB) **TL;DR:** **Updates:** - Zarr-Python 3 released on January 9th, 2025! - Blog post: https://zarr.dev/blog/zarr-python-3-release/ **Open agenda (add here 👇🏻):** - N5 - JM: There'll be new release to add support for N5 in Zarr-Python 3 - EP: They can leverage sharding and other useful features - Zarr-Python 3 - DB: Gave a presentation on Zarr-Python 3 at Allen Institute - DB: Realised some issues in Zarr-Python 2 when listing groups - JMS: Because ZP 2 listing processes were taking place parallely - DB: In Zarr V2 spec there's nothing says that groups and arrays are different - JMS: Looked at the spec as well as the implementation when working with my implementation - JMS: Added support for ZEP8 in Tensorstore - PR: https://github.com/google/neuroglancer/pull/696 - JM: Too many files, JMS: Includes linting changes and test files - JMS: The path resolution for Zarr V3 in Neuroglancer: https://host/path/to/n5/group/|n5:path/to/array/ would look for array and not go up in the group directory - JM: The searching is mostly top down in OME land but we should work towards more completeness - JMS: Also, planning to add Icechunk support to Neuroglancer - Discussion on URL for Neuroglancer - Deciding the right characters to use - Tricky to decide the right URL ## 2025-01-09 **Attending:** Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard **TL;DR:** **Updates:** - Happy New Year! :clinking_glasses: - Zarr-Python [v3.0.0-rc.1](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.1) and [rc.2](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.2) out now! Full release coming up tomorrow! - Zarr has a wikipedia page now! — https://en.wikipedia.org/wiki/Zarr_(data_format) - A group of Zarr-Python devs are at AMS next week including Joe and Ryan - CFP for SciPy 2025 are open: https://www.scipy2025.scipy.org/ **Open agenda (add here 👇🏻):** - EP: the month wait was good to get other projects like napari up-to-speed - DB: reached out to people using n5 in python. They weren't pinning to `zarr-python<3`. Sent an email. No response. EP anyone? No. Using Zarr. - JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it. - TN: Kerchunk doesn't support zarr-python 3.x (API usage) - without kerchunk: fits & netcdf won't work. - lose access to anything in the future (in-progress HDF4) - JH: requires rethinking of MultiZarrToZarr logic - TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects) - JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec - JM: I'll get the ZSC to respond on the longing issues - DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it) - DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3 - JMS: Planning to add Icechunk support to Tensorstore ## 2024-12-11 **Attending:** Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev - Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman! - A group of Zarr-Python devs are at AGU this week including Joe and Ryan - Zarr-Python V3 release before holidays! - DB has bunch of PRs coming in soon! - Planning to expose to sharding in a user friendly way - WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata **Open agenda (add here 👇🏻):** - Intros w/ favourite places for holidays - Sanket — into Himalayas - Ward - near Colorado - Davis — Italy - Eric - coming to India soon! - Gábor — Canada for Skiing - Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data - DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape? - JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult - JMS: Could also be based on user preference - DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it? - JMS: The non-regular/partial chunks would not compose across shards - DB: I see! The proposal is off the table then! ## 2024-11-27 **Attending:** Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - Zarr-Python V3 release in first week of December - New numcodecs release (includes fixes and improvements) - Check here: https://github.com/zarr-developers/numcodecs/releases/tag/v0.14.1 - Zarrs-Python: - Check: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Announcing.20zarrs-python! **Open agenda (add here 👇🏻):** - DB: OME-NGFF hackathon update: - They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it! - https://github.com/BioImageTools/ome-zarr-models-py - EP: John Bogoviç made good progress on Zarr Java (in the NGFF land) - FIJI being able to open Zarr V3 - DB: `zarr.open()` and `zarr.create()` are confusing - instead we could have `zarr.create_array()` or `zarr.create_group()` to make things clear - DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine - JM: Would be cool if `zarr.open()` could figure out if it's an array or a group - Zarrs-Python - JM: The nomenclature could've been better! A bit confusing. - DB: Major scope is to improve the IO - JM: https://bsky.app/profile/zarr.dev - Follow us! - https://github.com/zarr-developers/zarr-specs/pull/311 - JMS: The only part which requires spec is the URL - JM: I care about the internal directory structure - DB: The folks who I spoke at the OME hackathon they want the drag and drop feature - JM: In ZP land ## 2024-11-13 **Attending:** Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** **Open agenda (add here 👇🏻):** - meetings (Josh) - conversation - zarr-python: going strong (weekly) - ZEP: (not great for Dennis) - one off as necessary - community: combine into ZEP. - or vice versa - office hours: likely to end - :point_right: run a doodle - People - Dennis: not before 10am MST - Jeremy: ZEP meetings are less critical at the moment - Decisions/TODOs - Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones) - Josh: remove ZEP calendar entry - Josh: update on Zulip (anywhere else?) - Davis: was "zarr.json" a mistake? - Josh: good question. benefits were: - only one GET (or HEAD) rather than needing a frequent 404 - non-hidden files with proper file-ending - Davis: true. just have a pattern now where want to iterate and bottom out on arrays - Jeremy: often need to load the json anyway - or storage transformers that are needed - Dennis: preference is to have the directories marked (e.g., in the name) - price is mostly paid with large numbers of groups/arrays - cf. consolidated metadata -- locating all objects before reading them - maybe be able to recognize that by name - Davis: give it its own document? - Josh: like `.zmetadata`; downside is it requires an extra GET but that's ok. - Davis: using `_nc` for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues) - Josh: could additionally gzip the extra file. benchmarking? - Dennis: how big can get the total metadata? (in characters) - Davis: maybe tabular data. - Dennis: in NetCDF, lot of use of groups (1000s) as namespaces. - Davis: Store API - getters take memory type (GPU, CPU, ...) - Josh: good to track (or disallow?) copies - Jeremy: most Stores are CPU, so actively copying for GPU. - Davis: Separate stores as in v2 (regular and chunk) - Davis: Store is simple key/value. Agnostic to Zarr formats. - Is the Store API overloaded? - Davis: On extra files, an extension where sqlite for every group and array. Good for tabular. - Jeremy: sqlite doesn't work for cloud storage. - What stops people from doing it today? - Prototype - Is this icechunk? That's more a Store API. - Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead. - Josh: duckdb? - Davis: see BigStitcher's use of arrays. - Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing. - Davis: GeoParquet has a spatial index - https://github.com/opengeospatial/geoparquet/pull/191 - Theodoros: interested in adopting Zarr - Problem is that we're dealing with really sparse datasets (mass spec imaging). - Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign. - Efficient encoding of a single sample and "plugged into" zarr. - TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels) - JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours. - other step would be full spare support. ton of people have asked for this. but has to be woven throughout. - Davis: tell us what doesn't work for you. "we want to use Zarr but ..." - Josh: https://github.com/GraphBLAS/binsparse-specification - Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array" ## 2024-10-30 **Attending:** Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests - JMS: Back from the parental leave — the baby is doing great! :tada: - Been working on bugs for tensorstore - SV: GeoZarr spec meetings have been updated on the community calendar **Open agenda (add here 👇🏻):** - Frequency of the meetings - DB: No strong feelings - JMS: Less activity, so make sense - GB: Fine by me - DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful - SV: Will open a wide discussion for the community to get everyone thoughts - DB: Zarr V2 arrays using sharded arrays? - JMS: Not simple enough to do that because of overlapping arrays - DB: Zarr V2 codecs can utilise sharding codec - JMS: The JSON metadata is differ for sharding - JMS: Who's the user base? - DB: Someone who's using Zarr V2 and want to use sharding - DB: People might be scared of switching to a new format - DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference - JMS: The idea of community and core codecs is not super impressive! - DB: Would be good to avoid namespacing issues - JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core? - DB: Good question, need to come up with a process for this - JMS: Adding a vendor name could work — value in having a vendor name - Discussions on upcoming possible extensions ## 2024-10-16 **Attending:** Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA) **TL;DR:** **Updates:** - The default branch has been changed back to `main` to prepare for V3 main release - https://github.com/zarr-developers/zarr-python/pull/2335 - Numcodecs 0.13.1 release soon - https://github.com/zarr-developers/numcodecs/pull/592 - VirtualiZarr has a dedicated ZulipChat channel now - https://ossci.zulipchat.com/#narrow/stream/461625-VirtualiZarr - Check VirtualiZarr repo: https://github.com/zarr-developers/VirtualiZarr - New OS project release by Earthmover - https://earthmover.io/blog/icechunk - Transactional storage engine for ND array data on cloud object storage - Zarr-Python V3 updates - Any other updates? **Open agenda (add here 👇🏻):** - Intro w/ favourite food - Sanket - Dumplings - Joe - Burrito - Eric - Donuts - Davis - Ethopian, Mexican and Indian dishes - Michael - RSE at Australian Antartcic Division - Burgers - Ilana - Works at Earthmover - Daniel - NCAR - JH: _starts screen sharing_ - JH: Presents on Earthmover, Arraylake, Icechunk... - JH: _presentation ends_ — time for questions - DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask? - JH: Fetching is done by a different library - we're handling the concurrency better on the IO side - DB: What lessons could be take from this plot that can be applied to Zarr-Python? - JH: Python binding to rust crate needs to be looked at - JH: Doing decompression and IO in a relieved fashion - SV: Does Icechunk works with Zarr V2? - JH: Only with V3 for some parts - but we can change that - EP: Able to leverage Zarr sharding in some way for Icechunk would be great - JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec - JH: Implies sharding in a different manner - DB: How coupled are you with the current Zarr V3 API? - JH: Highly coupled - JH: LDeakin has started filling issues - JH: Can envision a high-level and a low-level store - that's what we build in the rust store - JH: We should ask store to do more, but we should be specific about it - MS: Really interested in Rust implementation - Does Rust part take over the encodings? - JH: No. We haven't implemented all of ZP yet - JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around - SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you - **Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!** - DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays! - JH: Try to get sharded V2 data to work, and let us know! ## 2024-10-02 **Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684 - We had a documentation sprint for Zarr-Python V3 - The doc sprint officially ended on 10/1 evening. The participants have sent PRs to document the `zarr.array` and `zarr.storage` modules. Here are the open PRs: - https://github.com/zarr-developers/zarr-python/pull/2276 - https://github.com/zarr-developers/zarr-python/pull/2279 - https://github.com/zarr-developers/zarr-python/pull/2281 - Zarr-Python V3 team good progress — alpha release every week — V3 main release soon! - Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass - OME Challenge - EP: Was able to convert a big JAX datasets into V3 - JM: Ran into issues and was able to convert them into Zarr-Python V3 issues - More discussion down below - Any other updates? **Open agenda (add here 👇🏻):** - OME Challenge - EP: JAX ran into issues for remote access and it's good to point them out and later rectify that - EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets - EP: Checking if an object exists before a write could cost us $2k! - EP: JZarr is currently being written - DH: Any decisions about deleting objects? - EP: When you check for existing objects, you have the ability to rewrite them - DH: That means you can delete it! - DH: NetCDF implements a recursive delete operation - EP: - DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them - DH: Having consolidated metadata help in rewriting operations - DB: Defining schema and knowing the entire hierarchy has been helpful - DH: We have this in NetCDF - DB: https://github.com/janelia-cellmap/pydantic-zarr/ - Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/ - JM: Will need to read this - DB: Seems good enough and harmless - Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/ - JM: Sharing password has been a challenge - DB: https://github.com/zarr-developers/zarr-specs/pull/312 - JM: Need to merge on https://github.com/zarr-developers/governance/pull/44 - JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes - DH: Diversity defined on the structure of the internal architecture and not programming language implementation ## 2024-09-18 **Attending:** Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct. - Identify missing docs and start creating issues - Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc - Async working via Zoom meetings - MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation - Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27 - Updates from Zarr-Python V3 effort / OME challenge - DB: Getting through issues from V2 and V3 compatibility - Tom and Deepak taking care of Dask issues - Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history - Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined - Main release by the end of year **Open agenda (add here 👇🏻):** - DB: https://github.com/zarr-developers/zarr-python/issues/2170 - The way of defining sharding codec is not intuitive and can be improved - https://github.com/zarr-developers/zarr-python/pull/2169 - proposed solution - DB: Will update this PR and make it ready for review - DB: All stores should have cache: https://github.com/zarr-developers/zarr-python/issues/1500 - EP: Some stores like S3 would benefit from this - EP: Compression and decompression on cache is expensive - DB: We can default it to 0 turn it on accordingly - DB: FSSpec have a default cache enabled - we can look into it - EP: Will try to join the Zarr-Python core devs meetings on Friday - SV: Early morning for west coast - EP: Can make it! - SV: Early morning stuff: presented on Zarr V3 at EuroBioc: https://eurobioc2024.bioconductor.org/abstracts/paper-bioc4/ ## 2024-09-04 **Attending:** Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP) **TL;DR:** **Updates:** **Meeting Minutes**: * Consolidated v2 - DH: annex for v2, "officially"/loose recognized (would be a **great** favor) - JM: and if we put it in v3 to say, "this is the former version"? - DH: add a forward pointer - EP: how many edge cases? * Deprecation (DH) - JM: No plan to deprecate v2 format (format vs. library) - DH: presumably people will use the new library, that will be the "test" of the consolidated metadata. * Bugs between implementations (JM) - DH: list of those bugs? JM: no, bug good idea. - DH: available data? - JM: yes! see https://github.com/ome/ome2024-ngff-challenge?tab=readme-ov-file#challenge-overview - EP: billions of objects isn't fun. * Consolidated v3 - JM: pushed recently at zarr-python meeting for a spec (and with more design) - DH: as soon as it's in the format, then it's not just caching - metadata caching prevents multiple reads - DH: caching -> "big set of objects, keeping subset in memory" - JM: can be re-created? "index"? - DH: regardless, have to specify construction any block of JSON - could say a subtree looks like some other pre-defined block - JM: parameterized MetadataLoader (or "MetadataDriver") - DH: that's what I was going to implement anyway - DH: like StorageDrivers (not caching) -- "VirtualAccess" - DH: but same wave length - JM: would like to offload some JSON (speed vs size) - DH: should that API do more than read/write the JSON? - should it interpret it? - "give me key X out of this dictionary" - JM: like mongodb or jq queries - DH: walk binary without needing to convert down to JSON - EP: this gets back to N5 as an API rather than a format - logical versus storage - DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP) - essential to have some virtual object/class - hammer applied to everything ("common API") * EP: https://github.com/zarr-developers/zarr_implementations - why didn't that find the codec issues? - JM: no v3! - EP: hackathon? - EP: need mozilla support for HTML things - DH: agreed. hugely important - JM: As a github action? ## 2024-08-21 **Attending:** Eric Perlman (EP) and Sanket Verma (SV) **TL;DR:** **Updates:** - Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093 - Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence - Welcome David Stansby as core dev of Zarr-Python! 🎉 - https://github.com/zarr-developers/zarr-python/pull/2071 - Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see [here](https://github.com/zarr-developers/zarr-python/commits/v3/?since=2024-08-08&until=2024-08-21) **Open agenda (add here 👇🏻):** - EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore - Repo: https://github.com/ome/ome2024-ngff-challenge - EP: Jackson lab will be utilising the docker for converting V2 to V3 data created by EP - SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places! ## 2024-08-07 **Attending:** Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN) **TL;DR:** **Updates:** - Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy - Zarr-Python updates - DB: There have been some movements in ZP - Discussion around the new API: https://github.com/zarr-developers/zarr-python/discussions/2052 - Chunks, shards, and other terminology - need to decide what to use - Getting more active core-devs for ZP will help in having lively discussion - TN: Applying for money to work on VirtualiZarr / Zarr upstream - TN: Development Seed is applying for the NASA grant - Julia Signell would work on it - DB: Non-zero origin for Zarr arrays would help - Related issue: https://github.com/zarr-developers/zarr-specs/issues/122 **Open agenda (add here 👇🏻):** - EP: Discussion about cycling, library and picking up books from library and reading them in the park! :bicyclist: :books: - EP/DB: Tangent on write directly to sharding from microscopes... - TN: Various ways of storing the large metadata for a huge Zarr array - TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed - TN: Seems important because it has come up every time there's a discussion about scalability - DB: Store the aggregrated information in the header of the chunk - SV: How doe BSON scale as compared to JSON? - TN: We would still need to have a pointer to the BSON in JSON - DB: How do we introduce it to the Zarr V3 Spec? - TN: Maybe a convention - TN: Zarr is close to be a superformat! - DB: We could also increment the spec to a major version to include the change - TN: Discussions on if its possible for Zarr to be a _superformat_! - TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that - DB: A fundamental Zarr array could be a set of small Zarr arrays - TN: VirtualiZarr basically does that - TN: _starts screen sharing_ - DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner - TN: Hopefully I'd be able to work on this after VirtualiZarr - Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages ## 2024-07-24 **Attending:** Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN) **TL;DR:** **Updates:** - SciPy 2024 was great! 🎉 - DB: Zarr-Python updates - Sharding codec is pickleable - Decision need to made about array API - How sharding codec should look like to the user? - DB: Easy to find if your array is sharded - JM: Partial reading this in Zarr V2 - TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr - DB: All the arrays should have sharding configuration - JM: Working with Tensorstore, the order of codecs didn't matter --> read_chunks / write_chunks - DB: some weirdness when it comes to different backends when uncompressed - New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan! - New codec added - Pcodec - JM: Conda is unhappy **Open agenda (add here 👇🏻):** - Intros - SV: Yosemite National Park - JM: National Seashore in Florida - Gulf of Mexico - FC: Jackson Lab working in ML - Saccida National Park - EP: Zayn National Park - WF: Yellowstone National Park - DB: Yellowstone National Park - TN: Want to open issues on bunch of ideas - 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this - Can use Zarr to open NetCDF directly - 2. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays - Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081 - DB: Could be handled and should be a priority now - TN: - JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct! - JM: Talked to 2 Napari maintainers - on a problem of chunking - TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that - JM: Finding more people with similar interests would help us provide more engineering power - DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase - TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices - DB: Slicing and concatenation are duals - if you have both its complete - DB: - JM: Query optimisation can be tweaked as we move forward - TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader - JM: What does user do with the plan? Do they do something with it? - TN: Array API folks has deliberately made arrays lazy - GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041 - GitHub and Cirun sounds good and easy to setup - Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF - NF has money reserved for projects in the infrastructure committee for similar costs - JM: Good to have it! - SV: Need to get it sooner that later - Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F - JM: My poster was cited multiple times in the last few weeks - JM: JOSS is a potential venue - IETF is more work - TN: Submitting to a computing journal - W3C, IEEE, etc. - TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148 - JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html - **TABLED** - Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages ## 2024-07-10 **Attending:** Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC) **Updates:** - SciPy! :tada: - Josh: testing zarr v3 - issue for each problem? Davis: sure - Davis: to be fixed: - no validation of fill value - multiple bugs with sharding: 1d - Josh: missing "attributes" - Josh: but neuroglancer working? - Davis: not for all static file servers. need PR. - Davis: various forks. Josh: plugins? Davis: tough - or: neuroglancer as a component that can be embedded - Janelia NG is a React component. - "Visualization is tough." - Motion for food :knife_fork_plate: Seconded. ## 2024-06-26 **Attending:** Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB) **TL;DR:** **Updates:** - Zarr-Python 3.0.0a0 out - https://pypi.org/project/zarr/3.0.0a0/ - Good momentum and lots of things happening with ZP-V3 - aiming for mid July release - SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466 - R users at bio-conductor looking to develop bindings for ZP-V3 - New blog post: https://zarr.dev/blog/nasa-power-and-zarr/ - ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819 - https://dynamical.org/ - making weather data easy and accessbile to work with - Check: https://dynamical.org/about/ - Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV **Open agenda (add here 👇🏻):** - BP: Will be talking about how Zarr is utilised at NASA! - _starts screen sharing and presenting_ - BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - **not representing all the data centres** - BP: Lot of people are coming into Zarr from the SMD (Science mission directorates) - BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management - BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use - BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats - BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB! - BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse! - BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range - BP: Over at NASA, we do many Zarr stuff... - Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS) - Giovanni in cloud - duplicates Zarr (variable based) - Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data - ZEP0005 - Brianna is leading the GeoZarr work - VEDA - no. of things Zarr/STAC related going on in VEDA - TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?) - BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes! - TN: From the slides - Xarray can take care of some of the stuff that Giovanni does - TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake - BP: LakeFS is OS ArrayLake - no vendor lock-in - SV: What does Giovanni actually do when you say, ‘it grooms metadata’? - BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services - BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata - TN: Can we do the flipping with Zarr/VirtualiZarr? - DB: If you flip at the store level - you'd need to find out the how deep you'd need to go - BP: Will try to make time standard across the datasets - BP: https://github.com/briannapagan/quirky-data-checker - BP: _from the Zoom chat_ - Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices - Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True - Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data - ZEP 0005: Zarr accumulation extension for optimizing data analysis - Looking into a GIS service for zarr stores - POWER https://power.larc.nasa.gov/data-access-viewer/ - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html - https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7 - EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful - DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document - DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature - EP: The bio and geo world have parallel tracks and working in silos - EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363 - DB: The challenge doesn't seems interesting to me! - convering `JSON`s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem - EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr