Zarr
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
4
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- tags: zarr, Meeting --- # Zarr Bi-weekly Community Calls ### **Check out the website for previous meeting notes and other information: https://zarr.dev/community-calls/** Joining instructions: [https://zoom.us/j/300670033 (password: 558943)](https://zoom.us/j/300670033?pwd=OFhjV0FHQmhHK2FYbGFRVnBPMVNJdz09#success) GitHub repo: https://github.com/zarr-developers/community-calls Previous notes: https://j.mp/zarr-community-1 ## 2025-09-03 **Attending:** Eric Perlman (EP), Justus Magin (JMa) * JMa: * sparse container for faster slicing / concatenation: https://github.com/keewis/sparse-indexing-container/ * fixed a roundtrip bug on BytesCodec: https://github.com/zarr-developers/zarr-python/pull/3417 * causes a bunch of failing tests in ArrayV3Metadata tests * might be visible from the public API? ## 2025-08-20 **Attending:** Lachlan Deakin (LD), Ryan Abernathey (RA), Jeremy Maitin-Shepard (JMS), Davis Bennett (DB), Max Jones (MJ), Josh Moore (JM) Special Session on [ZEP10](https://github.com/zarr-developers/zeps/pull/67) - Lachlan presents https://zarrs.dev/slides/zarr_generic_extensions_20250820 - LD: Like the "extensions" *object* a lot - JMS: from [issue](https://github.com/zarr-developers/zeps/pull/67#issuecomment-3207787301) we can use syntax like a `_` prefix to mark things as optional - DB: roughly 10 verbs create/remove array/group, read/write chunks, metadata (subsubgroups are a separate question) - what are we designing `must_understand` for, to tune the level granularity? - Ryan - Agenda (in 45 minutes) - what ZEP10 should look like? - alignment on `must_understand` - extension field - "registered attributes" - use cases (what do we want) - consolidated metadata - groups declaring members - subarrays (array under another array; relation to a parent grid) - multiscales - differently chunked copies (for differing access patterns) - chunk aliases - declaring separate source for the attributes - icechunk (group storage transformer?) - encrypted or DB-stored attributes - discussion - LD: consolidated metadata can benefit from constraints - DB: zarr aware routine that copies an existing hierarchy to a new store; direct copying should be allowed - JMS: if there's a relative path - DB: the verbs - create_array - create_group - remove_array - remove_group - update_array_metadata - update_group_metadata - read_chunk - write_chunk - read_array_metadata - read_group_metadata - RA: unevenly chunked array (chunk grid) - DB: extensibility of the chunk grid is narrowly scoped. doesn't interact with anything else - DB: top-level are unscoped - LD: lets us look at new stuff (essentially 4.0) - RA: chunk statistics (or as anotehr array within a group) - all about updates, and who is responsible - new opinion while developing icechunk. zarr is data at rest, spec only partially deals with consistency. - JM: building up a language of constraints to unaware implementations - DB: error is giving someone mutable access (cf. chunk statistic) - JMS: not untrusted users, but missing a plugin or an old version. failing is useful - however, tricky, since you don't want to necessarily update the multiscales and the statistics right way (i.e. implementation might not be able to maintain the ) - theoretical consistent maintenance might not be feasible - DB: data model that has relationship between nodes and an API that doesn't, then it can be difficult. - RA: everything is out the window during the non-zero window of time - only solution is a transactional mechanism - can declare what a dataset at rest looks like (up to implementation & store) - RA: propose (semi-controversial) - define all the above as attributes - LD: fine until we hit something that can't deal with a silent failure - JMS: example is forcing consolidated metadata to get updated - fill_array - MJ: consider not being able to handle cm - formalize the relationship between zarr and icechunk - don't over index on cm - JM: on the list of things to avoid, false reads - JMS: move to registered attributes? - RA: include `must_understand` there? - LD: no, because in 3.1 there is nothing about that. - JMS: preventing writing is not that big of a deal, limited. reading is a bigger problem. - RA: support for pushing functionality to attributes for a higher-level framework. may lead to data structure that requires synchronization ## 2025-08-20 **Attending:** Josh Moore (JMo), Justus Magin (JMa), Davis Bennett (DB), Jeremy Maitin-Shepard (JMS), Eric Perlman (EP), Gábor Kovács (GK) - Rome - DB: People signing up - Limit on the developer days - Discussing limit on the adopter days - Sparse - JMa: something that works :tada: (with modifications to zarr) - changed Zarr to take the prototype from the codec - array-to-array then array-to-bytes codec to see which NDBuffer it should take - sparse codec needs a specific buffer - not attached to it. (could use a global option) - modified zarr to pass along the chunk size when creating a buffer - not fast; spends most of its time indexing into the sparse array (CPU bound) - DB: fixed with a faster sparse library? - JMa: yes, and being smarter with number of operations - DB: which type of memory is an unsolved issue; don't allow codec to publicize which memory - i.e. have to handle an arbitrary NDBuffer - cf. every store method needs a buffer to write into - wider codec pipeline discussion - JMS: tensorstore doesn't do GPU or sparse but ... - think of it more as a top-down choice - rather than depending on the metadata array, read op returns a type of buffer - instead have an explicit API for reading sparse (type annotations can be stricter) - DB: Zarr doesn't support any operation where dense vs. sparse matters - people currently expect a dense numpy-like array - JMS: modifying an existing array is the expensive bit - JMa: screen shares with the zarr-python changes needed ![image](https://hackmd.io/_uploads/rJ41Z97Klg.png) ![image](https://hackmd.io/_uploads/BJ6kbc7Klg.png) - JMS: similar to GPU, you may want to do more in parallel - DB: getting the parallel GPU story working is difficult - CUDA has a different model of concurrency - want a GPU specific pipeline that's smart about allocating memory and then batching with respect to that - codec pipeline is the target for a lot of these optimizations - cf. scverse/rust speed up is from having a smarter pipeline class - JMo: wonder if we're missing an abstraction, "Operation", etc. - DB: would most like to have Zarr arrays be composed of other Zarr arrays. - chunks could negotiate that they can only go on certain devices - everything is just a collection of chunks - JMS: tensorstore thinking of an array as a set of operations - can be decomposed into smaller arrays - proposed chunkgrid as just being a top-level of a codec stack - you could imagine concatentations of other arrays - currently awkward. (codecs below the level of chunking) - DB: what's the function signature of a codec. not currently defined. - so e.g. we don't track endianess - need to be formal about what an array is - then it's clear what information the codec has access to - JMS: do some form of propagation up and down the stack - i.e. implementation specific (GPU, endianness, ...) - DB: perhaps then each implementation needs to know in its own language - JMS: what codecs attributes are dynamic and which are fixed? - e.g. datatype and shape fixed. DB: there are dtype - JMS: given once you have the config, not chunk to chunk - JMS: in ts there's resolution where each codec says (forward & backward passes) given X I put out Y (dtype, memory layout, ...) - DB: certainly not multiple passes in zarr-python. Added TODOs. Have a forward pass and rewrite the codecs dynamically based on the previous' advertised output - JMo: listening to the conversation, I wonder if metadata needs specifying - JMS: you need to do a resolution process, necessary information is there - we specify the overall output as dtype, that gets propagated to the lowest level codec - you could imagine the other way around, you just save what the lowest level stores and then calculate the top-level data type - comes into play in imagecodecs. lot of complexity in how they store thing (RGB, channels, etc.; color spaces). if you want to read that in, the zarr data type for the array is a bit arbitrary - same for endian conversation, could be implicitly propagated back up - JMa: issue of storing metadata about the array can only be a codec property (e.g., the kind of the sparse array) ## 2025-08-06 **Attending:** Josh Moore (JMa), Eric Perlman (EP), Justus Magin (JMa), Davis Bennett (DB), Gábor Kovács (GK), Jeremy Maitin-Shepard (JMS) - Find the meeting notes under https://zarr.dev/community-calls/ - Sparse (JMa) - Trying to how to customize NDBuffer (which assumes numpy array representation) - Possibly figured it out - Creating a NDBuffer, have to pass the chunks to make a chunk grid - Assignment to sparse array isn't a good idea - Have an object-type numpy array to represent the grid - Then concatenate only when you want the data - Need to know the chunking for mapping the slicing - some functionality in zarr-python that can be used - not all changes pushed to zarr-sparse (lots of experimentation) - performance metrics yet - next part of the problem: bytes to compressed, then uncompress and put into sparse arrays - so not yet - nested codec? probably - there will some rules (i.e. sparse must be first) - anything to get the default codec? - future of the repo? pip installable? - haven't thought about it yet - options: keep it separate or merge into zarr-python - open to discussion - JMo: open question of having a non-dense API (AMR, APR, multiscales, etc.) - DB: array API? JMa: not all of them. - DB: zarr-python currently takes all chunks and packs them into a numpy array. a mistake. - JMa: *summary on NDBuffer from above* - DB: separate sparse array per chunk? (Big change in business operation) - JMa: with NDBuffer can control what is contained - DB: don't want an object-type - JMa: already translating between arrays and chunks - DB: see indexing module (copied from zarr-python 2). rooms for improvement. - Draft PRs welcome. - ZEP10: - Josh owes Jeremy a - JMS: leave purely metadata in a separate realm - DB: Examples? - JMS: fill value as json array the broadcasts. motivates the PR. - DB: Breaking change to the array model "soft zarr 4" - JMo: difference between 3.x and 4.x - DB: inviting fragmentation - don't see how these leads to fragmentation - stac doesn't describe behavior; chunk encoding describes a function. makes zarr function. - JMS: adding a feature that *requires* implementation updates - burden of reducing that is on the implementations - DB: difficult to motivate the change; difficult it's an application. - JMS: way to play/evolve the spec. you need a way to implement it (create your keys) that doesn't cause a problem - "URL" following the spec today. - doesn't seem like that big of an idea either way - another example: inline array - DB: take for granted that it is for an implementation that only some will be able to read - alternative would be just to do that today - JMS: possible examples: fill_value_array, offset, consolidated metadata - DB: differentiating between metadata writing and chunk writing - DB: don't make it hypothetical, more focused on solving the engineering problem - EP: you set the bar really hard. have been bit in the foot on default fill values - bar of nice to have is fine. (otherwise is a core feature) - extension is the place for nice to have - DB: v3 spec contained contradictions so making the bar high - closing - DB: suggest -- build everything about consolidated metadata and these others things - JMS: seemed like most people are for metadata-only extensions - danger of how implementations are implemented if attributes aren't registerable - same time or handling attribute registration first? ## 2025-07-23 **Attending:** Ward Fisher (WF), Eric Perlman (EP), Davis Bennett (DB) **Meeting Minutes:** - DB: Discussion and quick demo of `uv` for managing python environments - uv headers are now used in zarr-python issue template - uv powers the current zarr2-tests in zarr3-python tests ## 2025-07-09 ## 2025-06-25 **Attending:** Ward Fisher (WF), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB) **Meeting Minutes:** - DB: https://github.com/zarr-developers/zarr-python/pull/2874 — got merged! - Next steps: - Fix the codecs situation — unify them - Stop depending on numcodecs - Variable size chunks - Other things: - Indexing issues - NVIDIA folks are interested in how GPU can be added into Zarr-Python codebase - SV: New data type addition is also included in DB's PR - DB: Also been working on the new data types stuff - WF: Unidata is back in operation. We are also collaborating with DKRZ (German Climate Computing Center) on some proposed `ncZarr` work they would like to see (consolidated metadata, amongst other things). - There have been questions from DKRZ devs re: functionality in various Zarr Python packages that deviate from the specification (v2, specifically). - DB: Having code samples in the Zarr-Python repository, using PEP723 - DB: Found a funny bug in Zarr-Python 2 ## 2025-06-11 **Attending:** Josh Moore (JM), Gábor Kovács (GK) - GK: any obvious issues to get involved on. - JM: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/zarr.2Eload.20deletes.20data/with/522616083 - Other suggestions welcome. - People still interested in the meeting? New time? Monthly? ## 2025-05-28 **Attending:** Josh Moore (JM) - No show. ## 2025-04-16 **Attending:** Josh Moore (JMo), Eric Perlman (EP), Justus Magin (JMa), Gábor Kovács (GK) **TL;DR:** **Updates:** **Meeting Minutes:** - JMa: "signed URLs" - EP: looking at them for raw data - e.g. HTTP parameters NOT the AWS thing - JMo: bug? JMa: Zarr is oversimplifying the use of paths - Planetary computer pulls an access token, appends to any URL (Zarr or Geo-TIFF or ...) - EP: would try hacking it in in a little fsspec wrapper - JMa: use-case -- trying to access something S3-like that needs parameters - JMo: does obstore "do the right thing"? - JMa: Kyle has something that will work for planetary computer, but that's just one endpoint - EP: with shards can almost use *proper* signed URLs - JMo: will likely require Virtualizarr - also: https://earthmover.io/blog/announcing-flux - JMa: sparse arrays - looked at binsparse - which decomposes into one dimensional arrays - another level of nesting - JMo: good format but need library support like `.chunks`. need to be aware of the metadata. - JMa: encoding the sparseness **per chunk**? ## 2025-04-02 **Attending:** Davis Bennett (DB), Sanket Verma (SV), Eric Perlman (EP), Jeremy Maitin-Shepard (JMS), Michael Sumner (MS) **TL;DR:** **Updates:** - Version policy update: https://github.com/zarr-developers/zarr-python/pull/2910 - Blog post PR: https://github.com/zarr-developers/blog/pull/67 - Obstore-base store implementation PR has landed: https://github.com/zarr-developers/zarr-python/pull/1661 - Zarr-Python V2 Support release [2.18.5](https://github.com/zarr-developers/zarr-python/releases/tag/v2.18.5) took place last week! Thanks, David! **Meeting Minutes:** - DB: Working to add support for V3 and Tensorstore in [Pydantic Zarr](https://github.com/zarr-developers/pydantic-zarr) - Also to add group support in Pydantic for Tensorstore - Appreciate the results by reading and writing in Tensorstore, i.e. returns an object - DB: _elaborates on the version policy change_ - DB: [Effver](https://jacobtomlinson.dev/effver/)—mostly a function of efforts put in by the users - JMS: - DB: https://github.com/zarr-developers/numcodecs/issues/686—formalise old and new styles of JSON serialisation - DB: Numcodecs doesn't interoperate well with Zarr-Python, also there's code in Cython which only handful of folks can maintain - MS: There have been great developments in the Zarr ecosystem but things have been moving so fast that I worry it will start proliferate. It's difficult to keep track of all the things - GDAL: https://lists.osgeo.org/pipermail/gdal-dev/2025-April/060414.html - List of EOPF product samples publicly available from the EOPF s3 public bucket: https://eopf-public.s3.sbg.perf.cloud.ovh.net/product.html - https://cpm.pages.eopf.copernicus.eu/eopf-cpm/main/PSFD/4-storage-formats.html - EP: Most of the Jackson Lab data is in V3 sharded effort - EP helped in conversion and Eric Ratamero lead the effort ## 2025-03-19 **Attending:** Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS) **TL;DR:** **Updates:** **Open agenda (add here 👇🏻):** - AR had GSoC questions - https://github.com/zarr-developers/zarr-python/pull/2910 — atleast social media announcement would be good, blog post ++ (more the merrier) - DB: Dtype addition to Zarr-Python (https://github.com/zarr-developers/zarr-python/pull/2874) - JMS: Would the data type be mapped to 1-1? - DB: Currently support NumPy and CuPy datatypes - DB: More tests needed to handle endianess, need to change the API too - JMS: Norman registered new codecs—how are they gonna work in Zarr-Python 3? - DB: We haven't made a final decision on that - ## 2025-02-19 **Attending:** Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Zarr participating in GSoC this year, ideas list: https://github.com/numfocus/gsoc/blob/master/2025/ideas-list.md - ZEP9 Draft published — https://zarr.dev/zeps/draft/ZEP0009.html. - Follow up PRs in zarr-specs: - https://github.com/zarr-developers/zarr-specs/pull/330 - https://github.com/zarr-developers/zarr-specs/pull/331 - Setting up https://zarr.dev/extensions - EP: Jackson lab will be converting the datasets to sharded Zarr V3 arrays - SV: https://2025.pycon.de/talks/ABWHSD/ — speaking at PyCon DE 2025! **Open agenda (add here 👇🏻):** - MS using Pizzarr for their work - GDAL and Pizzarr work for virtual references as well - Has datasets in HDF5 and NetCDF - both has their pecularities - JM: _gives an overview of ZEP9_ - EP: Jackson lab data conversion - JM: Any benefits in performance? - EP: - EP: Sticking with OMERO 5D arrays - MS: Public link for the data - EP: https://images.jax.org/webclient/userdata/?experimenter=-1 (can be added https://zarr.dev/datasets) - JM: could also add https://ome.github.io/ome2024-ngff-challenge/ - JM: Zarrs (Rust implementation) would be useful - EP: Solely using OMERO but could pass the URL to Neuroglancer to view it - EP: Cloudflare is potentially working with OS projects and giving them resonable tier prices - EP: Raw 10TB nbytes - how do you convert it to sharded V3 array? - DB: Using np.mempap might be useful - JM: Can also use Kerchunk - JM: Can use Tensorstore to convert the data as well - DB: Could potentially use Zarr-Python - but is slower 10x slower - JM: https://github.com/LDeakin/zarr_benchmarks - JM: Satra ran into memory issues with Tensorstore: https://github.com/ome/ome2024-ngff-challenge/issues/83 - JM: Best way to shard large arrays - a good GSoC project - JM: https://github.com/asdf-format/asdf-standard - https://github.com/asdf-format/asdf-zarr - EP: Zarr being used in bio space - Folks at the Allen are looking to submit a proposal at SciPy 2025 - JM: Francesc Alted did a nice presentation at SciPy 2023: https://youtu.be/0GX5nDqUUZE?si=WvE6asx5zjtrBcHI ## 2025-02-05 Notes TBA ## 2025-01-22 **Attending:** Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB) **TL;DR:** **Updates:** - Zarr-Python 3 released on January 9th, 2025! - Blog post: https://zarr.dev/blog/zarr-python-3-release/ **Open agenda (add here 👇🏻):** - N5 - JM: There'll be new release to add support for N5 in Zarr-Python 3 - EP: They can leverage sharding and other useful features - Zarr-Python 3 - DB: Gave a presentation on Zarr-Python 3 at Allen Institute - DB: Realised some issues in Zarr-Python 2 when listing groups - JMS: Because ZP 2 listing processes were taking place parallely - DB: In Zarr V2 spec there's nothing says that groups and arrays are different - JMS: Looked at the spec as well as the implementation when working with my implementation - JMS: Added support for ZEP8 in Tensorstore - PR: https://github.com/google/neuroglancer/pull/696 - JM: Too many files, JMS: Includes linting changes and test files - JMS: The path resolution for Zarr V3 in Neuroglancer: https://host/path/to/n5/group/|n5:path/to/array/ would look for array and not go up in the group directory - JM: The searching is mostly top down in OME land but we should work towards more completeness - JMS: Also, planning to add Icechunk support to Neuroglancer - Discussion on URL for Neuroglancer - Deciding the right characters to use - Tricky to decide the right URL ## 2025-01-09 **Attending:** Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard **TL;DR:** **Updates:** - Happy New Year! :clinking_glasses: - Zarr-Python [v3.0.0-rc.1](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.1) and [rc.2](https://github.com/zarr-developers/zarr-python/releases/tag/v3.0.0-rc.2) out now! Full release coming up tomorrow! - Zarr has a wikipedia page now! — https://en.wikipedia.org/wiki/Zarr_(data_format) - A group of Zarr-Python devs are at AMS next week including Joe and Ryan - CFP for SciPy 2025 are open: https://www.scipy2025.scipy.org/ **Open agenda (add here 👇🏻):** - EP: the month wait was good to get other projects like napari up-to-speed - DB: reached out to people using n5 in python. They weren't pinning to `zarr-python<3`. Sent an email. No response. EP anyone? No. Using Zarr. - JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it. - TN: Kerchunk doesn't support zarr-python 3.x (API usage) - without kerchunk: fits & netcdf won't work. - lose access to anything in the future (in-progress HDF4) - JH: requires rethinking of MultiZarrToZarr logic - TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects) - JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec - JM: I'll get the ZSC to respond on the longing issues - DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it) - DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3 - JMS: Planning to add Icechunk support to Tensorstore ## 2024-12-11 **Attending:** Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev - Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman! - A group of Zarr-Python devs are at AGU this week including Joe and Ryan - Zarr-Python V3 release before holidays! - DB has bunch of PRs coming in soon! - Planning to expose to sharding in a user friendly way - WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata **Open agenda (add here 👇🏻):** - Intros w/ favourite places for holidays - Sanket — into Himalayas - Ward - near Colorado - Davis — Italy - Eric - coming to India soon! - Gábor — Canada for Skiing - Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data - DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape? - JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult - JMS: Could also be based on user preference - DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it? - JMS: The non-regular/partial chunks would not compose across shards - DB: I see! The proposal is off the table then! ## 2024-11-27 **Attending:** Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - Zarr-Python V3 release in first week of December - New numcodecs release (includes fixes and improvements) - Check here: https://github.com/zarr-developers/numcodecs/releases/tag/v0.14.1 - Zarrs-Python: - Check: https://ossci.zulipchat.com/#narrow/channel/423692-Zarr/topic/Announcing.20zarrs-python! **Open agenda (add here 👇🏻):** - DB: OME-NGFF hackathon update: - They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it! - https://github.com/BioImageTools/ome-zarr-models-py - EP: John Bogoviç made good progress on Zarr Java (in the NGFF land) - FIJI being able to open Zarr V3 - DB: `zarr.open()` and `zarr.create()` are confusing - instead we could have `zarr.create_array()` or `zarr.create_group()` to make things clear - DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine - JM: Would be cool if `zarr.open()` could figure out if it's an array or a group - Zarrs-Python - JM: The nomenclature could've been better! A bit confusing. - DB: Major scope is to improve the IO - JM: https://bsky.app/profile/zarr.dev - Follow us! - https://github.com/zarr-developers/zarr-specs/pull/311 - JMS: The only part which requires spec is the URL - JM: I care about the internal directory structure - DB: The folks who I spoke at the OME hackathon they want the drag and drop feature - JM: In ZP land ## 2024-11-13 **Attending:** Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** **Open agenda (add here 👇🏻):** - meetings (Josh) - conversation - zarr-python: going strong (weekly) - ZEP: (not great for Dennis) - one off as necessary - community: combine into ZEP. - or vice versa - office hours: likely to end - :point_right: run a doodle - People - Dennis: not before 10am MST - Jeremy: ZEP meetings are less critical at the moment - Decisions/TODOs - Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones) - Josh: remove ZEP calendar entry - Josh: update on Zulip (anywhere else?) - Davis: was "zarr.json" a mistake? - Josh: good question. benefits were: - only one GET (or HEAD) rather than needing a frequent 404 - non-hidden files with proper file-ending - Davis: true. just have a pattern now where want to iterate and bottom out on arrays - Jeremy: often need to load the json anyway - or storage transformers that are needed - Dennis: preference is to have the directories marked (e.g., in the name) - price is mostly paid with large numbers of groups/arrays - cf. consolidated metadata -- locating all objects before reading them - maybe be able to recognize that by name - Davis: give it its own document? - Josh: like `.zmetadata`; downside is it requires an extra GET but that's ok. - Davis: using `_nc` for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues) - Josh: could additionally gzip the extra file. benchmarking? - Dennis: how big can get the total metadata? (in characters) - Davis: maybe tabular data. - Dennis: in NetCDF, lot of use of groups (1000s) as namespaces. - Davis: Store API - getters take memory type (GPU, CPU, ...) - Josh: good to track (or disallow?) copies - Jeremy: most Stores are CPU, so actively copying for GPU. - Davis: Separate stores as in v2 (regular and chunk) - Davis: Store is simple key/value. Agnostic to Zarr formats. - Is the Store API overloaded? - Davis: On extra files, an extension where sqlite for every group and array. Good for tabular. - Jeremy: sqlite doesn't work for cloud storage. - What stops people from doing it today? - Prototype - Is this icechunk? That's more a Store API. - Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead. - Josh: duckdb? - Davis: see BigStitcher's use of arrays. - Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing. - Davis: GeoParquet has a spatial index - https://github.com/opengeospatial/geoparquet/pull/191 - Theodoros: interested in adopting Zarr - Problem is that we're dealing with really sparse datasets (mass spec imaging). - Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign. - Efficient encoding of a single sample and "plugged into" zarr. - TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels) - JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours. - other step would be full spare support. ton of people have asked for this. but has to be woven throughout. - Davis: tell us what doesn't work for you. "we want to use Zarr but ..." - Josh: https://github.com/GraphBLAS/binsparse-specification - Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array" ## 2024-10-30 **Attending:** Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS) **TL;DR:** **Updates:** - DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests - JMS: Back from the parental leave — the baby is doing great! :tada: - Been working on bugs for tensorstore - SV: GeoZarr spec meetings have been updated on the community calendar **Open agenda (add here 👇🏻):** - Frequency of the meetings - DB: No strong feelings - JMS: Less activity, so make sense - GB: Fine by me - DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful - SV: Will open a wide discussion for the community to get everyone thoughts - DB: Zarr V2 arrays using sharded arrays? - JMS: Not simple enough to do that because of overlapping arrays - DB: Zarr V2 codecs can utilise sharding codec - JMS: The JSON metadata is differ for sharding - JMS: Who's the user base? - DB: Someone who's using Zarr V2 and want to use sharding - DB: People might be scared of switching to a new format - DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference - JMS: The idea of community and core codecs is not super impressive! - DB: Would be good to avoid namespacing issues - JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core? - DB: Good question, need to come up with a process for this - JMS: Adding a vendor name could work — value in having a vendor name - Discussions on upcoming possible extensions ## 2024-10-16 **Attending:** Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA) **TL;DR:** **Updates:** - The default branch has been changed back to `main` to prepare for V3 main release - https://github.com/zarr-developers/zarr-python/pull/2335 - Numcodecs 0.13.1 release soon - https://github.com/zarr-developers/numcodecs/pull/592 - VirtualiZarr has a dedicated ZulipChat channel now - https://ossci.zulipchat.com/#narrow/stream/461625-VirtualiZarr - Check VirtualiZarr repo: https://github.com/zarr-developers/VirtualiZarr - New OS project release by Earthmover - https://earthmover.io/blog/icechunk - Transactional storage engine for ND array data on cloud object storage - Zarr-Python V3 updates - Any other updates? **Open agenda (add here 👇🏻):** - Intro w/ favourite food - Sanket - Dumplings - Joe - Burrito - Eric - Donuts - Davis - Ethopian, Mexican and Indian dishes - Michael - RSE at Australian Antartcic Division - Burgers - Ilana - Works at Earthmover - Daniel - NCAR - JH: _starts screen sharing_ - JH: Presents on Earthmover, Arraylake, Icechunk... - JH: _presentation ends_ — time for questions - DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask? - JH: Fetching is done by a different library - we're handling the concurrency better on the IO side - DB: What lessons could be take from this plot that can be applied to Zarr-Python? - JH: Python binding to rust crate needs to be looked at - JH: Doing decompression and IO in a relieved fashion - SV: Does Icechunk works with Zarr V2? - JH: Only with V3 for some parts - but we can change that - EP: Able to leverage Zarr sharding in some way for Icechunk would be great - JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec - JH: Implies sharding in a different manner - DB: How coupled are you with the current Zarr V3 API? - JH: Highly coupled - JH: LDeakin has started filling issues - JH: Can envision a high-level and a low-level store - that's what we build in the rust store - JH: We should ask store to do more, but we should be specific about it - MS: Really interested in Rust implementation - Does Rust part take over the encodings? - JH: No. We haven't implemented all of ZP yet - JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around - SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you - **Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!** - DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays! - JH: Try to get sharded V2 data to work, and let us know! ## 2024-10-02 **Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684 - We had a documentation sprint for Zarr-Python V3 - The doc sprint officially ended on 10/1 evening. The participants have sent PRs to document the `zarr.array` and `zarr.storage` modules. Here are the open PRs: - https://github.com/zarr-developers/zarr-python/pull/2276 - https://github.com/zarr-developers/zarr-python/pull/2279 - https://github.com/zarr-developers/zarr-python/pull/2281 - Zarr-Python V3 team good progress — alpha release every week — V3 main release soon! - Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass - OME Challenge - EP: Was able to convert a big JAX datasets into V3 - JM: Ran into issues and was able to convert them into Zarr-Python V3 issues - More discussion down below - Any other updates? **Open agenda (add here 👇🏻):** - OME Challenge - EP: JAX ran into issues for remote access and it's good to point them out and later rectify that - EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets - EP: Checking if an object exists before a write could cost us $2k! - EP: JZarr is currently being written - DH: Any decisions about deleting objects? - EP: When you check for existing objects, you have the ability to rewrite them - DH: That means you can delete it! - DH: NetCDF implements a recursive delete operation - EP: - DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them - DH: Having consolidated metadata help in rewriting operations - DB: Defining schema and knowing the entire hierarchy has been helpful - DH: We have this in NetCDF - DB: https://github.com/janelia-cellmap/pydantic-zarr/ - Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/ - JM: Will need to read this - DB: Seems good enough and harmless - Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/ - JM: Sharing password has been a challenge - DB: https://github.com/zarr-developers/zarr-specs/pull/312 - JM: Need to merge on https://github.com/zarr-developers/governance/pull/44 - JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes - DH: Diversity defined on the structure of the internal architecture and not programming language implementation ## 2024-09-18 **Attending:** Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP) **TL;DR:** **Updates:** - Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct. - Identify missing docs and start creating issues - Link existing issues - https://github.com/zarr-developers/zarr-python/issues?q=is%3Aopen+is%3Aissue+label%3AV3+doc - Async working via Zoom meetings - MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation - Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27 - Updates from Zarr-Python V3 effort / OME challenge - DB: Getting through issues from V2 and V3 compatibility - Tom and Deepak taking care of Dask issues - Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history - Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined - Main release by the end of year **Open agenda (add here 👇🏻):** - DB: https://github.com/zarr-developers/zarr-python/issues/2170 - The way of defining sharding codec is not intuitive and can be improved - https://github.com/zarr-developers/zarr-python/pull/2169 - proposed solution - DB: Will update this PR and make it ready for review - DB: All stores should have cache: https://github.com/zarr-developers/zarr-python/issues/1500 - EP: Some stores like S3 would benefit from this - EP: Compression and decompression on cache is expensive - DB: We can default it to 0 turn it on accordingly - DB: FSSpec have a default cache enabled - we can look into it - EP: Will try to join the Zarr-Python core devs meetings on Friday - SV: Early morning for west coast - EP: Can make it! - SV: Early morning stuff: presented on Zarr V3 at EuroBioc: https://eurobioc2024.bioconductor.org/abstracts/paper-bioc4/ ## 2024-09-04 **Attending:** Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP) **TL;DR:** **Updates:** **Meeting Minutes**: * Consolidated v2 - DH: annex for v2, "officially"/loose recognized (would be a **great** favor) - JM: and if we put it in v3 to say, "this is the former version"? - DH: add a forward pointer - EP: how many edge cases? * Deprecation (DH) - JM: No plan to deprecate v2 format (format vs. library) - DH: presumably people will use the new library, that will be the "test" of the consolidated metadata. * Bugs between implementations (JM) - DH: list of those bugs? JM: no, bug good idea. - DH: available data? - JM: yes! see https://github.com/ome/ome2024-ngff-challenge?tab=readme-ov-file#challenge-overview - EP: billions of objects isn't fun. * Consolidated v3 - JM: pushed recently at zarr-python meeting for a spec (and with more design) - DH: as soon as it's in the format, then it's not just caching - metadata caching prevents multiple reads - DH: caching -> "big set of objects, keeping subset in memory" - JM: can be re-created? "index"? - DH: regardless, have to specify construction any block of JSON - could say a subtree looks like some other pre-defined block - JM: parameterized MetadataLoader (or "MetadataDriver") - DH: that's what I was going to implement anyway - DH: like StorageDrivers (not caching) -- "VirtualAccess" - DH: but same wave length - JM: would like to offload some JSON (speed vs size) - DH: should that API do more than read/write the JSON? - should it interpret it? - "give me key X out of this dictionary" - JM: like mongodb or jq queries - DH: walk binary without needing to convert down to JSON - EP: this gets back to N5 as an API rather than a format - logical versus storage - DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP) - essential to have some virtual object/class - hammer applied to everything ("common API") * EP: https://github.com/zarr-developers/zarr_implementations - why didn't that find the codec issues? - JM: no v3! - EP: hackathon? - EP: need mozilla support for HTML things - DH: agreed. hugely important - JM: As a github action? ## 2024-08-21 **Attending:** Eric Perlman (EP) and Sanket Verma (SV) **TL;DR:** **Updates:** - Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093 - Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence - Welcome David Stansby as core dev of Zarr-Python! 🎉 - https://github.com/zarr-developers/zarr-python/pull/2071 - Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see [here](https://github.com/zarr-developers/zarr-python/commits/v3/?since=2024-08-08&until=2024-08-21) **Open agenda (add here 👇🏻):** - EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore - Repo: https://github.com/ome/ome2024-ngff-challenge - EP: Jackson lab will be utilising the docker for converting V2 to V3 data created by EP - SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places! ## 2024-08-07 **Attending:** Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN) **TL;DR:** **Updates:** - Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy - Zarr-Python updates - DB: There have been some movements in ZP - Discussion around the new API: https://github.com/zarr-developers/zarr-python/discussions/2052 - Chunks, shards, and other terminology - need to decide what to use - Getting more active core-devs for ZP will help in having lively discussion - TN: Applying for money to work on VirtualiZarr / Zarr upstream - TN: Development Seed is applying for the NASA grant - Julia Signell would work on it - DB: Non-zero origin for Zarr arrays would help - Related issue: https://github.com/zarr-developers/zarr-specs/issues/122 **Open agenda (add here 👇🏻):** - EP: Discussion about cycling, library and picking up books from library and reading them in the park! :bicyclist: :books: - EP/DB: Tangent on write directly to sharding from microscopes... - TN: Various ways of storing the large metadata for a huge Zarr array - TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed - TN: Seems important because it has come up every time there's a discussion about scalability - DB: Store the aggregrated information in the header of the chunk - SV: How doe BSON scale as compared to JSON? - TN: We would still need to have a pointer to the BSON in JSON - DB: How do we introduce it to the Zarr V3 Spec? - TN: Maybe a convention - TN: Zarr is close to be a superformat! - DB: We could also increment the spec to a major version to include the change - TN: Discussions on if its possible for Zarr to be a _superformat_! - TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that - DB: A fundamental Zarr array could be a set of small Zarr arrays - TN: VirtualiZarr basically does that - TN: _starts screen sharing_ - DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner - TN: Hopefully I'd be able to work on this after VirtualiZarr - Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages ## 2024-07-24 **Attending:** Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN) **TL;DR:** **Updates:** - SciPy 2024 was great! 🎉 - DB: Zarr-Python updates - Sharding codec is pickleable - Decision need to made about array API - How sharding codec should look like to the user? - DB: Easy to find if your array is sharded - JM: Partial reading this in Zarr V2 - TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr - DB: All the arrays should have sharding configuration - JM: Working with Tensorstore, the order of codecs didn't matter --> read_chunks / write_chunks - DB: some weirdness when it comes to different backends when uncompressed - New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan! - New codec added - Pcodec - JM: Conda is unhappy **Open agenda (add here 👇🏻):** - Intros - SV: Yosemite National Park - JM: National Seashore in Florida - Gulf of Mexico - FC: Jackson Lab working in ML - Saccida National Park - EP: Zayn National Park - WF: Yellowstone National Park - DB: Yellowstone National Park - TN: Want to open issues on bunch of ideas - 1. Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this - Can use Zarr to open NetCDF directly - 2. VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays - Long standing issue in Xarray to separate the lazy indexing machinery from Xarray - https://github.com/pydata/xarray/issues/5081 - DB: Could be handled and should be a priority now - TN: - JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct! - JM: Talked to 2 Napari maintainers - on a problem of chunking - TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that - JM: Finding more people with similar interests would help us provide more engineering power - DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase - TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices - DB: Slicing and concatenation are duals - if you have both its complete - DB: - JM: Query optimisation can be tweaked as we move forward - TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader - JM: What does user do with the plan? Do they do something with it? - TN: Array API folks has deliberately made arrays lazy - GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041 - GitHub and Cirun sounds good and easy to setup - Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF - NF has money reserved for projects in the infrastructure committee for similar costs - JM: Good to have it! - SV: Need to get it sooner that later - Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F - JM: My poster was cited multiple times in the last few weeks - JM: JOSS is a potential venue - IETF is more work - TN: Submitting to a computing journal - W3C, IEEE, etc. - TN: Xarray: https://openresearchsoftware.metajnl.com/articles/10.5334/jors.148 - JM: NetCDF: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg00087.html - **TABLED** - Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages ## 2024-07-10 **Attending:** Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC) **Updates:** - SciPy! :tada: - Josh: testing zarr v3 - issue for each problem? Davis: sure - Davis: to be fixed: - no validation of fill value - multiple bugs with sharding: 1d - Josh: missing "attributes" - Josh: but neuroglancer working? - Davis: not for all static file servers. need PR. - Davis: various forks. Josh: plugins? Davis: tough - or: neuroglancer as a component that can be embedded - Janelia NG is a React component. - "Visualization is tough." - Motion for food :knife_fork_plate: Seconded. ## 2024-06-26 **Attending:** Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB) **TL;DR:** **Updates:** - Zarr-Python 3.0.0a0 out - https://pypi.org/project/zarr/3.0.0a0/ - Good momentum and lots of things happening with ZP-V3 - aiming for mid July release - SV represented Zarr at CZI Open Science 2024 meeting - various groups looking forward to V3 - https://x.com/MSanKeys963/status/1801073720288522466 - R users at bio-conductor looking to develop bindings for ZP-V3 - New blog post: https://zarr.dev/blog/nasa-power-and-zarr/ - ARCO-ERA5 got updated this week - ~6PB of Zarr data available - check: https://x.com/shoyer/status/1805732055394959819 - https://dynamical.org/ - making weather data easy and accessbile to work with - Check: https://dynamical.org/about/ - Video tutorial: https://youtu.be/uR6-UVO_3k8?si=cp0jOxrtKL_I6LfV **Open agenda (add here 👇🏻):** - BP: Will be talking about how Zarr is utilised at NASA! - _starts screen sharing and presenting_ - BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - **not representing all the data centres** - BP: Lot of people are coming into Zarr from the SMD (Science mission directorates) - BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management - BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use - BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats - BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB! - BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse! - BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range - BP: Over at NASA, we do many Zarr stuff... - Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS) - Giovanni in cloud - duplicates Zarr (variable based) - Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data - ZEP0005 - Brianna is leading the GeoZarr work - VEDA - no. of things Zarr/STAC related going on in VEDA - TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?) - BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes! - TN: From the slides - Xarray can take care of some of the stuff that Giovanni does - TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake - BP: LakeFS is OS ArrayLake - no vendor lock-in - SV: What does Giovanni actually do when you say, ‘it grooms metadata’? - BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services - BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata - TN: Can we do the flipping with Zarr/VirtualiZarr? - DB: If you flip at the store level - you'd need to find out the how deep you'd need to go - BP: Will try to make time standard across the datasets - BP: https://github.com/briannapagan/quirky-data-checker - BP: _from the Zoom chat_ - Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS). https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices - Giovanni in the Cloud, duplicate archive, zarr, variable-based: https://cmr.earthdata.nasa.gov/search/variables.umm_json?instance-format=zarr&provider=GES_DISC&pretty=True  - Open issue: continuously updating zarr stores. Exploring lakeFS for managing dynamic data - ZEP 0005: Zarr accumulation extension for optimizing data analysis - Looking into a GIS service for zarr stores - POWER https://power.larc.nasa.gov/data-access-viewer/ - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html - https://discourse.pangeo.io/t/metadata-duplication-on-stac-zarr-collections/3193/7 - EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful - DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document - DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature - EP: The bio and geo world have parallel tracks and working in silos - EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363 - DB: The challenge doesn't seems interesting to me! - convering `JSON`s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem - EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully