Joining instructions: https://zoom.us/j/300670033 (password: 558943)
GitHub repo: https://github.com/zarr-developers/community-calls
Previous notes: https://j.mp/zarr-community-1
2025-04-16
Attending: Josh Moore (JMo), Eric Perlman (EP), Justus Magin (JMa), Gábor Kovács (GK)
Open agenda (add here 👇🏻):
- JMa: "signed URLs"
- EP: looking at them for raw data
- e.g. HTTP parameters NOT the AWS thing
- JMo: bug? JMa: Zarr is oversimplifying the use of paths
- Planetary computer pulls an access token, appends to any URL (Zarr or Geo-TIFF or …)
- EP: would try hacking it in in a little fsspec wrapper
- JMa: use-case – trying to access something S3-like that needs parameters
- JMo: does obstore "do the right thing"?
- JMa: Kyle has something that will work for planetary computer, but that's just one endpoint
- EP: with shards can almost use proper signed URLs
- JMa: sparse arrays
- looked at binsparse
- which decomposes into one dimensional arrays
- another level of nesting
- JMo: good format but need library support like
.chunks
. need to be aware of the metadata.
- JMa: encoding the sparseness per chunk?
2025-04-02
Attending: Davis Bennett (DB), Sanket Verma (SV), Eric Perlman (EP), Jeremy Maitin-Shepard (JMS), Michael Sumner (MS)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- DB: Working to add support for V3 and Tensorstore in Pydantic Zarr
- Also to add group support in Pydantic for Tensorstore
- Appreciate the results by reading and writing in Tensorstore, i.e. returns an object
- DB: elaborates on the version policy change
- DB: Effver—mostly a function of efforts put in by the users
- JMS:
- DB: https://github.com/zarr-developers/numcodecs/issues/686—formalise old and new styles of JSON serialisation
- DB: Numcodecs doesn't interoperate well with Zarr-Python, also there's code in Cython which only handful of folks can maintain
- MS: There have been great developments in the Zarr ecosystem but things have been moving so fast that I worry it will start proliferate. It's difficult to keep track of all the things
- EP: Most of the Jackson Lab data is in V3 sharded effort
- EP helped in conversion and Eric Ratamero lead the effort
2025-03-19
Attending: Davis Bennett (DB), Abhiram Reddy (AR), Sanket Verma (SV), Jeremy Maitin-Shephard (JMS), Michael Sumner (MS)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
2025-02-19
Attending: Josh Moore (JM), Sanket Verma (SV), Michael Sumner (MS), Davis Bennett (DB), Eric Perlman (EP)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- MS using Pizzarr for their work
- GDAL and Pizzarr work for virtual references as well
- Has datasets in HDF5 and NetCDF - both has their pecularities
- JM: gives an overview of ZEP9
- EP: Jackson lab data conversion
- EP: Cloudflare is potentially working with OS projects and giving them resonable tier prices
- EP: Raw 10TB nbytes - how do you convert it to sharded V3 array?
- JM: https://github.com/asdf-format/asdf-standard
- EP: Zarr being used in bio space - Folks at the Allen are looking to submit a proposal at SciPy 2025
2025-02-05
Notes TBA
2025-01-22
Attending: Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Jeremy Maitin-Shepard (JMS), Gábor Kovács (GB)
TL;DR:
Updates:
- Zarr-Python 3 released on January 9th, 2025!
Open agenda (add here 👇🏻):
- N5
- JM: There'll be new release to add support for N5 in Zarr-Python 3
- EP: They can leverage sharding and other useful features
- Zarr-Python 3
- DB: Gave a presentation on Zarr-Python 3 at Allen Institute
- DB: Realised some issues in Zarr-Python 2 when listing groups
- JMS: Because ZP 2 listing processes were taking place parallely
- DB: In Zarr V2 spec there's nothing says that groups and arrays are different
- JMS: Looked at the spec as well as the implementation when working with my implementation
- JMS: Added support for ZEP8 in Tensorstore
- Discussion on URL for Neuroglancer
- Deciding the right characters to use
- Tricky to decide the right URL
2025-01-09
Attending: Josh Moore, Eric Perlman, Sanket Verma, Joe Hamman, Davis Bennett, Gábor Kovács, Dennis Heimbigner, Thomas Nicholas, Jeremy Maitin-Shepard
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- EP: the month wait was good to get other projects like napari up-to-speed
- DB: reached out to people using n5 in python. They weren't pinning to
zarr-python<3
. Sent an email. No response. EP anyone? No. Using Zarr.
- JH: Virtualizarr ready for 3.0.0? Failing test (xarray?) but Matt is looking at it.
- TN: Kerchunk doesn't support zarr-python 3.x (API usage)
- without kerchunk: fits & netcdf won't work.
- lose access to anything in the future (in-progress HDF4)
- JH: requires rethinking of MultiZarrToZarr logic
- TN: Doesn't directly interact with zarr-python v3. But want to (to use the v2 to v3 compat objects)
- JH: Would be good to unblock the ZEP process and get ZSC behind on the changes — it's confusing to see ZSTD codec in Zarr-Python 3.0 and not in the spec
- JM: I'll get the ZSC to respond on the longing issues
- DH: https://github.com/Unidata/netcdf-c/pull/3068 (Ward will get to the review it)
- DB: Sample V3 sharded data: https://github.com/d-v-b/zarr-workbench/tree/main/v3-sharding-compat/data/zarr-3
- JMS: Planning to add Icechunk support to Tensorstore
2024-12-11
Attending: Eric Perlman (EP), Sanket Verma (SV), Gábor Kovács (GB), Ward Fisher (WF, Davis Bennett (DB), Camille Teicheira (CT), Jeremy Maitin-Shepard (JMS)
TL;DR:
Updates:
- Zarr is on BlueSky — follow us https://bsky.app/profile/zarr.dev
- Norman Rzepka has joined Zarr Steering Council — https://zarr.dev/blog/steering-council-update-2024/! Welcome Norman!
- A group of Zarr-Python devs are at AGU this week including Joe and Ryan
- Zarr-Python V3 release before holidays!
- DB has bunch of PRs coming in soon!
- Planning to expose to sharding in a user friendly way
- WF: Had meetings from Florian Ziemann — putting up a PR for V2 consolidated metadata
Open agenda (add here 👇🏻):
- Intros w/ favourite places for holidays
- Sanket — into Himalayas
- Ward - near Colorado
- Davis — Italy
- Eric - coming to India soon!
- Gábor — Canada for Skiing
- Camille - tech lead at https://www.sofarocean.com/ — has lot of weather NetCDF data
- DB: Sharding chunk sizes: can we allow imperfect partitioning of the shard shape?
- JMS: Would be possible to support, but with the current config you have a regular grid for shards and chunks, also resizing would be difficult
- JMS: Could also be based on user preference
- DB: The sharding spec doesn't specifically say anything about the shape — so how and where should we define it?
- JMS: The non-regular/partial chunks would not compose across shards
- DB: I see! The proposal is off the table then!
2024-11-27
Attending: Sanket Verma (SV), Eric Perlman (EP), Davis Bennett (DB), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)
TL;DR:
Updates:
- Zarr-Python V3 release in first week of December
- New numcodecs release (includes fixes and improvements)
- Zarrs-Python:
Open agenda (add here 👇🏻):
- DB: OME-NGFF hackathon update:
- They worked on a Python library which will render DB's library obsolete - good news for DB as he doesn't need to maintain it!
- https://github.com/BioImageTools/ome-zarr-models-py
- EP: John Bogoviç made good progress on Zarr Java (in the NGFF land)
- FIJI being able to open Zarr V3
- DB:
zarr.open()
and zarr.create()
are confusing - instead we could have zarr.create_array()
or zarr.create_group()
to make things clear
- DB: Norman has a PR and he's also experimenting to have a Zarr sharded create routine
- JM: Would be cool if
zarr.open()
could figure out if it's an array or a group
- Zarrs-Python
- JM: The nomenclature could've been better! A bit confusing.
- DB: Major scope is to improve the IO
- JM: https://bsky.app/profile/zarr.dev
- https://github.com/zarr-developers/zarr-specs/pull/311
- JMS: The only part which requires spec is the URL
- JM: I care about the internal directory structure
- DB: The folks who I spoke at the OME hackathon they want the drag and drop feature
- JM: In ZP land
2024-11-13
Attending: Davis Bennett (DB), Eric Perlman (EP), Josh Moore (JM), Dennis Heimbigner (DH), Jeremy Maitin-Shepard (JMS)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- meetings (Josh)
- conversation
- zarr-python: going strong (weekly)
- ZEP: (not great for Dennis)
- community: combine into ZEP.
- office hours: likely to end
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
run a doodle
- People
- Dennis: not before 10am MST
- Jeremy: ZEP meetings are less critical at the moment
- Decisions/TODOs
- Only drop ZEP for the moment. Re-evaluate community frequency & time later. (cf. napari timezones)
- Josh: remove ZEP calendar entry
- Josh: update on Zulip (anywhere else?)
- Davis: was "zarr.json" a mistake?
- Josh: good question. benefits were:
- only one GET (or HEAD) rather than needing a frequent 404
- non-hidden files with proper file-ending
- Davis: true. just have a pattern now where want to iterate and bottom out on arrays
- Jeremy: often need to load the json anyway
- or storage transformers that are needed
- Dennis: preference is to have the directories marked (e.g., in the name)
- price is mostly paid with large numbers of groups/arrays
- cf. consolidated metadata – locating all objects before reading them
- maybe be able to recognize that by name
- Davis: give it its own document?
- Josh: like
.zmetadata
; downside is it requires an extra GET but that's ok.
- Davis: using
_nc
for all attributes now. destroyed lazy evaluation of attributes. seemed to be the way that people were going. have to have a good use-case for a separate file (see S3 and performance issues)
- Josh: could additionally gzip the extra file. benchmarking?
- Dennis: how big can get the total metadata? (in characters)
- Davis: maybe tabular data.
- Dennis: in NetCDF, lot of use of groups (1000s) as namespaces.
- Davis: Store API
- getters take memory type (GPU, CPU, …)
- Josh: good to track (or disallow?) copies
- Jeremy: most Stores are CPU, so actively copying for GPU.
- Davis: Separate stores as in v2 (regular and chunk)
- Davis: Store is simple key/value. Agnostic to Zarr formats.
- Is the Store API overloaded?
- Davis: On extra files, an extension where sqlite for every group and array. Good for tabular.
- Jeremy: sqlite doesn't work for cloud storage.
- What stops people from doing it today?
- Prototype
- Is this icechunk? That's more a Store API.
- Jeremy: WASM/sqlite forwards to HTTP requests to S3 with read ahead.
- Josh: duckdb?
- Davis: see BigStitcher's use of arrays.
- Jeremy: space of zarr-like stuff. cloud-based databases (queries & writing). Parquet might not support this for the indexing.
- Davis: GeoParquet has a spatial index
- https://github.com/opengeospatial/geoparquet/pull/191
- Theodoros: interested in adopting Zarr
- Problem is that we're dealing with really sparse datasets (mass spec imaging).
- Davis: working on Zarr Python v3. A barrier to sparse support is that when we fetch a chunk we turn it into a contiguous array. Requires a redesign.
- Efficient encoding of a single sample and "plugged into" zarr.
- TV: can also have a time axis. distribution time of ions (for the same mass). (even though not as many pixels)
- JMS: a codec could encode it as sparse (but in-memory is dense). Matter of hours.
- other step would be full spare support. ton of people have asked for this. but has to be woven throughout.
- Davis: tell us what doesn't work for you. "we want to use Zarr but …"
- Josh: https://github.com/GraphBLAS/binsparse-specification
- Jeremy: easy to store chunk in a sparse format. implementing all the APIs, e.g. in tensorstore would ask explicitly, "give it to me as a sparse array"
2024-10-30
Attending: Davis Bennett (DB), Sanket Verma (SV), Gábor Kovács (GB), Jeremy Maitin-Shepard (JMS)
TL;DR:
Updates:
- DB: Finding bugs in Zarr-Python and removing it - expanding the scope of tests
- JMS: Back from the parental leave — the baby is doing great!
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Been working on bugs for tensorstore
- SV: GeoZarr spec meetings have been updated on the community calendar
Open agenda (add here 👇🏻):
- Frequency of the meetings
- DB: No strong feelings
- JMS: Less activity, so make sense
- GB: Fine by me
- DB: Unrealiable attendance — how to mark meetings as successful and unsuccessful
- SV: Will open a wide discussion for the community to get everyone thoughts
- DB: Zarr V2 arrays using sharded arrays?
- JMS: Not simple enough to do that because of overlapping arrays
- DB: Zarr V2 codecs can utilise sharding codec
- JMS: The JSON metadata is differ for sharding
- JMS: Who's the user base?
- DB: Someone who's using Zarr V2 and want to use sharding
- DB: People might be scared of switching to a new format
- DB: Planning to add consolidated metadata and new data types in Zarr-Python, write a doc and publish it for reference
- JMS: The idea of community and core codecs is not super impressive!
- DB: Would be good to avoid namespacing issues
- JMS: What would happen if there are multiple JPEG codecs in community and we want to move them to core but there's already one in the core?
- DB: Good question, need to come up with a process for this
- JMS: Adding a vendor name could work — value in having a vendor name
- Discussions on upcoming possible extensions
2024-10-16
Attending: Sanket Verma (SV), Joe Hamman (JH), Eric Perlman (EP), Ilana Rood (IP), Davis Bennett (DB), Michael Sumner (MS), Daniel Adriaansen (DA)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- Intro w/ favourite food
- Sanket - Dumplings
- Joe - Burrito
- Eric - Donuts
- Davis - Ethopian, Mexican and Indian dishes
- Michael - RSE at Australian Antartcic Division - Burgers
- Ilana - Works at Earthmover
- Daniel - NCAR
- JH: starts screen sharing
- JH: Presents on Earthmover, Arraylake, Icechunk…
- JH: presentation ends — time for questions
- DB: Question on performance plots - what's the difference b/w Zarr V3 and Zarr V3 + Dask?
- JH: Fetching is done by a different library - we're handling the concurrency better on the IO side
- DB: What lessons could be take from this plot that can be applied to Zarr-Python?
- JH: Python binding to rust crate needs to be looked at
- JH: Doing decompression and IO in a relieved fashion
- SV: Does Icechunk works with Zarr V2?
- JH: Only with V3 for some parts - but we can change that
- EP: Able to leverage Zarr sharding in some way for Icechunk would be great
- JH: We had an opportunity to something totally different with sharding as it is now in ZP V3, i.e. a codec
- JH: Implies sharding in a different manner
- DB: How coupled are you with the current Zarr V3 API?
- JH: Highly coupled
- JH: LDeakin has started filling issues
- JH: Can envision a high-level and a low-level store - that's what we build in the rust store
- JH: We should ask store to do more, but we should be specific about it
- MS: Really interested in Rust implementation - Does Rust part take over the encodings?
- JH: No. We haven't implemented all of ZP yet
- JH: Codecs, stores and all other stuff is currently handled by ZP - someone interested could come along and build bindings around
- SV: Bioconductor folks are interested in Zarr-Python V3 — maybe they can be of interest to you
- Thanks, Joe for giving the presentation! The slides and video recording will be posted online soon! Check our social media for updates!
- DB: Zarr-Python V3 defines sharding as codec not as an explicit 3 feature, so basically you can have a Zarr-Python V2 sharded arrays!
- JH: Try to get sharded V2 data to work, and let us know!
2024-10-02
Attending: Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Davis Bennett (DB), Eric Perlman (EP)
TL;DR:
Updates:
- Added Tom and Deepak to Zarr-Python core devs — https://x.com/zarr_dev/status/1838965230438625684
- We had a documentation sprint for Zarr-Python V3
- Zarr-Python V3 team good progress — alpha release every week — V3 main release soon!
- Making stuff consistent with V2 - looking at Xarray and Dasks tests and they pass
- OME Challenge
- EP: Was able to convert a big JAX datasets into V3
- JM: Ran into issues and was able to convert them into Zarr-Python V3 issues
- More discussion down below
- Any other updates?
Open agenda (add here 👇🏻):
- OME Challenge
- EP: JAX ran into issues for remote access and it's good to point them out and later rectify that
- EP: Directory list should not be present by default - as it's computation heavy and could hurt your pockets
- EP: Checking if an object exists before a write could cost us $2k!
- EP: JZarr is currently being written
- DH: Any decisions about deleting objects?
- EP: When you check for existing objects, you have the ability to rewrite them
- DH: That means you can delete it!
- DH: NetCDF implements a recursive delete operation
- EP:
- DB: If you're inside the array then your metadata would statically define the array and you can easily rewrite it and essentially delete them
- DH: Having consolidated metadata help in rewriting operations
- DB: Defining schema and knowing the entire hierarchy has been helpful
- DH: We have this in NetCDF
- DB: https://github.com/janelia-cellmap/pydantic-zarr/
- Endorsing SPEC8 — Securing the release process: https://scientific-python.org/specs/spec-0008/
- JM: Will need to read this
- DB: Seems good enough and harmless
- Endorsing SPEC6 - Keys to the Castle: https://scientific-python.org/specs/spec-0006/
- JM: Sharing password has been a challenge
- DB: https://github.com/zarr-developers/zarr-specs/pull/312
- JM: Need to merge on https://github.com/zarr-developers/governance/pull/44
- JM and DB: Discussion on how to move forward and defining the scope of the groups involved in the specification changes
- DH: Diversity defined on the structure of the internal architecture and not programming language implementation
2024-09-18
Attending: Sanket Verma (SV), Gábor Kovács (GB), Davis Bennett (DB), Eric Perlman (EP)
TL;DR:
Updates:
- Zarr-Python V3 doc sprint — 30th Sep. and 1 Oct.
- MATLAB (Mathworks) interested in Zarr - want to add support - looking for a C++ implementation
- Worked on a zine comic couple weeks ago - https://github.com/numfocus/project-summit-2024-unconference/issues/27
- Updates from Zarr-Python V3 effort / OME challenge
- DB: Getting through issues from V2 and V3 compatibility
- Tom and Deepak taking care of Dask issues
- Alpha releases every week - https://pypi.org/project/zarr/3.0.0a4/#history
- Defining data types in Zarr V3 - you're gonna see a error if the dtype is not defined
- Main release by the end of year
Open agenda (add here 👇🏻):
2024-09-04
Attending: Josh Moore (JM), Dennis Heimbigner (DH), Eric Perlman (EP)
TL;DR:
Updates:
Meeting Minutes:
- Consolidated v2
- DH: annex for v2, "officially"/loose recognized (would be a great favor)
- JM: and if we put it in v3 to say, "this is the former version"?
- DH: add a forward pointer
- EP: how many edge cases?
- Deprecation (DH)
- JM: No plan to deprecate v2 format (format vs. library)
- DH: presumably people will use the new library, that will be the "test" of the consolidated metadata.
- Bugs between implementations (JM)
- Consolidated v3
- JM: pushed recently at zarr-python meeting for a spec (and with more design)
- DH: as soon as it's in the format, then it's not just caching
- metadata caching prevents multiple reads
- DH: caching -> "big set of objects, keeping subset in memory"
- JM: can be re-created? "index"?
- DH: regardless, have to specify construction any block of JSON
- could say a subtree looks like some other pre-defined block
- JM: parameterized MetadataLoader (or "MetadataDriver")
- DH: that's what I was going to implement anyway
- DH: like StorageDrivers (not caching) – "VirtualAccess"
- DH: but same wave length
- JM: would like to offload some JSON (speed vs size)
- DH: should that API do more than read/write the JSON?
- should it interpret it?
- "give me key X out of this dictionary"
- JM: like mongodb or jq queries
- DH: walk binary without needing to convert down to JSON
- EP: this gets back to N5 as an API rather than a format
- DH: netcdf had to support multiple storages as a wrapper (HDF4, HDF5, DAP)
- essential to have some virtual object/class
- hammer applied to everything ("common API")
- EP: https://github.com/zarr-developers/zarr_implementations
- why didn't that find the codec issues?
- JM: no v3!
- EP: hackathon?
- EP: need mozilla support for HTML things
- DH: agreed. hugely important
- JM: As a github action?
2024-08-21
Attending: Eric Perlman (EP) and Sanket Verma (SV)
TL;DR:
Updates:
- Zarr V3 Survey: https://github.com/zarr-developers/zarr-python/discussions/2093
- Zarr-Python developers meeting update - removed 15:00 PT meeting and changed 7:00 PT to weekly from bi-weekly occurrence
- Welcome David Stansby as core dev of Zarr-Python! 🎉
- Bunch of PRs got in Zarr-Python - changes around fixing dependencies, maintenance, and testing, see here
Open agenda (add here 👇🏻):
- EP: NGFF challenge to convert V2 data to V3 - EP had a chat with Josh Moore
- SV and EP: Discussions on Tensorstore, Compressions & Codecs, Microscope vendor software and woes of moving places!
2024-08-07
Attending: Davis Bennett (DB), Eric Perlman (EP), Sanket Verma (SV), Thomas Nicholas (TN)
TL;DR:
Updates:
- Benchmarking tool for determing best spec for writing using Tensorstore: https://github.com/royerlab/czpeedy
- Zarr-Python updates
- DB: There have been some movements in ZP
- TN: Applying for money to work on VirtualiZarr / Zarr upstream
Open agenda (add here 👇🏻):
- EP: Discussion about cycling, library and picking up books from library and reading them in the park!
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- EP/DB: Tangent on write directly to sharding from microscopes…
- TN: Various ways of storing the large metadata for a huge Zarr array
- TN: Storing the large metadata in form of a hidden Zarr array - sort of a common theme among the various ZEPs being proposed
- TN: Seems important because it has come up every time there's a discussion about scalability
- DB: Store the aggregrated information in the header of the chunk
- SV: How doe BSON scale as compared to JSON?
- TN: We would still need to have a pointer to the BSON in JSON
- DB: How do we introduce it to the Zarr V3 Spec?
- TN: Maybe a convention
- TN: Zarr is close to be a superformat!
- DB: We could also increment the spec to a major version to include the change
- TN: Discussions on if its possible for Zarr to be a superformat!
- TN: Some values in the geoscience datasets that are closely related and if compressed will be of huge value - but Zarr can't do that
- DB: A fundamental Zarr array could be a set of small Zarr arrays
- TN: VirtualiZarr basically does that
- TN: starts screen sharing
- DB: The current indexing in Zarr-Python is not ideal and having Zarr arrays made of small arrays sounds much cleaner
- TN: Hopefully I'd be able to work on this after VirtualiZarr
- Using MyST for Zarr webpages - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Moving.20from.20Jekyll.20.E2.80.94.20Zarr.20webpages
2024-07-24
Attending: Davis Bennett (DB), Josh Moore (JM), Sanket Verma (SV), Fernando Cervantes (FC), Eric Perlman (EP), Ward Fisher (WF), Thomas Nicholas (TN)
TL;DR:
Updates:
- SciPy 2024 was great! 🎉
- DB: Zarr-Python updates
- Sharding codec is pickleable
- Decision need to made about array API
- How sharding codec should look like to the user?
- DB: Easy to find if your array is sharded
- JM: Partial reading this in Zarr V2
- TIFFfile set a bunch of flags - wonder if those features are friendly for Zarr
- DB: All the arrays should have sharding configuration
- JM: Working with Tensorstore, the order of codecs didn't matter –> read_chunks / write_chunks
- DB: some weirdness when it comes to different backends when uncompressed
- New release - Numcodecs 0.13.0 - https://numcodecs.readthedocs.io/en/stable/release.html#release-0-13-0 - Thanks, Ryan!
- New codec added - Pcodec
- JM: Conda is unhappy
Open agenda (add here 👇🏻):
- Intros
- SV: Yosemite National Park
- JM: National Seashore in Florida - Gulf of Mexico
- FC: Jackson Lab working in ML - Saccida National Park
- EP: Zayn National Park
- WF: Yellowstone National Park
- DB: Yellowstone National Park
- TN: Want to open issues on bunch of ideas
-
- Zarr reader to read chunk manifest and bytes offset - currently Xarray handles this
- Can use Zarr to open NetCDF directly
-
- VirtualiZarr has lazy concatenation of arrays - Xarray has lazy indexing operations for arrays
- DB: Could be handled and should be a priority now
- TN:
- JM: Agree with Davis with indexing - not sure if the abstraction layer for concatenation is correct!
- JM: Talked to 2 Napari maintainers - on a problem of chunking
- TN: A lot of people want to solve the indexing problem but neither Zarr or Xarray exposes that
- JM: Finding more people with similar interests would help us provide more engineering power
- DB: Create a PR with copy pasting code from Xarray!? - This could unlock a lot of usecase
- TN: VirtualiZarr does actually do that - but at the level of chunks rather than indices
- DB: Slicing and concatenation are duals - if you have both its complete
- DB:
- JM: Query optimisation can be tweaked as we move forward
- TN: When you do concat and slice you have identified a directed graph - you can optimise that plan - you can also hand off that plan to some reader
- JM: What does user do with the plan? Do they do something with it?
- TN: Array API folks has deliberately made arrays lazy
- GPU CI for Zarr-Python - https://github.com/zarr-developers/zarr-python/issues/2041
- GitHub and Cirun sounds good and easy to setup
- Who pays? - Earthmover is ready to pay the cost for initial months and then switch to NF
- NF has money reserved for projects in the infrastructure committee for similar costs
- JM: Good to have it!
- SV: Need to get it sooner that later
- Zarr paper - https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/Appetite.20for.20a.20Zarr.20paper.3F
- TABLED
2024-07-10
Attending: Josh Moore (JM), Davis Bennett (DB), Fernano Cervantes (FC)
Updates:
- SciPy!
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Josh: testing zarr v3
- issue for each problem? Davis: sure
- Davis: to be fixed:
- no validation of fill value
- multiple bugs with sharding: 1d
- Josh: missing "attributes"
- Josh: but neuroglancer working?
- Davis: not for all static file servers. need PR.
- Davis: various forks. Josh: plugins? Davis: tough
- or: neuroglancer as a component that can be embedded
- Janelia NG is a React component.
- "Visualization is tough."
- Motion for food
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Seconded.
2024-06-26
Attending: Brianna Pagān (BP), Thomas Nicholas (TN), Dennis Heimbigner (DH), Eric Perlman (EP), Sanket Verma (SV), Davis Bennett (DB)
TL;DR:
Updates:
Open agenda (add here 👇🏻):
- BP: Will be talking about how Zarr is utilised at NASA!
- starts screen sharing and presenting
- BP: I work at Goddard GES DISC - deputy manager at one of the centres - manages team of developers and engineers - not representing all the data centres
- BP: Lot of people are coming into Zarr from the SMD (Science mission directorates)
- BP: Earth Science Division - EOSDIS and Distributed Active Archive Centres (DAACs) - DAACs focuses on data distribution and management
- BP: All the centres coming up with the suggestion on best practices and best format - we discuss with them the possibility of what they can, and should use
- BP: Moving to cloud optimized format - DAACs have ton of archival data in various formats
- BP: Projected growth for entire Zarr store across all EOSDIS by 2030 60PB -> 600PB!
- BP: GES DISC holds 7 PBs of data - we have 3000 different collections of datasets - really diverse!
- BP: Giovanni - interactive web-based program have 20+ services associated with it - taking the existing data and grooming the metadata so it's accessible and useful across broader range
- BP: Over at NASA, we do many Zarr stuff…
- Zarr V2 spec is approved data format convention for use in NASA Earth Science Data Systems (ESDS)
- Giovanni in cloud - duplicates Zarr (variable based)
- Open issue: continuously updating Zarr stores - Exploring lakeFS for managing dynamic data
- ZEP0005
- Brianna is leading the GeoZarr work
- VEDA - no. of things Zarr/STAC related going on in VEDA
- TN: Does Giovanni read Zarr directly? If so which reader does it use? (Can Goivanni use VirtualiZarr?)
- BP: Goivanni promotes variable first search - most of Goivanni has OpenDAP attached to it - builts with overhead with GES DISC pipeline - in hindsight- Yes!
- TN: From the slides - Xarray can take care of some of the stuff that Giovanni does
- TN: Very curious about the exact difference between the LakeFS idea and EarthMover’s ArrayLake
- BP: LakeFS is OS ArrayLake - no vendor lock-in
- SV: What does Giovanni actually do when you say, ‘it grooms metadata’?
- BP: Standardizes the grid - flip the grid - naming mechanism - smoothing the metadata so that it works across various services
- BP: other grooming metadata is for example we have alot of time dimension issues. that's because of scattered best practices for how to store time metadata
- TN: Can we do the flipping with Zarr/VirtualiZarr?
- DB: If you flip at the store level - you'd need to find out the how deep you'd need to go
- BP: Will try to make time standard across the datasets
- BP: https://github.com/briannapagan/quirky-data-checker
- BP: from the Zoom chat
- EP: Converting OME datasets in V3 in upcoming months - quirky tool can be useful
- DB: V3 chunking encoding matches with V3 encoding - you just need to re-write the JSON document
- DB: Playing with sharding - tensorstore is fast - need to figure out the nomenclature
- EP: The bio and geo world have parallel tracks and working in silos
- EP: https://forum.image.sc/t/ome2024-ngff-challenge/97363
- DB: The challenge doesn't seems interesting to me! - convering
JSON
s documents - instead we should be focusing on converting existing data to sharded stoes - much interesting problem
- EP: Bunch of data is non-Zarr and would be working on to push them to cloud and convert it to Zarr