# Zarr chunk/shard introspection
what users want to know vs. what the API provides
Status: written 2026-06-03, verified against zarr-python `main` (`3.2.2.dev40+gfe229107`) and the xarray `poc/unified-zarr-chunk-grid` branch. All example outputs below were produced by running the code, not inferred.
## Context
Zarr v3.2.0 introduced rectilinear (variable-sized) chunk grids behind the `array.rectilinear_chunks` config flag. This split the formerly one-property world (`Array.chunks`) into a family of partially overlapping accessors: `.chunks`, `.shards`, `.read_chunk_sizes`, `.write_chunk_sizes`. Downstream consumers (xarray's zarr backend being the motivating case) need to answer a small set of questions about any array, and today some of those answers require exception handling or imports from `zarr.core` internals. This doc maps the questions to the API and lists concrete improvement proposals.
## The four array configurations
| | Regular grid | Rectilinear grid |
|---|---|---|
| **Unsharded** | the classic case | chunk edges enumerated in metadata |
| **Sharded** | regular outer shards, regular inner chunks | rectilinear outer shards, regular inner chunks (rectilinear *inner* chunks with sharding are rejected at creation: `"Rectilinear chunks with sharding"`) |
## Question matrix
Each cell says how to get the answer, or why you can't.
| What the user wants to know | Regular | Regular + sharded | Rectilinear | Rectilinear + sharded |
|---|---|---|---|---|
| **Declared chunk shape** (what was passed at creation, unclipped) | `.chunks` → `(30,)` | `.chunks` → inner chunk shape | n/a — no single shape; the declared edge listing is in metadata | n/a for outer; `.chunks` raises even though inner chunks are regular ⚠️ |
| **Declared shard shape** (unclipped) | `.shards` → `None` | `.shards` → `(60,)` | `.shards` → `None` | **raises `NotImplementedError`** ⚠️ |
| **Is the array sharded?** | `.shards is not None` | `.shards is not None` | `.shards is not None` | **unanswerable without `try/except`** — `.shards` raises ⚠️ |
| **Is the chunk grid regular or rectilinear?** | **no public API** — either `try: .chunks / except NotImplementedError` or `isinstance(array.metadata.chunk_grid, RegularChunkGridMetadata)` with imports from `zarr.core.metadata.v3` ⚠️ | same | same | same |
| **Per-chunk read sizes** (dask convention, clipped to extent, inner under sharding) | `.read_chunk_sizes` → `((30, 30, 30, 10),)` | `.read_chunk_sizes` → inner sizes | `.read_chunk_sizes` → `((10, 20, 30),)` | `.read_chunk_sizes` → inner sizes, e.g. `((10,)*12,)` |
| **Per-storage-chunk sizes** (outer/shard granularity, clipped) | `.write_chunk_sizes` (= read sizes) | `.write_chunk_sizes` → `((60, 40),)` | `.write_chunk_sizes` (= read sizes) | `.write_chunk_sizes` → `((60, 40, 20),)` |
| **Chunk boundaries/offsets** (for write-alignment validation) | `itertools.accumulate` over a sizes listing | same | same | same |
| **Number of chunks per dimension** | `len()` of a sizes listing, or derive from shape and `.chunks` | same | same | same |
| **Can I open/create this at all?** | always | always | requires `zarr.config.set({"array.rectilinear_chunks": True})` for both read and write; metadata parse raises otherwise | same |
⚠️ marks the cells where the API forces exception-driven control flow, internal imports, or has no answer.
## Key semantic distinctions (easy to get wrong)
**Clipped vs. declared.** Both `read_chunk_sizes` and `write_chunk_sizes` clip boundary chunks to the array extent. The declared shape is only recoverable from `.chunks` / `.shards`. Verified examples:
```python
zarr.create_array({}, shape=(5,), chunks=(100,), dtype="i4").read_chunk_sizes # ((5,),) — the declared 100 is gone
zarr.create_array({}, shape=(100,), chunks=(30,), dtype="i4").read_chunk_sizes # ((30, 30, 30, 10),)
zarr.create_array({}, shape=(100,), chunks=(10,), shards=(60,), dtype="i4").write_chunk_sizes # ((60, 40),) — declared (60,) clipped
```
Consequence for round-trips: a consumer that captures only the clipped listings and writes them back will change the on-disk declaration (e.g. chunk size 100 becomes 5 for a short array, and a regular grid with a partial final chunk becomes a rectilinear grid, since zarr's `ChunkGrid.from_sizes` only collapses uniform *full-coverage* edge lists to a regular grid).
**Inner vs. outer under sharding.** `.chunks` and `read_chunk_sizes` report inner (read-granularity) chunks; `.shards` and `write_chunk_sizes` report outer (storage-granularity) shards. For unsharded arrays the two listings coincide, which means `read_chunk_sizes == write_chunk_sizes` cannot be used as a sharding test (a sharded array with one inner chunk per shard would also coincide).
**`.shards` is tri-state.** `None` = unsharded, tuple = sharded regular, raise = sharded rectilinear. The `None`-ness doubles as the "is sharding enabled" signal, so the raise in the rectilinear case takes that signal down with it.
## Improvement proposals (zarr-side)
Ordered roughly by value to downstream consumers. These fit naturally into the rectilinear stabilization window (the config flag is slated for removal within ~6 months of 3.2.0).
1. **Public grid-kind predicate.** `Array.is_regular_chunk_grid -> bool` (or expose the grid kind as an enum/property). Today the only options are catching `NotImplementedError` from `.chunks` or `isinstance` against `zarr.core.metadata.v3.RegularChunkGridMetadata`, which is not public API. Every consumer that supports both grid kinds needs this branch; none should need internal imports for it.
2. **Non-raising sharding predicate.** `Array.is_sharded -> bool`. Restores the "is sharding enabled" signal that `.shards` loses for rectilinear grids. The `.shards` raise itself is defensible (the return type cannot express varying shard sizes, mirroring `.chunks`), so add a predicate rather than change the raise.
3. **Declared (unclipped) sizes accessor.** Something like `Array.declared_chunk_sizes` / a public view of the chunk-grid declaration, so round-trip consumers can preserve on-disk declarations without stitching together `.chunks`, `.shards`, and two exception handlers. For rectilinear grids the listing is the declaration, so this is mostly about regular grids with partial or oversized boundary chunks.
4. **`.chunks` for sharded rectilinear arrays.** The inner chunk shape is regular in this configuration (rectilinear inner chunks are rejected with sharding), yet `.chunks` raises because the check is on the outer grid type. Arguably `.chunks` could answer with the inner chunk shape here; at minimum the error message could note that `read_chunk_sizes` gives the (regular) inner sizes.
5. **Flag-removal timeline.** Library consumers cannot reasonably ask their users to set `zarr.config.set({"array.rectilinear_chunks": True})`; xarray currently documents this requirement. Removing the flag (or providing a per-call opt-in) unblocks making rectilinear support transparent downstream.
## What this means for xarray (current consumer workarounds)
- `open_store_variable` branches with `try: tuple(zarr_array.chunks) / except NotImplementedError: <sizes listing>` — would become an `is_regular_chunk_grid` check under proposal 1.
- `encoding["shards"] = zarr_array.shards` is currently unguarded and crashes `open_zarr` on any sharded rectilinear store (verified end-to-end); the fix needs `try/except NotImplementedError → write_chunk_sizes`, which proposal 2 would reduce to a plain conditional.
- `encoding["chunks"]` must come from `.chunks` for regular grids (declared shape, see clipping above) and from `read_chunk_sizes` for rectilinear grids; proposal 3 would collapse this to one accessor.
- The `requires_zarr_rectilinear_chunks` test gate and the documented `zarr.config.set` requirement both disappear with proposal 5.