owned this note
owned this note
Published
Linked with GitHub
# Problems with the registered attributes mechanism (and a proposed solution)
## Governance issues
- 1.1 Centralized approval
- The requirement for the ZSC to bless chunk-grids, chunk-key-encodings, and data-types seems appropriate given the impact on zarr implementations. Many registered attributes do not impact implementations; requiring them to go through ZSC seems like an unnecessary delay. There could be a broader group of people that could maintain registered attributes.
- 1.2 Steering Council lacks domain expertise for all areas
- There may be extension names that are confusing to the entire ZSC but useful for the community proposing them. Requiring ZSC approval would delay adoption and reduce momentum in those communities.
- 1.3 - No clear resolution process if ZSC members disagree about a registered attribute
- E.g., [#21](https://github.com/zarr-developers/zarr-extensions/pull/21) - Ryan approved and Josh has reservations; how does the proposer know what happens next?
- 1.4 Contradictory normative/non-normative messages in the zarr-extensions repository
- The [zarr-extensions](https://github.com/zarr-developers/zarr-extensions) repository starts with "This repository is the normative source of registered names for the Zarr v3 specification." but later says "Strictly speaking, registered attributes are not extensions, because the attributes dictionary in Zarr arrays and groups may be populated with arbitrary metadata" and "implementations do not have strict guarantees about the contents of the attributes dictionary and are not required to fail if the attributes dictionary contains unknown keys." which adds confusion. It would be better to separate the management of normative components and optional catalog of conventions.
- 1.5 - Registered attributes require licensing convention under CC Attribution 3.0 Unported License
- This seems unnecessary restrictive, it would be nice to at least have a mechanism for commercial extensions to be shared
## Discoverability issues
- 2.1 No URLS in metadata for finding specifications, so a library must first look through the Zarr extensions repository to discover the options
- 2.2 Metadata doesn't provide sufficient information to determine if a convention is used
- e.g., while unlikely, someone could be using `geo:proj` for a different purpose than https://github.com/zarr-developers/zarr-extensions/pull/21. Since the registered attributes map to the zarr metadata but not vice-versa, a tool cannot guarantee that that convention is used. Also applies to `_FillValue`.
## Evolution issues
- 3.1 No maturity classification system
- Leads to a very high bar for centralized approval. For example, if we had a maturity system, we could have specified a `pilot` classification to the `geo:proj` ([#21](https://github.com/zarr-developers/zarr-extensions/pull/21) proposal to expedite merging.
- 3.2 No deprecation mechanism
- Also leads to a high bar for merging and makes it more difficult for users to know whether to adopt a registered attribute. If a `deprecated` status were available for the registered attribute, the keys used could be released easily for use by other conventions.
- 3.3 No clear versioning strategy for attributes
- No mechanism to iterate on the attribute convention and for a zarr implementation/tool to know which iteration was used
## Proposed solution
Adopt STAC's mechanism for attribute-based extensions, which addresses the governance, discoverability, and evolution issues identified above.
### Key components of the STAC approach
#### 1. Self-describing extensions with schema URIs
Each extension includes a schema URI directly in the metadata that points to its specification. For example:
```json
{
"attributes": {
"zarr_extensions": [
"https://zarr-extensions.github.io/projection/v1.0.0/schema.json"
],
"proj:epsg": 4326
}
}
```
This solves **discoverability issues 2.1 and 2.2** by:
- Providing direct URLs to specifications in the metadata
- Making it explicit which conventions are being used
- Allowing tools to programmatically validate against the schema
#### 2. Decentralized governance with multiple registries
STAC maintains a [central registry](https://github.com/stac-extensions/stac-extensions.github.io) but allows extensions to be hosted anywhere. Extensions are classified by maturity:
- **Stable**: Well-tested, widely adopted, breaking changes require new major version
- **Candidate**: Implemented and in use, but may still evolve
- **Proposal**: Under discussion, experimental
- **Deprecated**: No longer recommended for new implementations
This addresses **governance issues 1.1, 1.2, and 1.3** by:
- Removing the requirement for centralized approval for all extensions
- Allowing domain experts to maintain their own extensions
- Providing a clear maturity pathway without blocking innovation
And addresses **evolution issue 3.1** by providing a maturity classification system.
#### 3. Semantic versioning in schema URIs
Extension URIs include version numbers (e.g., `v1.0.0`, `v1.1.0`, `v2.0.0`), making it explicit which version is being used:
```json
{
"stac_extensions": [
"https://stac-extensions.github.io/eo/v1.1.0/schema.json",
"https://example.org/custom-extension/v0.2.0/schema.json"
]
}
```
This solves **evolution issues 3.2 and 3.3** by:
- Providing clear versioning for each extension
- Allowing deprecation of older versions while maintaining backward compatibility
- Enabling tools to know exactly which version of a convention is used
#### 4. Namespace prefixes tied to extensions
Attributes are namespaced (e.g., `proj:epsg`, `eo:cloud_cover`) and the prefix is tied to the extension URI, preventing collisions and ambiguity.
### How this would work for Zarr
#### Array metadata example
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"zarr_conventions": [
"https://zarr.dev/conventions/v1.0.0/schema.json",
"https://zarr.dev/conventions/geo-proj/v0.1.0/schema.json"
],
"_FillValue": 0,
"geo:proj": {
"type": "GeographicCRS",
"id": "EPSG:4326"
}
}
}
```
_FillValue Model 1
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"_FillValue": 0, # Sentinel value
}
}
```
_FillValue Model 2
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"_FillValue": 0, # uninitialized value
}
}
```
_FillValue Model 1
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"zarr_conventions": [
"https://zarr.dev/conventions/fill-value/v1.0.0/schema.json",
],
"_FillValue": 0, #Sentinel
}
}
```
_FillValue Model 2
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"zarr_conventions": [
"https://zarr.dev/conventions/fill-value/v2.0.0/schema.json",
],
"_FillValue": 0, # Uninitialized
}
}
```
_FillValue Model 2
```json
{
"zarr_format": 3,
"node_type": "array",
"attributes": {
"zarr_conventions": [
"https://zarr.dev/conventions/my-fill-value-value/v1.0.0/schema.json",
],
"_FillValue": 0, # Uninitialized
}
}
```
#### Governance model
1. **Core extensions** (those affecting implementations like chunk grids, codecs): Require ZSC approval, maintained in the official zarr-specs repository
2. **Community extensions** (domain-specific conventions): Can be hosted anywhere, maintained by domain experts, listed in a community registry with maturity badges
3. **Private/experimental extensions**: Can use any URI, no registration required
This resolves **governance issue 1.4** by clearly separating normative components (core extensions) from optional conventions (community extensions).
#### Licensing flexibility
Extensions can use any license appropriate for their domain, with the license specified in the extension schema. The registry only requires that the license be clearly stated.
This addresses **governance issue 1.5** by not mandating a single license for all extensions.
### Benefits of this approach
1. **Faster iteration**: Domain experts can publish and iterate on extensions without waiting for ZSC approval
2. **Clear semantics**: Metadata is self-describing with explicit version information
3. **Tool support**: Tools can validate metadata against schemas and handle unknown extensions gracefully
4. **Backward compatibility**: Old metadata remains valid as extensions evolve
5. **Reduced governance burden**: ZSC focuses on core specification, community maintains conventions
6. **Discoverability**: Extensions are discoverable both through URLs in metadata and through a searchable registry
## Notes
- Recommend a namespace for the attributes
`namespace:`
- Talk about them as conventions rather than extensions
- Overloading of the word extension vs. metadata convention vs. composition with other formats (e.g., putting STAC)
- URL
- How do we preserve?
- Are URIs available long term?
- Are they unique?
- URL should be resolvable, but implementations do not need to resolve it
- Background
- ZEP9, ZEP10 tried this
- Why a centralized repository?
- easier than ZEP
- got stuck on URLs