or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Upstreaming Kerchunk
Summary
We aim to upstream much of the functionality of kerchunk into Zarr and Zarr-python, through a series of individually-useful features.
Context / motivation:
Problems with kerchunk as-is:
Proposal:
VirtualiZarr
packageVirtualZarrArray
allows for wrapping with xarray, greatly streamlining the user experience for data providers tasked with giving access to data via Zarr.Roadmap
We are really talking about a whole roadmap of features here. They can be broken up, and each has an MVP. The top-level list is the feature, the inner-level list is the steps that should be tried to create the MVP.
Feature 0: Storage transformers in zarr-python v3
Idea: Make sure the Zarr-Python 3.0 implementation actually has developed enough to allow adding features 1 and 2 below.
Steps:
Feature 1: "Chunk Manifest" indexing into legacy formats
Idea: Formalize kerchunk’s format for storing byte ranges via a new zarr extension, the so-called “chunk manifest”.
Steps:
kerchunk.backends.SingleHDF5ToZarr
and manipulating the result),MVP: Read this test array from multiple languages
Milestone: Get the chunk manifest ZEP accepted into the Zarr Spec, and implemented in zarr-python
Feature 2: Virtual Concatenation inside Zarr stores
Idea: Formalize the idea of virtual concatenation at the Zarr level via another new zarr extension
Steps:
MVP: Read a Zarr array that was defined through concatenation
Milestone: Get the virtual concatenation ZEP accepted into the Zarr Spec, and implemented in zarr-python
Feature 3:
VirtualZarrArray
python objectIdea: Replace the overloaded
kerchune.combine.MultiZarrToZarr
function with a virtual array type so that all combining of legacy file data can be expressed as array concatenations.Steps:
KerchunkArray
prototype),np.empty_like
VirtualZarrArray
objects and serializable too.MVP: Prototype
VirtualZarrArray
class that supports concatenation and serialization to Zarr on-diskMilestone: Fully-developed
VirtualZarrArray
class that supports concatenation, indexing, NaNs, and serialization, which lives either in zarr-python or in a separate new package ("VirtualiZarr
")Feature 4: Xarray wrapping
VirtualZarrArray
objectsIdea: Make it easy to use xarray semantics (e.g.
xr.concat
orxr.open_mfdataset
) to combine many legacy files into one Zarr store.Steps:
VirtualZarrArray
instead of as a numpy/dask array) - see theKerchunkArray
notebook linked above.VirtualZarrArray
to disk as a new valid zarr array.MVP: Gist showing how to open legacy files as xarray-wrapped
VirtualZarrArray
s and concatenate themMilestone: Provide the xarray backend and accessor along with documentation, living either in zarr-python or in a separate new package ("
VirtualiZarr
").Impact
The end result of this would allow us to:
Example datasets