There are several concurrent requirements that our erasure coding and data placement needs to satisfy, thus we need to first understand what this requirement are before engaging in the minutia of the specific layouts. I'll try to outline the high level of this requirements and provide a few directions which I believe might be interesting to explore in this context.
Our requirements are concurrent and in some cases conflicting, thus some level of compromise is expected, however, we should be careful to make the right kind of compromises. I've been using the following framework to aid in deciding where the compromises are most acceptable (in order of importance):
We have several high level requirements which can be classified as immediate and future.
Our immediate requirements are:
Our future requirements are:
NOTE: We should future prove our solution with this requirements in mind.
Another factor to keep in mind is metadata. Metadata has significant storage overhead, thus, it has the potential of becoming a complicating factor if not managed carefully.
In our current system, all the metadata is kept in a manifest file which is stored and distributed as a regular block in the Codex network. The manifest contains a list of hashes which comprise the dataset. A manifest can be "protected", which means that it contains additional erasure coding parameters and the parity blocks in addition to the original blocks.
Note, the manifest doesn't contain the actual blocks, only the hashes of the blocks, blocks are stored independently from the manifest and several manifests could point to the same subset of blocks, which could be stored uniquely or duplicated across storage sets or other unrelated nodes.
So far, we've been able to keep the complexity of the metadata to a minimum. Concretely block hashes are recorded sequentially and their position in the blocks list/array determines their position in the dataset. This approach has kept the complexity of retrieval to a minimum. For example, downloading a dataset simply means reading all the hashes in the blocks
array and requesting them sequentially from the network or the local store. This might not be the case anymore once we factor in all the requirements. For example, the actual layout of the data might change based on the specific EC layout we choose, i.e. blocks might not anymore be laid out sequentially, which would require some level of correction at the access level, this seems like a reasonable overhead/complexity given all of the above requirements.
It seems reasonable to think, given all the current and future requirements that we need some level of future proving our manifests. In this context, Cid
already provide some level of control, but versioning might be a better way of signaling features?
Given the complexity of the requirements, I propose we address them iteratively. The current approach document in #85 has some obvious flaws that conflict with the requirements as they are laid out above. Some proposed ideas in #119.
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing