Let’s start from an idealised world. Assume we have an ideal code that can encode K symbols to N symbols such that any K of N can restore the original K, for arbitrary large K and N. We call R = K/N the code rate.
We encode our data (K symbols) to N, and we sample Q to get a False Positive (at most K pieces there, at least N-K+1 pieces missing, but all samples are there) probability P_FP <= (K/N)^Q = R^Q
Let’s assume symbols are bits. That sounds cool, with r=1/2 and a target P_FP < 1e-9, we just need 30 bits.
In both cases, if the test passes, we know with high probability that the data is there. If instead the test fails, the only thing we know is that we can’t be sure whether the data is there. However, there are large regions of losses, where the data is there, but the test fails almost certainly.
Even in this ideal case, we should speak of differences between centralised and distributed:
In-between the two, there is the case where groups of symbols are stored on a node, but there are also multiple nodes. How to pick the Q in this case?
We should start thinking about
Now, let’s move away from some of the idealistic assumptions:
Now, for this last one (C, D) the “unit” has many names: vector, block, cell, segment …. but overall it is the “unit of erasure” from the erasure code point of view. This means that the guarantees of the erasure code holds only if I treat the whole unit as follows: if any bit from the vector/block/cell/segment is changed, I consider the whole vector/block/cell/segment erased.
This is, however, just the guarantee coming from the erasure code. Things can still be repairable depending on what was actually lost from the underlying vectors, and with assumptions around random erasures, we can still quantify what happens.
Importantly, it does NOT mean that I have to reconstruct the whole vector if there is some loss in it. If I can somehow identify which part of the vector is lost, I can restore that individually. Even if the whole vector is lost, I do not need to restore it as a whole. I can do it symbol by symbol, going through the vector linearly.
As mentioned before, we can also go probabilistic in this dimension, this is what I call sub-sampling. And if we pre-commit to an erasure coded version of this, we can even do the sub-sampling reinforced. This is what I called sub-sampling with erasure code. The important part is to commit to the erasure coded version. It does not matter whether the parity symbols are actually stored or not.
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing