Outsourcing local erasure coding

Outsourcing local erasure coding -------------------------------- The purpose of local erasure coding (we used to call this "slot-level EC" in old Codex) is to increase the strength of storage proofs based on random sampling. The core principle behind this idea is the distance amplification property of Reed-Solomon codes. The concept is simple: If we encode $K$ data symbols into $N$ code symbols, then for the data to be irrecoverably lost, you need to lose at least $N-K+1$ symbols (it's a bit more complicated with data corruption, but let's ignore that for now). In a typical case of $N=2K$, this means that checking just one randomly chosen symbol gives you approximately $p\approx 1/2$ probability of detecting data loss. ### Outsourcing to the provider In "old Codex", this encoding (together with the network-level erasure coding) was done by the client before uploading. However, it would preferable to outsource the local encoding to the providers, for several reasons: - the providers typically have more computational resources than the clients (especially if the client is for example a mobile phone) - because the network chunks are hold by different providers, the work could be distributed among several providers, further decreasing the per-person work - if it's the provider who does it, it can be postponed until enough data (possible many small pieces from many different clients) is accumulated to make the resulting data unit size economical However, in general we don't want to trust the provider(s), but instead verify that they did it correctly. We need to verify 3 things: - the original data (authenticated by a Merkle root or similar commitment) is kept intact; - the encoding is done correctly; - the new Merkle root corresponds to the encoded data. Fortunately, there is a way to do all this. ### FRI low-degree check The FRI protocol (short for "Fast Reed-Solomon Interactive Oracle Proofs of Proximity") is an interactive protocol in which the prover convinces the verifier that a Merkle-committed vector of finite field elements is _close to_ a Reed-Solomon codeword (with rate $\rho=2^{-r}$). Note that obviously we cannot do better than "close to" without checking every single element of the vector (which obviously wouldn't be a useful approach), so "close to" is an acceptable compromise. As usual we can make this non-interactive via the Fiat-Shamir heuristic. This gives us a relatively simple plan of attack: - the client (uploader) calculates the Merkle root of the original data (assumed to have size $K=2^k$) - the provider calculates the parity data (same amount, so the encoding has rate $\rho=1/2$), and its Merkle root - the Merkle tree of the codeword (the concatenation of the original data and the parity data) is built by attaching these two subtrees a to single root node. That is, the Merkle root of the codeword is the hash of the pair of roots, of the original data and of the parity data - the provider executes the FRI protocol to prove that the codeword is in fact (close to) a Reed-Solomon codeword - the provider also distributes the Merkle root of the parity data together with the FRI proof. This is the proof (a singleton Merkle path) connecting the original data Merkle root and the codeword Merkle root (storage proofs will be validated against the latter) Remark: If the 2x storage overhead is too big, after executing the protocol, we may try to trim some of the parity (say 50% of it). You can probably still prove some new Merkle root with a little bit of care, but non-power-of-two sizes make everything more complicated. Remark #2: There are also improved versions of the FRI protocol like STIR and WHIR. I believe those can be used in the same way. ### Batching FRI is a relatively expensive protocol. I expect this proposal to work well for say up to 1 GB data sizes, and acceptably up to 10 GB of data. But directly executing FRI on such data sizes would be presumably very expensive. Fortunately, FRI can be batched, in a very simple way: Suppose you want to prove that $M$ vectors $v_i\in \mathbb{F}^N$ are all codewords. To do that, just consider a random linear combination (recall that Reed-Solomon is a linear code) $$ V := \sum_{i=1}^M \alpha^i v_i $$ with a randomly chosen $0\neq\alpha\in\mathbb{F}$ (choosen by the verifier or via Fiat-Shamir). It's very unlikely that any of $v_i$ is _not a codeword_ but $V$ is. So it's enough to run the FRI on the combined vector $V$. Note: If the field is not big enough, you may need to either repeat this with several different $\alpha$-s, or consider a field extension. This is the case for example with the Goldilocks field, which has size $|\mathbb{F}|\approx 2^{64}$. Plonky2 for example choses $\alpha\in\widetilde{\mathbb{F}}$ from a degree two field extension $\widetilde{\mathbb{F}}$ (so approx. 128 bits), which is big enough for any practical purposes. FRI is then executed in that bigger field. This approach has another nice consequence: Now instead of doing one big RS encoding, we have to do many smaller ones. This is good, because: - that's always faster (because of the $O(N\log(N))$ scaling of FFT) - it needs much less memory - can be trivially parallelized This "wide" approach is basically the same as the one used in Plonky2 and Plonky3. ### Field and hash choice We need to choose a prime field (but see below) for the Reed-Solomon encoding, and one (or more) hash functions to construct the Merkle tree. Just for executing the FRI protocol, the hash function could be any (cryptographic) hash, and we could even use different hash functions for constructing the row hashes and the Merkle tree. However, if in the future we want to do recursive proof aggregation, then since the Merkle path proofs need to use the same hash function, it's better to choose a ZK-friendly hash. With these in mind, a reasonable choice seems to be the Goldilocks field ($p=2^{64}-2^{32}+1$) and the Monolith hash function (which is one of the fastest ZK-friendly hashes). This way the Reed-Solomon encoding and the hash function uses a compatible structure. Remark: While in principle both FFT and FRI should work with a binary field instead of a prime field (see eg. FRI-Binius), I'm not at all familiar with that approach, so let's leave that for future work. Also, if we want to do recursive proof aggregation, again using prime fields for such proof systems is more common (but also, in principle that should be possible too with a binary field). ### Data layout So the basic idea is to organize the data into a $2^n\times M$ matrix of field elements. Then extend each column via a rate $\rho=1/2$ Reed-Solomon code to a matrix of size $2^{n+1}\times M$, so that top half is the original data, and the bottom half is parity. Then hash each row, and build a Merkle tree on the top of the row hashes (this is the structure we need for the batched FRI argument). #### Row hashes However, we have a lot of freedom on how to construct the row hashes. The simplest is of course just a linear (sequential) hash. This is efficient (linear sponge hash with $t=12$ should be faster than building a Merkle tree, as you can consume 8 field elements with a single permutation call, instead of an average 4 with a binary Merkle tree); however a disadvantage is that a Merkle path proof needs to include a whole row, which can be potentially a large amount of data if $M$ is big. The other end of the spectrum is to use a Merkle tree over the rows too. Then the Merkle path is really only a Merkle path. However, to make this reasonably simple, we need $M$ to be a power-of-two. We can also go anywhere inbetween, which is what I suggest: Split the row into say 8, 16 or 32 chunks; hash those individually; and build a tiny Merkle tree (of depth 3, 4 or 5, respectively) on the top of them. The Merkle root of this small tree will be the row hash. This looks like a nice compromise with the good properties of both extreme cases, while also keeping the Merkle trees complete binary trees (that is, power-of-two number of leaves). A problem though with this approach is that a single Merkle path doesn't really "proves" the full file, except if that you also have the full row data included. Which can be much bigger than the Merkle path itself... We also have the freedom to reorder the rows (except keeping the original data in the top half and the parity in the bottom; this is needed to be able to connect the Merkle root of the original data with the Merkle root of the encoded data). Such reordering can help making the FRI protocol more efficient. #### Choice of the width We may need to constraint the width $M$ too. This is unfortunate, because the height is already constrained to be a power of two, so we may end up with too many constraints (requiring too much padding and thus making the system less efficient). One reason to constraint the width, if we want our Merkle tree to be compatible with the network block structure. Recall that from the network point of view, the datasets are organized in blocks of uniform size (currently 64kb blocks), and then having a SHA256 Merkle tree on the top of them. Though I'm not sure at the moment if this compatibility is required at all? But if we want this, then we want a row, or a small (power of two) number of rows to contain exactly a single network block's data. Recall that with Goldilocks field we can pack either 7 bytes into a single field element, or somewhat better, 31 bytes into 4 field elements. So one possible arrangement could be for example to have rows of size $M=268$, then each row can fit $67\times 4$ field elements, that's $67\times 31 = 2077 > 2048$ bytes. Then 32 rows can hold 64kb. Another (wider) version could be $265\times 4$ field element per row, fitting $265\times 31 = 8215 > 8192$ bytes, so that 8 rows hold 64kb. ### Bundling of files The idea was that when we have to deal with lots of small files, we can start collecting and merging them, until the size of this bundle reaches an economical size (we don't want too many proofs circulating in the network). When a good size is reached (or too much time passed), we can then do the erasure coding, kind of "sealing" the bundle (though in principle we could collect even more and redo the encoding the same way). #### Discrepancy of lifetimes If we have a lot of small files (from potentially many users), they most probably have different expected lifetimes. However, I expect that typical small files have similar lifetimes (probably something on the order of 1 month). And keeping a file for a little bit longer shouldn't be a big issue. Furthermore it's the provider who chooses what to bundle together, so they can select for similar expiries (maybe building up several bundles at the same time). So I don't see this as a serious issue. #### Discrepancy of file sizes Files can also have different sizes, and non-power-of-two sizes. Merging two Merkle trees is the simplest when they both have the same number of leaves, which number is also a power of two. We can in theory pad any file to have power-of-two number of Merkle leaves, though this results in waste. But again, if we expect a distribution of (padded) file sizes, then we can try some heuristic to pack them nicely. For example: `(((1+1)+2)+4+4+4)+16 = 32` #### How does this interacts with repair? Well obviously such a bundle is lost, then the cost of repair is bigger than if a chunk of a small file is lost. However, we still the have information that where different chunks of the files in the different parts of the bundle are (that is required just for downloading them anyway), so in principle the repair can be, alas with a larger number of network connections (because it's unlikely that other chunks of the constituent files are located at the same providers). Another question is whether we want to repair the whole bundle, or the constituent files separately. ### The FRI protocol The FRI protocol (invented by Eli Ben-Sasson et al) consists of two phases: - the commit phase - and the query phase Note that somewhat confusingly, the commit phase is NOT calculating the Merkle commitment of the Reed-Solomon encoded data above; that's kind of "external" to the protocol, as we want to prove that an already committed vector is (close to) a codeword. We will first describe the simplest version, then describe optimizations which are commonly used. #### The commit phase First we calculate the combined vector (linear combination of the columns); this is completely straightfoward. This vector is interpreted as the values of a polynomial on a multiplicative subgroup (with initial size $2^{n}/\rho = 2^{n+1}$ as we use rate $\rho=1/2$; however with degree only $2^n-1$). Then the prover will repeatedly "fold" this polynomial, halving both the size and the degree, until it reaches a constant polynomial (which will be represeneted by a vector of size two, both values being the same). Each folded version is committed to using a Merkle tree, and at the very end the final singleton coefficient is sent in clear. In the query phase, we will then check that the folding was done correctly. The folding step works as follow: Let $p^{(k)}(x)$ be the polynomial before the $k$-the folding, then if $$ p^{(k)}(x) = p^{(k)}_{\textrm{even}}(x^2) + x \cdot p^{(k)}_{\textrm{odd}}(x^2) $$ the next, folded polynomial will be $$ p^{(k+1)}(y) := p^{(k)}_{\textrm{even}}(y) + \beta_k\cdot p^{(k)}_{\textrm{odd}}(y) $$ with $\beta_k \in\widetilde{\mathbb{F}}$ chosen by the verifier (or in practice, via Fiat-Shamir); and we evaluate it on the half-sized domain (multiplicative subroup) $D^{(k+1)} := (D^{(k)})^2$, generated by $\omega^2$ if the previous one was generated by $\omega$. Note: Sometimes the domains are shifted to be a coset instead of a subgroup (this is called the DEEP method?). Note: We have $$ \begin{align*} p_{\textrm{even}}(x^2) &= \frac{1}{2}\Big[ p(x) + p(-x) \Big] \\ p_{\textrm{odd}}(x^2) &= \frac{1}{2 x}\Big[ p(x) - p(-x) \Big] \end{align*} $$ This should be instantly recongizable as a step in the inverse FFT algorithm. #### The query phase In the query phase, we do several query rounds (enough to match the required security level); each round verifies the folding from a randomly chosen starting point. First, the verifier chooses a random $$x^{(0)}=\eta\cdot\omega^k\in D^{(0)}=\{\eta\cdot\omega^i\;:\; 0 \le i < 2^{n+1}\}$$ We then want to check that for $x^{(1)}:=(x^{(0)})^2=\eta^2\cdot\omega^{2k}$ we have $p^{(1)}(x^{(1)})$ as the expected value. From the above equations, the verifier can compute this from $p^{(0)}(x^{(0})$ an $p^{(0)}(-x^{(0})$. So to do this, the provers opens the two full rows, the $k$-th and the $(k+N)$-th; the verifier can then linearly combine their values with powers of alpha, compute the expected $p^{(1)}(x^{(1)})$ using the above formulas, which the the prover opens again according to the first folded Merkle commitment (from the commit phase). Note: opening the full rows only happens in the very first folding step. But that's required otherwise we would have no connection between the the data matrix and the combined vector(s). Next, the same kind step is iterated with $x^{(i+1)}:=(x^{(i)})^2$, until we get a final constant value. Remark: The verifier should sample the query locations for _all rounds_ before the first round - otherwise a malicious prover could try and cheat round-by-round. #### Optimizations _Merkle caps._ Instead of using Merkle roots to commit, we can use "Merkle caps". This simply means to cut the Merkle tree at a given level; so the commitment will be say 16 hashes instead a single root hash. This is a tradeoff between commitment size and Merkle proof sizes. _Pruning Merkle paths_. Alternatively, we could merge Merkle paths: the top parts will have a lot of shared values. _Folding in larger steps_ (eg. $16\mapsto 1$ or $8\mapsto 1$ instead of $2\mapsto 1$). In this case the computation of a single step will be a (very small) FFT. _Opening full cosets_. We can reoder the rows, so that when we need to open values (a small coset for folding), we can do that with a single Merkle path proof. _Early stopping at a low degree polynomial_. We can stop at say a degree 31 polynomial instead of a constant (degree 0) one. Sending the coefficients of this polynomial in clear is usually a win. _Grinding_. Between the commit phase and the query phase we can add a proof-of-work challenge. This ensures that a malicious prover can only do a much slower rate of brute-force trying to cheat Fiat-Shamir. Note: the challenge (and response) must go into the Fiat-Shamir transcript. #### Security The math behind all this is very complicated; however it's a somewhat intuitive conjecture that the security level (assuming big enough fields) should be approximately $$\lambda \approx \textrm{rate_bits} \times \textrm{num_query_rounds} + \textrm{proof_of_work_bits}$$ A standard security target is 100 bits. In our case the $\textrm{rate_bits} = -\log_2(\rho)$ is 1; so with 16 bits of grinding, we need 84 query rounds. The rate is thus more-or-less a tradeoff between prover speed and proof size. While we need an at most 2:1 expansion from the original data (otherwise the storage overhead would be too big), in theory we could try using a larger expansion rate (say $\rho=1/4$ or $\rho=1/8$), but then after executing the protocol, keeping only the same number of parity data as the original data. This could result in a smaller proof size (less query rounds) but larger prover time. ### References - Eli Ben-Sasson, Iddo Bentov, Yinon Horesh and Michael Riabzev: _"Fast Reed-Solomon Interactive Oracle Proofs of Proximity"_ - Eli Ben-Sasson, Lior Goldberg, Swastik Kopparty and Shubhangi Saraf: _"DEEP-FRI: Sampling Outside the Box Improves Soundness"_ - Ulrich Haböck: _"A summary on the FRI low degree test"_ - Alexander R. Block et al: _"Fiat-Shamir Security of FRI and Related SNARKs"_

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.