owned this note
owned this note
Published
Linked with GitHub
*When Chuck Norris uses erasure code, he doesn't just protect data redundancy; he creates a fortress of redundancy so impenetrable that even a single byte missing feels like an apology to him.*
# Erasure code using polynomial interpolation
:::info
**Definition**
An erasure code is a forward error correction (FEC) code under the assumption of bit erasures (rather than bit errors), which transforms a message of `k` symbols into a longer message (code word) with `n` symbols such that the original message can be recovered from a subset of the `n` symbols.
:::
We'll now see an example of erasure core using polynomials.
:::warning
**Warning**
This sections lacks a lot of mathematical formalism, and hides a lot of details about real implementation (like the fact that we'll in reality operate on polynomials over finite fields, while this section operates on polynomials over the real numbers).
However, this section lets the reader to grasp the idea of erasure code using polynomial interpolation.
:::
## Problem setup
Alice has the following 4-bytes message `0x160C0B0D` that she wants to transmit to Bob. To do that, she uses an unreliable network (the internet) with a terrible, very lossy connection.
Alice will encode the message into a 6-bytes encoded message, and Bob will be able to reconstruct the initial 4-bytes message if he receives only **ANY** 75% of the encoded 6-bytes message.
The word **ANY** is really important here. Bob must be able to reconstruct the initial data, if he receives from example:
- Only the first 75% of the encoded message, or
- Only the last 75% of the encoded message, or
- Any combination, as long as Bob received at least 75% (6 byte) of the encoded message
:::info
**Note**
In this section, we will use a code rate of 3/4 (75%), meaning that the initial message size will increase by 50%. Consequently, only 3/4 (75%) of the coded message will be needed to reconstruct the initial message.
:::
## Coding
Let's divide Alice's 4-bytes message into 4 bytes: `[0x16, 0x0C, 0x0B, 0x0D]`.
Each byte will represent one polynomial constraint. We have 4 constraints.
Let's define a polynomial `P` will the following constraints:
- `P(0) = 22` (`0x16`)
- `P(1) = 12` (`0x0C`)
- `P(2) = 11` (`0x0B`)
- `P(3) = 13` (`0x0D`)
`x` evaluated values are integers (0, 1, 2, 3).
`P(x)` evaluated values are the bytes of the message Alice wants to encode.
Using the polynomial Lagrange interpolation, Alice finds the unique polynomial of degree at most `4-1 = 3` which complies with these 4 constraints. She gets:
$$
P(x) =-x^3 + \frac{15}{2}x^2 - \frac{33}{2}x + 22
$$

## Extension
Alice adds 2 extension points. She chooses 2 points corresponding to `x=3` and `x=4`.
Evaluating `P` at these points gives:
- `P(4) = 12` (`0x0C`)
- `P(5) = 2` (`0x02`)

## Communication
Instead sending `0x160C0B0D` to Bob, Alice will:
- Agrees with Bob she will send him 6 `(x, P(x))` data points, with `P` a polynomial of degree at most `3`.
- Sends to Bob: `[(0x00, 0x16), (0x01, 0x0C), (0x02, 0x0B), (0x03, 0x0D), (0x04, 0x0C), (0x05, 0x02)]` corresponding to 6 `(x, P(x))` tuples.
- Agree with Bob that the initial data is: `[P(0), P(1), P(2), P(3)]`.
Because the connection between Alice and Bob is lossy, Bob receives only 4 tuples out of 6:
- `(0x00, 0x16)`,
- `(0x02, 0x0B)`,
- `(0x04, 0x0C)`, and
- `(0x05, 0x02)`
## Reconstruction
From received bytes, Bob deduces:
- `P(0) = 22` (`0x16`)
- `P(2) = 11` (`0x0B`)
- `P(4) = 12` (`0x0C`)
- `P(5) = 2` (`0x02`)

Bob knows the original message is `[P(0), P(1), P(2), P(3)]`.
However, `P(1)`and `P(3)`are missing.
Bob knows the 4 points belong to the the unique polynomial of degree at most `3`, so he can reconstruct the polynomial using the Lagrange interpolation (just as Alice previously did, but with the `4` points corresponding to the 4-bytes initial message).
Bob ends up with this polynomial (which is exactly the same Alice computed previously):
$$
P(x) =-x^3 + \frac{15}{2}x^2 - \frac{33}{2}x + 22
$$

Bobs now evaluates `P` at `x=1` and `x=3`.
He finds:
- `P(1) = 12` (`0x0C`)
- `P(3) = 13` (`0x0D`)

Bob is now able to reconstruct the first 4 tuples: `[(0x00, 0x16), (0x01, 0x0C), (0x02, 0x0B), (0x03, 0x0D)]` and the initial message: `0x160C0B0D`.
:::info
**Note**
If Bob receives less than 4 points, then he will be unable to reconstruct the corresponding unique polynomial with degree at most `3`.
Actually, he will be able to reconstructe an infinity of polynomials with degree at most `3`, including the original polynomial.
However, he won't know exactly which one is the good one.
:::
:::info
**Note**
The advantage of using such an erasure code technique may not be obvious at the first sight:
- Without the erasure code, Alice sends 4 bytes of data to Bob.
- With the erasure code, Alice sends 6 bytes of data (+ `x` coordinates) to Bob.
**Erasure code consumes more data.**
However, the important point here is:
- Without erasure code, Bob must receive **exactly** 4 bytes out of 4 (100%) sent by Alice.
- With erasure code, Bob may receive only **any** 4 bytes out of 6 (75%) sent by Alice.
:::
## Some erasure code real life usage
### QR codes
This QR codes embeds the offchainlabs.com link.

If we deteriorate some parts of this QR code, it is still possible to retrieve the (erasure coded) information embedded inside it.

### Compact disc
A scratches deteriorated compact disc (up to a certain extent...) may continue to be readable, because data stored into is erasure coded.
### Ethereum - beacon chain blobs peer DAS
This precise case will be studied in details.