# A quick barycentric evaluation tutorial
One powerful technique when working with polynomials is taking a set of evaluations of that polynomial and using that _directly_ to compute an evaluation at a different point. For example, if $P(x)$ is a degree-100 polynomial, you can use the evaluations $P(0), P(1) ... P(100)$ to directly compute $P(101)$, or $P(1247130)$, in $O(N)$ time, without ever reconstructing the polynomial. This post describes how this is done.
_See also, this earlier and more detailed paper by Oxford researchers on the topic: https://people.maths.ox.ac.uk/trefethen/barycentric.pdf_
## General technique
Let $P(x)$ be the polynomial, and $x_1 ... x_N$ as the set of points for which we have evaluations of $P(x)$. Call these evaluations $y_1 ... y_N$.
Think of $P(x)$ as a linear combination $\sum_i y_i L_i(x)$, where $L_i(x)$ is the polynomial that equals 1 at $x_i$ and 0 at all the other x-coordinates in the set.
Now, let us explore how these $L_i$ can be computed. Each $L_i(x)$ can be expressed as:
$$\frac{(x - x_1)(x - x_2)...(x - x_{i-1})(x - x_{i+1}) ... (x - x_N)}{(x_i - x_1)(x_i - x_2)...(x_i - x_{i-1})(x_i - x_{i+1}) ... (x_i - x_N)}$$
To see why this is true, notice that the numerator of the fraction is a polynomial that equals 0 at every evaluation point except $x_i$, and the denominator is just whatever the numerator equals at $x_i$. Hence, the result equals 0 at every evaluation point except $x_i$, and 1 at $x_i$.
Let us denote this denominator $d_i$. Let us also denote the product $(x - x_1)(x - x_2) ... (x - x_N)$ (_without_ skipping any indices) $M(x)$. We can now express $L_i(x)$ as:
$$\frac{M(x)}{d_i(x - x_i)}$$
The $d_i$ values take $O(N^2)$ time to compute (because each one takes $O(N)$ time and there are $N$ of them), but their values depend only on the evaluation points, not the specific polynomial. Hence, they can be precomputed and stored and don't need to be re-computed each time. $M(x)$ takes $O(N)$ time to compute at any specific point. Given these values, each $L_i(x)$ can be computed in $O(N)$ time. Hence, the expression:
$$P(x) = M(x) * \sum_i \frac{y_i}{d_i(x - x_i)}$$
can be computed in $O(N)$ time.
## Special case: roots of unity
In many applications of barycentric evaluation, we can choose the evaluation points. One very convenient choice is the set $\{1, \omega, \omega^2 ... \omega^{N-1}\}$, for an $\omega$ where $\omega^N = 1$ and $N$ is a power of two. The main advantage of this choice is that it removes the need to pre-compute the $d_i$ values, because there is a simple closed-form value for them.
To see why, let us try to simplify the expression for $d_i$:
$$\prod_{0 \le j \ne i < N} (\omega^i - \omega^j)$$
Notice that $\omega^{\frac{N}{2}} = -1$. We can re-express the above as:
$$2 \omega^i * \prod_{0 \le j \ne i < \frac{N}{2}} (\omega^i - \omega^j)(\omega^i + \omega^j)$$
Each term inside the product groups together $(\omega^i - \omega^j)$ and $(\omega^i - \omega^{j + \frac{N}{2}})$, as $\omega^{j + \frac{N}{2}} = -\omega^j$. The term on the left is for $(\omega^i - \omega^{i + \frac{N}{2}})$, which simplifies to $2\omega^i$ for the same reason.
Now, notice that $(\omega^i - \omega^j)(\omega^i + \omega^j)$ combines into $(\omega^{2i} - \omega^{2j})$. So we have:
$$2 \omega^i * \prod_{0 \le j \ne i < \frac{N}{2}} (\omega^{2i} - \omega^{2j})$$
Repeat this, and we have:
$$2 \omega^i * 2 \omega^{2i} * \prod_{0 \le j \ne i < \frac{N}{4}} (\omega^{4i} - \omega^{4j})$$
Keep repeating, and we get:
$$2 \omega^i * 2 \omega^{2i} * 2\omega^{4i} * ... * 2\omega^{\frac{N}{2}i}$$
The product term on the right falls away because the product eventually becomes empty. The remaining term simply becomes $N * \omega^{(N-1)i}$, or:
$$d_i = \frac{N}{\omega^i}$$
Additionally, note that $M(x) = \prod_i (x - \omega^i)$ simply becomes $x^N - 1$, so the entire calculation simplifies down to:
$$P(x) = \frac{x^N - 1}{N} * \sum_i \frac{y_i * \omega^i}{x - \omega^i}$$

Suppose you have a polynomial $P$, and the sample proofs $Q_i = \lfloor \frac{P}{X^{16} - \omega^{i * 16}} \rfloor$. Goal: verify all proofs. Note that for all $i$, $Q_i * (X^{16} - \omega^{i * 16}) = P - I_i$, where $I_i$ is the $deg < 16$ interpolant of the i'th subgroup. We can combine all of these equations with a random linear combination: $\sum Q_i * (X^{16} - \omega^{i * 16}) * r_i = \sum (P - I_i) * r_i$

11/1/2022Along with proof of stake, the other central feature in the eth2 design is sharding. This proposal introduces a limited form of sharding, called "data sharding", as per the rollup-centric roadmap: the shards would store data, and attest to the availability of ~250 kB sized blobs of data. This availability verification provides a secure and high-throughput data layer for layer-2 protocols such as rollups. To verify the availability of high volumes of data without requiring any single node to personally download all of the data, two techniques are stacked on top of each other: (i) attestation by randomly sampled committees, and (ii) data availability sampling (DAS). ELI5: randomly sampled committees Suppose you have a big amount of data (think: 16 MB, the average amount that the eth2 chain will actually process per slot, at least initially). You represent this data as 64 "blobs" of 256 kB each. You have a proof of stake system, with ~6400 validators. How do you check all of the data without (i) requiring anyone to download the whole thing, or (ii) opening the door for an attacker who controls only a few validators to sneak an invalid block through? We can solve the first problem by splitting up the work: validators 1...100 have to download and check the first blob, validators 101...200 have to download and check the second blob, and so on. The validators in each of these subgroups (or "committees") simply make a signature attesting that they have verified the blob, and the network as a whole only accepts the blob if they have seen signatures from the majority of the corresponding committee. But this leads to a problem: what if the attacker controls some contiguous subset of validators (eg. 1971....2070)? If this were the case, then even though the attacker controls only ~1.5% of the whole validator set, they would dominate a single committee (in this case, they would have ~70% of committee 20, containing validators 2001...2100), and so they would be able to control the committee and push even invalid/unavailable blobs into the chain. Random sampling solves this by using a random shuffling algorithm to select the committees. We use some hash as the seed of a random number generator, which we then use to randomly shuffle the list [1..6400]. The first 100 values in the shuffled list are the first committee, the next 100 are the second committee, etc.

10/26/2022WIP The purpose of this document is to describe in more detail a proposal for how phase 1 can be structured based on a "data-availability-focused" approach. The main addition to the beacon chain will be a Vector of ShardDataHeader objects, one for each shard. A ShardDataHeader is a small object which represents a large amount of underlying data (roughly 0-512 kB in size). A block is only valid if the underlying data that the ShardDataHeader points to is available - that is, it has been published to the network and anyone can download it. However, to preserve scalability, clients will not try to download the full underlying data of every ShardDataHeader to verify the block. Instead, they will verify that the data is available using an indirect technique called data availability sampling. Parameters Parameter Value Description

5/20/2022Two paths to a solution exist, and have existed for a long time: weak statelessness and state expiry: State expiry: remove state that has not been recently accessed from the state (think: accessed in the last year), and require witnesses to revive expired state. This would reduce the state that everyone needs to store to a flat ~20-50 GB. Weak statelessness: only require block proposers to store state, and allow all other nodes to verify blocks statelessly. The good news is that recently, there have been major improvements on both of these paths, that greatly reduce the downsides to both: Some techniques for how a ReGenesis-like epoch-based expiry scheme can be adapted to minimize resurrection conflicts Piper Merriam's work on transaction gossip networks that add witnesses to be stateless-client-friendly, and his work on distributed state storage and on-demand availability Verkle trees, which can reduce worst-case witness sizes from ~4 MB to ~800 kB (this is definitely small enough, because existing worst-case blocks that are full of calldata are already 12.5M / 16 ~= 780 kB and we have to handle those anyway). See slides, doc, code.

1/4/2022
Published on ** HackMD**