# Parallel polynomial evaluation
We have some set of expressions expressed as a DAG, that we need to compute on polynomials of different basis sizes.
* $L_0$ is the normal Lagrange basis with $n = 2^k$ evaluations.
* $L_i$ is the extended Lagrange basis with $2^i \cdot n = 2^{k+i}$ evaluations.
$L_1$ has $2n$, $L_2$ has $4n$, $L_3$ has $8n$.
Assume we have $p$ threads. We split each evaluation domain $L_i$ into $p$ sections, each of size $2^{k+i}/p$.
> Example: $gate_1 + y \cdot gate_2 + y^2 \cdot gate_3 + y^3 \cdot gate4 + y^4 \cdot gate_5$
>
> Let's say `[gate1, gate2, gate5]` have degree 2, and `[gate3, gate4]` have degree 3 (so they are separate cosets).
>
> We produce DAGs for the following expressions:
>
> - $e_1(x) = gate_1(x) + y \cdot gate_2(x) + y^4 \cdot gate_5(x)$ over $L_1$
> - We need to evaluate this polynomial a total of $2n$ times on different $x$.
> - $e_2(x) = y^2 \cdot gate_3(x) + y^3 \cdot gate_4(x)$ over $L_2$
> - We need to evaluate this polynomial a total of $4n$ times on different $x$.
We compute the maximal degree extension for all columns, from which we can efficiently obtain the required extension for any particular coset group.
We split up the work as follows:
- For $L_1$, each thread works on a width-$2n/p$ section of the polynomial.
- For $L_2$, each thread works on a width-$4n/p$ section of the polynomial.
Each processor thread takes its independent chunk of work (across all coset groups).
PARALLEL WORK HERE
Once all threads have completed their work, we reconstruct the full polynomial from the pieces:
- Construct the result polynomial for each coset group from its per-thread sections.
- IFFT result polynomials for all except the last (highest-degree) coset group.
- Use whatever paralleization strategy makes sense (probably just as we currently do: parallelize inside each FFT, so compute these sequentially).
- Add those in coefficient form.
- FFT that result.
- Add that to the final coset group.
- Divide by the vanishing polynomial (and continue on as usual).