Week 43 Notes/Report
Working on gpu evaluation this week which included but wasnt limited to understanding the evaluation_gpu code in addition to the prover and reading up on acceleration techniques for Plonk using GPU. Looking more into the `ec_gpu` repo. Working on understanding the running stucture of the kernel w.r.t evaluation_gpu and ec_gpu.
### evaluate_h
looking into the Evaluation
x.gpu_eval()
retouching some stuff from week41
```Rust!
x: &ProveExpression<<<C as CurveAffine>::CurveExt as CurveExt>::ScalarExt>
//with prove expression in evaluation_gpu.rs
pub enum ProveExpression<F> {
Unit(ProveExpressionUnit),
Op(Box<ProveExpression<F>>, Box<ProveExpression<F>>, Bop),
Y(BTreeMap<u32, F>), // gate:a=0, gate:2a=0 --> a + y*2a + y^2*3a = a*(f(y))
Scale(Box<ProveExpression<F>>, BTreeMap<u32, F>),
}
```
`_eval_gpu()` is:
```Rust!
pub(crate) fn _eval_gpu<C: CurveAffine<ScalarExt = F>>(...) {
// match on:
ProveExpression::Op(l, r, op)
ProveExpression::Y(ys)
ProveExpression::Unit(u)
ProveExpression::Scale(l, ys)
}
```
Expressions GPU evaluation is done by
processing the elements of `pk.ev.gpu_gates_expr`, which is a vector of `ProveExpression`'s. The function that does this processing (or evaluation) is `x.eval_gpu(...)`, i.e.,
```Rust!
x.eval_gpu(group_idx, pk, &unit_stat, &advice_poly[0], &instance_poly[0], y))
```
which is called for each element in th vec --> below:
```Rust!
gpu_gates_expr: Vec<ProveExpression<<C as CurveAffine>::ScalarExt>, Global>
```
The result of which is a `Polynomial` with CurveExt & ScalarExt. The final result is stored in `values`.
Does this using Rayon `.par_iter()`
### eval_gpu:
in `evaluation_gpu.rs`
sets `closures` to the cuda program (from ec_gpu_gen), sets up cache_size, and other vars.
helper = gen_do_extended_fft(pk, program)
buffers the values of the `_eval_gpu(...)`
**WIP**: Still working on the writeup for each `ProveExpression`.
### general notes on MSM, NTT, prover
Prover commits to wire polynomials $A(x)$, $B(x)$, $C(x)$ that encode the private inputs and gate polynomials $Q_L(X), Q_R(X), Q_M(X), Q_O(X), Q_C(X)$ that encode the structure of the circuit.
The prover also commits to permutation polynomials $\sigma_1(X)$, $\sigma_2(X)$, $\sigma_3(X)$ that encode all copy constraints to ensure consistency between wires of different gates.
The quotient polynomial $Q(x)$ summarizes the entire circuit, and computing the commitment to $Q(x)$ along with an evaluation proof serves as the proof $\pi$.
**NTT:**
The wire polynomials (that the prover commits to) $A(x)$, $B(x)$, $C(x)$ are constructed by performing a series of Lagrange Interpolations. Recall given a set of n points, Lagrange Interpolation constructs a polynomial $F(x)$ of degree $N-1$ that passes through all of them. This algorithm incurs a deficient $O(n^2)$ time complexity, prompting Number Theoretic Transforms (NTTs) with $O(n log n)$ complexity to be employed instead.
### other reading encountered:
https://github.com/AztecProtocol/barretenberg
https://scroll.io/blog/proofGeneration#phase-1-filling-in-the-trace-table
https://www.paradigm.xyz/2022/04/zk-hardware
https://ieeexplore.ieee.org/document/9499783
https://github.com/matter-labs/z-prize-msm-gpu
https://eprint.iacr.org/2022/1321.pdf