Week 43 - HackMD

Week 43 Notes/Report Working on gpu evaluation this week which included but wasnt limited to understanding the evaluation_gpu code in addition to the prover and reading up on acceleration techniques for Plonk using GPU. Looking more into the `ec_gpu` repo. Working on understanding the running stucture of the kernel w.r.t evaluation_gpu and ec_gpu. ### evaluate_h looking into the Evaluation x.gpu_eval() retouching some stuff from week41 ```Rust! x: &ProveExpression<<<C as CurveAffine>::CurveExt as CurveExt>::ScalarExt> //with prove expression in evaluation_gpu.rs pub enum ProveExpression<F> { Unit(ProveExpressionUnit), Op(Box<ProveExpression<F>>, Box<ProveExpression<F>>, Bop), Y(BTreeMap<u32, F>), // gate:a=0, gate:2a=0 --> a + y*2a + y^2*3a = a*(f(y)) Scale(Box<ProveExpression<F>>, BTreeMap<u32, F>), } ``` `_eval_gpu()` is: ```Rust! pub(crate) fn _eval_gpu<C: CurveAffine<ScalarExt = F>>(...) { // match on: ProveExpression::Op(l, r, op) ProveExpression::Y(ys) ProveExpression::Unit(u) ProveExpression::Scale(l, ys) } ``` Expressions GPU evaluation is done by processing the elements of `pk.ev.gpu_gates_expr`, which is a vector of `ProveExpression`'s. The function that does this processing (or evaluation) is `x.eval_gpu(...)`, i.e., ```Rust! x.eval_gpu(group_idx, pk, &unit_stat, &advice_poly[0], &instance_poly[0], y)) ``` which is called for each element in th vec --> below: ```Rust! gpu_gates_expr: Vec<ProveExpression<<C as CurveAffine>::ScalarExt>, Global> ``` The result of which is a `Polynomial` with CurveExt & ScalarExt. The final result is stored in `values`. Does this using Rayon `.par_iter()` ### eval_gpu: in `evaluation_gpu.rs` sets `closures` to the cuda program (from ec_gpu_gen), sets up cache_size, and other vars. helper = gen_do_extended_fft(pk, program) buffers the values of the `_eval_gpu(...)` **WIP**: Still working on the writeup for each `ProveExpression`. ### general notes on MSM, NTT, prover Prover commits to wire polynomials $A(x)$, $B(x)$, $C(x)$ that encode the private inputs and gate polynomials $Q_L(X), Q_R(X), Q_M(X), Q_O(X), Q_C(X)$ that encode the structure of the circuit. The prover also commits to permutation polynomials $\sigma_1(X)$, $\sigma_2(X)$, $\sigma_3(X)$ that encode all copy constraints to ensure consistency between wires of different gates. The quotient polynomial $Q(x)$ summarizes the entire circuit, and computing the commitment to $Q(x)$ along with an evaluation proof serves as the proof $\pi$. **NTT:** The wire polynomials (that the prover commits to) $A(x)$, $B(x)$, $C(x)$ are constructed by performing a series of Lagrange Interpolations. Recall given a set of n points, Lagrange Interpolation constructs a polynomial $F(x)$ of degree $N-1$ that passes through all of them. This algorithm incurs a deficient $O(n^2)$ time complexity, prompting Number Theoretic Transforms (NTTs) with $O(n log n)$ complexity to be employed instead. ### other reading encountered: https://github.com/AztecProtocol/barretenberg https://scroll.io/blog/proofGeneration#phase-1-filling-in-the-trace-table https://www.paradigm.xyz/2022/04/zk-hardware https://ieeexplore.ieee.org/document/9499783 https://github.com/matter-labs/z-prize-msm-gpu https://eprint.iacr.org/2022/1321.pdf