# Week 42 / Notes / Report Working on zkWasm's halo2 gpu specific impl. with --features = gpu Working on understanding: ```Rust! ProveExpression::Op(l, r, op) ProveExpression::Y(ys) ProveExpression::Unit(u) ProveExpression::Scale(l, ys) ``` and the GPU evaluation mechics. Looked breifly at `ecc-gpu` branch `halo2-opt-v2` to understand kernel/cl Test environment: ``` Ubuntu 22.04 5.15.0-86-generic NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2 RTX 4080 ``` for repo `halo2-gpu-specific` working in branch `xgao-gpu-expriments` ### `gen pkey:` Start: Assign End: Assign ....................................................................444.266ms Start: generate pkey ··Start: prepare ev gpus number is 1 total vs group is 36 ····Start: group exprs total vs group is 36 elements: cells are <> ``` elements: cells are <a0f8> ... elements: cells are <a1f8> ... ``` etc ····End: group exprs ...........................................................153.795ms depth is 58 depth is 45 `...<print depths down to 0>...` depth is 2 depth is 0 --------- expr part 0 --------- Dump the complexity is ComplexityProfiler ··End: prepare ev ..............................................................461.505ms End: generate pkey .............................................................1.104s ### `create proof:` Start: create proof ··Start: instance k is 18 ··End: instance ................................................................12.749ms ··Start: advice --> ··End: advice ..................................................................282.877ms ··Start: lookups 30 ··End: lookups 30 ..............................................................541.343ms ··Start: lookups commit product ··End: lookups commit product ..................................................423.172ms ··Start: lookups add blinding value ··End: lookups add blinding value ..............................................833.204µs ··Start: lookups msm and fft ··End: lookups msm and fft .....................................................836.289ms ··Start: permutation commit ··End: permutation commit ......................................................93.359ms ··Start: vanishing commit ··End: vanishing commit ........................................................31.464ms ### `h_poly:` ··Start: h_poly ····Start: lagrange_to_coeff_st ····End: lagrange_to_coeff_st ..................................................239.386ms ····Start: expressions gpu eval --> ····End: expressions gpu eval ..................................................1.590s ····Start: permutations --> `evaluation.rs` L877 ····End: permutations ..........................................................182.230ms ····Start: eval_h_lookups --> ····End: eval_h_lookups ........................................................567.423ms ··End: h_poly ..................................................................2.579s ### `vanishing arg / challenger:` ··Start: vanishing construct ··End: vanishing construct .....................................................95.446ms ### `poly eval (compute instance, adice, fixed):` ··Start: eval poly ··End: eval poly ...............................................................306.040ms ··Start: eval poly vanishing ··End: eval poly vanishing .....................................................6.605ms ··Start: eval poly permutation ··End: eval poly permutation ...................................................38.088ms ··Start: eval poly lookups ··End: eval poly lookups .......................................................163.562ms ### `open:` ··Start: multi open ··End: multi open ..............................................................534.920ms End: create proof ..............................................................6.115s write transcript to "/home/x/Workspace/zkWasm/output2/zkwasm.0.transcript.data" -------------------------- ## ev setup & elements/cells dump & complexity dump rom `evaluation.rs` a new `Evaluator` will dump the elements and its cells for the `ProveExpression` ```Rust! let mut e = ProveExpression::new(); ... println!("elements:"); for s in &e { println!("cells are {}", ProveExpression::<C::Scalar>::string_of_bundle(&s.0)) } ``` Which is in `prepare ev` --> `group exprs` phase. Then dump complexity: ```Rust! for (i, e) in es.iter().enumerate() { let complexity = e.get_complexity(); ev.unit_ref_count = complexity.ref_cnt.into_iter().collect(); ev.unit_ref_count.sort_by(|(_, l), (_, r)| u32::cmp(l, r)); ev.unit_ref_count.reverse(); println!("--------- expr part {} ---------", i); println!("complexity is {:?}", e.get_complexity()); println!("sorted ref cnt is {:?}", ev.unit_ref_count); println!("r deep is {}", e.get_r_deep()); } ``` return an `Evaluator` ### evaluate_h prover.rs (the prover) calls `evaluate_h` (in evaluation.rs -> the gpu featured) with self pk.ev.evaluate_h(...) . i.e. `ProvingKey.Evaluator..evaluate_h(...)` . create new BTreemap "units" which is `BTreeMap<usize, ProveExpression>` and "unit_stat" which is `BTreeMap<usize, usize>` for `gpu_gates_expr` (pk.ev.gpu_gates_expr) which is of type `ProveExpression` prefetch `units` and `unit_stat` get values (type Polynomial) run permutations. run lookups. ------------ Evaluator data struct ```Rust! #[derive(Default, Debug)] pub struct Evaluator<C: CurveAffine> { /// Constants pub constants: Vec<C::ScalarExt>, /// Rotations pub rotations: Vec<i32>, /// Calculations pub calculations: Vec<CalculationInfo>, /// Value parts pub value_parts: Vec<ValueSource>, /// Lookup results pub lookup_results: Vec<Calculation>, /// GPU pub gpu_gates_expr: Vec<ProveExpression<C::ScalarExt>>, pub gpu_lookup_expr: Vec<LookupProveExpression<C::ScalarExt>>, pub unit_ref_count: Vec<(usize, u32)>, } ``` ------------ ### expressions gpu eval: perform gpu_eval on each `gpu_gates_expr` filed of `pk.ev.` in a parallel fashon (Rayon). Call ```Rust! x.eval_gpu(group_idx, pk, &unit_stat, &advice_poly[0], &instance_poly[0], y)) ``` i.e., `eval_gpu` (in evaluation_gpu.rs) with the goup index, provingkey, memory cache, advice poly, instance poly & challenge y. in evaluation_gpu.rs ```Rust! fn eval_gpu<C: CurveAffine<ScalarExt = F>>( ``` generates extended fft from provingkey with gpu program. `self._eval_gpu()` is called **more to come WIP.** ------------ ### ProveExpression::Y `y` is the challenge with the exended scalar field. `y: ChallengeScalar<C, Y>` `ys` is a Vec with two elements; the scalar field value of 1 & the challenge value (from above) **working on this WIP.** ------------ ### ProveExpression::Op op = Operation (Sum or Product) the l & r. ```Rust ProveExpression::Op(l, r, op) => {...} ``` e.g., ``` l = Op(Scale(Unit(Advice { column_index: 36, rotation: Rotation(0) }), {407: 0x0000000000000000000000000000000000000000000000000000000000000001}), Scale(Unit(Advice { column_index: 36, rotation: Rotation(1) }), {1: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000}), Sum) r = Unit(Fixed { column_index: 15, rotation: Rotation(0) }) op = Product ----------------------------------------------- --> ProveExpression::Op l = Scale(Unit(Advice { column_index: 36, rotation: Rotation(0) }), {407: 0x0000000000000000000000000000000000000000000000000000000000000001}) r = Scale(Unit(Advice { column_index: 36, rotation: Rotation(1) }), {1: 0x30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000}) op = Sum ``` **working on this WIP.**