## Week 2 Working on understand he patch for `halo2_proofs` doing some bench & understnadign whats going on with the cache mechanics For a `ProveExpressionUnit` (in evaluation.rs) patch goes, From: ```rust let es = es.into_iter().map(|e| { ProveExpression::reconstruct(e.as_slice()) }).collect::<Vec<_>>(); ``` To: ```rust let es = es.into_iter().map(|e| { ProveExpression::reconstruct(e.as_slice()) let mut es = es.into_iter().map(|e| { println!("elements:"); for s in &e { println!("cells are {}", ProveExpression::<C::Scalar>::string_of_bundle(&s.0)) } let a = ProveExpression::reconstruct(e.as_slice()); a }).collect::<Vec<_>>(); ``` looking into `evaluation_gpu.rs` `impl ProveExpression<F>` checking out the modifications of the cache policy generator From: ```rust pub(crate) fn gen_cache_policy(&self, unit_cache: &mut Cache<Buffer<F>>) { match self { ProveExpression::Unit(u) => unit_cache.access(u.get_group()), ProveExpression::Op(l, r, _) => { l.gen_cache_policy(unit_cache); r.gen_cache_policy(unit_cache); } ProveExpression::Y(_) => {} ProveExpression::Scale(l, _) => { l.gen_cache_policy(unit_cache); } } } ``` which i nthe case of `ProveExpression::Unit` perform a chche access if the expression is a `::Op` or `::Scale` then call recursively call `gen_cache_policy` on left and right operands. To: ```rust pub(crate) fn gen_cache_policy(&self, unit_cache: &mut Cache<Buffer<F>>) { let handle_flat = if let Some ((uid, exprs)) = self.flat_unique_unit_scale() { if exprs.len() > 1 { uid.gen_cache_policy(unit_cache); Some (()) } else { None } } else { None }; if handle_flat.is_none() { match self { ProveExpression::Unit(u) => unit_cache.access(u.get_group()), ProveExpression::Op(l, r, _) => { l.gen_cache_policy(unit_cache); r.gen_cache_policy(unit_cache); } ProveExpression::Y(_) => {} ProveExpression::Scale(l, _) => { l.gen_cache_policy(unit_cache); } } } } ``` first checks is self returns `Some` from `flat_unique_unit_scale()`, if so if the expression length is > 1 then generate the Some cache policy, ow None --> `handle_flat`. If `handle_flat` is None, then mathc the expression similar to above. Breaking down: ```rust pub fn flat_unique_unit_scale(&self) -> Option<(Self, Vec<Self>)> { ... } ``` basically matches an `::Op` checking if its a `Bop::Sum`, if it can be flattened into unique unit scales (that is, `flat_unique_unit_scale` rest Some) and the unit grop are the same, it concats their expressions and returns Some with the flattened expression and ascoiated Vec. If matches on `::Scale` then if the scaled expression if a Unit then return Some: the original expr and a Vec<ProveExpression>. If its not a Unit expr then ret None. Litle bit of `pub(crate) fn do_fft_core<F: FieldExt>()` If 25, change degree to 27 (3*7 = 21), and for the next round use degree 4 (21 + 4 = 25). -- this give 70% improvement of fft with k = 27. There is a performance bug when degree = 1. stil la WIP working out whats going on with performance bug. ``` ------------------------------------------------------- do_fft_core() log_n = 22 n = 4194304 max_log2_radix = 8 max_log2_local_work_size = 6 ------------------------------------------------------- ``` todo pull apart: ``` uint lid = GET_LOCAL_ID(); uint lsize = GET_LOCAL_SIZE(); uint index = GET_GROUP_ID(); uint t = n >> deg; uint p = 1 << lgp; uint k = index & (p - 1); x += index; y += ((index - k) << deg) + k; uint count = 1 << deg; // 2^deg uint counth = count >> 1; // Half of count uint counts = count / lsize * lid; uint counte = counts + count / lsize; ``` ```opencl KERNEL void FIELD_eval_batch_scale( GLOBAL FIELD* res, GLOBAL FIELD* l, GLOBAL int* l_rot, uint nb_scale, uint size, GLOBAL FIELD* c ) { uint gid = GET_GLOBAL_ID(); uint idx = gid; uint lidx = (idx + size + l_rot[0]) & (size - 1); res[idx] = FIELD_mul(l[lidx], c[0]); for (uint i = 1; i < nb_scale; i++) { uint lidx = (idx + size + l_rot[i]) & (size - 1); res[idx] = FIELD_add(res[idx], FIELD_mul(l[lidx], c[i])); } } ```