## Notes/Report 46 Revision of fields, field ext, pairing, https://hackmd.io/l6_TauMHSvS1eqym3CiU-g Still understanding gpu poly eval. ## Multiexp revising algorithmms for multiexp, i.e., evaluating in vanilla ec-gpu they use a modofied version of the algorith from bellman Source: The American Mathematical Monthly, Vol. 71, No. 7 (Aug. - Sep., 1964), pp. 806-808 basically `parallel_multiexp()`, calls within ```Rust! match kern.multiexp(bases, exps) { Ok(result) => acc.add_assign(&result), Err(e) => { *error.write().unwrap() = Err(e); break; } } ``` with bases in the affine coordinates, and exponents in a jacobian representation. the kernel itseld operates on a set of threads where each thread is responsible for a window of bits within the exponent. in testing: ``` window_size: 2 num_windows: 128 num_groups : 152 bucket_len : 4 threads : 19456 buckets : 77824 ``` At initilization each thread initalizes its set of buckets to zero. The kernel processes a subset of the window values. i.e., the ones that correspond to the current window using some x-bits -to- (x-bits + window_size) for each exponent. For each element in the subset, it computes an index `ind` based on the bits from the current window, then adds the corresponding base point`bases[i]`(in Affine repr) to the appropriate bucket: ```opencl! for(uint i = nstart; i < nend; i++) { uint ind = EXPONENT_get_bits(exps[i], bits, w); #if defined(OPENCL_NVIDIA) || defined(CUDA) // O_o, weird optimization, having a single special case makes it // tremendously faster! // 511 is chosen because it's half of the maximum bucket len, but // any other number works... Bigger indices seems to be better... if(ind == 511) buckets[510] = POINT_add_mixed(buckets[510], bases[i]); else if(ind--) buckets[ind] = POINT_add_mixed(buckets[ind], bases[i]); #else if(ind--) buckets[ind] = POINT_add_mixed(buckets[ind], bases[i]); #endif } ``` Afterwards, it does a summation by parts. For bones modified ec-gpu repo, `multiexp()` has been modified to `multiexp_bound()` (in halo2_opt_v2 branch) ```Rust! pub fn multiexp(&self, bases: &[G], exponents: &[G::Scalar]) -> EcResult<G::Curve> { self.multiexp_bound(bases, exponents, 256usize) } ``` This has a modified closure to run two additional kernels "`{}_multiexp_group_step`" and "`{}_multiexp_group_collect`" ```opencl! KERNEL void POINT_multiexp_group_step( GLOBAL POINT_jacobian *results, uint num_groups, uint num_windows, uint step ) { const uint gid = GET_GLOBAL_ID(); uint i_windows = gid % num_windows; uint i_slot = gid / num_windows; if (i_slot < step && i_slot + step < num_groups) { uint l = i_windows + i_slot * num_windows; uint r = l + step * num_windows; results[i_windows + i_slot * num_windows] = POINT_add(results[l], results[r]); } } ``` ```opencl! KERNEL void POINT_multiexp_group_collect( GLOBAL POINT_jacobian *results, GLOBAL POINT_jacobian *acc, uint num_windows ) { const uint gid = GET_GLOBAL_ID(); if (gid < num_windows) { acc[gid] = results[gid]; } } ```