## Notes/Report 46
Revision of fields, field ext, pairing,
https://hackmd.io/l6_TauMHSvS1eqym3CiU-g
Still understanding gpu poly eval.
## Multiexp
revising algorithmms for multiexp, i.e., evaluating
in vanilla ec-gpu they use a modofied version of the algorith from bellman
Source: The American Mathematical Monthly, Vol. 71, No. 7 (Aug. - Sep., 1964), pp. 806-808
basically `parallel_multiexp()`, calls within
```Rust!
match kern.multiexp(bases, exps) {
Ok(result) => acc.add_assign(&result),
Err(e) => {
*error.write().unwrap() = Err(e);
break;
}
}
```
with bases in the affine coordinates, and exponents in a jacobian representation.
the kernel itseld operates on a set of threads where each thread is responsible for a window of bits within the exponent.
in testing:
```
window_size: 2
num_windows: 128
num_groups : 152
bucket_len : 4
threads : 19456
buckets : 77824
```
At initilization each thread initalizes its set of buckets to zero.
The kernel processes a subset of the window values. i.e., the ones that correspond to the current window using some x-bits -to- (x-bits + window_size) for each exponent.
For each element in the subset, it computes an index `ind` based on the bits from the current window, then adds the corresponding base point`bases[i]`(in Affine repr) to the appropriate bucket:
```opencl!
for(uint i = nstart; i < nend; i++) {
uint ind = EXPONENT_get_bits(exps[i], bits, w);
#if defined(OPENCL_NVIDIA) || defined(CUDA)
// O_o, weird optimization, having a single special case makes it
// tremendously faster!
// 511 is chosen because it's half of the maximum bucket len, but
// any other number works... Bigger indices seems to be better...
if(ind == 511) buckets[510] = POINT_add_mixed(buckets[510], bases[i]);
else if(ind--) buckets[ind] = POINT_add_mixed(buckets[ind], bases[i]);
#else
if(ind--) buckets[ind] = POINT_add_mixed(buckets[ind], bases[i]);
#endif
}
```
Afterwards, it does a summation by parts.
For bones modified ec-gpu repo, `multiexp()` has been modified to `multiexp_bound()` (in halo2_opt_v2 branch)
```Rust!
pub fn multiexp(&self, bases: &[G], exponents: &[G::Scalar]) -> EcResult<G::Curve> {
self.multiexp_bound(bases, exponents, 256usize)
}
```
This has a modified closure to run two additional kernels "`{}_multiexp_group_step`" and "`{}_multiexp_group_collect`"
```opencl!
KERNEL void POINT_multiexp_group_step(
GLOBAL POINT_jacobian *results,
uint num_groups,
uint num_windows,
uint step
) {
const uint gid = GET_GLOBAL_ID();
uint i_windows = gid % num_windows;
uint i_slot = gid / num_windows;
if (i_slot < step && i_slot + step < num_groups) {
uint l = i_windows + i_slot * num_windows;
uint r = l + step * num_windows;
results[i_windows + i_slot * num_windows] = POINT_add(results[l], results[r]);
}
}
```
```opencl!
KERNEL void POINT_multiexp_group_collect(
GLOBAL POINT_jacobian *results,
GLOBAL POINT_jacobian *acc,
uint num_windows
) {
const uint gid = GET_GLOBAL_ID();
if (gid < num_windows) {
acc[gid] = results[gid];
}
}
```