# Week 42 / Notes / Report
Working on understanding the impl for `ProveExpression` in `evaluation_gpu.rs` particularly `_eval_gpu()` within it. Working on understanding `Evaluator` and particularly `evaluate_h()` within it, in `evaluation.rs`. This is stillvery much a WIP to fully understand and document the evaluation using GPU kernel, modify and improve the running time for poly_h.
### Evaluator
Inside `evaluation.rs`:
L275
```Rust!
impl<C: CurveAffine> Evaluator<C> {
new();
add_rotation();
add_constant();
add_expression();
evaluate_h(); // non cuda feautre
evaluate_h(); // cuda gpu feautre
both evaluate_h() return -> Polynomial<C::ScalarExt, ExtendedLagrangeCoeff>
}
```
where i left off from last week.
### evaluate_h()
Investigate:
Inside `evaluation.rs` cuda feature for `evaluate_h()` L801
create new BTreemap
"units" which is `BTreeMap<usize, ProveExpression>` and
"unit_stat" which is `BTreeMap<usize, usize>`
for `gpu_gates_expr ` prepetch units and unit_stat
get values (type Polynomial)
run permutations.
run lookups.
### ProveExpression & reconstruct
Inside `evaluation_gpu.rs`
```Rust!
impl<F: FieldExt> ProveExpression<F> {
pub(crate) fn new() -> Self
pub(crate) fn from_expr(e: &Expression<F>) -> Self
pub(crate) fn add_gate(self, e: &Expression<F>) -> Self
fn reconstruct_coeff(coeff: BTreeMap<u32, F>) -> Self
fn reconstruct_units(mut us: BTreeMap<ProveExpressionUnit, u32>) -> Self
fn reconstruct_units_coeff(
us: BTreeMap<ProveExpressionUnit, u32>,
coeff: BTreeMap<u32, F>,
) -> Self
fn reconstruct_tree(
mut tree: Vec<(BTreeMap<ProveExpressionUnit, u32>, BTreeMap<u32, F>)>,
r_deep_limit: u32,
) -> Self
pub(crate) fn reconstruct(
tree: &[(Vec<ProveExpressionUnit>, BTreeMap<u32, F>)]
) -> Self
pub(crate) fn disjoint(
es: &Vec<ProveExpressionUnit>, src: &Vec<ProveExpressionUnit>
) -> bool
pub(crate) fn disjoint_group(
gs: &Vec<(Vec<ProveExpressionUnit>, BTreeMap<u32, F>)>,
src: &Vec<ProveExpressionUnit>
) -> bool
pub(crate) fn mk_group(
tree: &Vec<(Vec<ProveExpressionUnit>, BTreeMap<u32, F>)>
) -> Vec<Vec<(Vec<ProveExpressionUnit>, BTreeMap<u32, F>)>>
pub(crate) fn get_complexity(&self) -> ComplexityProfiler
pub(crate) fn get_r_deep(&self) -> u32
fn ys_add_assign(l: &mut BTreeMap<u32, F>, r: BTreeMap<u32, F>)
fn ys_mul(l: &BTreeMap<u32, F>, r: &BTreeMap<u32, F>) -> BTreeMap<u32, F>
pub(crate) fn flatten(self) -> BTreeMap<Vec<ProveExpressionUnit>, BTreeMap<u32, F>>
}
```
L1419:
```Rust!
pub(crate) fn reconstruct(tree: &[(Vec<ProveExpressionUnit>, BTreeMap<u32, F>)]) -> Self {
let tree = tree
.into_iter()
.map(|(us, v)| {
let mut map = BTreeMap::new();
for u in us {
if let Some(c) = map.get_mut(u) {
*c = *c + 1;
} else {
map.insert(u.clone(), 1);
}
}
(map, v.clone())
})
.collect();
let r_deep = std::env::var("HALO2_PROOF_GPU_EVAL_R_DEEP").unwrap_or("6".to_owned());
let r_deep = u32::from_str_radix(&r_deep, 10).expect("Invalid HALO2_PROOF_GPU_EVAL_R_DEEP");
Self::reconstruct_tree(tree, r_deep)
}
```
L1437:
reconstruct_tree()
## interpolation
from [here](https://github.com/ZK-Garage/plonk/blob/master/plonk-book/src/chapter_1.md)
The Lagrange interpolation consists of 2 steps:
1. Construct a **Lagrange basis**. This is a set of $n$ polynomials of degree $n-1$ that
take the value 0 at all points of the set except one, where their value is 1. Expressed in a formula:
$$
L_i(x) =
\begin{cases}
0, & \text{if }\ x = {x_j}, j \in [n], j\neq i\\
1, & \text{if }\ x = x_i
\end{cases}
$$
The polynomials $L_i$ can be constructed in the following way:
$$
L_i(x) = \prod_{0 \leq j < n\text{, }j\neq i}
\frac{x-x_j}{x_i - x_j}
$$
Notice that this product has $n-1$ terms, and therefore results in a degree $n-1$ polynomial.
2. Scale and sum the polynomials of the basis.
$$
f(x) = \sum_{i=0}^{n-1} y_i \cdot L_i(x)
$$
The properties of the Lagrange basis now allow us to scale each polynomial to its target value --
multiplying by $y_i$ and then add up all the terms.
The important observation we can extract from the Lagrange interpolation is that given a fixed set
of points $x_1,\dots,x_n$ (an evaluation domain) we can represent any polynomial of degree $d<n$
by its evaluations $f(x_i)$ at $d+1$ points in the set. As it turns out, this representation is
much more convenient than the usual coefficient representation as it provides a very simple and fast way of computing sums and multiplication of polynomials. However, the coefficient form is still useful for evaluating the polynomial at points outside the evaluation domain.
Switching between these two forms of representation is very useful. The coefficient form is
preferred when the polynomial must be evaluated at a random point (outside of the evaluation
domain). The evaluation form is better suited for operations between polynomials such as
addition, multiplication and exact quotients. The algorithm that allows us to efficiently switch
between representations is the Fast Fourier Transform (FFT). This is an efficient algorithm for the
more general discrete Fourier Transform (DFT). It has a complexity of $\mathcal{O}(n \cdot log(n))$
with $n$ being the degree of the polynomial.
### GPU evaluation --> _eval_gpu() this is a WIP
looking into `LookupProveExpression` and its ascoiated prove expression blocks
```Rust!
impl<F: FieldExt> LookupProveExpression<F> {
// match for type:
LookupProveExpression::Expression(e)
LookupProveExpression::LcTheta(l, r)
LookupProveExpression::LcBeta(l, r)
LookupProveExpression::AddGamma(l)
}
```
and looking into `ProveExpression` with its functions, particularly `_eval_gpu()`
```Rust!
impl<F: FieldExt> ProveExpression<F> {
pub(crate) fn gen_cache_policy(&self, unit_cache: &mut Cache<Buffer<F>>)
pub(crate) fn eval_gpu<C: CurveAffine<ScalarExt = F>>(...)
pub(crate) fn _eval_gpu_buffer<C: CurveAffine<ScalarExt = F>>(...)
pub(crate) fn _eval_gpu<C: CurveAffine<ScalarExt = F>>(...)
}
```
`_eval_gpu()` is:
```Rust!
pub(crate) fn _eval_gpu<C: CurveAffine<ScalarExt = F>>(...) {
// match on:
ProveExpression::Op(l, r, op)
ProveExpression::Y(ys)
ProveExpression::Unit(u)
ProveExpression::Scale(l, ys)
}
```
looking at and writing documentation down for the below:
op = Operation (Sum or Product) the l & r.
```Rust
ProveExpression::Op(l, r, op) => {...}
```
Multiplication with a fixed order.
```Rust
ProveExpression::Y(ys) => {...}
```
for the below a `ProveExpressionUnit` is made up of fixed, advice & instance coulmns.
```Rust
ProveExpression::Unit(u) => {...}
```
if cached return v, if not in the cache then calculate de_extenxed_fft(), which is an ExtendedField FFT calculation.
The poly that is passed in to _eval_gpu() is in coefficient form. If the polynomial is in the coefficient form we dont know the evaulation of each point. If its in the evaluation from then we dont have to run the else{ code, ext fft etc}.
**still working on this...**
Task: understand and study the evaluation of binary tree in the evaluaor, in terms of the running kernel on GPU. modify the kernel to take different operands (not just two) and bench performace.