# Verkle Trees - Proof creation/verification notes
*Update (2023-11-28): this document was created ~2023-05 -- all the ideas are still relevant. I've also worked on a parallelized version in the Go library that you might also find interesting [here](https://github.com/crate-crypto/go-ipa/pull/63).*
This document contains notes about implementation optimizations for Verkle Tree proof generation and verification.
Some of these optimizations are present [in the spec implementation](https://github.com/crate-crypto/verkle-trie-ref), while others aren't since the spec code is meant to optimize to be understandable. We list them here and provide links to the reference implementation and other documents with explanations (if available).
## Proof generation
- Preprocessing
- Collecting polynomials in eval form involves doing `toFr` at each relevant internal node to prove, which means doing inversions.
- [x] Optimization: If you're collecting polynomials of relevant internal nodes recursively, at least batch inverses per node (i.e.: between 1 and 256 Point->Fr transformations per internal node). Do the same on leaf nodes.
- Idea: Note there's no dependency between levels so that you can batch everything in a single batch.
- Idea: The EL pipeline might already serialized the tree. Try caching the already calculating Frs and avoid redoing work altogether. (Might consume extra memory).
- Multiproof
- [x] Fiat-Shamir: We start by adding all `Cs` to the transcript. This requires serializing the points. Probably, you store them in projective form in memory, normalize them (i.e: to affine) in batch mode (i.e: Montgometry trick for $Z$ coordinate). [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/46/files#diff-15732f9d80d142e52a9de146c6be3c65b99fa8c81c28540dcb6d8e2061ce6002R35)]
- [X] Aggregate all polynomials by evaluation domain before starting with any heavy work. This avoids repeated heavy work when you're aggregating many openings (which is usually the case). [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/48)]
- When calculating $g(x)$, we have to do polynomial division on the domain
- [x] Rewrite $q_m$ in terms of $q_j$. [[spec code](https://github.com/crate-crypto/verkle-trie-ref/blob/master/multiproof/quotient.py#L20-L21), [explanation](https://hackmd.io/mJeCRcawTRqr9BooVpHv5g#1-Rewrite-q_m-in-terms-of-q_j)]
- [x] Removing field inversions in $q_j$ [[spec code](https://github.com/crate-crypto/verkle-trie-ref/blob/master/multiproof/quotient.py#L9), [explanation](https://hackmd.io/mJeCRcawTRqr9BooVpHv5g#2-Removing-field-inversions-in-q_j)]
- [x] Leverage precomputed terms when calculating $\frac{A'(x_m)}{A'(x_i)}$ [[spec code](https://github.com/crate-crypto/verkle-trie-ref/blob/master/multiproof/quotient.py#L21), [explanation](https://hackmd.io/mJeCRcawTRqr9BooVpHv5g#3-Precompute-fracAx_mAx_i)].
- [x] When calculating $g_1(x)$, do batch inversion for $\frac{1}{t-z_i}$ terms.
- Inner product argument
- When evaluating outside of the domain step $f(z)= A(z)\sum_{i=0}^{d-1}\frac{f_i}{A'(x_i)(z-x_i)}$:
- [x] Remember that we already have precomputed $\frac{1}{A'(x)}$, use it directly. [[spec code](https://github.com/crate-crypto/verkle-trie-ref/blob/master/polynomial/precomputed_weights.py#L76)]
- [x] We have many $\frac{1}{z-x_i}$, batch those inverse calculations. [[spec code](https://github.com/crate-crypto/verkle-trie-ref/blob/master/polynomial/precomputed_weights.py#L73)]
## Proof verification
- Proof deserialization
- [X] Make sure you use Tonelli-Shanks with precomputed tables [not in spec code, [original impl in Gottfried repo](https://github.com/GottfriedHerold/Bandersnatch/blob/f665f90b64892b9c4c89cff3219e70456bb431e5/bandersnatch/fieldElements/field_element_square_root.go), [port in go-ipa with gnark](https://github.com/crate-crypto/go-ipa/pull/38), [explanation](https://ihagopian.com/posts/tonelli-shanks-with-precomputed-dlog-tables), [perf impact results](https://hackmd.io/@jsign/Bandersnatch-optimized-sqrt-notes)]
- Multiproof
- To calculate $g_2(t)$ (and $E$):
- [x] You'll need $\frac{r^i}{t-z_i}$ terms, batch the denominator inverses [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/42/files)]
- [x] As mentioned for the prover, remember to aggregate polynomials by evaluation point first which will save some work down the road. [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/49)]
- [X] Note that calculating `E` is a MSM. Consider using a MSM algorithm, and not the naive sum of scalar multiplications [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/43/files)].
- Inner product argument
- [X] When evaluating outside the domain step, use the same optimizations mentioned for the prover.
- [X] Only compute basis changes at the end [not in spec code, [EIP code](https://github.com/ethereum/research/blob/master/verkle_trie_eip/ipa_utils.py#L82-L93), [Dankrad's blog post](https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html), [my handwritten "explanation" of folding trick](https://hackmd.io/_uploads/rybHSG343.png), [go-ipa](https://github.com/crate-crypto/go-ipa/pull/41)]
- [X] Do batch inversion for the needed challenges for the basis changes. [[go-ipa](https://github.com/crate-crypto/go-ipa/pull/40/files)]
## Impact
As an example of what kind of speedup you can get from an implementation that can be missing some of these benchmarks, here's some results on `go-ipa`:
- [Benchmark results before some important optimizations](https://gist.github.com/jsign/3ea6d6d23ee691472c48f45732d31799)
- [Benchmark results after all optimizations](https://gist.github.com/jsign/f79f87af09ad3a14f0bfcfc5ba4e8c3e)
As an example, for 16.000 random polynomial openings:
```
For 16000 polynomials:
Proving:
Proof serialization duration: (0.11ms -> 0.11ms)
Proof creation duration: (664ms -> 271ms [2.45x speedup])
...
Verification:
Proof deserialization duration: 0.11ms
Total duration: (1166ms -> 38ms [30x speedup]
...
```
Legend: `(BEFOREms -> AFTERms, [SPEEDUPx])`
The optimized implementation still is single-threaded-ish, and doesn't require extra memory (e.g: precomputed tables).
## Closing
Do you know any other missing trick? [Ping me](https://t.me/ignaciohagopian)!

This document describes some high-level description of test-vectors to generate.

12/1/2023Last updated: 2023-10-30

11/6/2023This document aims to explore if using batch affine additions is faster than adding in projective form for VKT MSM.

8/20/2023block execution: 99.992976msINFO [07-15|13:10:28.917] Collected key values from base tree count=10000 duration=293.583654ms last account=008f65…483efbINFO [07-15|13:10:29.300] Prepared key values from base tree duration=382.488744msINFO [07-15|13:10:30.000] Inserted key values in overlay tree count=10000 duration=1.083027557smigration took: 1.376893246sfillLevels took 4.422121mscollecting nodes took 56.29µstoFrMultiple took 208.536µsupdating commitments took 3.090994mscollecting nodes took 5.472675mstoFrMultiple took 21.644539msupdating commitments took 52.035324mscollecting nodes took 2.3216mstoFrMultiple took 13.16166msupdating commitments took 54.440922mscollecting nodes took 127.455µstoFrMultiple took 639.024µsupdating commitments took 15.478009msFinalize: 211.469993msblock execution: 293.354953msINFO [07-15|13:10:30.971] Collected key values from base tree count=10000 duration=330.318923ms last account=008f65…483efbINFO [07-15|13:10:31.372] Prepared key values from base tree duration=400.638925msINFO [07-15|13:10:32.097] Inserted key values in overlay tree count=10000 duration=1.126096392smigration took: 1.456677516sfillLevels took 2.508844mscollecting nodes took 47.249µstoFrMultiple took 247.035µsupdating commitments took 4.936315mscollecting nodes took 7.428826mstoFrMultiple took 15.575423msupdating commitments took 50.524827mscollecting nodes took 2.413472mstoFrMultiple took 12.737298msupdating commitments took 41.038935mscollecting nodes took 139.996µstoFrMultiple took 648.065µsupdating commitments took 15.552383msFinalize: 193.08911msblock execution: 252.886918msINFO [07-15|13:10:32.963] Collected key values from base tree count=10000 duration=282.597556ms last account=00ab98…a45379INFO [07-15|13:10:33.531] Prepared key values from base tree duration=567.853577msINFO [07-15|13:10:33.900] Inserted key values in overlay tree count=10000 duration=937.057289msmigration took: 1.219926088sfillLevels took 7.629195mscollecting nodes took 136.204µstoFrMultiple took 414.446µsupdating commitments took 7.367287mscollecting nodes took 14.461581mstoFrMultiple took 18.437174msupdating commitments took 81.609759mscollecting nodes took 3.4579mstoFrMultiple took 19.02924msupdating commitments took 70.383834mscollecting nodes took 121.622µstoFrMultiple took 631.732µsupdating commitments took 15.461385msFinalize: 281.168431msblock execution: 239.645635msfillLevels took 269.784µscollecting nodes took 7.291µstoFrMultiple took 25.958µsupdating commitments took 641.648µscollecting nodes took 345.906µstoFrMultiple took 779.019µsupdating commitments took 5.172558mscollecting nodes took 198.328µstoFrMultiple took 601.982µsupdating commitments took 5.064353mscollecting nodes took 48.999µstoFrMultiple took 390.531µs

8/7/2023
Published on ** HackMD**