Try   HackMD

Verkle Trees - Proof creation/verification notes

Update (2023-11-28): this document was created ~2023-05 all the ideas are still relevant. I've also worked on a parallelized version in the Go library that you might also find interesting here.

This document contains notes about implementation optimizations for Verkle Tree proof generation and verification.

Some of these optimizations are present in the spec implementation, while others aren't since the spec code is meant to optimize to be understandable. We list them here and provide links to the reference implementation and other documents with explanations (if available).

Proof generation

  • Preprocessing
    • Collecting polynomials in eval form involves doing toFr at each relevant internal node to prove, which means doing inversions.
      • Optimization: If you're collecting polynomials of relevant internal nodes recursively, at least batch inverses per node (i.e.: between 1 and 256 Point->Fr transformations per internal node). Do the same on leaf nodes.
      • Idea: Note there's no dependency between levels so that you can batch everything in a single batch.
      • Idea: The EL pipeline might already serialized the tree. Try caching the already calculating Frs and avoid redoing work altogether. (Might consume extra memory).
  • Multiproof
    • Fiat-Shamir: We start by adding all Cs to the transcript. This requires serializing the points. Probably, you store them in projective form in memory, normalize them (i.e: to affine) in batch mode (i.e: Montgometry trick for
      Z
      coordinate). [go-ipa]
    • Aggregate all polynomials by evaluation domain before starting with any heavy work. This avoids repeated heavy work when you're aggregating many openings (which is usually the case). [go-ipa]
    • When calculating
      g(x)
      , we have to do polynomial division on the domain
    • When calculating
      g1(x)
      , do batch inversion for
      1tzi
      terms.
  • Inner product argument
    • When evaluating outside of the domain step
      f(z)=A(z)i=0d1fiA(xi)(zxi)
      :
      • Remember that we already have precomputed
        1A(x)
        , use it directly. [spec code]
      • We have many
        1zxi
        , batch those inverse calculations. [spec code]

Proof verification

Impact

As an example of what kind of speedup you can get from an implementation that can be missing some of these benchmarks, here's some results on go-ipa:

As an example, for 16.000 random polynomial openings:

For 16000 polynomials:
        Proving:
                Proof serialization duration: (0.11ms -> 0.11ms)
                Proof creation duration: (664ms -> 271ms [2.45x speedup])
                ...
        Verification:
                Proof deserialization duration: 0.11ms
                Total duration: (1166ms -> 38ms [30x speedup]
                ...

Legend: (BEFOREms -> AFTERms, [SPEEDUPx])

The optimized implementation still is single-threaded-ish, and doesn't require extra memory (e.g: precomputed tables).

Closing

Do you know any other missing trick? Ping me!