Try   HackMD

2024 zkVM team technicals design summary

TODO add change log.

In summary, we are exploring the "linear zkVM prover"

O(c1N+c2) for
c1
-> 1 ,
c2
->0, along with the least commitment data as possible. The linear prover should be optimal complexity, since no matter how we still need to go through all the data.

We think a "linear prover" should satisfy

  • No FFT (NTT) => sumcheck protocol based

    sumcheck is all we needed in the endgame, probably :) ?

  • Less commitment => based on GKR
  • Non-uniform: prover time complexity per proof just pay-as-you-go.

We shared the interests with Scroll based on this paper Ceno aiming to build first PoC of zkVM

summary of technical stacks for version 1 exploration (June - Oct 2024)

  • circuit frontend are based on Expression arithmetcis, similar as Halo2
  • PIOP based on multi-linear polynomial + sumcheck
  • small + extension field: Goldilock64 + Extension field size 2 for 128 bits security.
  • Rely on GKR-Logup: range check + constrain step of program execution (pc, opcode).
  • consistency across zkVM state (pc, memory read/write) are via GKR-product-argument for offline memory check: comparing
    Rset
    ==
    Wset
    at the end

Some demystifications

  • Symmetric pair of (prover, verifier)

    (P(C)(x,w),V(C)(x,π))
    where
    x
    is statement,
    w
    is witness,
    π
    is proof.
    we can define aymmetric pair of (prover, verifier) as
    (P(C)(x,w),V()(C,x,π))

    where a universal "static" verifier take circuit description
    C
    as extra input. Its equivalently that verifier can verify proof submmit by a classes of prover.
    We apply this idea on sumcheck protocol, where different circuit description mapping with a public input n (Ceno Fig 3)
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    Which achieve non-uniform prover design thanks to sumcheck protocol nice amendibility

  • GKR arithmetics: still bring non-negligible constant effort on GKR linear prover, refer SOTA Libra p15 Sec 3.3. It turned out, for simplicity of implementing each zkVM opcode, we only need the "sumcheck reduction" functionalty of GKR, but not GKR gates. We observe that if we stick to plonkish (GKR can generalize to R1CS/Plonkish) by accessing the witness in well defined index, then we only need "one" sumcheck per layer

    For instance, GKR "layer" design brings the following flexibility to access any index in previous layer:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    But this flexibility results in extra #fanin sumchecks, each for binding the variables to random challenges. We can make each index appear in order, and actually it fallback to plonkish, where each column index are deterministic
    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • GKR reduce: with "sumcheck reduction", it's more like a sumcheck graph that we reduce from output layer to input layer. We adopt a simpler design: log(N) + 1 sumchecks. Top log(N) layers just do the layer reduction, more like GKR LogUp style. 1 is to batch all the zero/non-zero gate as a single sumcheck at the last layer. 1 design will end up with some intermediate variable commitment. But due to the simplicity of most riscv opcodes, most of expressions are degree == 1, and we don't need to commit to those intermediate variables.

Structural GKR tower for Ceno zkVM

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

90% times spent on log(N) sumcheck. 1 sumcheck is efficient. The sqeeze of constraint in 1 sumcheck will end up with few extra column commitment. Good news is most of high-frequency riscv opcode constrain just degree = 1 (arithmetic r-type), so we dont need to commit those intemediate value.

Prove same instances of opcodes as tower structure

We can render the same kind of opcode instances (e.g. add) in a plonkish table.

Notes

  1. multi-variate and non-rotation, each instance occupied one row
  2. the add opcode just 12 columns, we can further reduce to 9.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Specially note that for ROM we rely on LogUp, and for RAM check we rely on product arguments

For more background about the choice of LogUp and Product Argument, please see here How state proof/range-check proof works in Ceno zkVM

1 + Log(N) sumcheck works as follow

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

where the 1 sumcheck is interleaving RAM/ROM expression into a huge vector for the GKR style reduction, and the Log(N) is to reduce the LogUp/Product Argument into 2 single scalar

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

on Prover, this is an highly concurrenct structure, such that each opcode can be proven in either single node or different nodes as cluster. At the end, aggregate all final layer field element + all sumcheck proofs into a transcript. There is no complex aggregation circuit, instead it just take field element and insert into transcript.

This structure also exposes a very powerful characteristic: instance could be "position agnostic", i.e. out-of-order. An opcode input/output will always be encoded into RAM/ROM sets regardless its execution order. It enquip us without "sorting" effort.

On the verifier side, just get all #opcode final tower scalar value, and do simple field arithmetics:

  • verify RAM:
    init×Wset=final×Rset
  • verify ROM: take scalar from GKR fraction sum and check LogUp formula.
  • verify #opcode sumcheck: from output to input layer, collect all polynomial evaluations
  • verify all degree-1 expressions evaluate to 0
  • verify PCS opening

other improvements

  • we apply 2 stage sumcheck on single node multi-core structure to achieve less cpu thread syncronization, based on the simple idea of distributed sumcheck in devirgo "Sec 4.2"

Simple benchmark result (Oct 2024)

5x faster on proving 2^20 add instances on CPU compare to SP1 (til Aug codebase) on exiting naive implementation.

Ceno still has many pending improvements, so the performance is expected to get better still.

The add workload benchmark result is inspiring and representitive because it's occupies ~ 1/4 of the execution trace in workloads, e.g. we tested on revm/fabonacci.

TODO more details

Milestones

M1.5 (end of Sept -> Oct 2024)

  • researching & discussion on public I/O
  • expand the scope finish riv32im opcode

Unknows

  • Binary field Adoption.
    • pros:
      1. less embedding cost: we can save quite lot of memory consumption that comes from encoding small value in field, e.g. encoding bit in BN254
      2. no range check: each column gets the bits it requires, e.g. 8 bit for a byte, so values are always within bytes
    • cons:
      1. arithmetics in tower binary field with 2^128 elements is VERY different with native field of size "2^128". Especially multiplication, it usually comes with table booking + divide & conquer.
      2. pack/unpack of sumcheck field
  • SNARKed recursive verifier or not
    • on L1 I claim it might not be nessesary because we dont need to be verify on-chain

      e.g. we can choose non-zkfriendly but fast hash, e.g. Poseidon -> Keccak. 10x fast on native CPU

    • Scroll working on Spartan to encode verifier.
  • Precompile nessesary ? and how to build it with less manual effort?
    • less manual effort is possible
  • Diversity
    • what kind of diversify we need?