2024 zkVM team technicals design summary

TODO add change log.

In summary, we are exploring the "linear zkVM prover"

O (c_{1} N + c_{2})

for

c 1

-> 1 ,

c_{2}

->0, along with the least commitment data as possible. The linear prover should be optimal complexity, since no matter how we still need to go through all the data.

We think a "linear prover" should satisfy

No FFT (NTT) => sumcheck protocol based

sumcheck is all we needed in the endgame, probably :) ?
Less commitment => based on GKR
Non-uniform: prover time complexity per proof just pay-as-you-go.

We shared the interests with Scroll based on this paper Ceno aiming to build first PoC of zkVM

summary of technical stacks for version 1 exploration (June - Oct 2024)

circuit frontend are based on Expression arithmetcis, similar as Halo2
PIOP based on multi-linear polynomial + sumcheck
small + extension field: Goldilock64 + Extension field size 2 for 128 bits security.
Rely on GKR-Logup: range check + constrain step of program execution (pc, opcode).
consistency across zkVM state (pc, memory read/write) are via GKR-product-argument for offline memory check: comparing
$R_{s e t}$ ==
$W_{s e t}$ at the end

Some demystifications

Symmetric pair of (prover, verifier)

$(P^{(C)} (x, w), V^{(C)} (x, π))$
where
$x$ is statement,
$w$ is witness,
$π$ is proof.
we can define aymmetric pair of (prover, verifier) as

$(P^{(C)} (x, w), V^{(\cdot)} (⟨ C ⟩, x, π))$
where a universal "static" verifier take circuit description
$⟨ C ⟩$ as extra input. Its equivalently that verifier can verify proof submmit by a classes of prover.
We apply this idea on sumcheck protocol, where different circuit description mapping with a public input n (Ceno Fig 3)
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Which achieve non-uniform prover design thanks to sumcheck protocol nice amendibility
GKR arithmetics: still bring non-negligible constant effort on GKR linear prover, refer SOTA Libra p15 Sec 3.3. It turned out, for simplicity of implementing each zkVM opcode, we only need the "sumcheck reduction" functionalty of GKR, but not GKR gates. We observe that if we stick to plonkish (GKR can generalize to R1CS/Plonkish) by accessing the witness in well defined index, then we only need "one" sumcheck per layer

For instance, GKR "layer" design brings the following flexibility to access any index in previous layer:
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
But this flexibility results in extra #fanin sumchecks, each for binding the variables to random challenges. We can make each index appear in order, and actually it fallback to plonkish, where each column index are deterministic
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
GKR reduce: with "sumcheck reduction", it's more like a sumcheck graph that we reduce from output layer to input layer. We adopt a simpler design: log(N) + 1 sumchecks. Top log(N) layers just do the layer reduction, more like GKR LogUp style. 1 is to batch all the zero/non-zero gate as a single sumcheck at the last layer. 1 design will end up with some intermediate variable commitment. But due to the simplicity of most riscv opcodes, most of expressions are degree == 1, and we don't need to commit to those intermediate variables.

Structural GKR tower for Ceno zkVM

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

90% times spent on log(N) sumcheck. 1 sumcheck is efficient. The sqeeze of constraint in 1 sumcheck will end up with few extra column commitment. Good news is most of high-frequency riscv opcode constrain just degree = 1 (arithmetic r-type), so we dont need to commit those intemediate value.

Prove same instances of opcodes as tower structure

We can render the same kind of opcode instances (e.g. add) in a plonkish table.

Notes

multi-variate and non-rotation, each instance occupied one row

the add opcode just 12 columns, we can further reduce to 9.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Specially note that for ROM we rely on LogUp, and for RAM check we rely on product arguments

For more background about the choice of LogUp and Product Argument, please see here How state proof/range-check proof works in Ceno zkVM

1 + Log(N) sumcheck works as follow

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

where the 1 sumcheck is interleaving RAM/ROM expression into a huge vector for the GKR style reduction, and the Log(N) is to reduce the LogUp/Product Argument into 2 single scalar

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

on Prover, this is an highly concurrenct structure, such that each opcode can be proven in either single node or different nodes as cluster. At the end, aggregate all final layer field element + all sumcheck proofs into a transcript. There is no complex aggregation circuit, instead it just take field element and insert into transcript.

This structure also exposes a very powerful characteristic: instance could be "position agnostic", i.e. out-of-order. An opcode input/output will always be encoded into RAM/ROM sets regardless its execution order. It enquip us without "sorting" effort.

On the verifier side, just get all #opcode final tower scalar value, and do simple field arithmetics:

verify RAM:
$i n i t \times W_{s e t} = f i n a l \times R_{s e t}$
verify ROM: take scalar from GKR fraction sum and check LogUp formula.
verify #opcode sumcheck: from output to input layer, collect all polynomial evaluations
verify all degree-1 expressions evaluate to 0
verify PCS opening

other improvements

we apply 2 stage sumcheck on single node multi-core structure to achieve less cpu thread syncronization, based on the simple idea of distributed sumcheck in devirgo "Sec 4.2"

Simple benchmark result (Oct 2024)

5x faster on proving 2^20 add instances on CPU compare to SP1 (til Aug codebase) on exiting naive implementation.

Ceno still has many pending improvements, so the performance is expected to get better still.

The add workload benchmark result is inspiring and representitive because it's occupies ~ 1/4 of the execution trace in workloads, e.g. we tested on revm/fabonacci.

TODO more details

Milestones

M1.5 (end of Sept -> Oct 2024)

researching & discussion on public I/O
expand the scope finish riv32im opcode

Unknows

Binary field Adoption.
- pros:
  1. less embedding cost: we can save quite lot of memory consumption that comes from encoding small value in field, e.g. encoding bit in BN254
  2. no range check: each column gets the bits it requires, e.g. 8 bit for a byte, so values are always within bytes
- cons:
  1. arithmetics in tower binary field with 2^128 elements is VERY different with native field of size "2^128". Especially multiplication, it usually comes with table booking + divide & conquer.
  2. pack/unpack of sumcheck field
SNARKed recursive verifier or not
- on L1 I claim it might not be nessesary because we dont need to be verify on-chain
  
  e.g. we can choose non-zkfriendly but fast hash, e.g. Poseidon -> Keccak. 10x fast on native CPU
- Scroll working on Spartan to encode verifier.
Precompile nessesary ? and how to build it with less manual effort?
- less manual effort is possible
Diversity
- what kind of diversify we need?

Chih-Cheng Liang

2024/10/23 08:06:42

along

There are some log(N) term behind c2

2024/10/23 08:09:54

Non-uniform

Trace. Small block size using a small proof; big proof for big block. Use non-uniform prover can be more efficient

Ali

2024/10/23 10:46:59

Ming: compare this with the situation in plonkish where uniformity of the table leads a lot of padding

2024/10/23 08:12:06

Extension field

Combination of two base fields

2024/10/23 08:13:10

program

Similar to zkevm bytecode lookup

2024/10/23 08:14:53

opcode

Read and Write to the register id are constrained by the lookup table

2024/10/23 08:16:45

is this correct: there are different types of opcodes in riscv, each with specific structure, that structure is constrained with lookup. Not to be confused with the constraining of an opcode's concrete business logic (Edited)

2024/10/23 08:17:05

comparing

This is read set and write set

2024/10/23 08:32:35

so we dont need to commit those intemediate value.

This is the main reason to reduce cost.

2024/10/23 08:38:22

degree

branching opcodes, and load bytes have high degree. But the degree is just 2

2024/10/23 08:50:10

details

By any chance we can get a bench on proving a 15m gas block?

sungmingwu

2024/10/23 14:34:16

We should aiming it in high priority, after we finish the nessesary opcode to represent the computation of a block

2024/10/23 08:51:16

Binary

Scroll, Jolt, irreducible, and excu... are researching into it

2024/10/23 08:54:52

table lookup

Very slow compared to Goldilock

2024/10/23 08:55:28

Not friendly in CPU, AMD, Intel, and even Cuda, GPU

2024/10/23 08:56:00

Binius from Irreducible might be the future

2024/10/23 08:55:56

cons

+ not gpu friendly

2024/10/23 08:58:13

Diversity

This is for Ethereum protocol. We want diversity to prevent single point of failure

2024/10/23 09:17:46

kind

multi-IOPs?

2024/10/23 09:22:27

down the road, once zkVM framework is stable, maybe a Go implementation of the same protocol

2024/10/23 09:24:49

linear

Both time and space? Looking at the interleaving huge vector below

2024/10/23 10:44:08

The linear prover should be optimal complexity, since no matter how we still need to go through all the data.

alt for clarity: Linear complexity is maximally optimal because we must at a minimum incur O(N) for simply reading/processing each step of the trace N.

2024/10/23 11:06:05

Log(N) is to reduce the LogUp/Product Argument into 2 single scalar

Could you clarify this more @sungmingwu ?

2024/10/23 11:08:14

for opcodes instance

"because opcode instances are" ?

2024/10/23 12:43:11

I rewrite to be more precise "This structure also exposes a very powerful characteristic: instance could be "position agnostic", i.e. out-of-order"

2024 zkVM team technicals design summary

summary of technical stacks for version 1 exploration (June - Oct 2024)

Some demystifications

Prove same instances of opcodes as tower structure

other improvements

Simple benchmark result (Oct 2024)

Milestones

M1.5 (end of Sept -> Oct 2024)

Unknows

Read more

Halo2 Peak Memory Usage

2024-01-11 Milestone review

2023-11-18 PSE tooling

zkEVM - Aggregation