TODO add change log.
In summary, we are exploring the "linear zkVM prover"
We think a "linear prover" should satisfy
sumcheck is all we needed in the endgame, probably :) ?
We shared the interests with Scroll based on this paper Ceno aiming to build first PoC of zkVM
offline memory check
: comparing Symmetric pair of (prover, verifier)
where
we can define aymmetric pair of (prover, verifier) as
where a universal "static" verifier take circuit description
We apply this idea on sumcheck protocol, where different circuit description mapping with a public input n
(Ceno Fig 3)
GKR arithmetics: still bring non-negligible constant
effort on GKR linear prover, refer SOTA Libra p15 Sec 3.3. It turned out, for simplicity of implementing each zkVM opcode, we only need the "sumcheck reduction" functionalty of GKR, but not GKR gates. We observe that if we stick to plonkish (GKR can generalize to R1CS/Plonkish) by accessing the witness in well defined index, then we only need "one" sumcheck per layer
For instance, GKR "layer" design brings the following flexibility to access any index in previous layer:
GKR reduce: with "sumcheck reduction", it's more like a sumcheck graph that we reduce from output layer to input layer. We adopt a simpler design: log(N) + 1 sumchecks. Top log(N) layers just do the layer reduction, more like GKR LogUp style. 1
is to batch all the zero/non-zero gate as a single sumcheck at the last layer. 1
design will end up with some intermediate variable commitment. But due to the simplicity of most riscv opcodes, most of expressions are degree == 1, and we don't need to commit to those intermediate variables.
Structural GKR tower for Ceno zkVM
90% times spent on log(N) sumcheck. 1
sumcheck is efficient. The sqeeze of constraint in 1
sumcheck will end up with few extra column commitment. Good news is most of high-frequency riscv opcode constrain just degree = 1 (arithmetic r-type), so we dont need to commit those intemediate value.
We can render the same kind of opcode instances (e.g. add
) in a plonkish table.
Notes
- multi-variate and non-rotation, each instance occupied one row
- the add opcode just 12 columns, we can further reduce to 9.
Specially note that for ROM we rely on LogUp, and for RAM check we rely on product arguments
For more background about the choice of LogUp and Product Argument, please see here How state proof/range-check proof works in Ceno zkVM
1 + Log(N) sumcheck works as follow
where the 1
sumcheck is interleaving RAM/ROM expression into a huge vector for the GKR style reduction, and the Log(N) is to reduce the LogUp/Product Argument into 2 single scalar
on Prover, this is an highly concurrenct structure, such that each opcode can be proven in either single node or different nodes as cluster. At the end, aggregate all final layer field element + all sumcheck proofs into a transcript. There is no complex aggregation circuit, instead it just take field element and insert into transcript.
This structure also exposes a very powerful characteristic: instance could be "position agnostic", i.e. out-of-order. An opcode input/output will always be encoded into RAM/ROM sets regardless its execution order. It enquip us without "sorting" effort.
On the verifier side, just get all #opcode final tower scalar value, and do simple field arithmetics:
5x faster on proving 2^20 add
instances on CPU compare to SP1 (til Aug codebase) on exiting naive implementation.
Ceno still has many pending improvements, so the performance is expected to get better still.
The add
workload benchmark result is inspiring and representitive because it's occupies ~ 1/4 of the execution trace in workloads, e.g. we tested on revm/fabonacci.
TODO more details
field
with 2^128 elements is VERY different with native field of size "2^128". Especially multiplication, it usually comes with table booking + divide & conquer.e.g. we can choose non-zkfriendly but fast hash, e.g. Poseidon -> Keccak. 10x fast on native CPU
Spartan
to encode verifier.