Introduction to ZKVM

# Introduction to ZKVM ZKVM is a [SNARK](https://hackmd.io/IA2U8Bq0ThmZxIGQ4KHd1w?both) that lets prover P to prove it ran a program cycle by cycle on some witness (an input). The program respresents a bytecode for specific VM. Pros: * usability, maintenance (use high level language Rust to write a program ) Cons: * perfomance overhead for prover * prover implementation is complex and buggy ## VM categorization: * Risc-V * simplest VM existed before ZKVM came in. * It has 32 bit data types * uses existing tooling to complile Rust code into Risc-V bytecode * SNARK-friendly VMs They tend to have less registers and simpler instructions * Cairo * 2 registers, * less than 10 primitive instructions which are designed to be SNARK friendly (finite field arithmetic) * Miden * Valida --------------------------------------------- ## ZKVM workflow ### VM simplified workflow VM repeatedly calls "fetch-decode-execute" in one cycle Running a computer program means calling "fetch-decode-execute" number of times 1. PC register stores which instruction to execute next 2. Fetch that instruction of bytecode 3. Increment PC by 4 (unless branch or jump) 4. Run INSTR rs1 rs2 rd (execute instruction INSTR for inputs rs1, rs2 and write result into rd) 6. Move to the next cycle ### High level ZKVM workflow Suppose, T = $2^{40}$ times (1 trillion) is the running time of a program A constraint system fetch-decode-execute T times apply a SNARK to that constraint system Repeat this process T times $C_{cycle}$ - circuit that implements a cycle | $C_{cycle}$ | ... | $C_{cycle}$ | final state of machine $C_{cycle}$ outputs entire memory cell of the VM every time ### Memory checking Memory checking ensures that execution trace contains a value thas has been read from some address in RAM = value that indeed most recent written into this address in RAM Info for cycle1 in a trace Registers: PC RS1 RS2 RD * PC(4 bytes) stores a lot of information such as primitive instruction, source registers, destination registers etc * Value proportedly stored in RS1 * Value proportedly stored in RS2 * RD contains an output of a primitive instruction on inputs (RS1, RS2) Examples: * ADD(RS1, RS2) -> RD * MUL(RS1, RS2) -> RD, * LOAD(address) -> RV loads the value from the address and stores in rd and Memory checkers checks the values in registers PC OP RS1 RS2 RD to ensure they are correct. Ways to do memory checking: * Reordering of reads grouped by memory cell. This leads to a communication overhead. It applies a permutation check to ensure that is actually a reordering. * Hashing - very expensive in SNARKS * Twist and Shout memory checking arguments There are two types of memory: 1. prover memory 2. execution memory ### VM overhead **Fact: VM abstraction itself introduces a ton of overhead** Let us consider matrix multiplication algorithm. ``` for i = 1...n for j = 1..n for k = 1..n C[i, j] += C[i,k] + C[k, j] ``` * Applying SNARK to an arithmetic circuit that implements this algorithm introduces a minimum overhead. * In contract, ZKVM route has a significant overhead. It decodes every line of bytecode, increment program counter PC, performs memory checking etc --- ### Prover space Applying any ZKVM naively to 1M of Risc-V cycles will take gigabytes of prover space. Suppose, T = $2^{40}$ number of cycles in some program. To control prover space: 1. Split execution trace into $2^{20}$ shards, each shard size $2^{19}$ cycles 2. Prove each shard separately 3. Recursively aggregate proofs Challenge: the shards can not be proven completely Key Issue with this approach Prove that state of RAM at the end of shard i matches the state of RAM at the start of shard i+1 Justin Thaler's approach : 1. get rid of sharding 2. sumcheck SNARKs dont need recursion to have a prover space under control