Machine Learning zk-VM

# Machine Learning zk-VM I recently [attended](https://devfolio.co/projects/machine-learning-zkvm-fdf6) the [ZK Hack](https://www.zkistanbul.com/) event in Istanbul and took the weekend to hack together a zk-VM specialized for machine learning workloads. ## Motivation A [zk-SNARK](https://z.cash/learn/what-are-zk-snarks/) is a cryptographic proof of having computed a function correctly, without necessarily revealing all of its inputs. There are two main approaches to compute those: One is to bake your function into a fixed "circuit"; the other is to describe your program in an assembly language that a [zero-knowledge Virtual Machine (zk-VM)](https://www.cryptologie.net/article/564/what-are-zkvms-and-whats-the-difference-with-a-zkevm/) can interpret. There is a trade-off between the two: While the circuit-based approach is more efficient, zk-VMs can be more flexible. Their verifier only needs to be deployed once and is completely independent of the program being run. Additionally, zk-VMs support jumping around the program code as needed, supporting conditionals and loops, whereas circuits have a fixed computational graph. When it comes to machine learning, [EZKL](https://github.com/zkonduit/ezkl) is a popular implementation of the circuit-based approach. It compiles many machine learning model into a [Halo2](https://zcash.github.io/halo2/) circuit. Why opt for the zk-VM approach instead? Here are a few reasons: - As mentioned earlier, a zk-VM can come with a verifier for *all* programs, in this case, all machine learning models. - Unlike the circuit-based approach, where the wiring of the circuit needs to be known to the verifier, a zk-VM can be built such that only a hash of the program is known to the verifier. This setup, seen in platforms like [Polygon Miden](https://polygon.technology/polygon-miden), allows for completely hiding the network architecture when applied to machine learning. - Lastly, a machine learning optimized zk-VM could be used as a sub-machine in more general-purpose zk-VMs, like those emulating RISC-V or EVM. For example, a zk-Rollup could introduce a special opcode allowing users to run machine learning models efficiently inside a smart contract! ## The Approach The main approach involves running EZKL, inspecting the compiled Halo2 circuit, and then applying a transformation that generates a zk-VM, an assembly program, and the prover-provided memory. This process is summarized in this picture: ![High-level overview of the approach](https://hackmd.io/_uploads/ryhDLoRQa.png) Now, let's delve into more detail. ### A Primer on EZKL Consider a simplified example of how EZKL works in the case of a toy model: ![EZKL example](https://hackmd.io/_uploads/SJbJwoCQp.png) EZKL follows the [Plonkish arithmetization](https://zcash.github.io/halo2/concepts/arithmetization.html) of Halo2, where `a`, `b`, and `c` are advice columns, and `mul`, `dot`, and `add` are (publicly known) selectors for the three gates used in this example. Here's what happens row-by-row: - **Row 1**: The `mul` gate is active, verifying that `c = a * b`. Here, `a` and `b` are set to the first two entries of the vectors in the dot product computation. - **Row 2**: The `dot` gate, similar to the `mul` gate, also adds the value of `c` from the *previous* row. At this point, `c` acts as an accumulator for the dot product. - **Row 3**: The `dot` gate is applied again, with the value in column `c` now being the dot product result. - **Row 4**: This value is *copied* to cell `a` in row 4. The `add` gate then verifies that `c = a + b`. While most Plonkish circuits will likely include the `mul` and `add` gates, the `dot` gate is specific to machine learning workloads, where dot products are ubiquitous. In reality, EZKL has 10 basic arithmetic gates and 2 lookups (for rescaling activations and applying the ReLU activation function). The lookups might actually depend on the model and the quantization settings, but it's fair to say that these 12 gates cover a wide range of machine learning models. ### Turning a Plonkish Circuit into a zk-VM To build our VM, we: - Introduce an instruction for each gate in the Halo2 circuit. - Set the model-specific program to the list of currently active gates. One challenge is validating copy-constraints, as VMs can't easily access their own state from previous time steps. The solution involves adding a read-only memory, ensuring consistent values for repeated reads from the same memory address. The VM assembly program is depicted here: ![zk-VM Example](https://hackmd.io/_uploads/SkVsqiAmT.png) Each instruction receives several arguments, interpreted as memory addresses. It then looks up the values and asserts an instruction-specific property. For instance: - `mul(a, b, r): r == a * b` - `dot(a, b, m, r): r == a * b + m` - `add(a, b, r): r == a + b` There are one-to-one correspondences between (1) the gates in EZKL's circuit and the instruction set, and (2) the rows in EZKL's circuit and the rows in the compiled assembly program. The copy constraint in the above example must hold if this program executes correctly because we're reading from memory address 8 twice. Similarly, the way the `dot` gate overlaps in the EZKL circuit is reflected in the way memory addresses are reused. ## Implementation Implementing this requires: 1. A description of the model-agnostic zk-VM. 2. A program generating the assembly program from a machine learning model by running the EZKL pipeline and inspecting the compiled circuit. 3. A program extracting memory values from a specific EZKL instance. ### A zk-VM in 70 Lines of Code Using [Powdr](https://www.powdr.org/), implementing a zk-VM is surprisingly easy! My hackathon version is available [here](https://github.com/Machine-Learning-zk-VM/ezkl/blob/main/src/powdr_template.asm). Currently, it implements only 5 of the 12 EZKL gates, but these are sufficient to prove a multi-layer CNN. The machine's implementation doesn't have "normal" registers but only a program counter and [assignment registers](https://docs.powdr.org/asm/registers.html#assignment-registers), which are used for inputs and outputs of the instructions. ``` reg pc[@pc]; reg X1[<=]; reg X2[<=]; reg X3[<=]; reg Y[<=]; ``` Our write-once memory is implemented as follows: ``` let ADDR = |i| i; let mem; ``` `ADDR` is a [fixed column](https://docs.powdr.org/pil/fixed_columns.html) that just counts from 0 to whatever maximum time step our VM has. `mem` is a witness column that can be chosen freely by the prover. [Instructions](https://docs.powdr.org/asm/instructions.html) in Powdr are conditional constraints. For example, the multiplication instruction is implemented as follows: ``` instr mult X1, X2, Y -> { {X1, a} in {ADDR, mem}, {X2, b} in {ADDR, mem}, {Y, res} in {ADDR, mem}, res = a * b } ``` This expresses that if the current instruction is a multiplication, the following constraints must hold: - The tuple `(X1, a)` must be in the list of tuples when "zipping" together the `ADDR` and `mem` columns. This perfoms a "read" operation on the memory: `X1` can be thought of as the address. Because the `ADDR` column has unique values, this means that `a` must be equal to whatever value is in the specified memory cell. - Similarly, `b` and `res` are read from memory in the next two constraints. - Finally, the relationship `res = a * b` must hold between the values `a`, `b`, and `res`. The ReLU instruction is implemented via a lookup constraint. In fixed columns, we precompute all possible input/output pairs for inputs in the range $(-2^{15}, 2^{15})$: ``` col fixed X(i) {i - 32768}; col fixed RELU_Y(i) {match i < 32768 {1 => 0, 0 => i - 32768}}; instr relu X1, Y -> { {X1, a} in {ADDR, mem}, {Y, res} in {ADDR, mem}, {a, res} in {X, RELU_Y} } ``` ### Extracting the Program & Memory Content Rather than reverse-engineering the EZKL codebase, I inspected the circuit via Halo2's `MockProver`. Unfortunately, to access all the needed information, I had to fork Halo2. The basic transformation process is: - Identify the active selector in each row to determine the instruction. - Map cells in the advice columns to memory addresses: We analyze the copy constraints in order to find out which cells are supposed to be identical and then use the ID of the equivalence class as the memory address. - Record the values to determine the memory content for the specific input to the model used in the EZKL instance. For those interested, the code is available [here](https://github.com/Machine-Learning-zk-VM/ezkl/blob/main/src/generate_vm.rs), although it's quite hacky. ## Preliminary Benchmarks To compare performance with EZKL, I ran preliminary benchmarks on the [`4l_relu_conv_fc`](https://github.com/Machine-Learning-zk-VM/ezkl/blob/main/examples/onnx/4l_relu_conv_fc/gen.py) example. The instructions for reproduction are in the [README](https://github.com/Machine-Learning-zk-VM/ezkl/blob/main/README.md). These are the benchmark results: | Approach | Backend & Field | Proving time (min) | |-------------|---------------------|--------------------| | zk-VM | eStark + Goldilocks | 0:56 | | zk-VM (opt) | eStark + Goldilocks | 0:27 | | zk-VM | Halo2 + BN254 | 5:50 | | zk-VM (opt) | Halo2 + BN254 | 4:53 | | EZKL | Halo2 + BN254 | 0:13 | Powdr supports two backends, resulting in different proof times. As expected, EZKL is the fastest option. ### Edit Nov 20: Hand-optimized constraint system In the table above, I added a version of the VM where I [hand-optimized the constraint system](https://github.com/Machine-Learning-zk-VM/ezkl/blob/main/examples/onnx/4l_relu_conv_fc/vm_hand_optimized_opt.pil) as mentioned in the "Future Work" section. I made the following changes: 1. Reduce the number of lookups by having memory lookups like `{ main.X1, main.a } in { main.ADDR, main.mem }` only once, instead once for every instruction that needs it. 2. Remove 8 witness columns: 4 were completely unused and 4 that were constrained to be equal to other columns. I could have made the first change in the original Powdr assembly file. The second change could be automated by improving Powdr's built-in optimizer. ## Future Work All of this is hackathon code and **should under no circumstances be used**. Some of the caveats of the current implementation are: - The model parameters are currently just generic memory cells. It would make much more sense to hard-code their values in the program (which is model-specific anyway), so that the prover is bound to them in the same way as (s)he is to the network architecture. - The program is kind of "baked" into the VM; it is committed to in the key generation phase of the VM. This, of course, defeats the purpose of having a zk-VM in the first place! - There are currently no public inputs and outputs. So really, what's being proven right now is: "I ran this specific architecture using some parameters on some input and got some output." - Not very useful! However, it's clear that all of these issues could be fixed. To improve efficiency, several directions are possible: - Inspect and optimize the low-level [PIL](https://docs.powdr.org/pil/index.html) code that the higher-level Powdr language compiles to. Perhaps some of these optimizations can be automated and built into Powdr in the future! - Explore using permanent registers: The current implementation doesn't use any "normal" registers. In one time-step, an instruction loads all of its arguments and runs the instruction. We could instead split out the memory operations and run them in separate steps. - Reduce the number of instructions: Looking at [EZKL's base instructions](https://github.com/zkonduit/ezkl/blob/8b07857fb51137cfedd94b18cb146b2b2d07c1cf/src/circuit/ops/base.rs#L38-L45), you can notice some redundancy. For example, the `Sub` gate could be implemented using a combination of `Add` and `Neg`. Reducing redundancy would lead to fewer columns, although it would increase the amount of rows needed. - Getting rid of the program counter: One thing machine learning models have in common is that they don't actually have loops or conditionals. So really, the program counter and instruction fetching mechanism is kind of overkill. ## Conclusion All in all, I'm very happy with the result of this hackathon project, but of course, there is a lot that remains to be done. I'd be curious to hear your thoughts! For more information about the project, check out the [Github project](https://github.com/Machine-Learning-zk-VM/ezkl/), the [Devfolio page](https://devfolio.co/projects/machine-learning-zkvm-fdf6) and [this presentation](https://docs.google.com/presentation/d/11AauxQ1tfuM5U18hWUpqz-ph3awOmvonee1DjoijYF0/edit?usp=sharing)!