# Overflow vulnerability in Polygon's zkEVM Storage machine *by Georg Wiese* In October 2023, I discovered a vulnerability in [Polygon's zkEVM](https://polygon.technology/polygon-zkevm) which lets a malicious prover make false statements about the persistent storage in the zk-Rollup. Since then, Polygon has fixed the issue. This post sketches what the vulnerability was. ## Background A zero-knowledge Virtual Machine is a virtual machine that in addition to executing a given program also provides a cryptographic "proof" that convinces a verifier that the program was executed correctly. In this particular case, we are talking about a zk-VM that simulates the [Ethereum Virtual Machine](https://ethereum.org/en/developers/docs/evm/) (EVM). In an EVM execution, the program has access to a persistent key-value store, called the *storage*. In Polygon's zkEVM, the [Storage machine](https://docs.polygon.technology/zkEVM/architecture/zkprover/storage-state-machine/) implements [CRUD](https://en.wikipedia.org/wiki/Create,_read,_update_and_delete) operations (Create, Read, Update, Delete) on the storage, which is implemented as a sparse Merkle tree. For a concrete example of how this works, let's follow [Polygon's description of the READ operation](https://docs.polygon.technology/zkEVM/concepts/sparse-merkle-trees/basic-smt-ops/#the-read-operation): <!-- ![[fig10-key-not-set-eg](https://docs.polygon.technology/img/zkEVM/fig10-key-not-set-eg.png)](https://hackmd.io/_uploads/SyFadxo1A.png) --> ![https://docs.polygon.technology/img/zkEVM/fig5-b-mpt-kv-eg.png](https://hackmd.io/_uploads/SkHXSvn1A.png =400x) Here, the sparse Merkle tree contains two "proper" leaves, one with a key whose binary representation ends in `00` ($L_a$) and another whose key ends in `10` ($L_b$). For simplicity, let's assume that $K_a = 0$ and $K_b = 2$. The paths to these leaves are determined by the bit-representation of the keys (starting from the least significant bit). Because the two nodes have the same least significant bit (LSB), a "zero node" is created as the sibling of $B_{ab}$, which signifies that there is no element in the tree with a key whose LSB is `1`. A key is represented as a tuple of four field elements in the [Goldilocks](https://github.com/0xPolygonZero/plonky2/issues/1) prime field (to represent a ~256-Bit key), as explained [here](https://docs.polygon.technology/zkEVM/architecture/zkprover/storage-state-machine/construct-key-path/). However, this detail is not of much relevance to understanding the vulnerability; the reader can think of the key as being just one field element for simplicity. ## The Implementation This document focuses on the GET operation, which roughly works as follows: - If a node of the requested key is in the tree, the storage VM validates a prover-provided Merkle proof. The least significant bits of the key are used to traverse the Merkle tree until a leaf node is reached. Then, the remaining key (i.e., the unused bits) are included in the leaf hash. - If a node is not in the tree, the returned value should be 0. To prove that this is the case, the prover has to show one of the following: - The key bits lead to a "proper" leaf node, but with a remaining key that doesn't reconstruct to the requested key. - The key bits lead to a "zero node". The exact details of the GET procedure are specified in [zkAssembly](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm). To understand the vulnerability, the following lines are important: - [Line 26](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm#L26): The machine reads the remaining key `RKEY` as prover-provided input. - [Line 139](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm#L139): The machine reads a prover-provided bit value `RKEY_BIT` which specifies the next step in the Merkle path (from bottom to top). - Lines [151](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm#L151) and [164](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm#L164): The `CLIMB_RKEY` instruction appends the current `RKEY_BIT` to `RKEY`. - [Line 168](https://github.com/0xPolygonHermez/zkevm-storage-rom/blob/c1ef4f411f7c11047e7d5252e24245b707fe2206/zkasm/storage_sm_get.zkasm#L168): Ends the execution and returns to the caller. At this point, the requested key is supposed to be in the `RKEY` register. Finally, the storage machine is connected to the main machine via [the following permutation argument](https://github.com/0xPolygonHermez/zkevm-proverjs/blob/84ca95f90f1ff0551d724cca92767b1cc829c068/pil/main.pil#L825-L838): ``` sRD { SR0 + 2**32*SR1, SR2 + 2**32*SR3, SR4 + 2**32*SR5, SR6 + 2**32*SR7, sKey[0], sKey[1], sKey[2], sKey[3], op0, op1, op2, op3, op4, op5, op6, op7, incCounter } is Storage.iLatchGet { Storage.oldRoot0, Storage.oldRoot1, Storage.oldRoot2, Storage.oldRoot3, Storage.rkey0, Storage.rkey1, Storage.rkey2, Storage.rkey3, Storage.valueLow0, Storage.valueLow1, Storage.valueLow2, Storage.valueLow3, Storage.valueHigh0, Storage.valueHigh1, Storage.valueHigh2, Storage.valueHigh3, Storage.incCounter + 2 }; ``` For this permutation to hold, the requested key (stored in `sKey[]`) needs to be equal to `Storage.rkey{0,1,2,3}` at the *end* of the execution. This is the same as the `RKEY` register in the zkAssembly above. ## The Vulnerability The vulnerability is due to the `CLIMB_RKEY` instruction, which is implemented as follows (excerpts from [`storage.pil`](https://github.com/0xPolygonHermez/zkevm-proverjs/blob/84ca95f90f1ff0551d724cca92767b1cc829c068/pil/storage.pil)): ```rust // Line 161: If setRkeyBit is active, set rkeyBit to a prover-provided // value, else keep the current value rkeyBit' = setRkeyBit*(op0-rkeyBit) + rkeyBit; // Line 162: Constrain rkeyBit to be bit-valued at all times rkeyBit * (1 - rkeyBit) = 0; // Line 195: Set climbedKey0 to rkey0 * 2 + rkeyBit pol climbedKey0 = (level0*(rkey0*2 + rkeyBit - rkey0) + rkey0); // Line 151: // - If setRkey is active, set rkey0 to a prover-provided value // - If iClimbRkey is active, set rkey0 to climbedKey0 (defined above) // - Else: keep the current value rkey0' = setRkey*(op0-rkey0) + iClimbRkey*(climbedKey0-rkey0) + rkey0; ``` Ignoring details, `rkey0'` is essentially constrained to be equal to `rkey0 * 2 + rKeyBit`. But this expression can overflow, for two reasons: 1. The initial remaining key is a 4-tuple of completely unconstrained field elements. 2. Even if the remaining key was constrained to be the right amount of bits (e.g., should be representable by 63 bits if we are on the first level), the expression could still overflow because the Goldilocks prime ($p = 2^{64} - 2^{32} + 1$) is smaller than some 64-bit numbers. ## Exploit This vulnerability can be exploited by a malicious prover. We'll focus on maliciously proving that a node is not in the tree, when in fact it is. This way, a malicious prover can falsely claim that a storage slot's value is 0. For example, consider the GET algorithm to prove that a key `1` is not in the tree in the example above: An honest prover will set the remaining key to `0` and the path bit to `1`. Then, they can provide a Merkle proof that this leads to a zero node. However, if instead the remaining key was set to $(p - 1) / 2$, the key would be reconstructed as $(p - 1) / 2 \cdot 2 + 1 = p = 0$. As a result, simply by modifying a free input, the honest proof for key `1` can be changed to make a false claim about key `0`. ## Demo [This patch](https://gist.github.com/georgwiese/6bbbd2348a64e4013595bacb778f3f06) can be applied to [0xPolygonHermez/zkevm-proverjs@84ca95f90f](https://github.com/0xPolygonHermez/zkevm-proverjs/tree/84ca95f90f1ff0551d724cca92767b1cc829c068) to demonstrate the vulnerability. It adds a test that honestly sets `storage[0] = 1` and `storage[2] = 2` (as in the example above), but then maliciously reveals `storage[0]` to be `0`. The test can be run by executing `npm run test:storage`. ## Generalizing the attack The exploit described above can be generalized to read operations of any key, as long as there is *any* zero node in the tree (which will be the case in practice). In particular, let there be a zero node at the end of a path $(b_1, b_2, ..., b_n)$. Let $k_{zero}$ be defined as: $$k_{zero} = \sum_{i = 1}^n{2^i \cdot b_i}$$ To fake a proof for a key $k$, we can set the remaining key to: $$k_{rem} := (k - k_{zero}) \cdot (2^{-1})^n$$ where $2^{-1}$ is the multiplicative inverse of $2$ in the Goldilocks field and all calculations are computed in field arithmetic. This will make sure that $k_{rem} \cdot 2^n + k_{zero} = k$. As mentioned above, we haven't discussed the fact that keys in the storage machine actually [consist of 4 field elements](https://docs.polygon.technology/zkEVM/architecture/zkprover/storage-state-machine/construct-key-path/), which each contribute bits to the path. The attack generalizes to this setting (perhaps requiring that we have a zero node at a level $\ge 4$, so that we can overflow all 4 components of the key as needed). While it is possible to read any key as $0$, it at least isn't trivial to read it as a non-zero value, because that would require providing a Merkle proof of the existence of a node with that value *and* a particular remaining key (which contributes to the node's hash). Likely this would require also maliciously inserting a node, which might well be possible because of the same overflow vulnerability. ## Potential Impact Just the ability to falsely read a value of $0$ can already be exploited for a variety of smart contracts, for example those that: - use non-zero values to indicate that a proxy has been initialized or set - control token swap pools, since a malicious price of 0 could enable draining pools - implement tokens, since a user's balance could be maliciously proved to be 0 It seems very plausible that other CRUD operations are affected as well, although I have not looked into that in much detail. Note that Polygon's zkEVM proving is currently permissioned, and only provers can exploit this vulnerability. But, of course, the whole point of computing a validity proof in the first place is that we don't trust the prover. ## The Fix Since reporting the issue in October 2023, Polygon has fixed the issue and generously rewarded a bug bounty (classifying the issue as high severity). As I am not the author of the fix, I will not describe it in detail here. In short, the `climbedKey = rkey0 * 2 + rkeyBit` calculation has been replaced with a call to a new [ClimbKey](https://github.com/0xPolygonHermez/zkevm-proverjs/blob/c4a2ce7617cb34b2c119742c2adbcd11ac435ec4/pil/climb_key.pil) machine. As explained in [this comment](https://github.com/0xPolygonHermez/zkevm-proverjs/blob/c4a2ce7617cb34b2c119742c2adbcd11ac435ec4/pil/climb_key.pil#L9), any possible remaining key that *would* cause an overflow is excluded from a pre-computed lookup table. This way, the prover would not be able to create a proof if they set the remaining key maliciously. ## Acknowledgements I would like to thank [Leo Alt](https://twitter.com/leonardoalt) (my valued colleague at [powdr](https://www.powdr.org/)) for helping me navigate the reporting process and coming up with much of the "Potential Impact" section. Also, I would like to thank the team at Polygon for confirming the bug quickly and keeping me in the loop while fixing the problem!