[Note: this document is under review and might be changing in the next days]
[Follow-up: Please look at Gottfried document for a more formal approach to the topic]
This document explores an attempt to attack a particular branch in a Verkle Tree. By "attack" I mean making that branch have a depth higher than expected.
Attacking tree branches isn't a new idea, and it's a known attack vector for any self-balancing tree based on hashes (e.g: the current Merkle Patricia Tree (MPT)).
The Ethereum protocol plans to change from a Merkle Patricia Tree to a Verkle tree (VKT). Thus, this tree new design motivates re-exploring this attack vector since:
Some important notes about this document:
This attack vector can have severe implications for Ethereum security and stability, so any "final call" about its security should ideally be reviewed/analyzed by experts in the field if they sense this requires a more formal analysis. This document presents a reasonable attempt and invites/nerd-snipe experts if they think a serious risk must be addressed.
The rest of the document is organized in three main sections:
Context: Why is this attack vector a problem?
: gives some intro to the reader about the discussed problem.Attacker strategy and PoC
: tries to go through what an attacker strategy might be and show some numbers on a developed proof of concept of that strategy.Attack impact benchmarks
: show some benchmarks on how branches of different depths affect some relevant metrics for the Ethereum clients and protocol.You should skip this section if you know why making branches longer than average is a problem.
The impact of a branch that has a higher depth than average tries to break a design expectation of having a balanced tree, which has some unwanted side effects:
Those are simply effects of branches with higher depth. The critical question is how costly it is for an attacker to generate an attack that would be meaningful (negatively) for the Ethereum protocol. For example, if it costs 100 billion dollars to affect Ethereum client performance by 1%, we can ignore the attack vector.
To make this document educational, let's review some points an attacker can consider for their attack. Note this might be a naive-ish attacker, and depending on the resources, it might attempt a more distributed and specialized attack.
The Verkle Tree design doesn't have the concept of extension nodes. This means the tree doesn't try to avoid long branches by compressing shared bytes in tree keys in the same branch (as with MPTs).
For example, let's start with an empty tree and insert the key 0x0000000000000000000000000000000000000000000000000000000000000001
. Now imagine how the tree looks like doing the following operations:
0x0100000000000000000000000000000000000000000000000000000000000001
: we'd have two branches, each of depth 1.0x0001000000000000000000000000000000000000000000000000000000000001
: we'd have two branches, each of depth 2.0x0000000000000000000100000000000000000000000000000000000000000001
: we'd have two branches, each of depth 10.In summary, the length in bytes of the shared prefix +1 will be the depth of the branch for both keys. From an attacker's perspective, this is convenient since we only need to insert one appropriately crafted key to increase the depth of the branch, and not many if there were extension nodes to avoid "prefix-compression".
This is also convenient since we only need to pay for a single SSTORE to produce the effect in a branch. Note that SSTORE costs might be negligible compared to the computational cost of finding the correct key to insert.
The attack cost is related to the computational cost of finding a tree key with a matching prefix length for the target branch to be attacked.
Let's go through some basic ideas that come to mind.
Trees usually use tree key hashing to generate a self-balanced tree automatically. The hashing function can have many shapes and forms, but if the hash is cryptographically secure, it shouldn't be biasable by the user (i.e: potential attacker).
The kind of tree key hashing used in Verkle Trees is called Pedersen Hashing or could be described by a multiscalar multiplication on a fixed basis.
Proving this hashing function would produce a uniform and unbiased result is outside the scope of this document. Out of curiosity and double-checking, I ran a large sampling months ago and verified that the results were uniform. The unbiasibility can't be proven by any statistical analysis, since it requires a more formal mathematical argument.
To conclude about this point for this document, we assume an attacker can't manipulate the hashing function inputs in any strategic way to gain an advantage compared to a random/linear scan.
As mentioned before, Verkle Trees doesn't use any mainstream cryptographic hash function that has specialized instructions in CPUs. If that was the case, then there might not be better options than "just use X CPU instruction" which is probably the best you can do compared with other strategies.
In the case of Verkle Trees, the tree key calculation is a multiscalar multiplication of a fixed basis of length 5. Everything narrows down to how we can perform this operation as fast as possible with minimal work.
The short description of the tree key calculation is:
Or more compactly as:
where:
k
is a fixed scalarxxx_(low|high)
refers to the top and bottom 16-bytes of an address and treeIndex
.PointToHash(...)
transforms the elliptic curve point result of the multiscalar multiplication into bytes (which we want a prefix length match a target attacked key).The above logic for tree key hashing is used in both:
Thus, an attacker can try finding an Ethereum address that generates the desired tree key prefix or fixes a contract address and brute-forcing the storage slot. The latter is better since we can fix a contract address that will do the SSTORE
, and then make a contract call providing the "convenient" storage slot. This will make the tree write and fire the branch depth attack.
So, everything comes down to doing "scan" on the treeIndex
, and work backwards to generate a proper storage slot (I'll mention later how to transfrom a treeIndex
to (one possible) storageSlot
)
Despite the treeIndex
maps to two scalar 16-byte values, we only need to do a single scalar multiplication per brute force iteration. The lower 16-bytes give us plenty of search space.
The central question then is: how we can make this scalar multiplication and point addition as fast as possible?
My take on this PoC attack leverages some code that we did for go-ipa, the cryptography library to be used by geth for VKTs:
This strategy is at least much less naive than doing any other kind of "double and add", Pippenger style algorithm, or similar. Still, there might be better strategies that maybe allow us to do even less work per iteration.
Gory details note: An implementation (such as the PoC I did) should consider that not every treeIndex
value maps to a storage slot. In the tree key calculation, for most storage slots, we shift by a MAIN_STORAGE_OFFSET
constant (256**31), which should be considered if we want to be able to map the treeIndex
finding to a storage slot. I'm just mentioning this if anyone gives a detailed look into the implementation, it will see that I fix some high-level bytes of treeIndex
such that subtracting MAIN_STORAGE_OFFSET
never underflows a uint256
. If you're not reading details at that level, you can ignore this paragraph since it doesn't change anything about the important points.
I'll avoid getting into implementation details, but the PoC code is open, and the source can be found in a single commit here. There I added some extra comments if you're interested in more details. Still, it's not necessary to continue reading the document.
The attack code receives a contract address and a desired prefix and uses the precomputed tables to find a lower set of 48 bits for a treeIndex
that matches that prefix. Using the returned treeIndex
an attacker can generate a valid storage slot by doing storage_slot = treeIndex * 256 - 256**31
(I'm just reversing the storageSlot -> treeIndex
formula, see here).
The program output has lines of this style:
This means that for some defined arbitrary contract address:
Finding a storage slot that would result in a tree key with some (arbitrarily) defined prefix of length A took B milliseconds, and the corresponding tree index to be used in the attack is C.
Before showing the results, it's worth noting some things:
Running logs:
Looking for each extra depth implies the search space gets bigger by 256 (i.e: one extra byte in the prefix). The expected growth per extra depth should increase by a similar factor.
In the previous section, we explored how hard it is to generate these branches. The other side of the coin is, assuming we have branches of depth X, how much this affects the Ethereum protocol operation.
As mentioned earlier, the longer a branch is, the more commitments we need to update to propagate the change from the leaf node to the root node.
I've created a synthetic benchmark that creates a tree with all potential depths and measures how much time it takes to update all the required commitments to calculate the new root hash:
As expected, the time it takes is linear, depending on the depth of the branch. In the case of this particular CPU, it's around 6μs.
These numbers might be considered on the high-end spectrum since:
To have an idea on the lower end of hardware specs, I've run the same benchmark on a Rock5B:
Note that a Rock5B cost is <200usd (half the price of the CPU used in the previous run). It has an ARM64 architecture, which might not have all the same tricks as for amd64 in gnark-crypto, and of course, it doesn't have ADX support. Still, it isn't clear if a Rock5B can support both 4844 and VKT new load, but we use it as a lower bound.
It's more than obvious to say that a serious attacker won't use a 400usd CPU to attack the Ethereum network.
In this PoC, I used precomputed tables focused on a single elliptic curve point. This is far from a naive take, but there might be other more clever ideas depending on the way the search space is walked (e.g: maybe some math tricks with EC to save time)
This kind of attack is embarrassingly parallel. Partitioning the search space is very easy, and with little effort, the attacker could make more specialized attacks if it's also willing to use more RAM to have wider windows. If I had a CPU with four times more cores, all times above would be cut by four, too. An attacker might have access to a botnet, supercomputers, or other specialized hardware to fine-tune the computation.
Another fact is that the framing of the problem I did here was attacking a single branch, but an attacker could try producing a match on many (heavily used/touched) branches in the tree. This can improve the efficiency of finding a match by a significant factor.
From another perspective, as explored in the Attack impact benchmarks section, scenarios of depth up to 15 take less than 100μs (on a high-end CPU) and 450μs (on a very low-end CPU) to recompute the root commitment. The mainnet tree will probably have a depth of 5 or 6, so the "additional overhead" of the attack is much less than the presented numbers. Regarding witness impact size, the logic is similar. A longer branch means that the execution witness will need to provide "extra" commitments (32 bytes) per extra depth compared to the average (e.g: from 5 to 14, means ~288 bytes).
The next steps to continue this discussion that I imagine are:
- The execution witness is what the stateless client will receive as proof of all the pre-state needed to verify the block. This new construct is one of the main goals of Verkle Trees, since those will be much smaller than one coming from a MPT.