Try   HackMD

Thanks to Guillaume and Vitalik for discussions and feedback

Introduction

This is a fresh look at Binary Trees in L1, motivated by recent interest, offline discussions, etc.

EIP-3102 is still there but might be stale regarding how it would fit more recent ideas introduced in Verkle, which also applies to Binary. This can generate confusion when trying to compare Verkle vs. Binary fairly.

These notes might be useful to other core-devs to catch up on the topic and engage in discussions more efficiently. This document is a personal summary, not a formal proposal it can have mistakes and be improvable. Reach out if you have any further ideas or improvements!

Shape - Sparse vs Prefixed

There are two main options for a binary tree:

  • A Sparse Merkle Tree (SMT): a tree with a fixed (full) depth of 256-bits (assuming key length).
  • A prefixed Merkle Tree (PMT): including extension nodes to avoid full depth.

Apart from other pros/cons, an SMT is attractive since the spec is very simple, which is nice for overall implementations, particularly circuits for validity proofs. The obvious tradeoff is having a bigger tree and requiring more hashing, which PMT tries to address.

Are SMTs a real option?

There are two research posts which try to mitigate some of the “inefficiencies”:

These aren’t new breakthroughs (~2019), but maybe not everyone knows about them, which is one of the motivations for consolidating/extending them when writing these notes. I recommend reading the original discussions, but I’ll do my summary to have a more linear/compact explanation for others and provide visuals to enhance understanding. This doesn’t mean the ideas must be used; just consider them if they help.

Optimize storage

Given an SMT, if you choose a path leading to a non-zero leaf value, you will find that after some depth, you’re in a sub-tree with a single non-zero leaf.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

If an implementation represents this sub-tree explicitly, it requires a lot of “dummy” internal nodes and a long path of H(empty, XXX) or H(XXX, empty) hashes which is wasteful.

An easy implementation optimization in the EL db layer is inventing a “virtual” node at the root of that sub-tree, which represents a claim: “An empty sub-tree with value X at path Y”. This implementation optimization removes the inefficiency of many empty subtrees, so we don’t store them.

Note that at the protocol level, the tree is still an SMT. Still, in the implementation, we store nodes as if they were a PMT—said differently, at least on this front, you resolved with an implementation optimization what you usually do with a spec optimization (PMT).

Override H(empty, empty) value

In a SMT, you have at many levels H(empty, empty) where H is the hash function to merkelize nodes. This H(empty, empty) calculation is cachable since you can precalculate the result for this happening at all levels.

But if we define H(empty, empty) = 0, this is simplified and reduces implementation complexity.

What about the number of required hashes?

There’s still something quite heavy in an SMT: the amount of hashing we must do. Given a branch to a non-zero leaf node, we still need to do as many hash evaluations as the length of the branch, which for a 256-bit key would be many.

This is mitigated in the mentioned Ethresearch post. I believe the explanation there is quite clear, so I’ll provide a visual explanation that can help understand the idea:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Notes about the picture:

  • Nodes have which rule number was applied in that internal node, as explained in the Ethresearch post.
  • Red means having to calculate a “real” hash i.e: “slow path”. Other colors are “fast paths” (i.e: no hashing).

The TL;DR is lowering the impact of extra hashing due to empty subtrees. Vitalik also notes you can tune k to reach the desired tradeoff between amortizing calculated hashes and security. Instead of requiring 16x fewer hashes in the spare part of the tree, you could require 18x (or more) fewer hashes if you accept further bits of security loss.

This idea adds some complexity to the spec since we have all these extra rules on applying the hashing in the tree. Additionally, (h/t to Lev for pointing this out) this also applies to in-circuit logic, and here, “branches” are more annoying than out-of-circuit implementations. If this idea makes sense, it depends on how hard we might need to push “saving hashes” for validity proofs.

Note that Vitalik's very recent The Verge post includes further strategies to help on this front, e.g., using multidimensional gas to limit the number of state accesses. These ideas (and others) are orthogonal, so they could be stacked on each other.

State data encoding

This is where many of Verkle EIP's ideas would still apply to Binary Trees, making both share the same benefits. Since I’m not assuming you know about Verkle's main changes, I’ll explain from scratch.

Let's see how accounts are encoded into the tree:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The main idea is that you address data in the tree by stems, which are 248-bit keys that group 256 values— so each value is still referenced with a 256-bit key. The motivation is better data locality. e.g: if you need to access the nonce of an account, there’s a higher chance you will also access the balance or first bytecodes than some other random Ethereum state. If this grouping is designed properly, you can minimize the number of branches you need to prove for the data (and other db efficiency gains).

The stem part of the path is shown in black nodes, i.e., internal nodes. The stem guides you through the internal nodes. After the stem, we have two types of following sub-trees (which will be explained soon), which will represent 256 32-byte values. This is why these sub-trees have the “8-bit” clarification in the image above.

Note 1: the number of stems in Verkle Trees is the same in Binary Trees, thus why the main logic around the new gas costs is ~equivalent (i.e: EIP-4762 main ideas would be kept).

Note 2: Although EIP-3102 intended to have a single tree, the storage slots have their tree “hanging” next to the account data, which makes the depth more unpredictable. The proposal now has a single tree but is balanced since storage/code chunks are spread over the tree (i.e., the same as Verkle).

There are two kinds of sub-trees, which I describe in two different dotted-square colors in the image. Note these are just semantics; the tree is a ~balanced binary tree.

Account header (green)

The Account header describes a stem that refers to 256 values, which include:

  • Account data: nonce, balance, code size, code hash, and some reserved (rsrv) slots.
  • The first 64 storage slot values (if the account is a contract)
  • The first 128 code chunks (if the account is a contract).

Reiterating the motivation, if you access account data, there’s a high chance you’ll access other account data; thus, we put them in the same stem (i.e. branch).

Other storage slots (blue)

An account can use many more storage slots than the first 64 ones, so we have to put them somewhere. There’s a map that would keep grouping 256 storage slots in a stem (i.e. now different from the account stem).

Other code chunks (blue)

Recall that in Verkle or Binary, we need to include the account’s code in the tree. This is done through chunkification which is “just” chunking the code and storing it in 32-byte values.

As mentioned before, the first 128 code chunks are in the account stem, so it’s very easy/cheap to access them since you’ll need them to execute contracts.If the contract has more than 128 code chunks, those will be grouped in groups of size 256 and live in different stems in the binary tree.

Hash function

We’d like a hash function that:

  1. Is fast enough out-of-circuit in a CPU (i.e. for EL clients to execute)
  2. It has an efficient arithmetization under a proving system that is fast enough for worst-case blocks.
  3. It’s secure.

The following hash functions seem to be the candidates:

  • Poseidon2
  • SHA256/Blake3
  • Ajtai

Each option requires proper evaluation for the points a), b), and c). Impressive progress is happening under different proving systems for some of these hash functions, and this is quite relevant for the decision. Which hash function is used shouldn’t impact (most of) the rest of the protocol changes required for Binary Trees (see next sub-section).

Note that if we end up applying the optimization explained in “What about the number of required hashes?”, this is the hash function referred to as the “base” one.

How do we encode 32-byte leaf values in finite fields?

This question makes sense if we use an arithmetic hash function such as Poseidon2 since the input of the hash function are finite field elements. The state tree stores 32-byte values. Depending on which hash function we choose, we might need to clarify how this would work under a specific field.

As an example, if we use a proving system that relies on the Mersene31 finite field, the size is 31 bits, so we can’t store 256 bits in a single 31-bit value. The minimum amount of required field elements would be nine. If this unaligned encoding is inconvenient, we can also do 24-bit encoding in 11 finite field elements. Another thing that can influence this decision, is which is the rate we need for hashing internal nodes as a consequence of the number of squeezed finite fields for whatever concluded security levels. In any case, the general idea is the same.

Tree serialization

Let’s analyze how the tree can be serialized and saved on disk. Since the tree's arity is two, it could be argued that it has a high overhead.

As mentioned before, we can abstractly imagine a binary tree as:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Two main parts of the tree:

  • Top-tree refers to the tree's top part containing internal nodes containing subtrees with more than one value.
  • Single-value-subtree: refers to the bottom part of the tree where we have subtrees with only one non-zero value.

The diagram corresponds to an SMT since you end up with a subtree with a single key value after some levels of internal nodes.

In the case of an SMT, optimizing the serialization of the single-value subtree was already explained in a previous section. We don’t need any “optimization” for PMT since this sub-tree is already compressed in a node type.

Regarding the Top-tree, we can note the following comment in EIP-3102:

The ethereum tree was originally hexary because this would reduce the number of database reads (eg. 6 instead of 24 in the above example). It is now understood that this reasoning was flawed, because nodes can still “pretend” that a binary tree is a hexary (or even 256-ary) tree at the database layer (eg. see https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751), and thereby get the best-of-both-worlds of having the low proof sizes of the tree being binary from a hash-structure perspective and at the same time a much more efficient representation in the database.

To help visualize the trick, we can now imagine the tree in this way:

image
This changes the perspective of looking at internal nodes as a 4-arity tree. Each colored triangle is a single entry in the database. For example, the first blue triangle is a single entry in the database with the hash values of the Blue-Blue-Red-Blue nodes. With this single db-lookup, you can load this whole subtree, which stores an N-arity tree (in this case, 4).
image

To load the path: Blue(10)Red(11)Green(11)Yellow(101101), we only do four disk lookups, compared to the naive serialization of one lookup per bit.

The only catch is that you need to do more hashing when loading the tree to memory to reconstruct the internal triangle node values, but this is a tunable knob that should account for the following:

  • Optimal number of bottom-level subtrees (i.e: Blue-Blue-Red-Node) nodes to fit in a disk page.
  • How fast is the cryptographic hash function used in the tree. This accounts for the overhead of hashing on the fly when loading the tree. Note that the trick described in the previous Cryptography hash section helps with this.
  • How fast is a disk lookup.

Increasing the arity means less disk space (ROI decreases fast), fewer lookups, and more CPU work. Decreasing arity means more disk lookups, more disk usage, and less CPU work. If we use 8-bits, this would mean 256-arity which is analogous to Verkle Trees.

Extra optimization: Note that these X-arity subtrees reconstruction (i.e., hashing recalculation) has two parallelizable dimensions: each subtree level and between subtrees. Regarding the latter, you can overlap hashing recalculation while doing further db lookups for lower subtrees.

Note that everything described here is independent of the protocol spec. Each EL client can choose whatever tradeoff (e.g. small/big arity) makes sense for their architecture, underlying key-value database used, etc.

Important note: This is napkin math showing a tradeoff space between storage overhead and CPU workload. It would be useful to implement it and have real numbers—I’m just (re) explaining the overall idea.

Example case

Let's analyze the following case:

  • Hash function output is 32 bytes.
  • For storage optimizations, use 256-arity to apply the trick.
  • The tree has 2^32 non-zero key values; this should be ~4x more than expected for current mainnet data.

Let’s run some numbers:

  • The “top tree” has ~32 levels, and since we apply the 256-arity trick, this means we need 4 “triangles”.
  • Thus, to load a leaf value, we need 4+1 disk lookups:
    • The first four lookups each load 256*32=8KiB
    • The last lookup should be ~256*3264 bytes (i.e: prefix bits + 256 32-byte values).
  • If we need to reconstruct all the hashes in the path to this leaf node:
    • For an SMT: 4*(2^8-1) + (256-32) = 1244 hashes
      • If we use the trick described in the What about the number of required hashes? section, we can divide (256-32) by ~16 (or more depending on the decided collision security tradeoff).
    • For a PMT: 4*(2^8-1) = 1020 hashes
  • Recall that:
    • As mentioned in the Tree serialization section, many of these hashing are parallelizable.
    • These are numbers for a single leaf-value reconstruction. In real blocks, at least one “top triangle” is shared.
    • Despite parallelizability or top-triangle sharing, the amount of hashing could be significant, but the (out of circuit) performance hit depends on the chosen hash function and aritry-number trick.
  • The total tree size would be (
    28
    +…+
    232
    )*32~=128GiB, half the size of the naive serialization (~
    233
    nodes * 32 bytes = 256GiB). Note this is an estimation to know the order of magnitude.

This is an estimation, real life can be more tricky in implementations so take with a grain of salt.

Appendix

This section describes some lateral topics regarding Binary Trees.

How much of the current Verkle progress is reusable for Binary Trees?

Here are the current EIPs that are related to Verkle:

  • EIP-6800 describes the switch to the new tree and how data is stored there.
  • EIP-4762: describes the gas cost changes, which now penalized big witnesses.
  • EIP-7748: describes how the state conversion would work.
  • EIP-7709: describes how BLOCKHASH would now use data saved in EIP-2935.

If we guide ourselves on what changed by the mentioned EIPs we can say:

  • EIP-6800 would be partially reused, mainly regarding data encoding into the tree.
  • EIP-4762 for the gas model would be similar since the main goal is penalizing witness costs.
  • BLOCKHASH resolving from system contract storage (EIP-7709) doesn’t depend on the target tree — so it’s the same.
  • State conversion (EIP-7748) would be the same since nothing about the current Overlay Tree migration strategy depends on the shape of the target tree.
  • The pre-images distribution sub-problem of the state conversion is the same since those preimages depend on the current tree (MPT), not the target tree.
  • The Verkle cryptography stack, which includes Bandersnatch (+Banderwagon), Multiproof + IPA, and Pedersen Hashes, would be dropped. For binary trees, we’d be including validity proof of the pre-state (and post-state updates).
  • The current ~200 execution-spec-test test vectors we’ve been working on for Verkle could be adjusted with low effort. The same applies to state-conversion test vectors (which don’t exist yet).

Note that EIP-7736, proposing leaf-level state expiry, could also be applied to Binary Trees.

“Normal” State proofs

Note that for L1, we’ve always used a proving system to prove the pre-state for block execution. This won’t be explained since it still isn’t clear if STARKs, GKR, or another strategy will be used. Still, simple proofs can be useful, as mentioned in Use cases of witnesses other than verifying blocks in Vitalik’s article.

A way to visually see an option on how to do it in a SMT:

image
Given the leaf at the bottom, we can conceptually describe the proving path in four steps:

  • One EN(5, Right, H0) which allows you to construct Section 1. Note that intermediate black nodes aren’t provided in the proof since the validator can calculate that knowing this path length is 5 and H0.
  • Three IN(A, B) , which describe an internal node with A = Left | Right and B = Sibiling Hash which allows to calculate an intermediate node coming from the left or right.

When proving more than one leaf, we could sort them by keys and have a stream of EN+IN+EN+IN+IN; as more keys are introduced, many IN will be shared, thus not repeated.

Proof of absence is similar, proving that the target key lives in a sub-tree with an empty root.