Guillaume Ballet
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
1
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Binary trie format EIP draft ###### tags: binary EIP eip: title: Binary trie structure author: Guillaume Ballet (@gballet), Vitalik Buterin (@vbuterin) discussions-to: https://ethresear.ch/t/binary-trie-format/7621 status: Draft type: Standards Track category: Core created: 2020-09-01 requires: None [toc] ## Abstract This proposal presents a binary structure and merkelization rule for the account and storage tries, which are merged into a single "state" trie. RLP and most of the MPT's optimizations are dropped to simplify the design. ## Motivation The current design of the Merkle Patricia Trie (MPT) uses an hexary trie. Hexary Merkle trees are more shallow than their binary counterparts, which means less hashing. Over the course of the 5 years of Ethereum's existence, it has become apparent that disk accesses are a greater bottleneck than hashing. Clients are therefore moving away from a storage model in which all internal nodes are stored, in favor of a flat (key, value) storage model first used by turbo-geth, in which the intermediate nodes are recalculated only when needed. There is a push for making Ethereum easier to use in a stateless fashion. Binary tries make for smaller (~4x) proofs than hexary tries, making it the design of choice for a stateless-friendly Ethereum. For that same reason, the account and storage tries are merged in order to have a single proof for all changes. The MPT design is also rife with uncanny optimizations for size, that have a limited effect - at the cost of prohibitive complexity. For example, nesting for children whose RLP is less than 32 bytes saves an estimated 1MB of disk space. A paltry compared to the 300GB required by a fast sync at the time of this writing. These optimizations are a significant source of errors, and therefore a consensus-breaking risk. The reliance on RLP has also been criticized for its complexity, while the overhead of a general-purpose encoding scheme isn't warranted for the rigid structure of a Merkle trie. The desire to switch the storage model from an hexary trie to a binary trie provides an opportunity to adopt a simpler trie architecture that pushes optimizations from the protocol level down to that of the client implementors. ## Specification ### Conventions As defined in the yellow paper, $\mathbb{B}$ is the set of all possible byte sequences, and $\mathbb{B}_{32}$ is the set of all byte sequences of length 32. Accordingly, $\mathbb{B}_{l}$ is the set of all byte sequences of length $l$. |Code|Symbolic|Description| |:-:|:-:|-| |`u256(x)`|$\texttt{U256}(x)$|Big endian, 32-byte representation of number _x_| |`||`| $\|$ |Byte-wise concatenation operator| |`++`| $\texttt{++}$ | Bit-wise concatenation operator| |`0b0101`|$0101_2$|The binary string `0101`| |`hash()`|$\texttt{H}()$|The usual hashing function| |`empty_hash`|$H_{\varnothing}$|The empty hash: `hash("")`| |`length(x)`|$\lVert x \rVert$|The byte length of object `x`| |`d[a..b]`|$d_{a..b}$|The big-endian bit sequence taken from $d$, starting at bit index $a$, up to and including the bit at index $b$.| ### Notable changes from the hexary structure * Account and storage tries are merged, with key length between 32 and 64 bytes; * RLP is no longer used; * The "leaf marker" bit used in the hex prefix is also dropped. Leaves are identified as nodes with no children; * Serialized nodes are hashed, no matter how small the byte length of the serialized nodes. ### The trie #### Structure The trie structure is made up of _nodes_. A node $N \equiv \left(N_{l},N_{r},N_{p},N_{v}\right)$ has the following 4 components: * $N_{l}$ is the hash to the node's _left child_. If the node does not have a left child, then $N_{l}$ is the empty hash $H_{\varnothing}$; * $N_{r}$ is the hash to the node's _right child_. If the node does not have a right child, then $N_{r}$ is the empty hash $H_{\varnothing}$; * the optional $N_{p}$ is the node's _prefix_ : every key into the subtrie rooted at $N$ is prefixed by this bit string; * $N_{v}$ is the _value_ stored at this node. The value is **only present in leaf nodes**. Nodes with $H_\varnothing$ as both children are called _leaf nodes_, and the remaining nodes are known as _internal nodes_. #### Accessing account's balance, nonce, code, storage root and storage slots Assuming an account $A \equiv \left(A_b, A_n, A_c, A_s\right)$ at address $A_a$, the following elements can be found at the following addresses: * The account balance $A_b \in \mathbb{B}_{32}$ can be found at key $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 00_2$; * The account nonce $A_n \in \mathbb{B}_8$ can be found at key $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 01_2$; * The code $A_c \in \mathbb{B}$ can be found at key $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 10_2$; * The root of the storage trie $A_s$ can be found at key $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 11_2$ * The storage slot number $k \in \mathbb{B}_{32}$ can be found at key $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 11_2 \space\texttt{++}\space \texttt{H}(k)$; After [EIP-2926](https://eips.ethereum.org/EIPS/eip-2926) has been rolled out, $A_c$ will represent the root of the code merkelization tree. The key accessing code chunk $c$ is $\texttt{H}(A_a)_{0..253} \space\texttt{++}\space 10_2 \space\texttt{++}\space \texttt{U256}(c)$. ### Node merkelization rule Leaves and nodes that have no prefix are hashed according to the rule below: ``` internal_hash = hash(left_child_hash || right_child_hash) leaf_hash = hash(hash(key) || hash(leaf_value)) ``` If a prefix is present, the prefix length (as a 32-byte integer) is further concatenated with the output of the prefix-less rule, and hashed again: ``` internal_hash_with_prefix = hash(u256(prefix_length) || internal_hash) leaf_hash_with_prefix = hash(u256(prefix_length_u256 - 1) || leaf_hash) ``` ## Rationale ### Merging of the account and storage tries The trend in clients is to store the keys and values in a "flat" database. Having the key of any storage slot prefixed with the address key of the account it belongs to helps grouping all of an account's data on disk, as well as simplifying the witness structure. ### Prefixes and extension nodes The removal of extension nodes has been considered as a way to greatly reduce code complexity. The removal induces really prohibitive hashing costs (on the order of 10 seconds for a trie with only 1k leaves) and as a result they have been kept. An attempt to keep extension nodes for witness and not the merkelization rule can be found [here](https://notes.ethereum.org/m5VMkX8FRvi0Q_OOR7TF4A). It has been deemed a good trade-off to keep extension nodes, and to only get rid of complex methods like the hex prefix and children nesting. ### 2x32-byte inputs It has been requested to keep each node hash calculation as a function that takes two 256-bit integer as an input and outputs one 256-bit integer. This property is expected to play nice with circuit constructions and is therefore expected to greatly help with future zero-knowledge applications. ### Binary tries Binary tries have been chosen primarily because they reduce the witness size. In general, in an `N`-element tree with each element having `k` children, the average length of a branch is roughly `32 * (k-1) * log(N) / log(k)` plus a few percent for overhead. 32 is the length of a hash; the `k-1` refers to the fact that a Merkle proof needs to provide all `k-1` sister nodes at each level, and `log(N) / log(k)` is an approximation of the number of levels in the tree (eg. a tree where each node has 5 children, with 625 nodes total, would have depth 4, as `625 = 5**4` and so `log(625) / log(5) = 4`). For any `N`, the expression is minimized at `k = 2`. Here's a table of branch lengths for different `k` values assuming `N = 2**24`: | `k` (children per node) | Branch length (chunks) | Branch length (bytes) | | - | - | - | | 2 | 1 * 24 = 24 | 768 | | 4 | 3 * 12 = 36 | 1152 | | 8 | 7 * 8 = 56 | 1792 | | 16 | 15 * 6 = 90 | 2880 | Actual branch lengths will be slightly larger than this due to uneven distribution and overhead, but the pattern of `k=2` being by far the best remains. The ethereum tree was originally hexary because this would reduce the number of database reads (eg. 6 instead of 24 in the above example). It is now understood that this reasoning was flawed, because nodes can still "pretend" that a binary tree is a hexary (or even 256-ary) tree at the database layer (eg. see https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751), and thereby get the best-of-both-worlds of having the low proof sizes of the tree being binary from a hash-structure perspective and at the same time a much more efficient representation in the database. Additionally, binary trees are expected to be widely used in Eth2, so this path improves forward-compatibility and reduces long-run total complexity for the protocol. ### Key length instead of bit prefix In order to remove the complexity associated with byte manipulation, only the bit-length of the extension is used to merkelize a node with a prefix. ## Backwards compatibility A hard fork is required in order for blocks to have a trie root using a different structure. ## Test Cases This section is still incomplete and will be fleshed out as clients start implementing the support for binary tries. ### Hard fork transition test Input: an hexary trie with N accounts and associated storages. Output: a binary trie with the same N accounts and associated storages, as well as an extra account for the miner. ## Implementation This section is still incomplete and will be fleshed out as clients start implementing the support for binary tries. * As of [commit 0db87e187dc0bfb96046a47e3d6768c93a2e3331](https://github.com/gballet/multiproof-rs/commit/6d22b1aef9548581826b3c04b3e00d6cc709388c), [multiproof-rs](https://github.com/gballet/multiproof-rs) implements this merkelization rule in the `hash_m5()` function, found in file `src/binary_tree.rs`. * An implementation of this structure for go-ethereum is available [in this branch](https://github.com/gballet/go-ethereum/tree/binary-trie-m5). ## Security considerations Security issues could arise when performing the transition. In particular, a heavy conversion process could incentivize clients to wait the transition out. This could lead to a lowered network security at the time of the transition. A transition process has been proposed with [EIP-2584](https://github.com/ethereum/EIPs/blob/master/EIPS/eip-2584.md). ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully