Codex Storage
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Help
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 12/MERKLE-TREE | Name | Merkle Tree | | ---------- | ------------------------------------ | | **Status** | **draft** | | **Editor** | **Balázs Kőműves** bkomuves@status.im | | **Contributors** | **Giuliano Mega** giuliano@status.im | | | **Mohammed Alghazwi** mohalghazwi@status.im| ## Abstract This specification describes the Merkle tree construction adopted in Logos Storage. The construction relies on keyed hashing, injective encodings, and a simple padding approach to prevent a variety of collision attacks, and is designed to operate both on conventional hashing functions like SHA-2, and ZK-friendly functions like Poseidon2. ## Specification Merkle trees can be implemented in a variety of ways and, if done naively, can be also attacked in a variety of ways (e.g [BitCoinOps25](#references)). For this reason, this document focuses a _concrete implementation_ of a Merkle tree which helps avoid common pitfalls. We start by laying out some definitions in the [Definitions](#Definitions) section, followed by our [keyed hash construction](#Keyed-Hash-Construction), which guards the Merkle against certain padding and layer abuse attacks. We then discuss the role of injective encodings in the [Encoding and (de)Serialization](#Encoding-and-deSerialization) section, and their role in secure Merkle tree constructions when dealing with hashing functions that operate on finite field elements. Next, we provide some brief guidance on serialization in the [Serialization/Deserialization](#Serializing--Deserializing) section. Finally, we present an abstract interface for a Merkle tree module in the [Interface](#interface) section. ### Definitions A Merkle tree, built on a hash function `H`, produces a Merkle root of type `T` (the "Target type"). This is usually the same type as the output of the hash function (we will assume this below). Some examples: - SHA1: `T` is 160 bits - SHA256: `T` is 256 bits - Keccak (SHA3): `T` can be one of 224, 256, 384 or 512 bits - Poseidon: `T` is one or more finite field element(s) (depending on the field size) - Monolith: `T` is 4 Goldilocks field elements The hash function `H` can also have different types `S` ("Source type") of inputs. For example: - SHA1 / SHA256 / SHA3: `S` is an arbitrary sequence of bits[^1] - binary compression function: `S` is a pair of `T`-s - Poseidon: `S` is a sequence of finite field elements - Poseidon compression function: `S` is at most `t-k` field elements, where `k` field elements should be approximately 256 bits (in our case `t=3`, `k=1` for BN254 field; or `t=12`, `k=4` for the Goldilocks field; or `t=24`, `k=8` for a ~32 bit field) - as an alternative, the "Jive strategy" for binary compression (see [[Bouvier22](#references)]) can eliminate the "minus `k`" requirement (you can compress `t` into `t/2`) - A naive Merkle tree implementation could for example accept only a power-of-two sized sequence of `T` Notation: Let's denote a sequence of `T`-s by `[T]`; and an array of `T`-s of length `l` by `T[l]`. ### Keyed Hash Construction One of the main goals of a secure Merkle tree construction is to avoid collision attacks; i.e., situations in which an attacker is able to obtain a tree with the same Merkle root as a legitimate tree, but using data that is different from the data used to construct the tree originally. In that sense, we are interested in two different types of attacks here. #### 1. Layer abuse attacks. A Merkle tree requires, at its minimum, two elements to be constructed: 1. A hashing function `H : S -> T`; 2. a _binary compression function_ `C: T x T -> T`. In some hashing functions like SHA256, one could express `C(a, b)` as `H(F(a, b))`, where `F(a, b) = a || b` (Fig. 1(a)) and `||` denotes byte string concatenation. This creates a type of symmetry one can then exploit - since a Merkle tree root does not encode its depth, one could pretend that `F(a, b)` is actually data, and construct a shorter Merkle tree which evaluates to the same root as before (Fig. 1(b)); i.e., $T^{\prime}_3$ in the second tree is the same as $T_7$ in the original tree. ![Backup_of_merkletree](https://hackmd.io/_uploads/HJD3VleIZx.png) **Figure 1.** **(a)** original Merkle tree; **(b)** a different Merkle tree with the same Merkle root as the original; i.e., $T^{\prime}_3 = T_7$, constructed from a layer abuse attack. #### 2. Padding attacks. Merkle trees can be unbalanced as the number of leaves will often not be a power of two. A common technique is then to pad the tree in some way by adding "synthetic" missing children to internal nodes that do not have both of them (Fig. 2(a)). If done carelessly, this can potentially allow an attacker to replace the padding with real data that is equivalent to such padding (Fig. 2(b)), and again generate the same Merkle root from data that is different from what had been intended. ![](https://hackmd.io/_uploads/r1GxooJLZg.png) **Figure 2.** **(a)** Original padded Merkle tree, and **(b)** a Merkle tree with the same root as the original built from a padding attack. If this sounds hypothetical, early versions of Bitcoin padded their Merkle trees by using the existing child of a node as padding, and this led to security issues that had to be later addressed (e.g. see [[CVE-2012-2459](#references)]). #### Construction To prevent the problems above, we adopt a _keyed compression function_ `C: K x T x T -> T` which takes a _key_ in addition to the two target elements it needs to compress. Our key is composed by two bits, which encode: * whether a parent node is an _even_ or an _odd_ node; i.e., whether it has 2 or 1 children, respectively; * whether the node belongs to the bottom layer; i.e. the first layer of internal nodes above the leaves, or not. This information is converted to a number `0 <= key < 4` by the following algorithm: ```haskell data LayerFlag = BottomLayer -- ^ it's the bottom (initial, widest) layer | OtherLayer -- ^ it's not the bottom layer data NodeParity = EvenNode -- ^ it has 2 children | OddNode -- ^ it has 1 child -- | Key based on the node type: -- -- > bit0 := 1 if bottom layer, 0 otherwise -- > bit1 := 1 if odd, 0 if even -- nodeKey :: LayerFlag -> NodeParity -> Int nodeKey OtherLayer EvenNode = 0x00 nodeKey BottomLayer EvenNode = 0x01 nodeKey OtherLayer OddNode = 0x02 nodeKey BottomLayer OddNode = 0x03 ``` The number is then used to key the compression function, as follows: * for finite field sponge based hash functions, like Poseidon2 or Monolith, we apply the permutation function to `(x,y,key)` where the key (as field element) is placed at index 8 (0-based) of the sponge state (assuming state size of 12). Then we take the first component of the result (`T`); * for hash functions like SHA256 we simply prepend the key; i.e., we take `SHA256(key|x|y)` where `key` is encoded as bytes. This is in practice equivalent to having 4 different compression functions `C_key(x, y) = C(key, x, y)`. Note that the `BottomLayer` flag is set for all compression nodes whose children are leaves, but leaves themselves; i.e. the `H(S_i)`, are not keyed. Remark: Since standard SHA256 includes padding, adding a key at the beginning doesn't result in extra computation (it's always two internal hash calls). However, a faster (twice as fast) alternative would be to choose 4 different random-looking initialization vectors, and not do padding. This would be a non-standard SHA256 invocation. Finally, we proceed from the initial sequence in layers: take the previous sequence, apply the keyed compression function for each consecutive pairs `(x,y) : (T,T)` with the correct key based on whether this was the initial (bottom) layer, and whether it's a singleton "pair" `x : T`, in which case it's also padded with a zero to `(x,0)`. Note that if the input is a singleton list `[x]`, we still apply one layer, so in that case the root will be `compress[key=3](x,0)`. ### Encoding Our Merkle trees are created from arbitrary data, and arbitrary data means raw byte strings. Such byte strings are typically first split into equally-sized blocks[^2] of length $b$, which we refer to as _data blocks_[^3]. Each data block then undergoes hashing under `H`, and those become the tree's leaves (Fig. 3(a)). When the source type `S` for the hashing function is a byte string; e.g. as with SHA256, those strings can be taken to be the $S_i$ in Fig. 1 and hashed directly. For hashing functions which do not take byte strings as input - like Poseidon2, which operates on sequences of finite field elements - we must _encode_ our data blocks into the proper type first (Figure 3(b)). ![encoding](https://hackmd.io/_uploads/SJedGS-Ibe.png) **Figure 3.** **(a)** A byte string (raw data), split into data blocks which are also byte strings. **(b)** Data blocks after undergoing encoding. We define an _encoding function_ `E` to be a function which can do that for us. Formally speaking, `E` can encode an arbitrary-valued byte string of length $b$ into a value of the required type `S`; i.e., `E : byte[b] -> S`. Such an encoding function MUST always be _injective_; i.e., distinct byte strings MUST map into distinct values of the type `S`. To see why, suppose `E` were not injective. This means there are two byte strings `D_1` and `D_2` such that `E(D_1) = E(D_2) = S_1 = S_2`. We could then take the raw data used to build a Merkle tree $A$, replace all occurrences of `D_1` with `D_2`, and build an identical Merkle tree from distinct data - which is what we want to avoid. <!-- _Serialization_, on the other hand, is a reversible operation in which an _existing_ Merkle tree gets transformed into a byte string and back, and should be bijective (and thus injective) by construction. Note that the encoding and the serialization of a tree CAN coincide when `S` is a byte sequence like in SHA256, but not when `S` consists of prime field elements. Remark: We use the same encoding of sequence of bytes to sequence of field elements for the sponge hash construction, when applicable. --> The next section deals with injective encodings for the major types we use - including SHA256, BN254, Goldilocks. We then discuss padding in Sec. [Padding](#padding). #### Injective Encodings Encoding for SHA256 is trivial in that it already operates on byte strings, so there is no extra work to be done. For most of the other hashing functions we care about, however, `S` is a sequence of finite field elements, and we therefore need to perform encoding. In such situations, it is convenient to construct our injective encoding function `E` by chunking the input sequence of size $b$ into smaller pieces of size `M`, and then applying a _element encoding function_ `E_s` function to each chunk. This function can output either a single or multiple field elements per `M`-sized chunk (Listing 1). ```python= def E( raw: List[byte], E_s: List[byte] -> List[FieldElement], M: uint, ) -> List[FieldElement]: # requires len(raw) to be multiple of M chunks = len(raw) / M encoded = [] for i in range(0, chunks): encoded.extend(E_s(raw[i*M:(i+1)*M-1])) return encoded ``` **Listing 1.** Constructing `E` from `E_s`. <!-- For any `S = Source[H]`, fix a size `M` and an injective encoding function `E : byte[M] -> S`. For SHA256 etc, this will be standard encoding (big-endian; `M=32`). --> For the BN254 field, we can construct `E_s` by taking `M=31`, and having the 31 bytes interpreted as a _little-endian_ integer modulo `p`. This outputs one single field element per $31$ bytes. Poseidon2 or Monolith (over the Goldilocks field) take multiples of four Goldilocks field elements at a time. We have some choices: We can have `M=4*7=28`, as a single field element can encode 7 bytes but not 8. Or, if we want to be more efficient, we can still achieve `M=31` by storing 62 bits in each field element. For this to work, some convention needs to be chosen; our implementation is the following: ```haskell #define MASK 0x3fffffffffffffffULL // NOTE: we assume a little-endian architecture here void goldilocks_convert_31_bytes_to_4_field_elements(const uint8_t *ptr, uint64_t *felts) { const uint64_t *q0 = (const uint64_t*)(ptr ); const uint64_t *q7 = (const uint64_t*)(ptr+ 7); const uint64_t *q15 = (const uint64_t*)(ptr+15); const uint64_t *q23 = (const uint64_t*)(ptr+23); felts[0] = (q0 [0]) & MASK; felts[1] = ((q7 [0]) >> 6) | ((uint64_t)(ptr[15] & 0x0f) << 58); felts[2] = ((q15[0]) >> 4) | ((uint64_t)(ptr[23] & 0x03) << 60); felts[3] = ((q23[0]) >> 2); } ``` This simply chunks the 31 bytes = 248 bits into 62 bits chunks, and interprets them as little endian 62 bit integers. #### Padding The element encoding function `E_s` requires byte strings of length `M`, but the input's length; i.e., the size $b$ our data blocks, might not be a multiple of `M`. We might therefore need to _pad_ the blocks, and we need to do it in a way that avoids the usual trap: different input data resulting in the same padded representation. We adopt a `10*` padding strategy to pad raw data blocks to a multiple of `M` bytes; i.e., we _always_ add a `0x01` byte, and then as many `0x00` bytes as required for the length to be divisible by `M`. Assuming again a data block of length $b$ bytes, then the padded sequence will have `M*(floor(b/M)+1)` bytes. Note: The `10*` padding strategy is an invertible operation, which will ensure that there is no collision between sequences of different lengths; i.e., to unpad scan from the end, drop `0x00` bytes until the first `0x01`, drop that too; reject if no `0x01` found Once the data is padded, it can be chunked to pieces of `M` bytes (so there will be `floor(b/M)+1` chunks) and encoded by calling the element encoding function for each chunk. Remark: We don't pad the sequence of leaf hashes (`T`-s) when constructing the Merkle tree, as the tree construction ensures that different lengths will result in different root hashes. However, when **using the sponge construction,** we need to further pad the sequence of encoded field elements to be a multiple of the sponge rate; there again we apply the `10*` strategy, but there the `1` and `0` are finite field elements. #### Serializing / Deserializing When using SHA256 or similar, this is trivial (use the standard, big-endian encoding). When `T` consists of prime field elements, simply take the smallest number of bytes the field fits in (usually 256, 64 or 32 bits, that is 32, 8 or 4 bytes), and encode as a little-endian integer (mod the prime). This is trivial to invert. To serialize the actual tree, just add enough metadata that the size of each layer is known, then you can simply concatenate the layers, and serialize like as above. This metadata can be as small as the size of the initial layer, that is, a single integer. ### Reference implementations: - in Haskell: https://github.com/logos-storage/logos-storage-proofs-circuits/blob/master/reference/haskell/src/Poseidon2/Merkle.hs - in Nim: https://github.com/codex-storage/codex-storage-proofs-circuits/blob/master/reference/nim/proof_input/src/merkle.nim ## Interfaces The interface for a Merkle tree typically requires two types of APIs: - one which takes a sequence `S` of length `n` as input, and produces a Merkle tree of type `T`, and; - one which takes a sequence of bytes (or even bits, but in practice we probably only need bytes), and produces a Merkle tree of type `T`. We also typically want to be able to obtain proofs for existing data blocks, and check blocks along with their proofs against a root. The interface is described in Listing 2. We avoid going into actual algorithms here as those are already specified in the reference implementations. ```python class Encoder[S]: """ Encodes sequences of bytes to S. The encoding produced by `Encoder` MUST be injective. """ def encodeBytes(data: List[byte]) -> S: ... class MerkleProof[T]: """ A `MerkleProof` contains enough information to check a leaf against a tree root. The verifier MUST reconstruct node parity and layer keys deterministically from index and the `leaf_count`; these values MUST NOT be supplied externally. """ index: uint leaf_count: uint path: List[T] def create(data: List[S]) -> MerkleTree[T]: """Creates a `MerkleTree` from a sequence of data in the .""" ... def create( data: List[byte], block_size: uint, encoder: Encoder[S], ) -> MerkleTree[T]: """Creates a `MerkleTree` from raw data by splitting into blocks of `block_size` size and encoding them. """ return create( encoder.encodeBytes(data_block) for split_chunks(data_block, block_size) ) def get_proof( self: MerkleTree[T], i: uint, ) -> MerkleProof[T]: """Obtains a `MerkleProof` for the i-th leaf in the tree.""" ... def verify_proof( root: T, proof: MerkleProof[T], ) -> bool: """Verifies an existing `MerkleProof` against a tree's root. :return: `True` if the proof verifies, or `False` otherwise. """ ... ``` **Listing 2.** Abstract interface for a Merkle tree. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). ## References [[Bouvier22](https://eprint.iacr.org/2022/840)]: Bouvier et al. "New Design Techniques for Efficient Arithmetization-Oriented Hash Functions: Anemoi Permutations and Jive Compression Mode", Cryptology ePrint Archive, 2022. [[BitCoinOps25](https://bitcoinops.org/en/topics/merkle-tree-vulnerabilities/)]: Merkle tree vulnerabilities. https://bitcoinops.org/en/topics/merkle-tree-vulnerabilities/ [[CVE-2012-2459](https://bitcointalk.org/?topic=102395)]: Block Merkle Calculation Exploit. https://bitcointalk.org/?topic=102395 [^1]: Some less-conforming implementation of these could take a sequence of bytes instead. [^2]: Data blocks might also need to be padded to a multiple of $b$, but raw data padding is beyond the scope of this document. [^3]: The standard block size for Logos storage is $65\,536$ bytes for SHA256 trees, and $2\,048$ bytes for Poseidon2 trees.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully