Aard's Merkle trees in POD2 doc

WARNING: This document is no longer up-to-date. See the doc in the Github repo instead.

POD2 "outside circuit" has three data types: integers, strings, and Merkle roots. In-circuit, any integer, string, or Merkle root is represented by a 4-tuple of elements of a 64-bit field.

A Merkle root represents a Merkle tree. Here we specify the structure of Merkle trees.

A Merkle tree stores an unordered collection of (key, value) pairs. Keys in a Merkle tree must be distinct: the same key cannot be entered twice. (In this sense a Merkle tree exposes the same functionality as a Python dictionary.)

Each key or value can again be an integer, string, or a Merkle root. In-circuit, of course, the key and value are each a 4-tuple of field elements.

The branching rule

A Merkle tree is implemented as a binary tree.

The insertion path of any key is given by a deterministic rule: given key and a nonnegative integer depth, the rule determines that key belongs to either the left or right branch at depth depth.

The precise rule is an implementation detail; here is one possibility. In-circuit, key will be represented as a 4-tuple of field elements

key = (k_0, k_1, k_2, k_3).

Write the field elements in binary (in little-endian order):

k_0 = b_0 b_1 ... b_63
k_1 = b_64 b_65 ... b_127
....

At the root, key goes to the left subtree if b_0 = 0, otherwise the right subtree. At depth 1, key goes to the left subtree if b_1 = 0, otherwise the right subtree, and similarly for higher depth.

The tree structure

A Merkle tree with no entry at all is represented by the hash value
root = hash(0).
(For security, the hash function hash must output a 4-tuple of field elements.)

A Merkle tree with a single entry (key, value) is called a "leaf". It is represented by the hash value
root = hash((key, value, 1)).

A Merkle tree tree with more than one entry is required to have two subtrees, left and right. It is then represented by the hash value
root = hash((left_root, right_root, 2)).

(The role of the constants 1 and 2 is to prevent collisions between leaves and non-leaf Merkle roots. If the constants were omitted, a large Merkle tree could be dishonestly interpreted as a leaf, leading to security vulnerabilities.)

Example

Suppose we want to build a Merkle tree with three key-value pairs:

(0, "even")
(2, "even")
(4, "even").

The keys are integers, so the are represented in-circuit by themselves. In little-endian order, the first three bits of the keys are:

(0, "even"): 000
(2, "even"): 010
(4, "even"): 001.

The resulting tree looks like:

                root
              /      \
           L_root   R_root = hash(0)
          /      \
      LL_root   LR_root = hash((2, "even", 1))
      /    \
          LLR_root = hash((4, "even", 1))
LLL_root = hash((0, "even", 1)).

The intermediate roots are computed as hashes of their subroots:

LL_root = hash((LLL_root, LLR_root, 2)
L_root = hash((LL_root, LR_root, 2)
root = hash((L_root, R_root, 2).