The SimpleSerialize (SSZ) specification defines how different types should be serialized and represented as a merkle tree. There has recently been discussions around the complexity of defining a merkleization method for dynamically sized objects, and the Ethereum 2.0 researchers have decided to avoid the complexity by forcing all dynamic types to i) specify a maximum capacity and ii) always be represented in their merkle tree with that maximum capacity. Our goal is to illustrate the issues with dynamically sized types and how capping the capacity tackles the problem.
Before exploring proposed constructions, let's first review the attributes of an ideal proof format:
In order to calculate the merkle root of this data, we need to define a procedure for representing it in a merkle tree. We'll follow the procedure defined in the SSZ spec.
Suppose we have the following struct S
.
struct S {
x: u256,
// notice how there is no maximum capacity
y: List<u256>,
}
Using only the defined utility functions, we can generate the same merkle root as hash_tree_root
.
let s = S { x: 1_u256, y: list![2_u256, 3_u256] };
let x = merkleize(pack(s.x));
let y_data_root = merkleize(pack(s.y));
let y_root = mix_in_length(y_data_root, s.y.len());
let s_root = merkleize([x, y_root]);
assert_eq!(s_root, hash_tree_root(s));
The code's calculations are visually represented in the diagram below. Note that y_root
mixes in the true merkle root of y
and y
's current length.
s_root
/ \
x y_root
/ \
y_data_root 2
/\
2 3
Let's define a path
to access s.y[1]
and the generalized indices for each node in the path.
path = [y, 1]
generalized_indices = [1, 3, 6, 13]
1
/ \
2 3
/ \
6 7
/ \
12 13
To illustrate the issue with modifying dynamically sized types, let's append 4_u256
to y
.
s_root
/ \
x y_root
/ \
y_data_root 3
/ \
/\ /\
2 3 4 0
And again we'll determine the generalized indices for the path [y, 1]
.
path = [y, 1]
generalized_indices = [1, 3, 6, 12, 25]
1
/ \
2 3
/ \
6 7
/ \
12 13
/ \ / \
24 25 26 27
Although the path remained the same, the list of generalized indices changed. This will occur anytime the length of a dynamic object crosses a power of two.
Rather than allowing dynamically sized arrays to have an unbounded number of elements, we can give them a max length. Using this max length, and changing how dynamic types are Merkelized, we can avoid much of the implementation complexity mentioned above that results from having to change the location of elements in the tree. Thus, any list must specify the max number of elements in can include. This is currently the status quo.
struct S {
x: u256,
// 4 is the max capacity of y
y: List<u256, 4>,
}
In the above structure S
, the array y
has a max length of 4
. Let us now consider how this same structure would be merkelized.
let s = S { x: 1_u256, y: list![2_u256, 3_u256] };
let x = merkleize(pack(s.x));
let y_data_root = merkleize(
pack(s.y),
pad_for=(4 * (size_of::u256() / BYTES_PER_CHUNK))
);
let y_root = mix_in_length(y_data_root, s.y.len());
let s_root = merkleize([x, y_root]);
assert_eq!(s_root, hash_tree_root(s));
In the above merkle tree calculation, any dynamically sized array is padded to a balanced Merkle tree that has as many leaves as possible elements. The elements on the right side of the tree are the elements actually in the array. Thus, in the above, example, we get the tree.
s_root
/ \
x y_root
/ \
y_data_root 2
/ \
/\ /\
2 3 0 0
Let's define a path
to access s.y[1]
and the generalized indices for each node in the path.
path = [y, 1]
generalized_indices = [1, 3, 6, 12, 25]
1
/ \
2 3
/ \
6 7
/ \
12 13
/ \ / \
24 25 26 27
Appending 4
to this array simply changes the tree structure to:
s_root
/ \
x y_root
/ \
y_data_root 3
/ \
/\ /\
2 3 4 0
This operation does not change the generalized indicies of any element in the array. We note that it is an error to append an element to a dynamic array that is full.
2**64
), and so in practice this is not limiting.There are a few other proposed solutions to the issue of simplify the merkleization of dynamic types.
Suppose we have the following struct S
.
struct S {
x: u256,
// notice how there is no maximum capacity
y: List<u256>,
}
Using only the utility functions, we can generate the same merkle root as hash_tree_root
. We'll assume merkleize
organizes the chunks such that the right -> left node points to the first two nodes and the right -> right -> left points to the next four nodes and so forth, in powers of two.
let s = S { x: 1_u256, y: list![2_u256, 3_u256] };
let x = merkleize(pack(s.x));
let y_data_root = merkleize(pack(s.y));
let y_root = mix_in_length(y_data_root, s.y.len());
let s_root = merkleize([x, y_root]);
assert_eq!(s_root, hash_tree_root(s));
The code's calculations are visually represented in the diagram below.
s_root
/ \
x y_root
/ \
2 y_data_root
/
/\
2 3
Let's define a path
to access s.y[1]
and the generalized indices for each node in the path.
path = [y, 1]
generalized_indices = [1, 3, 7, 14, 28]
1
/ \
2 3
/ \
6 7
/ \
14 \
/\ 15
28 29
To illustrate the issue with modifying dynamically sized types, let's append 4_u256
to y
.
s_root
/ \
x y_root
/ \
3 / \
/\ \
2 3 \
/\
/ 0
/ \
/\ /\
4 0 0 0
And again we'll determine the generalized indices for the path [y, 1]
.
path = [y, 1]
generalized_indices = [1, 3, 7, 14, 28]
1
/ \
2 3
/ \
6 7
/ \
14 \
/\ \
28 29 \
15
/ \
30 31
/ \
60 61
/ \ / \
120 121 122 123
Despite appending a new element to the list, the generalized indices for the path
remained stable.
Once a construction with generalized indicies has been decided upon, proofs for individual leaves can be defined as an object.
MerklePartial
struct MerklePartial {
indices: List<uint64, 2**32>
chunks: List<Bytes32, 2**32>
}
We'll assume a tree structure with bounded dynamic types and use the example from earlier.
s_root
/ \
x y_root
/ \
y_data_root 3
/ \
/\ /\
2 3 4 0
In order to prove that y[1]
is a part of s_root
we'd need to provide the elements at indices [2, 7, 13, 24, 25]
.
1
/ \
[2] 3
/ \
6 [7]
/ \
12 [13]
/ \ / \
[24][25] 26 27
The other necessary nodes can be calculated from the initial set. If we need to prove y[0..2]
are in the tree, the proof looks exactly the same.
generalized_indices = [2, 7, 13, 24, 25]
1
/ \
[2] 3
/ \
6 [7]
/ \
12 [13]
/ \ / \
[24][25] 26 27
As the size of the tree expands, transmitting multiple merkle proofs at once will continue to be far more efficient than in isolation. The coresponding MerklePartial would be as follows:
MerklePartial {
indices: vec![2, 7, 13, 24, 25],
chunks: vec![x, y.len(), h(y[2], y[3]), y[0], y[1]]
}