or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
\[ \newcommand{\postwo}[0]{\text{Poseidon2}} \]
Proofs in Codex
We start by taking apart the code that builds general Merkle trees, and then looking into the bits that are specialized for proof trees - namely the
SlotBuilder
and the code that builds fine-grained, per-block "cell" trees.Building Merkle Trees
Our Merkle trees use leaf domain separation to avoid "second preimage" (or, more aptly named, "node-as-leafs") attacks. The code snippet which computes the nodes at height \(k + 1\) from the nodes at height \(k\) is shown below. Trees are built from the bottom (height \(0\)) up (height \(=\) depth).
This starts with establishing the number of elements \(m\)[1] in the array at height \(k\), in lines \(11\)-\(13\). If we only have one element at level \(k\) and this is not the "bottom layer" (i.e., the set of nodes at height \(0\)), then we simply return
xs
as is. For a tree of depth \(d\), I claim that this will only happen when \(k = d\).To see why, let \(n_k\) be the number of nodes at height \(k\) in the tree; i.e. \(n_0\) is the number of leaf nodes. Now note that, at each call,
merkleTreeWorker
reduces the number of nodes to \(n_{k+1} = \lceil n_k / 2 \rceil\). It does that in a sort of a funny way from lines \(14\) to \(22\) by taking the division floor and summing one if that is not an exact division (i.e., it takes the ceiling), but that is what it does.This means that, in a bounded number of steps which is \(O\left(\log n_0\right)\), we will get to a point where \(n_j = 1\) for some \(j \geq 0\). Once that happens, we have two possibilities:
isBottomLayer
will be false. This will cause the node to be compressed with zero (line \(29\)), and lead to a second call tomerkleTreeWorker
containing a single node. That call will put us in case \((2)\) below.In both cases, the condition in line \(10\) is only triggered once we get to the tree's root, which by construction implies \(k = d\). \(_\blacksquare\)
Note that
merkleTreeWorker
will build a sequence of sequences, where each sequence corresponds to the nodes at a given height. The sequence at position \(0\), therefore, will contain the leaves, whereas the last sequence will contain a single node which is the root node.Building Proof Trees
As we have seen above, the Merkle tree construction relies on:
With proof trees, \(E\) is the set of valid \(\postwo\) hashes, and leaf nodes are the \(\postwo\) hashes taken over block cells, which are subdivisions of blocks, over an entire slot. We currently use \(2\text{kB}\) cells in \(64\text{kB}\) blocks, meaning \(32\) cells per block. For slots with \(j\) blocks, then, we will have \(32 \times j\) cells, or \(32 \times j\) leaf nodes in the proof tree.
Building a Full Proof Tree (Uploader)
The uploader; i.e., the node requesting storage, will need to build a full proof tree for its dataset.
The way this works is that uploaders build a subtree per slot, and then compose those into a larger tree which will ultimately yield the verification root – the \(\postwo\) hash of this tree-of-subtrees. This is all orchestrated within the
buildSlots
proc, listed below:At a high level, we can see the slot subtrees being built in lines \(13\) – \(17\), followed by the "tree-of-subtrees" in line \(19\) which uses the slot roots as leaf nodes.
Finally, we have some funny-looking piece of code in lines \(23\) – \(25\). The
if
statement in line \(23\) will be triggered if only if we haveverifyTree
set and thatverifyTree
has a computed root[2]. To understand where such a tree might have been set, we need to look intoSlotBuilder.new
:In essence, if
SlotBuilder
is instantiated with a verifiable manifest, it will build the top-level verification tree (the one that has the slot roots as leafs) and set it toverifiableTree
, as well as the slot roots intoslotRoots
. This will make it so thatbuildSlots
:buildSlot
for each slot;This is used by storage nodes when they attempt to fill a slot, as we will see later.
Slot and Block Subtrees
Slot subtrees, as we have seen before, are built by calls to
buildSlot
, listed next:This is somewhat straightforward; it:
Peeling another layer, we see that
buildSlotTree
(below) builds a tree from cell hashes. Those are not, however, hashes of individual cells as the code may lead you to believe.Instead, if we look at
getCellHashes
:We get a rather different story: it actually builds the subtrees for each individual block separately (lines \(217\) – \(230\)), leaning on the index strategy to figure out which blocks to fetch. Needless to say, misalignments in indexing strategies among clients, or among uploader and clients, will cause proofs to fail as roots won't match.
Merkle Proofs
Next, we look into the code which actually generates Merkle proofs. This is provided in the
getProof
proc, listed next.The main piece of
getProof
relies in the index of the leaf \(k\) for which we want to obtain a proof, the total number \(m\) of leaves, and the depth of the tree, \(d\).The loop in lines \(14\)–\(18\) then iterates from the top (root) to the bottom of the tree. To understand what that code does, we have to recall how a Merkle proof works. An example of a Merkle tree with a highlighted proof is provided in Figure 1.
Figure 1. A Merkle proof.
Suppose we wanted to get the Merkle proof for node \(010\). The idea here is that we traverse the path upwards from \(010\) and, for each node we encounter, we obtain its sibling.
The key thing to note here is that the index of the leaf already encodes the path to the leaf in the binary tree. If we want to know the immediate sibling for node \(010\), therefore, all we need to do is flip the least significant bit; i.e., the last part of the path, and we will get the index of the sibling (\(011\)), and that is exactly what line \(15\) does.
Since our trees are not balanced, it may well be that the sibling does not exist. Recall that when building the tree we were adding a "virtual" zero-valued sibling to every node without siblings, and so that is what we do again here: if the sibling is missing (i.e., the index does not exist), we add zero to the path (line \(16\)).
Finally, we update \(k\) so we move onto the next layer by dropping the last bit of the path (with a right shift of \(1\) byte in line \(17\)), and compute the number of nodes in the layer above with the following update rule. If we had \(m_k\) nodes at layer \(k\), then \(m_{k + 1} = \left\lfloor \frac{m_k + 1}{2} \right\rfloor\).
To understand why this works, note that \(m_k\) is either an even number, at which point we will get \(m_{k + 1} = m_k / 2\) from the formula above, or it is odd, at which point we will be getting \(m_{k + 1} = \frac{m_k + 1}{2}\) which accounts for the "virtual" zero-valued node that exists at layer \(k\).
Verification of Inclusion Proofs
Haven't reviewed this piece of code yet.
Single-Block Slots
The tree construction process detailed so far implies that, under certain circumstances, you may see some strange-looking trees being built. One case this is true is when parameters induce single-block slots. Assuming a block had \(8\) cells for simplicity (it currently has \(32\)), the slot tree would look something like what is in Figure 2.
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →Figure 2. Proof tree for a single-block slot.
As we can see, the slot-level tree in these cases will always be highly unbalanced. In particular, the first element of the proof path will be always zero-valued as the sibling node of the root of the slot tree will always be a padding node (i.e., zero-valued).
Data Sampling
The sampling algorithm extracts \(k_s\) Merkle proofs that run from an individual cell all the way to the dataset root. Merkle proofs are built in \(3\) stages (block level, slot level, dataset level) and must all match up, i.e.:
Another key aspect is the actual cell selection process, which relies on an entropy parameter received as an external output.
The entropy is a \(32\)-byte, padded array, which is constructed by first taking a \(32\)-byte random input, which should be a block hash. For slot index \(i \in \mathbb{N}\), we pick the hash of the \(i^{th}\) block before the current. In solidity:
We then flip the last byte of this array to zeroes using "double truncation": the array is truncated to
31
bytes, then back to32
. The reason for this is because the curve we use is defined over a \(254\)-bit prime field, meaning that the last portion of the last byte needs to be removed or the field element will not be in canonical form. This is done both inside of contracts, as part of verifying proofs:and at the client, as part of computing and submitting proofs:
at the client, this \(31\)-byte array gets converted into a \(254\)-bit field element which is interpreted as a Poseidon2 hash in line \(9\). The sampler then uses:
And generates a corresponding cell index as follows:
The interesting bit is in lines \(8\), which uses the entropy, the slot root, and the sample index \(j\) to compute yet another \(\postwo\) hash (using the sponge construction), and then extract the lowest \(\log_2(j)\) bits (which ensures a number \(\leq\)
numCells
) to use as random cell index for sample index \(j\).This is an awkward piece of code as it assumes can be different from zero. If could not be different from zero, then we could simply have
let m = xs.len
↩︎I do not know under which circumstances we can have a tree without a root set. I suspect the answer may be "never", and that the second check performed there is unnecessary. ↩︎