# Concise Encoding of Signed Merkle Tree Proofs
## Status
DRAFT
## Goals
- Define a format for a Merkle tree root signature with metadata. via COSE signature
- Define a format for an inclusion path. via COSE
- Define a format for the disclosure of a single leaf payload, aka signed Merkle tree proof.
- All formats should be as compact as possible.
## Terminology
### Leaf Bytes
TBD
### Merkle Tree
A Merkle tree is a tree where every leaf is labelled with the cryptographic hash of a sequence of bytes and every node that is not a leaf is labeled with the cryptographic hash of the labels of its child nodes.
### Merkle Tree Root
A Merkle tree root is the root node of a tree which represents the cryptographic hash that commits to all leaves in the tree.
### Merkle Tree Algorithm
A Merkle tree algorithm specifies how nodes in the tree must be hashed to compute the root node.
### Payload and Extra Data
A payload is data bound to in a Merkle tree leaf. The Merkle tree algorithm determines how a payload together with extra data is bound to a leaf. The simplest case is that the payload is the leaf itself without extra data.
### Inclusion Path
An inclusion path confirms that a value is a leaf of a Merkle tree known only by its root hash (and tree size, possibly).
### Signed Merkle Tree Proof
A signed Merkle tree proof is the combination of signed Merkle tree root hash, inclusion path, extra data, and payload.
## Data formats
BIG TODO: where will all the types end up in practice: principles, examlpes, etc & exhaustive list of options?
### Signed Merkle Tree Root
A Merkle tree root is signed with COSE_Sign1, creating a Signed Merkle Tree Root:
```c
SMTR = THIS.COSE.profile .and COSE_Sign1_Tagged
```
Protected header parameters:
- alg (label: 1): REQUIRED. Signature algorithm. Value type: int / tstr.
- tree alg (label: TBD): REQUIRED. Merkle tree algorithm. Value type: int / tstr.
- tree size (label: TBD): OPTIONAL. Merkle tree size as the number of leaves. Value type: uint.
A COSE profile of this specification may add further header parameters, for example to identify the signer.
Payload: Merkle tree root hash bytes according to tree alg (i.e., header params tell you what the alg id is here)
Note: The payload is just a byte string representing the Merkle tree root hash (and not some wrapper structure) so that it can be detached (see defintion of payload in https://www.rfc-editor.org/rfc/rfc9052#section-4.1) and easily re-computed from an inclusion path and leaf bytes. This allows to design other structures that force re-computation and prevent faulty implementations (forgetting to match a computed root with one embedded in a signature).
### Inclusion Path
If the tree size and leaf index is known, then a compact inclusion path (see, REF NEEDED, CT RFC?) variant can be used:
```c
IndexAwareInclusionPath = #6.1234([
leaf_index: int
hashes: [+ bstr]
])
```
Otherwise, the direction for each path step must be included (see, REF NEEDED, ask cabo?):
(bit vector: Bitmask'ish 0 -> right, 1 -> left, so no bit lables, they are "self-explaingin" in a way?)
```c
IndexUnawareInclusionPath = #6.1235([
hashes: [+ bstr]
left: uint ; bit vector
])
```
For some tree algorithms, like Quantum Ledger Data Base (QLDB) [FIXME, commercial refernce?], the direction is derived from the hashes themselves and both the index and direction can be left out in the path:
```c
; TODO: find a better name for this
UndirectionalInclusionPath = #6.1236([+ bstr])
```
```c
InclusionPath = IndexAwareInclusionPath / IndexUnawareInclusionPath / UndirectionalInclusionPath
```
Note: Including the tree size and leaf index may not be appropriate in certain privacy-focused applications as an attacker may be able to derive private information from them.
TODO: Should leaf index be part of inclusion path (IndexAwareInclusionPath) or outside?
TODO: Define root computation algorithm for each inclusion path type
TODO: [Do we need both inclusion path types? what properties does each type have?](https://github.com/ietf-scitt/cose-merkle-tree-proofs/issues/6)
TODO: Should the inclusion path be opaque (bstr) and fixed by the tree algorithm? It seems this is orthogonal and the choice of inclusion path type should be application-specific.
### Signed Merkle Tree Proof
A signed Merkle tree proof is a CBOR array containing a signed tree root, an inclusion path, extra data for the tree algorithm, and the payload.
```c
SignedMerkleTreeProof = [
signed_tree_root: bstr .cbor SMTR ; payload of COSE_Sign1_Tagged is detached
inclusion_path: bstr .cbor InclusionPath
extra_data: bstr / nil
payload: bstr
]
```
`extra_data` is an additional input to the tree algorithm and is used together with the payload to compute the leaf hash. A use case for this field is to implement blinding.
TODO: maybe rename `extra_data`
### Signed Merkle Tree Multiproof
TODO: define a multi-leaf variant of a signed Merkle tree proof like in:
- https://github.com/transmute-industries/merkle-proof
- https://transmute-industries.github.io/merkle-disclosure-proof-2021/
TODO: consider using sparse multiproofs, see https://medium.com/@jgm.orinoco/understanding-sparse-merkle-multiproofs-9b9f049e8f08 and https://arxiv.org/pdf/2002.07648.pdf
## Merkle Tree Algorithms
This document establishes a registry of Merkle tree algorithms with the following initial contents:
FIXME: this is an explorational table, what should go into -00?
Name | Label | Description
----------------|-------|------------
Reserved | 0 | (in COSA IANA zero is not registered, so we are mimicing it here)
CCF_SHA256 | 1 | CCF with SHA-256
RFC6962_SHA256 | 2 | RFC6962 with SHA-256
RFC6962_BL_SHA256 | 3 | RFC6962 with blinding and SHA-256
QLDB_SHA256 | 4 | QLDB with SHA-256
OZ_Keccak256 | 5 | Open Zeppelin with keccak256
Each tree algorithm defines how to compute the root node from a sequence of leaves each represented by payload and extra data. Extra data is algorithm-specific and should be considered opaque.
### CCF_SHA256
For n > 1 inputs, let k be the largest power of two smaller than n.
```c
MTH({d(0)}) = SHA-256(d(0))
MTH(D[n]) = SHA-256(MTH(D[0:k]) || MTH(D[k:n]))
```
where `d(0)` is computed as:
```c
d(0) = writeset_digest || SHA-256(commit_evidence) || SHA-256(payload)
```
with extra data defined as:
```c
ExtraData = bstr .cbor [
writeset_digest: bstr .size 32
commit_evidence: bstr
]
```
### RFC6962_SHA256
For n > 1 inputs, let k be the largest power of two smaller than n.
```c
MTH({d(0)}) = SHA-256(0x00 || d(0))
MTH(D[n]) = SHA-256(0x01 || MTH(D[0:k]) || MTH(D[k:n]))
```
where `d(0)` is the payload. This algorithm takes no extra data.
### RFC6962_BL_SHA256
For n > 1 inputs, let k be the largest power of two smaller than n.
```c
MTH({d(0)}) = SHA-256(0x00 || d(0))
MTH(D[n]) = SHA-256(0x01 || MTH(D[0:k]) || MTH(D[k:n]))
```
where `d(0)` is computed as:
```c
d(0) = nonce || payload
```
with extra data defined as:
```c
ExtraData = bstr .size 32 ; nonce
```
### QLDB_SHA256
For n > 1 inputs, let k be the largest power of two smaller than n.
```c
MTH({d(0)}) = SHA-256(d(0))
MTH(D[n]) = SHA-256(DOT(MTH(D[0:k]), MTH(D[k:n])))
DOT(H1, H2) = if H1 < H2 then H1 || H2 else H2 || H1
```
where `d(0)` is the payload. This algorithm takes no extra data.
### OZ_keccak256
For n > 1 inputs, let k be the largest power of two smaller than n.
```c
MTH({d(0)}) = keccak256(keccak256(d(0)))
MTH(D[n]) = MTH2(sorted([ MTH([d]) | d in D ]))
MTH2({h(0)}) = h(0)
MTH2(H[n]) = keccak256(DOT(MTH2(H[0:k]), MTH2(H[k:n])))
DOT(H1, H2) = if H1 < H2 then H1 || H2 else H2 || H1
```
where `d(0)` is the payload. This algorithm takes no extra data.