owned this note
owned this note
Published
Linked with GitHub
# Merkle Tree Interfaces
#### Background
In dealing with merkle-trees and merkle-proofs, there is often a decision of _how_ to store the tree in memory. Application and computing environment requirements allow opportunities and set limitations that make the tradeoffs of different implementations appealing.
The result is there are many different implementations, or _backings_, of trees and proofs that all make sense individually in their own context. Proofs may be viewed as a specialized type of tree backing.
#### Motivation
Despite the variety of implementations, we want to unify the operations on trees by standardizing a core interface, which all backings can implement. Abstraction allows users of trees to safely ignore implementation details and avoid reimplementing low-level code.
Conformance to an interface also allows for tooling to easily swap out backings for convenience, debugging purposes, or use in alternate computing environments.
#### Invariants
- Every Root is 32 bytes
- Every Gindex into the tree is stable
> Proto: stable gindices are a nice-to-have to look up data, but lookups generally do not need them to be stable. Converting static paths to gindices (i.e. no contextual information) is where stable gindices really are a pre-requisite.
#### Proposed interface 1 - opaque KV store
This interface can be viewed as a kv store of index (`Gindex`) to data root (`Root`) and index (`Gindex`) to subtree (`TreeBacking`).
```typescript
/**
* A TreeBacking is a tree/proof stored according to some internal memory format
*/
interface TreeBacking {
/**
* Return a copy of the backing
*/
clone(): TreeBacking;
/**
* Get the bytearray value at a given index
*/
getRoot(index: Gindex): Root;
/**
* Set the bytearray value at a given index
*
* This will refresh any intermediate data
*
* If expand is true, expand the tree as necessary on navigation
* If expand is false, error on navigation to an invalid index
*/
setRoot(index: Gindex, root: Root, expand: boolean): void;
/**
* Get the subtree at a given index
*/
getSubtree(index: Gindex): TreeBacking;
/**
* Set the subtree at a given index
*
* The sub-backing will be remapped relative to index when it is attached
*
* This will refresh any intermediate data
*
* If expand is true, expand the tree as necessary on navigation
* If expand is false, error on navigation to an invalid index
*/
setSubtree(index: Gindex, backing: TreeBacking, expand: boolean): void;
}
/**
* Additional static methods (constructors)
*/
interface TreeBackingStatic {
/**
* Create a tree with root `Root`
*/
leafTree(root: Root): TreeBacking;
/**
* Return a backing with the root
* which equals the zero-hash at depth `depth`
*/
zeroTree(depth: number): TreeBacking;
/**
* Return a tree with leaves at depth `depth` which equal `bottom`
*/
subtreeFillToDepth(bottom: TreeBacking, depth: number): TreeBacking;
/**
* Return a tree with leaves at depth `depth`
* which equal `bottom` to length `length`
* All remaining leaves are set to zero-hashes
*/
subtreeFillToLength(bottom: TreeBacking, depth: number, length: number): TreeBacking;
/**
* Return a tree with leaves at depth `depth`
* corresponding to each backing in `backings`
* All remaining leaves are set to zero-hashes
*/
subtreeFillToContents(backings: TreeBacking[], depth: number): TreeBacking;
}
```
> Proto: clone() -- not necessary, trees should be simple and immutable. Incl. not replacing subtrees even if they have the same root. The user referencing a subtree needs it to be stable.
> Proto: getRoot(gindex) -- I think we overdo it with gindices sometimes, and it is better to add a getLeft() and getRight() and get getRoot() (without gindex argument). Then the child root can be found with a getChild(gindex).getRoot(). Where getChild can optionally be optimized, but doesn't have to (getLeft/Right repeatedly on existing nodes is not too expensive). Simplifying navigation also helps a lot to keep new tree-node types simple to implement.
> Proto: getSubtree(gindex) -- I would call it getChild() or something, the getter does not say if it is a subtree, or just a single bottom node.
> Proto: setSubtree(...) -- Not a fan of mutability in the tree interface. It is very valuable to just make assumptions on shared immutable subtrees, especially for storage. Replacing this with a "rebind" method (make new pair node, take child nodes, replace one of them to effectively mutate, and return new root nodes).
> Proto: TreeBackingStatic -- I see where the name comes from, but confusing with "static" in two other contexts: static gindices, and static (or fixed) SSZ positions in encoding.
> Proto: rootTree(root) -- This converts a bytes32 to a node? I generally like the name "node" much better than "tree", as it doesn't imply anything about the contents. And repeatedly writing "backing" also gets cumbersome.
#### Proposed Interface 2 - node-centric linked immutable datastructure
With an additional invariant of only operating on a _binary_ merkle tree, we can expose an interface to a tree as an immutable, linked datastructure.
A `Node` has references to its two children `Node`s, and once instantiated, are immutable. The merkle-root of a node is lazily computed and cached upon request.
Different 'backings' of trees may be implemented by specialized `Node` implementations.
A `View` of a tree is a mutable reference to a root `Node` that changes as the tree changes.
This interface has the benefit that any "backing" may be composed together, with a consistent data model, unified by composing `Node`s of different implementations.
```typescript
/**
* A Node is a lazily computed root, and optionally two children
*
* Once instantiated, a Node instance may not have its root or its children mutated directly.
* Updates happen as "rebindings", new Nodes being created with the proper linkages and/or data.
*/
interface Node {
/**
* Return the merkle root
*/
getRoot(): Root;
/**
* true if the node has no children
* false if the node has children
*/
isLeaf(): boolean;
/**
* Return the left child
*/
getLeft(): Node;
/**
* Return the left child
*/
getRight(): Node;
/**
* Create a new Node
* with n as the left child
* and this.getRight() as the right child
*/
rebindLeft(n: Node): Node;
/**
* Create a new Node
* and this.getLeft() as the left child
* with n as the right child
*/
rebindRight(n: Node): Node;
}
/**
* A View is a mutable reference to a Node
*
* It stores a reference to a root Node, which it uses for traversal.
* Any "mutations" are implemented as changes to the stored reference,
* usually stored with the reference of a newly created "rebound" root Node.
*
* Garbage collection allows for Nodes to be purged when no longer used.
*/
interface View {
/**
* Return a new View with a reference to the same root Node
*/
clone(): View;
/**
* Get the node at the root of the tree
*/
getRootNode(): Node;
/**
* Set the node at the root of the tree
* (replace the whole tree)
*/
setRootNode(n: Node): void;
/**
* Get the node at the given index in the tree
*/
getNode(index: Gindex): Node;
/**
* Set the node at the given index in the tree,
* creating new nodes rebound as needed
*
* The view's internal root Node is updated as a result.
*/
setNode(index: Gindex, n: Node, expand: boolean): void;
}
/**
* Convenience functions act as specialized constructors for `Node`
*/
/**
* Return a Node with the root
* which equals the zero-hash at depth `depth`
*/
function zeroNode(depth: number): Node;
/**
* Return a Node with leaves at depth `depth` which equal `bottom`
*/
function subtreeFillToDepth(bottom: Node, depth: number): Node;
/**
* Return a Node with leaves at depth `depth`
* which equal `bottom` to length `length`
* All remaining leaves are set to zero-hashes
*/
function subtreeFillToLength(bottom: Node, depth: number, length: number): Node;
/**
* Return a Node with leaves at depth `depth`
* corresponding to each node in `nodes`
* All remaining leaves are set to zero-hashes
*/
function subtreeFillToContents(nodes: Node[], depth: number): Node;
```
#### Questions
- What does the interface for conversion between backings look like?
- Is there a way of abstracting the conversion between backings? or is it a backing-by-backing concern? If so, it may include proof generation.
> Proto: I think the typing layer should be the only canonical way to convert between two backings. A hub/spoke model of conversion complexity is much better than a fully-connected one that converts between all of them. E.g. to make a serialized form of a typed tree, we can iterate over the elements as views, and recursively call serialize on those. This also enables typing to change the meaning of the conversion: a block header type may serialize the same tree differently than a full block.
- What does the interface for generating a proof/multiproof look like?
- Is a single proof backing standardized as a privileged backing? Related to above
> Proto: I think we can attach the proof/multiproof interface simply as a .verify() on a Node (or TreeBacking here) and getBranches(gindex...) + getLeafs(gindex...) on the static tree methods (which would just yield the appropriate nodes while navigating with getLeft/getRight efficiently).
- Generic interface for tooling? or one privileged backing, with conversion from alternate backings into/out of the privileged backing?
- A tooling implementation question, in which cases should tooling operate on the above interface, in which cases should there be a privileged backing?
> Proto: I don't see new backings popping up too quickly, but being able to somehow register them with the type would be nice. Remerkleable currently has a `.get_backing` which just returns the tree. I see two options: Views are generic, and can have optional treats like `TreeBacking` to provide a `.get_tree_node()`. Or without generics/mixins (Go...) you have to reverse the responsibility, and ask some `TreeBackingType` to fetch the node from a view (which just holds some `interface{}` data that can be any backing).
> Proto: Also, see latest definitions of `TypeDef`, `Node` and `View` in remerkleable. Nodes are so basic, that pair/root/whatever nodes are all interchangeable as they should be. Just provide the `get/rebind`-`left/right`, and a check method to see if a node has such childs `is_leaf()`. Once checked, it's safe to access. No surprise mutations, easy caching. And `VirtualNode` lazy-loading from a source of tree nodes also helps. And being able to `getRoot` on everything helps, as things like Headers can be interpreted from trees that are actually full blocks. If you need to prune the childs of a tree, just call `summarize(gindex)`, which returns a rebind link to call to get the pruned tree, the opposite of expansions/setting where you get a regular link (one with a node input) that returns the new anchor node which references the newly inserted child node in its new position.
> Avoiding mutations/unnecessary complexity in the caching layer is essential. Prysm is into some kind of messy hybrid, and has all sorts of bugs, and copies/clones in places where there shouldn't be. Resulting in unreliable and large-memory consumption. The exact opposite of what we are trying to achieve.