This document aims to showcase the differences between gossamer's implementation of the state trie and the implementation made by Parity and discussing improvements for Gossamer to address existing challenges / issues
All described below is based on the Polkadot spec
Each polkadot host implements a hash table storage where the keys are used to access each data entry.
This key and data could be of arbitrary size but we know the length limit based on the algorithm used to hash the key and encode the values.
In order to ensure the integrity of the state of the system, the stored data needs to be re-arranged and hashed in a radix tree, which hereafter we refer to as the State Trie or just Trie. This rearrangement is necessary to be able to compute the Merkle hash of the whole or part of the state storage, consistently and efficiently at any given time.
The trie is used to compute the Merkle root of the state, whose purpose is to authenticate the validity of the state database
Each key-value identifies a unique node in the trie.
Each node is encoded using the following structure:
\(Node Header∣∣Partial Key∣∣Node Subvalue\)
The node header has at least 1 byte that specifies the node variant and the partial key length.
The available node variants are:
\[ {v}={\left\lbrace\begin{matrix}{01}&\text{Leaf}&{p}_{{l}}={2}^{{6}}\\{10}&\text{Branch Node with }\ {k}_{{N}}\notin{\mathcal{{K}}}&{p}_{{l}}={2}^{{6}}\\{11}&\text{Branch Node with }\ {k}_{{N}}\in{\mathcal{{K}}}&{p}_{{l}}={2}^{{6}}\\{001}&\text{Leaf containing a hashed subvalue}&{p}_{{l}}={2}^{{5}}\\{0001}&\text{Branch containing a hashed subvalue}&{p}_{{l}}={2}^{{4}}\\{0000}{0000}&\text{Empty}&{p}_{{l}}={0}\\{0000}{0001}&\text{Reserved for compact encoding}&\end{matrix}\right.} \]
The partial key is the node key while the full key is the concatenation of each partial key of the nodes above the specific one and their child indexes.
Eg:
In the image above each node contains its partial key in its own representation while the full key is the concatenation of the partial key of the root + its child index.
Note: In case of root node
partial key == full key
The node subvalue is the value of the node, if the node is a leaf it will only contain the scale encoded stored value but if the node is a branch its subvalue is the scale encoded concatenation of 3 components:
11
or 0001
\[ {s}{v}_{{N}}\:={\left\lbrace\begin{matrix}\text{StoredValue}_{{\text{SC}}}\\\text{Enc}_{{\text{SC}}}{\left(\text{ChildrenBitmap}{\left({N}\right)}{\left|{\left|\text{StoredValue}_{{\text{SC}}}\right|}\right|}\text{Enc}_{{\text{SC}}}{\left({H}{\left({N}_{{{C}_{{1}}}}\right)}\right)},\ldots,\text{Enc}_{{\text{SC}}}{\left({H}{\left({N}_{{{C}_{{n}}}}\right)}\right)}\right)}\end{matrix}\right.} \]
Given the previous definition I'm going to detail how Parity implemented their storage and compare it with gossamer
https://github.com/paritytech/trie
They describe their implementation as:
A generic implementation of the Base-16 Modified Merkle Tree ("Trie") data structure, provided under the
And that's what it is, a DB-backed radix-16 Merkle trie that supports different hashing algorithms and configurations.
Some of its features include:
Parity's trie defines a generic way to encode and decode nodes by implementing the following trait:
They defined a trait that describes an object that can hash a slice of bytes.
It contains a single hash(bytes)
method that returns an Out
type
Given the previous definitions they created a trait to hold the Trie configuration and be able to create any kind of radix-16 trie using it.
This trait is used to define a DB that will be used as the source of truth for the Trie.
For this, and to be able to re build the trie structure, what is stored in this database are the trie nodes encoded.
Since the V1 of the trie also supports storing values, these values will also be stored in this DB when the size of the value is greater or equal to 32 bytes.
Key | Value |
---|---|
Hash(encoded_node) | encoded_node |
Hash(value) | value |
Note: to prevent collisions we should prefix the keys with the node prefix.
key = node prefix + hash
So to be able to build the trie, it's only necessary to know the hash of the root node hash(encoded_root)
, get this node from the db, and then decode the following nodes from it.
It is a wrapper over a DB that allows creating a "virtual" trie, supporting the same operations as a conventional trie but using a DB as the source of truth.
Since reading or writing directly on the DB would generate a lot of overhead, this structure takes care of accumulating write operations and applying them only when executing the commit()
method, it also uses a cache to optimize read times for both nodes and values.
Given the structure above, it's easy to generate an algorithm that looks for a value in the DB knowing the root of the tree hash(encoded_root_node)
.
In case we have a cache its easy to add checks before each db access to check if the node or value is contained in the cache and return it from there.
The type of trie used in Gossamer complies with the structure of a radix-16 Merkle trie, so it is compatible with the Substrate implementation. However, it has the disadvantage that gossamer trie is 100% in memory.
So basically a Trie in gossamer is defined by a root Node where every node in the trie has the following structure:
This means that, unlike Parity where there is the TrieDB
struct acting as a wrapper over a DB
, Gossamer, in contrast, has both the trie structure and its data always loaded in memory.
The only time it interacts with a database is when storing the entire trie using the method WriteDirty()
, which encodes and stores the entire trie in the DB
, and the Load()
method, which given a db
and a rootHash
reconstructs the entire trie by reading the data from the db
and creating the nodes accordingly.
The main disadvantage with this approach is related with the memory consumption and CPU usage when it loads the trie, but also with the flexibility and extensibility.
Since the entire trie is always in memory, we must store not only the data necessary to maintain the trie structure (nodes, children) but also the values.
As a consequence, until a block is finalized, we cannot execute a pruning
of nodes from the trie, so normally we must store more than one trie at the same time, which further increases memory consumption.
Since we must load the trie from our DB when starting a node using the last finalized block, it is necessary to go through the database executing the trie loading process, decoding the nodes, and loading them into memory.
This process is very slow, and as a consequence, if we want to start a node using a database snapshot, the startup time is proportional to the size of the snapshot.
This trie does not support a generic configuration; currently, the hash algorithm it uses is "hardcoded" in the trie implementation.
The same happens with the trie encoding and decoding algorithm; it can be extended in terms of functionalities but it is difficult to create a different implementation and start using it for some tries and not for others.
Since replicating Parity's trie implementation would necessitate almost replacing our entire codebase with their logic, and considering that all these changes are interconnected, this would require creating a single, large PR encompassing all the modifications. This approach could demand a significant amount of time not only from the implementer but also from the reviewers. Moreover, after developing the library, we would need to modify other parts of our code to start utilizing it. Therefore, I propose reusing as many parts of our existing code as possible, such as the encode and decode functions for nodes, merkle proof generation, and verification, while replacing the necessary components to add lazy loading. For example: loading the trie from the database, modifying trie and node structures, and introducing new structures.
Furthermore, to streamline the review process and deliver value in a short period, I propose implementing these changes incrementally, following this plan:
In order to be ready for the further changes, I believe we should do a little refactor first:
/pkg/trie
.Trie
interface with the minimum necessary methods for a trie. Such as: get
, insert
, remove
, contains
, root
Trie
struct to InMemoryTrie
.To improve node handling, I propose replacing our current Node
structure with one oriented towards enums to more easily and structurally differentiate nodes.
Along with this, we would replace the current StorageValue []byte
and IsHashedValue bool
fields with an enum NodeValue
as follows:
NodeValue:
Node:
In this phase, we would add lazy loading, so we need to introduce the concept of TrieDB
that implements the Trie
and uses a db
as a data source.
Consider the following naive implementation assuming that:
In this case, what we would need to do is the following:
We must modify our node enum so that the Children list of a branch are not another node but since it contains scale encoded hashes of encoded nodes we can replace it with a list of node hashes instead.
Where Hash
is the type of data returned by the hash algorithm we are using.
We can create a Lookup
structure responsible for traversing the tree looking for a node given its fullKey
and returning the value following the algorithm showed before:
Given that now our concept of Trie
is not in fact a 100% pre built trie but instead it refers to the way we read, traverse, and store the keys and values of our DB
, we need to introduce the concept of TrieDB
which is a wrapper over a DB
based on a rootHash
and using a trieLayout
as configuration holder similar to Parity's implementation.
This structure will implement all the methods defined in the Trie
interface introduced in Phase 0 so we could easily replace the version used in our code from InMemoryTrie
to TrieDB
and all the tests should still working.
We can break down this implementation into different steps:
To start with the minimum required logic the first step is to create a read only TrieDB capable of lazy load nodes from DB and being compliance with the Trie
interface but without an specific implementation for write methods (put, delete, etc)
This is an easy way to introduce TrieDB without adding too much complexity
Since the previous change will be reading and decoding nodes directly from the DB it could be slow even using an SSD so we have to introduce a caching mechanism to speed up the read operations.
This should be a shared cache between different TrieDB instances.
We can easily add a cache as part of the TrieDB
structure and use it when necessary, mainly modifying the Lookup
structure but potentially also using it in methods where we need to know the trie structure to add, modify, or delete nodes.
Now that we test and improve the read operations we can focus on write operations. We have to add every put
, delete
method for key-values and child tries.
This writes will be made directly to the DB, we are going to improve this later
To prevent performing all writes directly to the database every time we need to change something we are going to introduce a mechanism to accumulate changes and commit them only when we need.
To continue complying with the Trie
interface, we must call this function in a method that is already defined in that interface.
I propose using the same strategy as Substrate and executing the commit()
in the root()
method.
Note: This TrieDB is not responsible for handling the state transactions, we should solve that in a different layer on top of this one. That layer will accumulate and apply temporal "change sets" and when a transaction commit is executed the resulting change set will by applied using this TrieDB API.
To finish the migration we are going to change every place where we are currently using in-memory tries to start using the new TrieDB with lazy loading and test if all works ok.
Would be great if we can measure the improvement in different aspects:
We can change our TrieDB
and our structure to receive a generic type representing the hash algorithm output type
TrieDB[H]
NodeValue[H]:
Node[H]:
Would be great if we can add a hasher interface to create different hashers for the keys in our database. This will allow use to easily change the hash algorithm in the future.
We can implement an interface similar to Parity's NodeCodec
trait to be able to use different encode and decode strategies in the future
Then we can add the generic Codec
in our TrieLayout and use it in our TrieDB
with a few changes in the codebase.
With these changes, it should be sufficient to support lazy loading while maintaining compatibility with the rest of our code, without necessitating the replacement of 100% of our implementation.
Furthermore, this approach helps us to add value across different iterations, as well as simplifying the coding and review process.