# Merkle Patricia Trie
[rust-implementation]()
**What is a Merkle Patricia Trie?**
At its core, a Merkle Patricia Trie is a specialized tree-like data structure that efficiently stores key-value pairs. It cleverly combines two concepts:
* **Merkle Tree:** This means the structure uses cryptographic hashes to ensure data integrity. Each node in the tree is identified by the hash of its contents. The `root_hash` of the entire trie serves as a unique fingerprint representing *all* the data stored within it. If even a single bit of data changes anywhere in the trie, the `root_hash` will change deterministically. This is crucial for systems like blockchains (e.g., Ethereum) where you need to verify state consistency efficiently.
* **Patricia Trie (Radix Trie):** This part optimizes storage and retrieval, especially when keys share common prefixes. Instead of storing entire keys at each node, paths are compressed. Keys are broken down into smaller units (in this implementation, "nibbles" - 4-bit values), and the structure branches based on these units. Nodes only store the *unique* parts of the paths.
**In essence:** It's a secure (Merkle) and space-efficient (Patricia) way to map keys to values, producing a single hash that represents the entire collection.
**Key Components in the Code:**
* **`Nibble` and `NibblePath`:** Keys are not stored as raw bytes directly in the node paths. They are first converted into a sequence of `Nibble`s (0-15) using `bytes_to_nibbles`. A `[6, 1, 6, 2]` `NibblePath` corresponds to the bytes `b"ab"`. This 4-bit unit is fundamental because it dictates the branching factor (16) in `Branch` nodes.
* **`Db` (Database):** This `Arc<RwLock<HashMap<H256, Vec<u8>>>>` represents the underlying storage. In a real MPT (like in Ethereum), this would be a persistent key-value database (like LevelDB or RocksDB). Here, it's an in-memory HashMap. It stores *serialized* node data (`Vec<u8>`) keyed by the node's hash (`H256`). The `Arc<RwLock>` makes it thread-safe for shared access.
* **`H256` (Hash):** Represents a 256-bit (32-byte) hash, calculated using `Keccak256` in this code (`calculate_hash_internal`). **Important Note:** This code uses `bincode` for serialization *before* hashing. Standard Ethereum MPTs use RLP (Recursive Length Prefix) encoding. Therefore, the hashes generated by this code **will not** match the hashes produced by an Ethereum client for the same data.
* **`MerklePatriciaTrie` Struct:** Holds the current `root_hash` (the entry point and overall state identifier) and a reference (`Db`) to the storage.
**Node Types (`Node` Enum):**
The structure of the trie is defined by these node types:
* **`Node::Empty`:** Represents a null or empty slot. This is the state of the trie before any data is inserted, and it's also used in `Branch` nodes where a particular nibble path doesn't lead anywhere. It has a predefined hash (`EMPTY_NODE_HASH`).
* **`Node::Leaf { path, value }`:** Represents the termination of a key's path.
* `path`: Contains the *remaining* nibbles of the key path specific to this leaf, after any shared prefixes handled by parent Extension or Branch nodes.
* `value`: The actual value associated with the full key.
* *Example:* If you insert `("do", "verb")` into an empty trie, the root might point directly to a `Leaf { path: [6, 4, 6, f], value: b"verb".to_vec() }`.
* **`Node::Extension { path, next_node_hash }`:** Acts as a shortcut for paths that don't branch.
* `path`: A shared sequence of nibbles common to all keys passing through this node.
* `next_node_hash`: The hash of the *next* node (which could be another Extension, a Branch, or a Leaf) where the path continues or terminates. It *doesn't* store a value itself.
* *Example:* If you only have keys like `("ether", "crypto")` and `("ethereum", "blockchain")`, you might have an `Extension { path: [6, 5, 7, 4, 6, 8], next_node_hash: HASH_OF_BRANCH }` representing the shared "ether" part, pointing to a `Branch` where " " and "eum" diverge.
* **`Node::Branch { children, value }`:** Represents a point where a path diverges into multiple possibilities (up to 16).
* `children`: An array of 16 `Option<H256>`. Each index (0-15) corresponds to a nibble. If `children[i]` is `Some(hash)`, it means there's a child node reachable by taking the nibble `i`, and `hash` is its identifier. `None` indicates no path exists for that nibble.
* `value`: An `Option<Vec<u8>>`. This allows a key to terminate *exactly* at this branch point.
* *Example (from `main`):* After inserting `("dog", "puppy")` and `("do", "verb")`:
* `do` -> `[6, 4, 6, f]`
* `dog` -> `[6, 4, 6, 7]`
* They share `[6, 4, 6]`. The trie might form a structure leading to a `Branch` node representing the divergence after `[6, 4, 6]`.
* This `Branch` might have `value: Some(b"verb".to_vec())` because "do" terminates here.
* Its `children[7]` (the nibble for 'g') would contain `Some(HASH_OF_DOG_LEAF)`. The `HASH_OF_DOG_LEAF` would point to a `Leaf { path: [], value: b"puppy".to_vec() }` (empty remaining path as 'g' was the last nibble).
**How Operations Work (`get` and `put`):**
* **`get(key)` (Retrieval):**
1. Convert the `key` into a `NibblePath`.
2. Start at the `root_hash`.
3. Recursively (`recursive_get`) fetch the node corresponding to the current hash from the `db`.
4. Based on the node type:
* `Empty`: Key not found -> `None`.
* `Leaf`: If the leaf's `path` matches the *remaining* `NibblePath`, return the `value`. Otherwise -> `None`.
* `Extension`: If the `NibblePath` starts with the extension's `path`, consume that part of the path and recurse using the `next_node_hash` and the rest of the `NibblePath`. Otherwise -> `None`.
* `Branch`:
* If the `NibblePath` is now empty, return the branch's optional `value`.
* Otherwise, take the *next* nibble from the `NibblePath`, look up the corresponding `child_hash` in the `children` array. If `Some(hash)`, recurse using that `hash` and the rest of the `NibblePath`. If `None` -> `None`.
* **`put(key, value)` (Insertion/Update):**
1. Convert `key` to `NibblePath`.
2. Start at the `root_hash`.
3. Recursively (`recursive_put`) traverse the trie similar to `get`.
4. When the correct position/node is reached, modify the trie structure. This is the complex part involving several cases:
* **Inserting into `Empty`:** Creates a new `Leaf` node.
* **Inserting into `Leaf`:**
* If keys match exactly: Update the value if different, creating a *new* Leaf node (with a new hash).
* If keys diverge: Replace the `Leaf` with a new `Branch` (and potentially an `Extension` if there was a common prefix). The original leaf data and the new data are placed appropriately within or below the new `Branch`.
* **Inserting into `Extension`:**
* If the path matches the extension fully: Recurse down to the `next_node_hash`.
* If the path diverges within the extension: Split the `Extension`. Create a new `Branch` at the divergence point, and potentially a new `Extension` leading to it.
* **Inserting into `Branch`:**
* If the path ends here: Update the `value` of the `Branch`.
* If the path continues: Recurse down the appropriate child path (creating it if it's `None`).
5. **Crucially:** Whenever a node is modified or created, it's serialized and stored (`store_node`), generating a *new hash*. This new hash is returned upwards.
6. Parent nodes (Branches/Extensions) update their `children` or `next_node_hash` with the *new* hash received from the recursive call. This process repeats up to the root.
7. The final hash returned by the top-level `recursive_put` call becomes the *new* `root_hash` of the entire trie.
**Example Walkthrough (from `main`):**
1. **`trie.put(b"dog", b"puppy")`**: `dog` -> `[6,4,6,f,6,7]`. Trie is likely empty. Creates a `Leaf` node, stores it, updates `root_hash`.
2. **`trie.put(b"doge", b"coin")`**: `doge` -> `[6,4,6,f,6,7,6,5]`.
* Traverses to the `Leaf` for "dog".
* Finds common prefix `[6,4,6,f,6,7]`.
* Replaces the `Leaf` node:
* A new `Branch` node is created.
* The `Branch` gets the value `"puppy"` (since "dog" terminates here).
* The `children[6]` slot (for the 'e' nibble) points to a *new* `Leaf { path: [5], value: "coin" }`.
* If the original "dog" Leaf wasn't the direct root, an `Extension` node might be created/updated to point to this new `Branch`.
* New nodes are stored, hashes propagate up, `root_hash` changes.
3. **`trie.put(b"do", b"verb")`**: `do` -> `[6,4,6,f]`.
* Traverses the trie. Let's assume it encounters an `Extension` node with path `[6,4,6,f,6,7]` (leading to the "dog"/"doge" structure).
* The path "do" (`[6,4,6,f]`) diverges *within* this extension path.
* The `Extension` is split:
* A new `Branch` node is created representing the divergence point after `[6,4,6,f]`.
* This `Branch` gets the value `Some("verb")` (since "do" terminates here).
* Its `children[6]` slot (for the next nibble 'g' in "dog") points to a *new* node (likely another Extension or Branch) representing the rest of the original "dog"/"doge" structure (e.g., `Extension { path: [7], next_node_hash: HASH_OF_DOG_BRANCH }`).
* If the original split Extension wasn't the root, a *new* `Extension` node might be created with path `[6,4,6,f]` pointing to the new `Branch`.
* Again, new nodes, new hashes, new `root_hash`.
This process of node creation, splitting, and hash propagation ensures the trie remains consistent and the `root_hash` accurately reflects its entire state after each modification.