## Generating state pre-images with Erigon
Unlike some other clients, Erigon actually stores the plain-state (or pre-images of keys) of the state. The MPT root calculation takes places in a different stage of block execution/processing.
So, once an erigon node is synced, it's database can be used to generate and export a file containing these pre-images in a sorted order. This order will, thus, coincide with the order of keys in the MPT
**APPROACH 1**
While walking over all the entries in the table called PlainState, do:
- If the key (which is also the account address) doesn't have any storage slots, push it to the ETL collector, as is, with `{key: hash(acc), val: acc}`
- If the key has storage slots, push the storage slots of the account in the same ETL collector, each with `{key: hash(acc) + hash(storage_slot_i), val: storage_slot_i}`
After exiting the loop, loop again through the ETL collector and push the `val` entries to file, as is.
The above referenced ETL collector is an approach/library used within Erigon that basically sorts data/keys according to prefixes, i.e. alphabetically. This can be replaced with any mechanism (in-memory or with files) that does sorting of large dataset(s).
Here, we sort by the `key` (not the `val`) of the entries, thus the end result is sorted according to the hashes within the leaves in the MPT.
Doing the above, I was able to get a file size of around 40GB containing the whole ordered account and storage slots. Of course, this leads to quite a bit of duplication of storage slot pre-images, since quite a lot of contracts use the same storage slots.
**APPROACH 2**
Instead of putting both accounts and storage in a single file, use two separate files, say `account_pre_images.dat` and `storage_pre_images.dat`
Here, we use two different ETL collectors - one for accounts, one for storage.
So, we loop through PlainState table and
- Push account address to `Acc_ETL_collector`
- Push corresponding storage slots to `Storage_ETL_Collector`
After, this, loop through each of the ETL collectors and push the keys to the respective files.
This approach brings down the overall size of the file(s) as follows:
- `account_pre_images.dat`: 4.8 GB
- `storage_pre_images.dat`: 22.6 GB
One small drawback of APPROACH 2 is that the file cannot be imported as is. For importing, the mechanism could be as follows:
Walk through `account_pre_images.dat` and for each key (20 bytes) and look up in the database if it has storage, if so,
- Look up the `storage_pre_images.dat` the pre-image of the key(s) using a binary search with `hash(val_in_file)` as the comparator
OR
- Look up in a database of pre-images created using the values in the file `storage_pre_images.dat` with`i`-th entry being `{key: hash(storage_pre_image_i), val: storage_pre_image_i}`
## How To
1. Before being able to generate the pre-images you must have a synced node with Erigon-2 (any of the v2.60.x releases). Checkout erigon.gitbook.io for chain specific guides. Let's say your node is synced at the location /home/coolUser/mainnet
This directory will have the folders like `chaindata`, `downloader` etc., and we're interested in the `chaindata` folder that has the database of all accounts.
2. Checkout the current development branch for Erigon-2 based implementation and build the `verkle` artifact
```
git clone --branch verkle-kaustinen-2 https://github.com/erigontech/erigon
cd erigon
make verkle
```
3. Run the verkle command line tool
```
./build/bin/verkle --datadir /home/coolUser/mainnet/chaindata --action dump_preimages
```
That's it, you should see four `.dat` files in the current directory containing the pre-images as well as the hased keys for accounts and storage.