## Generating state pre-images with Erigon Unlike some other clients, Erigon actually stores the plain-state (or pre-images of keys) of the state. The MPT root calculation takes places in a different stage of block execution/processing. So, once an erigon node is synced, it's database can be used to generate and export a file containing these pre-images in a sorted order. This order will, thus, coincide with the order of keys in the MPT **APPROACH 1** While walking over all the entries in the table called PlainState, do: - If the key (which is also the account address) doesn't have any storage slots, push it to the ETL collector, as is, with `{key: hash(acc), val: acc}` - If the key has storage slots, push the storage slots of the account in the same ETL collector, each with `{key: hash(acc) + hash(storage_slot_i), val: storage_slot_i}` After exiting the loop, loop again through the ETL collector and push the `val` entries to file, as is. The above referenced ETL collector is an approach/library used within Erigon that basically sorts data/keys according to prefixes, i.e. alphabetically. This can be replaced with any mechanism (in-memory or with files) that does sorting of large dataset(s). Here, we sort by the `key` (not the `val`) of the entries, thus the end result is sorted according to the hashes within the leaves in the MPT. Doing the above, I was able to get a file size of around 40GB containing the whole ordered account and storage slots. Of course, this leads to quite a bit of duplication of storage slot pre-images, since quite a lot of contracts use the same storage slots. **APPROACH 2** Instead of putting both accounts and storage in a single file, use two separate files, say `account_pre_images.dat` and `storage_pre_images.dat` Here, we use two different ETL collectors - one for accounts, one for storage. So, we loop through PlainState table and - Push account address to `Acc_ETL_collector` - Push corresponding storage slots to `Storage_ETL_Collector` After, this, loop through each of the ETL collectors and push the keys to the respective files. This approach brings down the overall size of the file(s) as follows: - `account_pre_images.dat`: 4.8 GB - `storage_pre_images.dat`: 22.6 GB One small drawback of APPROACH 2 is that the file cannot be imported as is. For importing, the mechanism could be as follows: Walk through `account_pre_images.dat` and for each key (20 bytes) and look up in the database if it has storage, if so, - Look up the `storage_pre_images.dat` the pre-image of the key(s) using a binary search with `hash(val_in_file)` as the comparator OR - Look up in a database of pre-images created using the values in the file `storage_pre_images.dat` with`i`-th entry being `{key: hash(storage_pre_image_i), val: storage_pre_image_i}` ## How To 1. Before being able to generate the pre-images you must have a synced node with Erigon-2 (any of the v2.60.x releases). Checkout erigon.gitbook.io for chain specific guides. Let's say your node is synced at the location /home/coolUser/mainnet This directory will have the folders like `chaindata`, `downloader` etc., and we're interested in the `chaindata` folder that has the database of all accounts. 2. Checkout the current development branch for Erigon-2 based implementation and build the `verkle` artifact ``` git clone --branch verkle-kaustinen-2 https://github.com/erigontech/erigon cd erigon make verkle ``` 3. Run the verkle command line tool ``` ./build/bin/verkle --datadir /home/coolUser/mainnet/chaindata --action dump_preimages ``` That's it, you should see four `.dat` files in the current directory containing the pre-images as well as the hased keys for accounts and storage.