Unlike some other clients, Erigon actually stores the plain-state (or pre-images of keys) of the state. The MPT root calculation takes places in a different stage of block execution/processing.
So, once an erigon node is synced, it's database can be used to generate and export a file containing these pre-images in a sorted order. This order will, thus, coincide with the order of keys in the MPT
APPROACH 1
While walking over all the entries in the table called PlainState, do:
{key: hash(acc), val: acc}
{key: hash(acc) + hash(storage_slot_i), val: storage_slot_i}
After exiting the loop, loop again through the ETL collector and push the val
entries to file, as is.
The above referenced ETL collector is an approach/library used within Erigon that basically sorts data/keys according to prefixes, i.e. alphabetically. This can be replaced with any mechanism (in-memory or with files) that does sorting of large dataset(s).
Here, we sort by the key
(not the val
) of the entries, thus the end result is sorted according to the hashes within the leaves in the MPT.
Doing the above, I was able to get a file size of around 40GB containing the whole ordered account and storage slots. Of course, this leads to quite a bit of duplication of storage slot pre-images, since quite a lot of contracts use the same storage slots.
APPROACH 2
Instead of putting both accounts and storage in a single file, use two separate files, say account_pre_images.dat
and storage_pre_images.dat
Here, we use two different ETL collectors - one for accounts, one for storage.
So, we loop through PlainState table and
Acc_ETL_collector
Storage_ETL_Collector
After, this, loop through each of the ETL collectors and push the keys to the respective files.
This approach brings down the overall size of the file(s) as follows:
account_pre_images.dat
: 4.8 GBstorage_pre_images.dat
: 22.6 GBOne small drawback of APPROACH 2 is that the file cannot be imported as is. For importing, the mechanism could be as follows:
Walk through account_pre_images.dat
and for each key (20 bytes) and look up in the database if it has storage, if so,
storage_pre_images.dat
the pre-image of the key(s) using a binary search with hash(val_in_file)
as the comparatorstorage_pre_images.dat
withi
-th entry being {key: hash(storage_pre_image_i), val: storage_pre_image_i}
Before being able to generate the pre-images you must have a synced node with Erigon-2 (any of the v2.60.x releases). Checkout erigon.gitbook.io for chain specific guides. Let's say your node is synced at the location /home/coolUser/mainnet
This directory will have the folders like chaindata
, downloader
etc., and we're interested in the chaindata
folder that has the database of all accounts.
Checkout the current development branch for Erigon-2 based implementation and build the verkle
artifact
git clone --branch verkle-kaustinen-2 https://github.com/erigontech/erigon
cd erigon
make verkle
./build/bin/verkle --datadir /home/coolUser/mainnet/chaindata --action dump_preimages
That's it, you should see four .dat
files in the current directory containing the pre-images as well as the hased keys for accounts and storage.