Try   HackMD

Generating state pre-images with Erigon

Unlike some other clients, Erigon actually stores the plain-state (or pre-images of keys) of the state. The MPT root calculation takes places in a different stage of block execution/processing.

So, once an erigon node is synced, it's database can be used to generate and export a file containing these pre-images in a sorted order. This order will, thus, coincide with the order of keys in the MPT

APPROACH 1
While walking over all the entries in the table called PlainState, do:

  • If the key (which is also the account address) doesn't have any storage slots, push it to the ETL collector, as is, with {key: hash(acc), val: acc}
  • If the key has storage slots, push the storage slots of the account in the same ETL collector, each with {key: hash(acc) + hash(storage_slot_i), val: storage_slot_i}

After exiting the loop, loop again through the ETL collector and push the val entries to file, as is.

The above referenced ETL collector is an approach/library used within Erigon that basically sorts data/keys according to prefixes, i.e. alphabetically. This can be replaced with any mechanism (in-memory or with files) that does sorting of large dataset(s).

Here, we sort by the key (not the val) of the entries, thus the end result is sorted according to the hashes within the leaves in the MPT.

Doing the above, I was able to get a file size of around 40GB containing the whole ordered account and storage slots. Of course, this leads to quite a bit of duplication of storage slot pre-images, since quite a lot of contracts use the same storage slots.

APPROACH 2
Instead of putting both accounts and storage in a single file, use two separate files, say account_pre_images.dat and storage_pre_images.dat
Here, we use two different ETL collectors - one for accounts, one for storage.
So, we loop through PlainState table and

  • Push account address to Acc_ETL_collector
  • Push corresponding storage slots to Storage_ETL_Collector

After, this, loop through each of the ETL collectors and push the keys to the respective files.
This approach brings down the overall size of the file(s) as follows:

  • account_pre_images.dat: 4.8 GB
  • storage_pre_images.dat: 22.6 GB

One small drawback of APPROACH 2 is that the file cannot be imported as is. For importing, the mechanism could be as follows:

Walk through account_pre_images.dat and for each key (20 bytes) and look up in the database if it has storage, if so,

  • Look up the storage_pre_images.dat the pre-image of the key(s) using a binary search with hash(val_in_file) as the comparator
    OR
  • Look up in a database of pre-images created using the values in the file storage_pre_images.dat withi-th entry being {key: hash(storage_pre_image_i), val: storage_pre_image_i}

How To

  1. Before being able to generate the pre-images you must have a synced node with Erigon-2 (any of the v2.60.x releases). Checkout erigon.gitbook.io for chain specific guides. Let's say your node is synced at the location /home/coolUser/mainnet
    This directory will have the folders like chaindata, downloader etc., and we're interested in the chaindata folder that has the database of all accounts.

  2. Checkout the current development branch for Erigon-2 based implementation and build the verkle artifact

git clone --branch verkle-kaustinen-2 https://github.com/erigontech/erigon
cd erigon
make verkle
  1. Run the verkle command line tool
./build/bin/verkle --datadir /home/coolUser/mainnet/chaindata --action dump_preimages 

That's it, you should see four .dat files in the current directory containing the pre-images as well as the hased keys for accounts and storage.