Try   HackMD

Analysis and considerations on state rental

Note: this piece was compiled in Dec 2023

The problem – State size growth

In EVM compatible chains “state refers to data such as an account’s (or smart contract’s) balance, nonce, a contract’s bytecode, or the data contained in a contract’s storage cells” (source). A blockchain’s state constantly grows as the number of accounts, smart contracts, and transactions increases. L2s scaling Ethereum will process orders of magnitude more transactions than the base layer, and thus their state is expected to grow faster. The bigger the state size, the bigger the computing, storage, and cost requirements for full nodes to process the chain. This may result in slower processing times, fewer nodes, and potential centralization. Even though the size of blocks is capped by the gas limit on most protocols, it does not provide a solution since it only limits the growth rate and does not address the problem of state growth itself.

Blockchain nodes use three types of resources to process transactions: computation, bandwidth, and storage space. For the first two, it is fine to pay a one-time transaction fee, as they will be available again to process the next block. However, if storage space is occupied permanently, a one-time fee is not sufficient to ensure economic balance and provide sufficient compensation for occupying a full node’s storage space continuously. This leads to the tragedy of the commons where individual users with no restrictions or limitations will deplete a system’s shared resource which in this case is the storage space of full nodes.

When analyzing the problem of state growth, the focus should be more on the effect of state growth on node performance and decentralization, instead of measuring pure physical storage costs which is expected to decline anyway over the coming years. State growth impacts users with consumer-grade hardware more than specialized and resource-rich block producers.

A potential solution – State rental

The rapid growth in state size is mostly due to the lack of pricing applied for storage duration. And good engineering (optimizing clients etc.) can only partially make up for the missing economics.

A potential solution for the state bloat problem is introducing state rental, i.e. a rental fee paid for storing data on the blockchain permanently. This way the users of the blockchain will pay for the cost of state storage instead of the full nodes. The amount of rent to be paid should be proportional to the size of the data and the period the data is stored on the chain, regardless of the frequency of accessing the data.

A state rental mechanism should also include the removal of unused data from the state for which no rent is paid. This improves the efficiency and performance of the blockchain and reduces the cost of participation for full nodes as they can sync faster and process transactions more quickly and with fewer resources. To maintain a good user experience, users must also be able to re-activate their accounts or smart contracts if they want to.

For this reason, and above all, to maintain the blockchain’s integrity and security, as well as the replicability and verifiability of its state, it is inevitable to ensure that the entire state and transaction history is available at all times through different sources so that any node can do a full sync and rebuild the current state from genesis, any previous state (or part of it) can be verified and re-instated, and also to ensure that historical data can be served to smart contracts.

Who pays the state rental fee?

  1. Account owners: in this case, the owner of the account or smart contract is paying the fees for the state rental. In the case of user accounts the ownership is more evident but not so much in the case of smart contracts as they can function by design without the intervention of an “owner”. For smart contracts (dApps) managed by businesses or DAOs, ownership is clear. For smart contracts functioning as public goods the person who initially deployed the contract could be considered the owner but monitoring the rent status creates more complexity for developers.
  2. Users initiating/signing transactions (consuming the data stored): for user accounts (like wallet accounts) the account owner and the signer of the transaction are the same. In the case of smart contracts, users call a function of the contract, and hence the owner and the user are different.

Due to unclear ownership, option #2 may be more fit for purpose, but it would also make sense to enable smart contract “owners” to pay the rental fee themselves instead of the users if they want to. From user perspective, this could be a valuable differentiator when deciding which dApp to use, and for the smart contract owners, it gives flexibility to manage their operating costs and user incentives. However, there is a third alternative as well:

  1. Rental fees built into the tokenomics of the protocol: full node operators could be incentivized by the protocol through newly minted native tokens proportionally distributed based on their participation in processing state changes and validating new states (similar to Nervos Network).

Rental fee payment process

(As mentioned above the approaches partially overlap because the user and the owner are the same in the case of user accounts like wallets.)

  1. Rent paid by account/smart contract owner:
    The following parameters should be added to the account data stored in the state: Rent_Paid_Till and Rent_Balance. Account/smart contract owners can top-up their balance as needed using a Pay_Rent operation. Rent_Paid_Till is the number of blocks up to which rent is covered. If an account’s Rent_Paid_Till is already passed, the Rent_Balance (if any) could be automatically charged to extend coverage for a pre-defined number of blocks (e.g.: equal to 6 months or 1 year).

  2. Rent paid through user transactions:
    In this case, rent tracking and computation could be at the level of individual leaf-nodes in the state trie (the concept is partially from IOVLabs). When a transaction is initiated, a rental fee is calculated for all leaf-nodes touched. A timestamp could be added to each leaf-node containing the time of the last rental fee payment, i.e. how long the fee is outstanding. The computation of rent could be based on bytes used, while the duration of storage could be measured in time (days) or epochs. Thus, the unit of rental fee can be defined as gas per byte per day or epoch.

    • Considering a transfer of some DAI from Alice to Bob, the transaction touches the leaf-nodes of the state trie related to the following: Alice’s account, the DAI smart contract bytecode, the storage cells containing application parameters, the cells containing the mapping of addresses to token balances, and finally the storage cells that contain Alice’s and Bob’s DAI balances. The rental fee is calculated for all leaf-nodes touched as per the above rental fee definition.
    • A max limit could be defined for rental fee gas to be collected per unit of duration, once reached, no more fees are collected from users. This could be combined with a simple Pay_Rent function allowing anyone to pay the rental fee for certain periods of storage.
    • The maximum rental fee per transaction (and per leaf-node) should also be defined to avoid paying a disproportionately high rental fee when initiating a transaction touching many leaf-nodes that have long outstanding balances.
    • Timestamps can also be used to decide which part of the state to delete from the full nodes.

    Questions to consider:

    • What happens to the rent due for transactions that get reverted?
    • Should the rental fee be consumed from the gas limit or be added on top of it?
    • How should rental fees be distributed? There can be multiple approaches to this:
      • based on the proportion of a full node's participation in processing state changes,
      • in proportion to the full nodes’ staked amount (in a PoS protocol),
      • based on the uptime of full nodes in a certain period (full nodes with equal uptime receive equal amount of fees regardless whether they were proposing blocks or not in that particular period).

Deleting an account/smart contract

After a pre-defined period (for instance # of blocks or epochs equal to 1 year) for which no rental fee was paid either by the account owner or by any user transaction, the state of the account/contract is considered expired, and the account/contract data is deleted from the state stored by full nodes.
The same deletion action could be also initiated by the Self_Distruct opcode at any point in time to remove unnecessary data from the state and to reduce state size. In this case, any outstanding Rent_Balance would be returned to the address that did the latest rental fee payment/top-up.

Reinstating an account/smart contract

It is very important to ensure deleted accounts can be restored whenever needed. There could be different ways to do it:

  1. The deletion could leave a „stub” as a commitment to the state of the account/contract at the time of deletion. Using that stub the previous state could be restored in another contract.
  2. Any account/contract could be reinstated by submitting a Merkle proof proving the state of the contract at the time of deletion.

As for me, any final solution should prioritize UX and user-friendliness in the first place, and make sure that ordinary, non-technical users can also easily re-instate their accounts.

Requirements for rebuilding the L2 state are very similar to those of L1 (source):

  1. Data retention policies should be agreed upon by all clients (if multiple)
  2. Public archives with full historical data should be available at all times
  3. Cryptographic proofs of the historical blocks (i.e. headers) should remain in the network: By retaining proofs of block ancestry, historical chain segments can be retrieved from arbitrary untrusted sources
    • Taking Ethereum as an example and considering that the size of a header is independent of the transactions included in the block, the growth rate due to keeping headers indefinitely is constant. Based on a rough estimation by Peter Szilágyi (GETH) this means a storage growth of 1.164 GB per year, which may be an acceptable trade-off.

Based on the above:

  • A protocol-incentivized, decentralized network of archive nodes should be available as the ultimate source of truth to maintain network integrity and security, and to allow trustless verification when the state needs to be rebuilt from genesis.
  • Third-party data sources storing the entire state and transaction history are expected to emerge, using decentralized storage solutions (IPFS, or L2 storage networks similar to EthStorage for Ethereum, or other DA solutions). The archive nodes allow these data sources to be verified in a trustless way, for instance through regular data sampling, or through ZKPs attached to data provided by these third-party sources.

Alternative partial solution: Weak statelessness

Weak statelessness could also allow L2 validators not to store the full state anymore. Instead, the block builders would send the proposed block and include witnesses (proofs of validity/Merkle proofs) with the block, i.e. the relevant pieces of state that need to be looked at and updated by the validator. The assumption is that – similar to Ethereum – sequencers on L2s are likely to outsource block-building to specialized builders who are well-resourced.

For this, L2s need to store data in a format (e.g. Verkle tree on Ethereum) that allows builders to send the necessary witnesses to the sequencers easily. In the end, it is a partial solution and only shifts the burden from sequencers (L2 full nodes) to block builders. Trustless sources of transaction and state history still need to be available to allow verification of data received from builders.

Challenges and questions to consider:

  • How much should be the amount of rental fees? And should a separate storage gas price be defined similarly to data gas fees based on blob supply/demand? To come up with a realistic fee and to test it on mainnet may be challenging. Any test periods on testnet may not provide meaningful insights on the real economic impact on users and other network participants.
  • Deletion of data might also require a transaction to be initiated by someone. Who should this be and who should pay the transaction fee for this? Block proposers could include a list of accounts/smart contracts to be deleted in the block they propose, and the protocol could incentivize them for keeping the state consize.
  • How long a period not covered by rental fee should the protocol tolerate before initiating deletion? If it is too short, the number of reinstatements is expected to be high (potentially including multiple reinstatements of the same account state). If it is too long, it may result in a limited reduction of state size and only provide a limited solution for the problem.
  • DApps may consist of several smart contracts that are interdependent. Deletion of some smart contract data could break certain functions of dApps.
  • Token contracts (e.g. ERC-20) hold the data for all holder balances and account mappings. This makes it more complex to decide who should cover the rental fees.
  • What happens to the nonce when a user account is reinstated? Does it reset to 0? How do we avoid previous transactions to be “replayed”?
  • To avoid a certain state to be revived twice, it needs to be monitored whether an account/contract was previously re-instated. If it was already deleted and reinstated once earlier, then the Merkle proof for a second reinstatement should point to the state at the time of the second deletion.
  • New attack vectors, such as the risk of unauthorized state deletion should be explored and mitigated.

References: