Note: this piece was compiled in Dec 2023
In EVM compatible chains “state refers to data such as an account’s (or smart contract’s) balance, nonce, a contract’s bytecode, or the data contained in a contract’s storage cells” (source). A blockchain’s state constantly grows as the number of accounts, smart contracts, and transactions increases. L2s scaling Ethereum will process orders of magnitude more transactions than the base layer, and thus their state is expected to grow faster. The bigger the state size, the bigger the computing, storage, and cost requirements for full nodes to process the chain. This may result in slower processing times, fewer nodes, and potential centralization. Even though the size of blocks is capped by the gas limit on most protocols, it does not provide a solution since it only limits the growth rate and does not address the problem of state growth itself.
Blockchain nodes use three types of resources to process transactions: computation, bandwidth, and storage space. For the first two, it is fine to pay a one-time transaction fee, as they will be available again to process the next block. However, if storage space is occupied permanently, a one-time fee is not sufficient to ensure economic balance and provide sufficient compensation for occupying a full node’s storage space continuously. This leads to the tragedy of the commons where individual users with no restrictions or limitations will deplete a system’s shared resource which in this case is the storage space of full nodes.
When analyzing the problem of state growth, the focus should be more on the effect of state growth on node performance and decentralization, instead of measuring pure physical storage costs which is expected to decline anyway over the coming years. State growth impacts users with consumer-grade hardware more than specialized and resource-rich block producers.
The rapid growth in state size is mostly due to the lack of pricing applied for storage duration. And good engineering (optimizing clients etc.) can only partially make up for the missing economics.
A potential solution for the state bloat problem is introducing state rental, i.e. a rental fee paid for storing data on the blockchain permanently. This way the users of the blockchain will pay for the cost of state storage instead of the full nodes. The amount of rent to be paid should be proportional to the size of the data and the period the data is stored on the chain, regardless of the frequency of accessing the data.
A state rental mechanism should also include the removal of unused data from the state for which no rent is paid. This improves the efficiency and performance of the blockchain and reduces the cost of participation for full nodes as they can sync faster and process transactions more quickly and with fewer resources. To maintain a good user experience, users must also be able to re-activate their accounts or smart contracts if they want to.
For this reason, and above all, to maintain the blockchain’s integrity and security, as well as the replicability and verifiability of its state, it is inevitable to ensure that the entire state and transaction history is available at all times through different sources so that any node can do a full sync and rebuild the current state from genesis, any previous state (or part of it) can be verified and re-instated, and also to ensure that historical data can be served to smart contracts.
Due to unclear ownership, option #2 may be more fit for purpose, but it would also make sense to enable smart contract “owners” to pay the rental fee themselves instead of the users if they want to. From user perspective, this could be a valuable differentiator when deciding which dApp to use, and for the smart contract owners, it gives flexibility to manage their operating costs and user incentives. However, there is a third alternative as well:
(As mentioned above the approaches partially overlap because the user and the owner are the same in the case of user accounts like wallets.)
Rent paid by account/smart contract owner:
The following parameters should be added to the account data stored in the state: Rent_Paid_Till and Rent_Balance. Account/smart contract owners can top-up their balance as needed using a Pay_Rent operation. Rent_Paid_Till is the number of blocks up to which rent is covered. If an account’s Rent_Paid_Till is already passed, the Rent_Balance (if any) could be automatically charged to extend coverage for a pre-defined number of blocks (e.g.: equal to 6 months or 1 year).
Rent paid through user transactions:
In this case, rent tracking and computation could be at the level of individual leaf-nodes in the state trie (the concept is partially from IOVLabs). When a transaction is initiated, a rental fee is calculated for all leaf-nodes touched. A timestamp could be added to each leaf-node containing the time of the last rental fee payment, i.e. how long the fee is outstanding. The computation of rent could be based on bytes used, while the duration of storage could be measured in time (days) or epochs. Thus, the unit of rental fee can be defined as gas per byte per day or epoch.
Questions to consider:
After a pre-defined period (for instance # of blocks or epochs equal to 1 year) for which no rental fee was paid either by the account owner or by any user transaction, the state of the account/contract is considered expired, and the account/contract data is deleted from the state stored by full nodes.
The same deletion action could be also initiated by the Self_Distruct opcode at any point in time to remove unnecessary data from the state and to reduce state size. In this case, any outstanding Rent_Balance would be returned to the address that did the latest rental fee payment/top-up.
It is very important to ensure deleted accounts can be restored whenever needed. There could be different ways to do it:
As for me, any final solution should prioritize UX and user-friendliness in the first place, and make sure that ordinary, non-technical users can also easily re-instate their accounts.
Based on the above:
Weak statelessness could also allow L2 validators not to store the full state anymore. Instead, the block builders would send the proposed block and include witnesses (proofs of validity/Merkle proofs) with the block, i.e. the relevant pieces of state that need to be looked at and updated by the validator. The assumption is that – similar to Ethereum – sequencers on L2s are likely to outsource block-building to specialized builders who are well-resourced.
For this, L2s need to store data in a format (e.g. Verkle tree on Ethereum) that allows builders to send the necessary witnesses to the sequencers easily. In the end, it is a partial solution and only shifts the burden from sequencers (L2 full nodes) to block builders. Trustless sources of transaction and state history still need to be available to allow verification of data received from builders.