# Ethereum Block Building Workload ## Background The Ethereum Virtual Machine (EVM) is a transacton based state machine. The state of the system started with some genesis and was incrementally updated through executing transactions. The world state, $σ$, can be thought of as all state used in the virtual machine such as accounts, balances, and more. Transactions The state transition function can be expressed as: $$ (1) \sigma_{t+1} \equiv \Upsilon(\sigma_{t}, T)$$ $$ (2) \sigma_{t+1} \equiv \Pi(\sigma_t, B) $$ $$ (3) B \equiv (...,(T_0, T_1, ...), ...) $$ $$ (4) \Pi(\sigma, B) \equiv \Upsilon(\Upsilon(\sigma, T_0), T_1) $$ Where $Υ$ is the Ethereum state transition function. In Ethereum, $Υ$, together with $σ$ are considerably more powerful than any existing comparable system; $Υ$ allows components to carry out arbitrary computation, while $σ$ allows components to store arbitrary state between transactions. Transactions are collated into blocks; blocks are chained together using a cryptographic hash as a means of reference. Where $B$ is this block, which includes a series of transactions amongst some other components and $\Pi$ is the block-level state-transition function for transactions. This is the basis of the blockchain paradigm, a model that forms the backbone of not only Ethereum, but all decentralised consensus-based transaction systems to date. ### Data Structures An Ethereum block $B$ is defined as a tuple of four components: $$B \equiv (B_H, B_T, B_U, B_W)$$ where: * $B_H$ (block header): A collection of metadata and accumulators that capture the state transitions and properties of the Ethereum state machine * $B_T$ (transactions): The ordered list of transactions included in this block * $B_U$ (ommers): A deprecated field from pre-Paris hard fork that contained headers of blocks sharing a grandparent with the current block * $B_W$ (withdrawals): A collection of validator withdrawals from the consensus layer, introduced in the Shanghai hard fork The block header $B_H$ is formally defined as: $$B_H \equiv (H_p, H_o, H_c, H_r, H_t, H_e, H_b, H_d, H_i, H_l, H_g, H_s, H_x, H_a, H_n, H_f, H_w)$$ where each component is defined as: * $H_p$ (parentHash): Keccak 256-bit hash of the parent block's header * $H_o$ (ommersHash): Deprecated 256-bit hash field, now constant at KEC(RLP(())) * $H_c$ (beneficiary): 160-bit address receiving this block's priority fees * $H_r$ (stateRoot): Keccak 256-bit hash of the state trie's root node after executing all transactions and withdrawals * $H_t$ (transactionsRoot): Keccak 256-bit hash of the trie's root node containing all block transactions * $H_e$ (receiptsRoot): Keccak 256-bit hash of the trie's root node containing all transaction receipts * $H_b$ (logsBloom): Bloom filter of indexable data (addresses and topics) from all transaction receipt logs * $H_d$ (difficulty): Deprecated scalar field, now constant at 0 * $H_i$ (number): Scalar count of ancestor blocks (0 for genesis block) * $H_l$ (gasLimit): Scalar maximum gas expenditure allowed for this block * $H_g$ (gasUsed): Scalar total gas consumed by all transactions in this block * $H_s$ (timestamp): Scalar Unix timestamp at block creation * $H_x$ (extraData): Arbitrary byte array of block-relevant data, maximum 32 bytes * $H_a$ (prevRandao): Latest RANDAO mix from the previous block's post-beacon state * $H_n$ (nonce): Deprecated 64-bit field, now constant at 0x0000000000000000 * $H_f$ (baseFeePerGas): Scalar wei amount burned per unit of gas consumed * $H_w$ (withdrawalsRoot): Keccak 256-bit hash of the trie's root node containing all consensus layer withdrawals ### Block Validity A block is valid if and only if: $\text{Valid}(B) \iff \text{EmptyOmmers}(B) \land \text{ConsistentHeader}(B)$ A block header $B_H$ is considered consistent if and only if all required fields match their derived values after block execution. ${ConsistentHeader}(B)$ returns true if the following conditions hold simultaneously: 1. $$B_U \equiv () \text{ } \land$$ 2. $$H_r \equiv \text{TRIE}(\text{LS}(\Pi(\sigma, B))) \text{ } \land$$ 3. $$H_t \equiv \text{TRIE}(\{ \forall i < \|B_T\|, i \in \mathbb{N} : p_T(i, B_T[i]) \}) \text{ } \land$$ 4. $$H_e \equiv \text{TRIE}(\{ \forall i < \|B_R\|, i \in \mathbb{N} : p_R(i, B_R[i]) \}) \text{ } \land$$ 5. $$H_w \equiv \text{TRIE}(\{ \forall i < \|B_W\|, i \in \mathbb{N} : p_W(i, B_W[i]) \}) \text{ } \land $$ 6. $$H_b \equiv \bigvee_{r \in B_R} r_b$$ Each coniditon is explained as follows: 1. Empty ommers field 2. State Root Consistency: $H_r \equiv \text{TRIE}(\text{LS}(\Pi(\sigma, B)))$ * Where $\sigma$ is the base state * $\Pi(\sigma, B)$ represents state after executing all transactions and withdrawals * $\text{LS}$ converts state to trie entries * Must match final post-execution state root 2. Transaction Root Consistency: $H_t \equiv \text{TRIE}(\{ \forall i < \|B_T\|, i \in \mathbb{N} : p_T(i, B_T[i]) \})$ * Each transaction is RLP encoded with its index * Must match root of transaction trie * Includes special handling for EIP-2718 transactions 3. Receipts Root Consistency: $H_e \equiv \text{TRIE}(\{ \forall i < \|B_R\|, i \in \mathbb{N} : p_R(i, B_R[i]) \})$ * Each receipt is RLP encoded with its index * Must match root of receipt trie * Generated from actual transaction execution results 4. Withdrawals Root Consistency: $H_w \equiv \text{TRIE}(\{ \forall i < \|B_W\|, i \in \mathbb{N} : p_W(i, B_W[i]) \})$ * Each withdrawal is RLP encoded with its index * Must match root of withdrawals trie * Reflects consensus layer withdrawals 5. Logs Bloom Consistency: $H_b \equiv \bigvee_{r \in B_R} r_b$ * Bloom filter must match union of all receipt logs * Combines log entries from all transaction receipts * Includes logger addresses and log topics ## State Root Consistency The state root must satisfy: $\text{TRIE}(\text{LS}(\sigma)) = P(B_H)H_r$ where: * $\text{TRIE}(\text{LS}(\sigma))$ is the Merkle Patricia tree root hash of state $\sigma$ * $P(B_H)$ is the parent block * Values are RLP encoded * $\sigma$ is the state after executing all transactions and withdrawals ### Execution Order Dependencies For consistency to hold: 1. All transactions must be executed in order: $B_T[0] \rightarrow B_T[1] \rightarrow ... \rightarrow B_T[n-1]$ 2. All withdrawals must be processed after transactions: $B_W[0] \rightarrow B_W[1] \rightarrow ... \rightarrow B_W[m-1]$ 3. State root ($H_r$) must reflect final state after both transactions and withdrawals 4. Receipt root ($H_e$) must include results from all transaction executions 5. Logs bloom ($H_b$) must include all logs from all receipts ## Validation Process $\text{ValidateHeader}(B) :=$ 1. Execute all transactions in order 2. Process all withdrawals in order 3. Compute final state root 4. Compute transaction trie root 5. Compute receipt trie root 6. Compute withdrawals trie root 7. Compute logs bloom filter 8. Verify all computed values match header fields A header is consistent if and only if all computed values exactly match their corresponding header fields. ### Computation #### Merkle Patricia Tries There are four Modified Merkle Patricia Tries (MPT) used in maintenance of the Ethereum World State $σ$: - State Trie - Transaction Trie - Receipt Trie - Withdrawal Trie #### Transaction Execution --- **Symbol Key:** * $\sigma_t$ = state at time $t$ * $\Pi$ = block-level state transition function * $B$ = block * $T_n$ = transaction $n$ * $\Upsilon$ = transaction-level state transition function Any transactions executed first pass the initial tests of intrinsic validity. (as of October 22nd) 1. The transaction is well-formed RLP, with no additional trailing bytes; 2. the transaction signature is valid; 3. the transaction nonce is valid (equivalent to the sender account’s current nonce); 4. the sender account has no contract code deployed (see EIP-3607 by Feist et al. [2021]); 5. the gas limit is no smaller than the intrinsic gas, g0, used by the transaction; 6. the sender account balance contains at least the cost, v0, required in up-front payment; 7. the maxFeePerGas, Tm, in the case of type 2 transactions, or gasPrice, Tp, in the case of type 0 and type 1 transactions, is greater than or equal to the block’s base fee, Hf; and 8. for type 2 transactions, maxPriorityFeePerGas, Tf , must be no larger than maxFeePerGas, Tm. The process of finalising a block involves three stages: 1. executing withdrawals; 2. validate transactions; 3. verify state. ## Time Complexity $Block\ Creation\ Complexity = O(n \cdot k \cdot h + n \cdot s + n \log n)$ where: * $n$ = number of transactions * $k$ = accounts touched per transaction * $h$ = height of state trie $\approx \log(A)$ * $s$ = signature verification cost (constant) * $A$ = total number of accounts in state **Component Breakdown** 1. State Updates: $O(n \cdot k \cdot \log(A))$ * $n$ transactions * $k$ accounts per tx * $\log(A)$ trie depth 2. Signature Verifications: $O(n)$ * One verification per tx * Constant time per verification 3. Merkle Tree: $O(n \log n)$ * Binary tree construction * $n$ leaf nodes $Memory = O(n \cdot k)$ ## Instruction Level Analysis ## Empirical Values For typical values: * When $n = 100$ transactions * $k = 2$ accounts per tx * $A = 2^{24}$ accounts in state * $h = \log(A) = 24$ - average transaction size = 200 bytes The dominant term becomes: $O(100 \cdot 2 \cdot 24) = O(4800)$ for state updates # Block Header Size | Field | Symbol | Size in Bits | Size in Bytes | Notes | |-------|--------|--------------|---------------|-------| | parentHash | $H_p$ | 256 | 32 | Keccak hash | | ommersHash | $H_o$ | 256 | 32 | Keccak hash | | beneficiary | $H_c$ | 160 | 20 | Address | | stateRoot | $H_r$ | 256 | 32 | Keccak hash | | transactionsRoot | $H_t$ | 256 | 32 | Keccak hash | | receiptsRoot | $H_e$ | 256 | 32 | Keccak hash | | logsBloom | $H_b$ | 2048 | 256 | Bloom filter | | difficulty | $H_d$ | 256 | 32 | Zero value | | number | $H_i$ | 256 | 32 | Block number | | gasLimit | $H_l$ | 256 | 32 | Gas limit | | gasUsed | $H_g$ | 256 | 32 | Gas used | | timestamp | $H_s$ | 256 | 32 | Unix timestamp | | extraData | $H_x$ | ≤256 | ≤32 | Variable, max 32 | | prevRandao | $H_a$ | 256 | 32 | RANDAO value | | nonce | $H_n$ | 64 | 8 | Zero value | | baseFeePerGas | $H_f$ | 256 | 32 | Base fee | | withdrawalsRoot | $H_w$ | 256 | 32 | Keccak hash | ## Total Size ``` Total Fixed Bits = 5600 bits Total Fixed Bytes = 700 bytes ``` This breaks down as: - 13 fields of 256 bits (32 bytes) each = 3328 bits (416 bytes) - 1 field of 2048 bits (256 bytes) = 2048 bits (256 bytes) - 1 field of 160 bits (20 bytes) = 160 bits (20 bytes) - 1 field of 64 bits (8 bytes) = 64 bits (8 bytes) - 1 variable field of up to 256 bits (32 bytes) Maximum total size including variable field: ``` Maximum Total Bits = 5600 bits Maximum Total Bytes = 700 bytes ``` In practice, the actual size might be slightly less due to: 1. extraData often being smaller than 32 bytes 2. RLP encoding overhead not included in this calculation 3. Some number fields potentially using fewer bytes when encoded