Roberto Bayardo
May 1, 2023
EIP-4844 introduces a new blob transaction type for which SSZ encoding was chosen for its network & block representations in place of RLP. The choice of SSZ kicks off a longstanding desire to migrate the Ethereum execution layer (EL) to this more modern and featureful encoding format, which is expected to yield multiple benefits over time. In the short term, once there are enough SSZ transaction types to cover all transaction use cases, new L2s/zk-EVMs can avoid implementing the legacy RLP transaction support entirely, eliminating significant technical debt.
Transactions in the execution layer have two perpetual identifiers:
sig_hash
, a hash of the unsigned transaction data which the signature is based on, andtx_hash
, the signed transaction's hash, used as a unique identifier to refer to the transaction.We currently lack consensus on how to compute sig_hash and tx_hash for SSZ-encoded transactions. This document attempts to summarize the current state, the various other options being considered, and the concerns surrounding each one.
For RLP ethereum transaction types, the sig_hash is computed as:
keccak([tx_type_byte] + rlp_encode(tx))
And its tx_hash as:
keccak([tx_type_byte] + rlp_encode(signed_tx))
EIP-4844 (as of May 1, 2023) defines sig_hash and tx_hash for a blob transaction as follows
sig_hash:
keccak([0x03] + ssz.serialize(tx.message))
tx_hash:
keccak([0x03] + ssz.serialize(tx))
This approach borrows heavily from the existing RLP transaction signature scheme, providing the following advantages:
This simplicity however is countered by the following drawbacks:
EIP-6493 proposes an SSZ transaction signature scheme consistent with how SSZ objects are signed in the Ethereum consensus layer, involving idiomatic usage of the hash_tree_root combined with a domain that prevents collisions with signatures from other domains (see compute_signing_root).
This approach resolves the concerns with the current state, though not without new drawbacks. The main concern seems to be:
Update: As of May 3 2023, EIP-6493 has been updated and now largely resembles option 2 below.
We can imagine a new idiomatic ssz signature scheme for transactions that more narrowly attempts to address the specific concerns of the current state without bringing in a full CL-inspired signature scheme. For example, consider:
class TxSigningData(Container):
tx_type: Bytes1
object_root: Root
sig_hash(BlobTransaction: tx):
return hash_tree_root(TxSigningData(0x03, hash_tree_root(tx.message))
tx_hash(BlobTransaction: tx):
return hash_tree_root(tx)
Because the top-level hash function here is no longer keccak, the probability of collision with any keccak-based hash is negligible. We preserve usage of the tx-type byte as a sort of execution layer transaction domain prefix for preventing collisions over future SSZ transaction types. Cross-chain replays are prevented by the fact that blob-tx type has chain_id as its first element.
If we imagine a future where we move entirely to SSZ type transactions, this approach allows removing of all legacy RLP cruft pertaining to transaction encoding/decoding. While this may not benefit ethereum mainnet any time soon since we expect old transactions should be forever interpretable, it could benefit, say, L2s which are bootstrapped at a point where SSZ transactions provide all the necessary functionality.
Another option would be to retain use of SSZ for network & block representations of a blob transaction, but to serialize the blob transaction's contents to RLP purely for the purpose of signing & tx id computation.
The obvious drawback having to continue to rely on legacy RLP serialization (but not deserialization) in order to validate the signature of an SSZ transaction type. If we imagine a future where we deprecate older transactions and move entirely to SSZ based types, this approach prevents us from removing legacy RLP cruft pertaining to transaction encoding/decoding. While this may not affect ethereum mainnet any time soon since we expect old transactions to be forever interpretable, it limits the benefit noted in our introduction around post-SSZ bootstrapped L2s being able to avoid significant RLP related technical debt.
Right now chain_id, whose purpose is (only? predominantly?) to prevent tx replay across chains, is within the blob_tx itself. Should this field instead be moved to the "signing envelope? As examples, consider the current state. If we removed chain-id from the blob-tx, we'd instead want to sign:
keccak([0x3] + uint_to_bytes(uint32(chain_id)) + ssz.serialize(tx.message))
Note that if all other chains were to adopt this new style signing prefix, there would be replay protection across chains regardless of the tx-format chosen. (The scheme would still be vulnerable to bad actors who fork without introducing a new chain_id however.)
For option 2 we'd extend TxSigningData as:
class TxSigningData(Container):
tx_type: Bytes1
chain_id: Bytes32
object_root: Root
Potential concerns:
The signed blob tx still has a signature composed of sub-elements v,s, and r which are (de)serialized individually even though their only use is to re-compact them back into a 65-byte array expected by the signature validation / address recovery function. This approach has been abused to some extent to encode chain_id into older transaction types by hacking it into the "V" value's higher order bits, but is no longer relevant for newer tx types where chain_id is encoded explicitly (whether within the tx itself or within the signing envelope).
We could consider treating the signature instead as an opaque 65 byte array with respect to its SSZ encoding, and the implementation could then choose a representation that precisely matches the expectations of the signature verification / address recovery API.