Try   HackMD

Runtime for recompiled YUL contracts

Recompiled YUL contracts are mostly compatible with existing pallet-contracts; here some thoughts around what's missing or could be further optimized.

Required additions

Keccak256 code hashes

EVM/ETH use keccak256 as code hashes. So we need:

  • An API to get the keccak256 hash of a blob for supporting EXTCODEHASH.
  • Have the deploy function to accept keccak256 hashes

Just using blake2 or anything else than keccak256 does not work for the recompiler because it will break the semantics of existing contracts.
I.e. the hash of a contract is not always just some opaque bytes. For example when a contract compares some code hash against the "empty code hash" (keccak256('')) to figure out if the address is a code account. Or it might expect a specific code hash at a specific address. This does fall apart if we use a different hash function.

STATICCALL

In a staticcall context, the runtime throws if the contract calls into any state modifying function. Propagated down the call stack. EIP-214

CALL

Address vs. code hash

Solidity specifies callees using addresses and not code hashes. Instead of the contract having to do the lookup, the runtime should do it instead.

Balance transfers

EVM does balance transfer via calls to EOA. So instead of failing calls to accounts without code, balance should be transferred in that case.

RETURNDATA{COPY,SIZE}

The output of the last call context should be kept around because contracts can request it.

TLOAD / TSTORE

Transient ("temporay") contract storage that is always reverted after the call ends.

https://eips.ethereum.org/EIPS/eip-1153#specification

Must be implemented so that it is cheaper than than ordinary STORE / LOAD on contract storage as solc will use those opcodes for optimizations. Must be implemented to exactly resemble the semantics on EVM, otherwise it can introduce security risks.

BLOBHASH / BLOBBASEFEE

New opcodes related to sharding on ETH. The idea of proto-danksharding is to provide more data short term (data too expensive to store for all the rollups long term). Questionable if/how we can to support that. However, ETH folks are always getting creative with whatever new functionality they put in EVM and abuse it for something else. I expect this to be the case here too so we might want to support it anyways if we can. IMO not the highest priority though.

https://github.com/ethereum/EIPs/blob/master/EIPS/eip-4844.md
https://eips.ethereum.org/EIPS/eip-4844#gas-accounting
https://www.eip4844.com/

CHAINID

Returns some number (identifier) for the chain. Ideally we don't clash with any existing ones.

BLOCKHASH

BLOCKHASH(blockNumber) returns the hash of block number blockNumber (only valid for blockNumber up to the newest 256 blocks).

GASLIMIT

Return the blocks gas limit

CODESIZE / EXTCODESIZE

Returns the size of code blob running (CODESIZE) or the code size of the code at the specified address (EXTCODESIZE) respectively (analogous to existing code_hash / ext_code_hash).

We can return a u32 here (the code size can not exceed it anyways and it is much cheaper to zero extend this into an i256 than allocating and loading from stack space).

INVALID

The INVALID opcode (0xFE) reverts but also consumes all remaining gas. Could maybe implemented in return flags.

CREATE/CREATE2

The runtime should use the same address derivation as on EVM. Contract code might assume this:
https://github.com/Uniswap/v2-periphery/blob/0335e8f7e1bd1e8d8329fd300aea2ef2f36dd19f/contracts/libraries/UniswapV2Library.sol#L18

Additionally, we have many parameters in fn instantiate in the contracts pallet which don't matter for EVM. So we could have a simpler create/create2 API methods that behaves exactly like on Ethereum and take the same parameters:

fn create2( code_hash_ptr: u32, // keccak256 hash image of the contract code value_ptr: u32, // i256 ptr to balance to be transferred input_data_ptr: u32, // constructor calldata ptr input_data_len: u32, // constructor calldata length ptr address_ptr: u32, // output buffer (20 bytes) = keccak256(0xff + sender_address + salt + keccak256(initialisation_code))[12:] salt_ptr: u32, // 32byte ptr of salt ) -> Result<()>

Where CREATE2 writes the zero address on failure. Anolog for CREATE.

This would also benefit code size as there are only 6 parameters which doesn't require spilling.

Another thing to note is that we likely just ignore the output of the constructor (on EVM, the constructor output is the runtime code to be deployed, however we assume the code already on-chain and execute the constructor in the context of the new instance, discarding any output).

BALANCE

Currently, the balance seal API returns the balance of the executing account. EVM has the account (address) as parameter.

Unclear

Best to check what frontier/moonbeam do for those. I'm not sure of the back of my head what do for those but if frontier can emulate it then we surely can find a solution for those too. Worst case we don't support some of them and just emit a compiler error at the cost of sacrificing compatibility.

Potential Optimizations

Calldata and callvalue

On EVM there are CALLDATALOAD(i) -> calldata[i] to load a single word from calldata at offset i and CALLDATACOPY(destOffset, offset, size) which is essentially a memcopy (offset is the offset from the start of calldata and destOffset the offset into the EVM linear heap memroy). CALLDATASIZE() -> size returns the size of the calldata in bytes. CALLVALUE() -> value returns the transferred balance with this call.

Ideas discussed so far:

  • The selector check requires the calldatasize, so we could provide it at a fixed memory location or in a register or on the stack at the start of the execution to spare a host API call in virtually any case.
  • Callvalue is used regurarly as the contract reverts if non-payable functions are called with value.
  • The runtime could provide calldata[0] at a fixed memory location. All contracts are expect to do at least a CALLDATALOAD[0] at minimum because this is required for the selector check. This could spare calling into seal_input in cases where the code doesn't use CALLDATCOPY at all, as the compiler can optimize CALLDATALOAD(0) away if the offset 0 is static (which it always is during selector check).

Immutables

On ETH the deploy code can insert immutables into the code, which we can't, so they need to be stored somewhere. My naive approach would be just store them in regular contract storage under a 4byte index key (ETH storage keys are always 32bytes so this can never collide) but runtime performance is penalized by doing that.

  • Could access them lazily and keep them on the stack so the penality is only signifcant for the first access