--- eip: 4573 title: Procedures for the EVM description: Introduces support for EVM Procedures. status: Draft type: Standards Track category: Core author: Greg Colvin (@gcolvin), Greg Colvin <greg@colvin.org> discussions-to: https://ethereum-magicians.org/t/eip-4573-named-procedures-for-evm-code-sections/7776 created: 2021-12-16 requires: 2315, 3540, 3670, 3779, 4200 --- ## Abstract Five EVM instructions are introduced to define, call, and return from named EVM _procedures_ and access their _call frames_ in memory - `ENTERPROC`, `LEAVEPROC`, `CALLPROC`, `RETURNPROC`, and `FRAMEADDRESS`. ## Motivation Currently, Ethereum bytecode has no syntactic structure, and _subroutines_ have no defined interfaces. We propose to add _procedures_ -- delimited blocks of code that can be entered only by calling into them via defined interfaces. Also, the EVM currently has no automatic management of memory for _procedures_. So we also propose to automatically reserve call frames on an in-memory stack. Constraints on the use of _procedures_ must be validated at contract initialization time to maintain the safety properties of [EIP-3779](./eip-3779.md): Valid programs will not halt with an exception unless they run out of gas or recursively overflow stack. ### Prior Art The terminology is not well-defined, but we will follow Intel in calling the low level concept _subroutines_ and the higher level concept _procedures_. The distinction is that _subroutines_ are little more than a jump that knows where it came from, whereas procedures have a defined interface and manage memory as a stack. This EIP defines _procedures_ in terms of the _subroutines_ of [EIP-2315](./eip-2315.md). ## Specification ### Instructions #### ENTERPROC (0x??) n_inputs: uint16, n_outputs: uint16, n_locals: uint16 ``` PC +- <length of immediates> ``` Marks the entry point to a procedure * taking `n_inputs` arguments from the data stack, * returning `n_outputs` values on the `data stack`, and * reserving `n_locals` words of data in memory on the `frame stack`. Procedures can only be entered via a `CALLPROC` to their entry point. #### LEAVEPROC (0x??) ``` FP = frame_stack.pop() RETURNSUB ``` > Pop the `frame stack` and return to the calling procedure as if via `RETURNSUB`. Marks the end of a procedure. Every `ENTERPROC` requires a closing `LEAVEPROC`. *Note: Attempts to jump into a procedure (including its `LEAVEPROC`) from outside of the procedure or to jump or step to `ENTERPROC` at all must be prevented at validation time. `CALLPROC` is the only valid way to enter a procedure.* #### CALLPROC (0x??) dest_section: uint16, dest_proc: uint16 ``` frame_stack.push(FP) FP -= n_locals * 32 JUMPSUB <offset of section> + <offset of procedure> ``` > Allocate a *stack frame* and transfer control as if via a `JUMPSUB` to the Nth (N=*dest_proc*) _procedure_ in the Mth(M=*dest_section*) _section_ of the code. _Section 0_ is the current code section, any other code sections are indexed starting at _1_. *Note: That the procedure is defined and the required `n_inputs` words are available on the `data stack` must be shown at validation time.* #### RETURNPROC (0x??) ``` FP += n_locals RETURNSUB ``` > Pop the `frame stack` and return control to the calling procedure as if via `RETURNSUB`. *Note: That the promised `n_outputs` words are available on the `data stack` must be shown at validation time.* #### FRAMEADDRESS (0x??) offset: int16 ``` PUSH2 FP + offset ``` > Push the address `FP + offset` onto the data stack. Call frame data is addressed at an immediate `offset` relative to `FP`. Typical usage includes storing data on a call frame ``` PUSH 0xdada FRAMEADDRESS 32 MSTORE ``` and loading data from a call frame ``` FRAMEADDRESS 32 MLOAD ``` ### Memory Costs Presently,`MSTORE` is defined as ``` memory[stack[0]...stack[0]+31] = stack[1] memory_size = max(memory_size,floor((stack[0]+32)÷32) ``` * where `memory_size` is the number of active words of memory above _0_. We propose to treat memory addresses as signed, so the formula needs to be ``` memory[stack[0]...stack[0]+31] = stack[1] if (stack[0])+32)÷32) < 0 negative_memory_size = max(negative_memory_size,floor((stack[0]+32)÷32)) else positive_memory_size = max(positive_memory_size,floor((stack[0]+32)÷32)) memory_size = positive_memory_size + negative_memory_size ``` * where `negative_memory_size` is the number of active words of memory below _0_ and * where `positive_memory_size` is the number of active words of memory at or above _0_. ### Call Frame Stack These instructions make use of a `frame stack` to allocate and free frames of local data for _procedures_ in memory. Frame memory begins at address 0 in memory and grows downwards, towards more negative addresses. A frame is allocated for each procedure when it is called, and freed when it returns. Memory can be addressed relative to the frame pointer `FP` or by absolute address. `FP` starts at 0, and moves downward towards more negative addresses to point to the frame for each `CALLPROC` and moving upward towards less negative addresses to point to the previous frame for the corresponding `RETURNPROC`. Equivalently, in the EVM's twos-complement arithmetic, `FP` moves from the highest address down, as is common in many calling conventions. For example, after an initial `CALLPROC` to a procedure needing two words of data the `frame stack` might look like this ``` 0-> ........ ........ FP-> ``` Then, after a further `CALLPROC` to a procedure needing three words of data the `frame stack` would like this ``` 0-> ........ ........ -64-> ........ ........ ........ FP-> ``` After a `RETURNPROC` from that procedure the `frame stack` would look like this ``` 0-> ........ ........ FP-> ........ ........ ........ ``` and after a final `RETURNPROC`, like this ``` FP-> ........ ........ ........ ........ ........ ``` ## Rationale There is actually not much new here. It amounts to [EIP-615](./eip-615.md), refined and refactored into bite-sized pieces, along lines common to other machines. This proposal uses the [EIP-2315](./eip-2315.md) return stack to manage calls and returns, and steals ideas from [EIP-3336](./eip-3336.md), [EIP-3337](./eip-3337.md), and [EIP-4200](./eip-4200.md). `ENTERPROC` corresponds to `BEGINSUB` from EIP-615. Like EIP-615 it uses a frame stack to track call-frame addresses with `FP` as _procedures_ are entered and left, but like EIP-3336 and EIP-3337 it moves call frames from the data stack to memory. Aliasing call frames with ordinary memory supports addressing call-frame data with ordinary stores and loads. This is generally useful, especially for languages like C that provide pointers to variables on the stack. The design model here is the _subroutines_ and _procedures_ of the Intel x86 architecture. * `JUMPSUB` and `RETURNSUB` (from [EIP-2315](./eip-2315.md) -- like `CALL` and `RET` -- jump to and return from _subroutines_. * `ENTERPROC` -- like `ENTER` -- sets up the stack frame for a _procedure_. * `CALLPROC` amounts to a `JUMPSUB` to an `ENTERPROC`. * `RETURNPROC` amounts to an early `LEAVEPROC`. * `LEAVEPROC` -- like `LEAVE` -- takes down the stack frame for a _procedure_. It then executes a `RETURNSUB`. ## Backwards Compatibility This proposal adds new EVM opcodes. It doesn't remove or change the semantics of any existing opcodes, so there should be no backwards compatibility issues. ## Security Safe use of these constructs must be checked completely at validation time -- per EIP-3779 -- so there should be no security issues at runtime. `ENTERPROC` and `LEAVEPROC` must follow the same safety rules as for `JUMPSUB` and `RETURNSUB` in EIP-2315. In addition, the following constraints must be validated: * Every`ENTERPROC` must be followed by a `LEAVEPROC` to delimit the bodies of _procedures_. * There can be no nested _procedures_. * There can be no jump into the body of a procedure (including its `LEAVEPROC`) from outside of that body. * There can be no jump or step to `BEGINPROC` at all -- only `CALLPROC`. * The specified `n_inputs` and `n_outputs` must be on the stack. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).