--- eip: 4750 title: EOF - Modules and Procedures description: Individual sections for modules with `CALLPROC` and `RETURNPROC` instructions author: Andrei Maiboroda (@gumb0), Alex Beregszaszi (@axic), Paweł Bylica (@chfast) discussions-to: https://ethereum-magicians.org/4750-eof-functions/8195 status: Draft type: Standards Track category: Core created: 2022-01-10 requires: 2315, 3540, 3670, 3779, 4200 --- ## Abstract Introduce the ability to have several code sections in EOF-formatted [EIP-3540](./eip-3540.md) bytecode, each one representing a separate module. Each code section has a corresponding type section which specifies one or more procedural entry points to that module. Two new opcodes,`CALLPROC` and `RETURNPROC`, are introduced to call and return from these procedures. ## Motivation Currently in the EVM everything is a dynamic jump. Languages like Solidity generate most jumps in a static manner (i.e. the destination is pushed to the stack right before, `PUSHn .. JUMP`). Unfortunately however this cannot be used by most EVM interpreters, because of added requirement of validation/analysis. This also restricts them from making optimisations and potentially reducing the cost of jumps. Dynamic jumps also impede the validation of safety properties promised by [EIP-3779: Safer Control Flow for the EVM](./eip-3779.md), for programs whose jumps are all static: that no valid program will encounter an exceptional halting condition except via lack of gas or recursive stack overflow. This EIP aims to remove the need for dynamic jumps as it offers the most important feature those are used for: calling into and returning from procedures. While it removes the need, it does not disallow those instructions. [EIP-4200: Static relative jumps](./eip-4200.md) introduces static jump instructions, which remove the need for *most* dynamic jump use cases, but not everything can be solved with them. [EIP-2315: Simple Subroutines for the EVM ](./eip-2315.md) provides for calling into and returning from subroutine. But only bare mechanism is provided -- no structure is imposed on the code. This proposal goes further in providing a typed, modular structure. Modules are mapped to multiple code sections, and procedures are mapped to typed entry points in each code section. It aims to improve analysis opportunities by encoding the number of inputs and outputs as the type for each given procedure, and isolating the stack of each procedure (i.e. a procedure cannot read the stack of the caller/callee). ## Specification ### EOF Code and Type Sections We propose to allow for multiple `EOF` *code* and corresponding *type* sections. Code sections after the first can be entered only via a `CALLPROC` to one of the entry point procedures specified in its corresponding type section, and can be left only via `RETURNPROC`. `CALLPROC` and `RETURNPROC` are in turn defined in terms of `JUMPSUB` and `RETURNSUB` from [EIP-2315](./eip-2315.md). ### EOF container changes 1. The requirement of [EIP-3540](./eip-3540.md) "Exactly one code section MUST be present." is relaxed to "At least one code section MUST be present.", i.e. multiple code sections (`kind = 1`) are allowed. 2. Total number of code sections MUST NOT exceed 1024. 3. All code sections MUST precede a data section, if data section is present. 4. New section with `kind = 3` is introduced called the *type section*. 5. Exactly one type section MUST be present for each code section. 6. The type sections MUST directly precede all code sections. 7. The type section contains a sequence of triples: * the first uint8 encodes number of inputs * a second unint8 encodes number of outputs, and * a final uint16 encodes the offset of the entry point, relative to the beginning of each code section. * *Note: This implies that there is a limit of 256 stack for the input and in the output.* 9. First code section MUST have 1 entry point at offset 0 with 0 inputs and 0 outputs. To summarize, a well-formed EOF bytecode will have the following format: ``` bytecode := format, magic, version, (type_section_header)+, (code_section_header)+, [data_section_header], 0, (type_section_contents)+, (code_section_contents)+, [data_section_contents] type_section_header := 3, number_of_code_sections * 4 # section kind and size type_section_contents := 0, 0, code_section_1_inputs, code_section_1_outputs, code_section_2_inputs, code_section_2_outputs, ..., code_section_n_inputs, code_section_n_outputs, code_section_entry_offest ``` ### New instructions We introduce two new instructions: 1. **`CALLPROC`** (`0x5e`) 2. **`RETURNPROC`** (`0x5f`) If the code is legacy bytecode, both of these instructions result in an *exceptional halt*. (*Note: This means no change to behaviour.*) #### CALLPROC (0x5e) dest_section: uint8, dest_proc: uint8 ``` JUMPSUB <offset of section> + <offset of procedure withib section> ``` > Transfer control as if via `JUMPSUB` to the offset of the Nth (N=*dest_proc*) _procedure_ in the Mth(M=*dest_section*) _section_ of the code. _Section 0_ is the current code section, any other code sections are indexed starting at _1_. *Note: That the procedure is defined and the required `n_inputs` words are available on the `data stack` must be shown at validation time.* #### RETURNPROC (0x??) ``` RETURNSUB ``` > Return control to the calling procedure as if via `RETURNSUB`. *Note: That the promised `n_outputs` words are available on the `data stack` must be shown at validation time.* ### Execution 1. Execution starts at the first byte of the first code section, and PC is set to 0. 2. Return stack is initialized to contain one item: `(code_section_index = 0, offset = 0, stack_height = 0)` 3. Destinations of jumps are allowed only to be inside current code section. `JUMP`, `JUMPI`, `RJUMP, and `RJUMPI` result in an invalid program when destination is outside of current section bounds. *Note: That jumps are valid must be shown at validation time.* #### Implications on the JUMPDEST analysis - Analysis is done separately for each section, i.e. output of entire analysis is `number_of_code_sections` lists of possible jump destinations. - Analysis is extended to consider 2 bytes directly following `CALLF` to be invalid jump destination ## Rationale Each code section is a module with defined procedural interfaces, identified with address-independent indexes. Given that all jumps within a section are relative, and cannot jump out of a section, this allows external tools to compose contracts out of pre-compiled code sections. ### `RETF` in the top frame ends execution vs exceptionally halts Alternative logic for executing `RETF` in the top frame could be to exceptionally halt execution, because there is arguably no caller for the starting procedure. This would mean that return stack is initialized as empty, and `RETF` exceptionally aborts when return stack is empty. We have decided in favor of always having at least one item in the return stack, because it allows to avoid having a special case for empty stack in the interpreter loop stack underflow check. We keep the stack underflow. ## Backwards Compatibility This change poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted). The new execution state and multi-section control flow pose no risk to backwards compatibility, because it is a generalization of executing a single code section. Executing existing contracts (both legacy and EOF1) has no user-observable changes. ## Security Considerations TBA ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).