MEMORY OBJECTS IN YUL ===================== First step: - We implement the following global yul functions using plain ``mload`` and ``mstore`` and the free memory pointer: - ``mobj_allocate(size) -> mobj`` - allocates ``size`` bytes by reading from and updating the free memory pointer - ``mobj_allocate_unbounded() -> mobj_unbounded`` - returns the current position of the free memory pointer - ``mobj_finalize(mobj_unbounded, size) -> mobj`` - makes a previously unbounded memory object persistent, i.e. write ``add(mobj_unbounded, size)`` (rounded up) to the free memory pointer - ``mobj_discard(mobj_unbounded)`` - discards the previously allocated memory object (no-op, only there for static analysis later - we can also skip it and just detect this? I'd prefer having it explicitly.) - ``mobj_load(mobj, offset) -> value`` - load from memory object. Initially without bounds checks. - ``mobj_store(mobj, offset, value)`` - store to memory object. Initially without bounds checks. - "memory objects" can initially just be of type ``u256`` and just store the offset into memory [this will change below] - We replace all memory-related functions in the ABI code and the Yul IR code with these functions. - For now we keep passing memory objects to builtins like ``keccak`` as they are, i.e. just ``keccak(mobj, size)``. Thus far this shouldn't really change anything significantly - hopefully the optimizer will inline all of this and reduce it to pretty much what we have right now already. Next step: enforcing and moving to builtins (all of this is still up for discussion): - The previously defined functions will become builtins in a new, initially internal, Yul dialect. The memory objects will become a proper builtin type. - Remove ``mload`` and ``mstore`` from that dialect (I think we should just do that here - but we can argue about it) - We encode the length into the memory object itself (masking & shifting). - We add an additional builtin for creating memory-sub-objects/slices. Something like: - ``mobj_view(mobj, offset, size) -> mobj_view`` or - ``mobj_slice(mobj, offset, size) -> mobj_slice`` - Needs more thought: we will need those on unbounded memory objects. Are there issues with that? Bounds checks in this case? - We consider adding bounds checks in ``mobj_load`` and ``mobj_store``. - Builtin functions that used to take a memory offset will be changed to accept a memory object only: ``keccak(mpos, size)`` will become ``keccak(mobj)``. The encoded size in the object will be used as size. - Should those functions get different names? - We perform the following steps in static analysis: - Whenever a function uses ``mobj_allocate_unbounded()``, all its code paths must lead to ``mobj_finalize`` or ``mobj_discard`` on that same memory object. (In another step we may also allow returning unbounded memory objects instead). - We define the property "allocates unbounded memory". It is set for ``mobj_allocate_unbounded`` and recursively inherited by all functions calling it. - We define the property "allocates memory". It is set for ``mobj_allocate_unbounded`` and ``mobj_allocate`` and recursively inherited by all functions calling it. - No function calling a function that "allocates unbounded memory" is allowed to call another function marked "allocates memory" before it has ``mobj_finalize``ed or ``mobj_discard``ed that same memory object. [Note: in the future we may or may not look into relaxing this for scratch allocations] At this point we have to introduce optimizer steps based on the memory objects. Open questions: - Do we want an intermediate step between those two defined above? That would be for example encoding length into objects and replacing builtins with defined functions ``keccak_mobj`` or something like that? - How to deal with the scratch space? Is it a global memory object? Or do we have ``function get_scratch_space() -> mobj``? Or even ``get_scratch_space(<size>) -> mobj``? ``get_scratch_space_<size>()``? We should make sure this can be nicely optimized away in the end. Note: always - We could just have ``mobj_allocate_unbounded()`` be ``mobj_allocate(0)`` - or we could check the size in ``mobj_allocate`` to be non-zero. I'd expect the size encoded into an unbounded memory object to be zero. Could also be the maximal length we allow, though. - How to encode length/offset in the memory object? Where to split? 128-bit/128-bit? - How to expose to inline assembly? Eliminating Stack Height Issues =============================== Mark every Yul variable that's not reachable on the stack. This should be reachability *after* stack optimization, although that might be a bit tricky to figure out... Determine the maximum number of unreachable variables over all functions (MAX_UNREACHABLE). This determines the size of memory frames (this is compile-time constant). In case MAX_UNREACHABLE > 0, i.e. there are any unreachable variables at all, reserve a fixed offset in memory for the "memory frame pointer" (call it FRAME_POINTER, e.g. 0x80) and another one for the "free memory frame stack" (call it FREE_FRAME_STACK, e.g. 0xA0). For each function that has unreachable variables do the following: - Add ``enterMemoryFrame()`` to the beginning of the function (definition below). - Add ``exitMemoryFrame()`` to every point of return of the function (i.e. at the end of the body, resp. immediately before any ``leave`` statement). - Assign an index to each variable that has an unreachable reference, i.e. it's position in the memory frame. - Replace every assignment to the variable (including the potential one at the declaration) with ``setMemVar(id, value)`` (definition below). - For function arguments insert ``setMemVar(assignedId, arg)`` immediately after ``enterMemoryFrame()`` instead. - Replace every reference to the variable with ``getMemVar(id)``. ```yul function enterMemoryFrame() { let frame := mload(FREE_FRAME_STACK) switch frame case 0 { // no re-usable memory chunk; allocate new chunk frame := allocate(add(mul(MAX_UNREACHABLE, 32), 32)) // We need 32 bytes for each unreachable variable and an // additional 32 bytes for storing the previous frame pointer. } default { // we found a re-usable memory chunk mstore(FREE_FRAME_STACK, mload(frame)) // Note: mload(frame) contains a pointer to the next free memory chunk (or 0) } // Save the current frame pointer to the beginning of the memory frame mstore(frame, mload(FRAME_POINTER)) // Replace the frame pointer. mstore(FRAME_POINTER, frame) } function exitMemoryFrame() { let frame := mload(FRAME_POINTER) // restore the previous frame pointer let previous_frame_pointer := mload(frame) mstore(FRAME_POINTER, previous_frame_pointer) // "free" our memory frame let next_free_frame := mload(FREE_FRAME_STACK) // might be 0 // store "next" pointer at the beginning of the frame mstore(frame, next_free_frame) // store the frame in the free frame stack mstore(FREE_FRAME_STACK, frame) } function setMemVar(id, value) { let frame := mload(FRAME_POINTER) frame := add(frame, 0x20) // skip stored old frame pointer mstore(add(frame, mul(id, 0x20)), value) // can be optimized by adding and subtracting 0x20 in enter/exit and pre-multiplying with 0x20 on the call site. } function getMemVar(id) -> value { let frame := mload(FRAME_POINTER) frame := add(frame, 0x20) // skip stored old frame pointer value := mload(add(frame, mul(id, 0x20))) } ``` Note: ``setMemVar`` and ``getMemVar`` can be inlined to single expressions. I think we could get away with directly inserting optimized versions of these without having to optimize again afterwards (which might introduce new stack issues). Example of the transformation: The following errors out with ``Variable a1 is 1 slot(s) too deep inside the stack.``: ``` function f() { let a1 := calldataload(1) let a2 := calldataload(2) let a3 := calldataload(3) let a4 := calldataload(4) let a5 := calldataload(5) let a6 := calldataload(6) let a7 := calldataload(7) let a8 := calldataload(8) let a9 := calldataload(9) let a10 := calldataload(10) let a11 := calldataload(11) let a12 := calldataload(12) let a13 := calldataload(13) let a14 := calldataload(14) let a15 := calldataload(15) let a16 := calldataload(16) let a17 := calldataload(17) sstore(0, a1) sstore(0, a2) sstore(0, a3) sstore(0, a4) sstore(0, a5) sstore(0, a6) sstore(0, a7) sstore(0, a8) sstore(0, a9) sstore(0, a10) sstore(0, a11) sstore(0, a12) sstore(0, a13) sstore(0, a14) sstore(0, a15) sstore(0, a16) sstore(0, a17) } ``` ``a1`` will be marked unreachable and this will be transformed to: ``` function f() { enterMemoryFrame() /* let a1 := */ setMemVar(0, calldataload(1)) let a2 := calldataload(2) let a3 := calldataload(3) let a4 := calldataload(4) let a5 := calldataload(5) let a6 := calldataload(6) let a7 := calldataload(7) let a8 := calldataload(8) let a9 := calldataload(9) let a10 := calldataload(10) let a11 := calldataload(11) let a12 := calldataload(12) let a13 := calldataload(13) let a14 := calldataload(14) let a15 := calldataload(15) let a16 := calldataload(16) let a17 := calldataload(17) sstore(0, /* a1 */ getMemVar(0)) sstore(0, a2) sstore(0, a3) sstore(0, a4) sstore(0, a5) sstore(0, a6) sstore(0, a7) sstore(0, a8) sstore(0, a9) sstore(0, a10) sstore(0, a11) sstore(0, a12) sstore(0, a13) sstore(0, a14) sstore(0, a15) sstore(0, a16) sstore(0, a17) exitMemoryFrame() } } ``` Potential Improvements ---------------------- If the call chain to a function from the external entry is always known (e.g. for external functions themselves )