MEMORY OBJECTS IN YUL
=====================
First step:
- We implement the following global yul functions using plain ``mload`` and ``mstore`` and the free memory pointer:
- ``mobj_allocate(size) -> mobj`` - allocates ``size`` bytes by reading from and updating the free memory pointer
- ``mobj_allocate_unbounded() -> mobj_unbounded`` - returns the current position of the free memory pointer
- ``mobj_finalize(mobj_unbounded, size) -> mobj`` - makes a previously unbounded memory object persistent, i.e. write ``add(mobj_unbounded, size)`` (rounded up) to the free memory pointer
- ``mobj_discard(mobj_unbounded)`` - discards the previously allocated memory object (no-op, only there for static analysis later - we can also skip it and just detect this? I'd prefer having it explicitly.)
- ``mobj_load(mobj, offset) -> value`` - load from memory object. Initially without bounds checks.
- ``mobj_store(mobj, offset, value)`` - store to memory object. Initially without bounds checks.
- "memory objects" can initially just be of type ``u256`` and just store the offset into memory [this will change below]
- We replace all memory-related functions in the ABI code and the Yul IR code with these functions.
- For now we keep passing memory objects to builtins like ``keccak`` as they are, i.e. just ``keccak(mobj, size)``.
Thus far this shouldn't really change anything significantly - hopefully the optimizer will inline all of this and reduce it to pretty much what we have right now already.
Next step: enforcing and moving to builtins (all of this is still up for discussion):
- The previously defined functions will become builtins in a new, initially internal, Yul dialect. The memory objects will become a proper builtin type.
- Remove ``mload`` and ``mstore`` from that dialect (I think we should just do that here - but we can argue about it)
- We encode the length into the memory object itself (masking & shifting).
- We add an additional builtin for creating memory-sub-objects/slices. Something like:
- ``mobj_view(mobj, offset, size) -> mobj_view`` or
- ``mobj_slice(mobj, offset, size) -> mobj_slice``
- Needs more thought: we will need those on unbounded memory objects. Are there issues with that? Bounds checks in this case?
- We consider adding bounds checks in ``mobj_load`` and ``mobj_store``.
- Builtin functions that used to take a memory offset will be changed to accept a memory object only: ``keccak(mpos, size)`` will become ``keccak(mobj)``. The encoded size in the object will be used as size.
- Should those functions get different names?
- We perform the following steps in static analysis:
- Whenever a function uses ``mobj_allocate_unbounded()``, all its code paths must lead to ``mobj_finalize`` or ``mobj_discard`` on that same memory object. (In another step we may also allow returning unbounded memory objects instead).
- We define the property "allocates unbounded memory". It is set for ``mobj_allocate_unbounded`` and recursively inherited by all functions calling it.
- We define the property "allocates memory". It is set for ``mobj_allocate_unbounded`` and ``mobj_allocate`` and recursively inherited by all functions calling it.
- No function calling a function that "allocates unbounded memory" is allowed to call another function marked "allocates memory" before it has ``mobj_finalize``ed or ``mobj_discard``ed that same memory object. [Note: in the future we may or may not look into relaxing this for scratch allocations]
At this point we have to introduce optimizer steps based on the memory objects.
Open questions:
- Do we want an intermediate step between those two defined above? That would be for example encoding length into objects and replacing builtins with defined functions ``keccak_mobj`` or something like that?
- How to deal with the scratch space? Is it a global memory object? Or do we have ``function get_scratch_space() -> mobj``? Or even ``get_scratch_space(<size>) -> mobj``? ``get_scratch_space_<size>()``? We should make sure this can be nicely optimized away in the end. Note: always
- We could just have ``mobj_allocate_unbounded()`` be ``mobj_allocate(0)`` - or we could check the size in ``mobj_allocate`` to be non-zero. I'd expect the size encoded into an unbounded memory object to be zero. Could also be the maximal length we allow, though.
- How to encode length/offset in the memory object? Where to split? 128-bit/128-bit?
- How to expose to inline assembly?
Eliminating Stack Height Issues
===============================
Mark every Yul variable that's not reachable on the stack. This should be reachability *after* stack optimization, although that might be a bit tricky to figure out...
Determine the maximum number of unreachable variables over all functions (MAX_UNREACHABLE). This determines the size of memory frames (this is compile-time constant).
In case MAX_UNREACHABLE > 0, i.e. there are any unreachable variables at all, reserve a fixed offset in memory for the "memory frame pointer" (call it FRAME_POINTER, e.g. 0x80) and another one for the "free memory frame stack" (call it FREE_FRAME_STACK, e.g. 0xA0).
For each function that has unreachable variables do the following:
- Add ``enterMemoryFrame()`` to the beginning of the function (definition below).
- Add ``exitMemoryFrame()`` to every point of return of the function (i.e. at the end of the body, resp. immediately before any ``leave`` statement).
- Assign an index to each variable that has an unreachable reference, i.e. it's position in the memory frame.
- Replace every assignment to the variable (including the potential one at the declaration) with ``setMemVar(id, value)`` (definition below).
- For function arguments insert ``setMemVar(assignedId, arg)`` immediately after ``enterMemoryFrame()`` instead.
- Replace every reference to the variable with ``getMemVar(id)``.
```yul
function enterMemoryFrame()
{
let frame := mload(FREE_FRAME_STACK)
switch frame
case 0 { // no re-usable memory chunk; allocate new chunk
frame := allocate(add(mul(MAX_UNREACHABLE, 32), 32))
// We need 32 bytes for each unreachable variable and an
// additional 32 bytes for storing the previous frame pointer.
}
default { // we found a re-usable memory chunk
mstore(FREE_FRAME_STACK, mload(frame))
// Note: mload(frame) contains a pointer to the next free memory chunk (or 0)
}
// Save the current frame pointer to the beginning of the memory frame
mstore(frame, mload(FRAME_POINTER))
// Replace the frame pointer.
mstore(FRAME_POINTER, frame)
}
function exitMemoryFrame()
{
let frame := mload(FRAME_POINTER)
// restore the previous frame pointer
let previous_frame_pointer := mload(frame)
mstore(FRAME_POINTER, previous_frame_pointer)
// "free" our memory frame
let next_free_frame := mload(FREE_FRAME_STACK) // might be 0
// store "next" pointer at the beginning of the frame
mstore(frame, next_free_frame)
// store the frame in the free frame stack
mstore(FREE_FRAME_STACK, frame)
}
function setMemVar(id, value)
{
let frame := mload(FRAME_POINTER)
frame := add(frame, 0x20) // skip stored old frame pointer
mstore(add(frame, mul(id, 0x20)), value)
// can be optimized by adding and subtracting 0x20 in enter/exit and pre-multiplying with 0x20 on the call site.
}
function getMemVar(id) -> value
{
let frame := mload(FRAME_POINTER)
frame := add(frame, 0x20) // skip stored old frame pointer
value := mload(add(frame, mul(id, 0x20)))
}
```
Note: ``setMemVar`` and ``getMemVar`` can be inlined to single expressions. I think we could get away with directly inserting optimized versions of these without having to optimize again afterwards (which might introduce new stack issues).
Example of the transformation:
The following errors out with ``Variable a1 is 1 slot(s) too deep inside the stack.``:
```
function f() {
let a1 := calldataload(1)
let a2 := calldataload(2)
let a3 := calldataload(3)
let a4 := calldataload(4)
let a5 := calldataload(5)
let a6 := calldataload(6)
let a7 := calldataload(7)
let a8 := calldataload(8)
let a9 := calldataload(9)
let a10 := calldataload(10)
let a11 := calldataload(11)
let a12 := calldataload(12)
let a13 := calldataload(13)
let a14 := calldataload(14)
let a15 := calldataload(15)
let a16 := calldataload(16)
let a17 := calldataload(17)
sstore(0, a1)
sstore(0, a2)
sstore(0, a3)
sstore(0, a4)
sstore(0, a5)
sstore(0, a6)
sstore(0, a7)
sstore(0, a8)
sstore(0, a9)
sstore(0, a10)
sstore(0, a11)
sstore(0, a12)
sstore(0, a13)
sstore(0, a14)
sstore(0, a15)
sstore(0, a16)
sstore(0, a17)
}
```
``a1`` will be marked unreachable and this will be transformed to:
```
function f() {
enterMemoryFrame()
/* let a1 := */ setMemVar(0, calldataload(1))
let a2 := calldataload(2)
let a3 := calldataload(3)
let a4 := calldataload(4)
let a5 := calldataload(5)
let a6 := calldataload(6)
let a7 := calldataload(7)
let a8 := calldataload(8)
let a9 := calldataload(9)
let a10 := calldataload(10)
let a11 := calldataload(11)
let a12 := calldataload(12)
let a13 := calldataload(13)
let a14 := calldataload(14)
let a15 := calldataload(15)
let a16 := calldataload(16)
let a17 := calldataload(17)
sstore(0, /* a1 */ getMemVar(0))
sstore(0, a2)
sstore(0, a3)
sstore(0, a4)
sstore(0, a5)
sstore(0, a6)
sstore(0, a7)
sstore(0, a8)
sstore(0, a9)
sstore(0, a10)
sstore(0, a11)
sstore(0, a12)
sstore(0, a13)
sstore(0, a14)
sstore(0, a15)
sstore(0, a16)
sstore(0, a17)
exitMemoryFrame()
}
}
```
Potential Improvements
----------------------
If the call chain to a function from the external entry is always known (e.g. for external functions themselves )