Try   HackMD

Franck Cassez, Head of Research at Mantle. — all views expressed are my own.


Formal and executable semantics of Yul in Dafny

In this note I present a formal semantics of Yul in the verification-friendly language Dafny. The Dafny verification engine can be used to reason about Yul code and to provide some (machine-checkable) guarantees about the code. The formal semantics and examples are available in the repository Yul in Dafny.

What is Yul?

Yul is a programming language that can be used as an intermediate representation (IR) for EVM bytecode:

  • it is (relatively) easy to translate Yul into EVM bytecode; for instance the Solidity compiler supports the compilation from Yul to EVM bytecode,
  • Yul is a structured programming language and sits at a higher-level than EVM bytecode: it does not provide the notion of stack[1] or gas[2], and is much easier to read/understand. As a result it can be an intermediate target for compilers that ultimately generate EVM bytecode.

Optionally Yul code can be optimised before the final compilation stage to EVM bytecode. Note that the use of IR is common in compiler infrastructures: for instance, LLVM is used as a backend for several compilers (C, C++, Haskell, Rust, ).

What caught my

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
in the Yul documentation is one of the motivations for designing Yul:

  1. Control flow should be easy to understand to help in manual inspection, formal verification and optimisation.

Very happy to see formal verification mentioned there

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
.

What does the Yul formal verification landscape look like?

I looked up for projects on formal verification for Yul and could only find a handful:[3]

The previous contributions share some common features:

  • the focus seems to be on defining a formal semantics for Yul. This is understandable as the section Formal Specification in the Yul documentation does not really provide a rigorous and formal semantics of the language.
  • the above mentioned formalisations are all deep embeddings of the semantics of Yul: first the syntax of the language is defined, and then the semantics (operational, denotational) is defined.
  • as a result, the formal semantics proposed above have not been leveraged to perform formal verification of Yul code. I'll be happy to be wrong and would love to see some examples of Yul formal verification that I have possibly missed.
  • the projects are not updated often or not maintained any more.

A shallow embedding of Yul in Dafny

Recently, I have initiated and contributed to the Dafny-EVM project and we wrote about it in [1]. As Yul has a lot in common with the EVM bytecode, I thought it would be nice to re-use some of the libraries developed in the Dafny-EVM, and the automated reasoning capabilities provided by Dafny.

The first step to formally verifying Yul code is to have a formal semantics of Yul. The second is to have a formal semantics that can be used to automatically reason about the code. If the semantics of a Yul program is a Dafny program, then this second requirement is met.

It turns out that providing the semantics of Yul in Dafny can be done using a shallow embedding of Yul into (the host language) Dafny.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Yul has been designed to be be flexible in the sense that primitive functions or types can be customised. A customisation is a Yul dialect. At the moment there is only one official Yul dialect that uses EVM opcodes as primitive functions and a single type u256 (unsigned integers over 256 bits).

In other words, Yul itself (non customised) is a sort of template that defines the supported control structures like assignment :=, if, forloop, switch, function declarations etc but does not prescribe any primitive types or functions. These control structures are common to many languages and the semantics (in terms of control flow) of an if or switch or loop is usually the same in different languages, Yul, Dafny.

To provide the semantics of Yul in Dafny, it suffices to write the semantics of the primitive functions, and then we can re-use the control structure of Dafny to provide the semantics of the control flow constructs likeif, loop, function etc. This is a shallow embedding of Yul in Dafny as it re-uses the Dafny control structures and the scoping rules to define the semantics of Yul.

I have written the semantics of Yul primitive functions in a Dafny library. This library re-uses the core components of the Dafny-EVM and provides the functional semantics of the Yul EVM Dialect.

Here is the semantics of the Yul mstore given by the Dafny MStore function:

/** * Memory store. Store a u256 into memory. * * @param address The start address. * @param value Value to store. * @param s The state before store operation. * @returns s' such that s'.memory = s.memory except for * [address..address + 31] where value is stored. * * @note Memory is a word-addressable array of bytes. A u256 * is stored into 32 bytes from address to address + 31. */ function MStore(address: u256, value: u256, s: Executing): (s': State) ensures s'.EXECUTING? ensures s'.MemSize() > address as nat + 31 ensures s'.yul.context == s.yul.context ensures s'.yul.world == s.yul.world { // Make sure memory is large enough. var m' := Memory.ExpandMem(s.yul.memory, address as nat, 32); EXECUTING(s.yul.( memory := Memory.WriteUint256(m', address as nat, value))) }

It is a function from a Yul (Executing) state (and two u256 parameters) to a Yul state State. The Dafny datatype Executing (line 13) captures a normal state (non-error, non-return) of the Yul machine. The state represents the current memory and storage states, and has also the context of the current transaction, like the calldataload. For instance for result of MStore returns a new state s' because it updates the memory component of the current state.

A simple verification example

As an illustration of how this works and how to formally verify Yul code, here is an example of a Yul function computing the maximum of two u256 values. It could be for instance provided by a library.

The Yul source code is:

object "Runtime" { code { function max(x, y) -> result { result := x if lt(x, y) { result := y } } // Use the code to compute max of 8, 3 let x := 8 let y := 3 let z := max(x, y) mstore(0x40, z) // return the result to caller return(0x40, 32) } }

The primitive operators (in the EVM Dialect) are: lt (lower-than), mstore (memory storage) and return (return result in a data field to the caller.) The Yul Dafny library provides the semantics for these operators: Lt (Dafny semantics here), MStore (Dafny semantics here) and Return (Dafny semantics here).

The rest of the semantics of the Yul code above (the control structures, variables' scopes) is provided by translating the Yul control structures into Dafny like so:

// Verification helper /** A ghost (verification-only) function. Not part of the translation Yul to Dafny. */ ghost function maxu256(x: u256, y: u256): (r: u256) ensures r == x || r == y ensures r >= x && r >= y { if x <= y then y else x } // Translated code from Yul to Dafny /** Translation of the of the Yul code of max in Dafny. */ method Max(x: u256, y: u256) returns (result: u256) ensures result == maxu256(x, y) { result := x; if Lt(x, y) { result := y; } } /** Translation of Yul code of main in Dafny. */ method Main(s: Executing) returns (s': State) ensures s'.RETURNS? ensures ByteUtils.ReadUint256(s'.data, 0) == 8 { var x := 8; // let var y := 3; // let var z := Max(x, y); // funcall var s2 := MStore(0x40, z, s); // memory store return Return(0x40, 32, s2); // return result }

And we now have:

  1. a formal and executable semantics of Yul (executable via Yul
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    Dafny
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    execute Dafny code).
  2. the ability to reason about the Yul code, the semantics of which is given in Dafny: in the example above, Dafny verifies at compile-time that the post-conditions, ensures at lines 2627, hold for every input state s, and ensures at line 16 hold for every inputs x and y.

The complete description of max code is available here.

Next steps

This is work in progress and there are a couple of things to improve, for instance:

  • implement a translator from Yul to Dafny,
  • add more Yul primitive functions to the Dafny Yul library,
  • use the verification technique described above on more complex examples.

If you are interested in this kind of work, the Yul in Dafny repository already contains more examples and in particular:

  • A simplified verified version of an ERC-20 token in Yul,
  • a proof of equivalence of Yul code and the corresponding EVM bytecode as generated by solc the Solidity compiler.

References

[1] Cassez, F., Fuller, J., Ghale, M.K., Pearce, D.J., Quiles, H.M.A. (2023). Formal and Executable Semantics of the Ethereum Virtual Machine in Dafny. In: Chechik, M., Katoen, JP., Leucker, M. (eds) Formal Methods. FM 2023. Lecture Notes in Computer Science, vol 14000. Springer, Cham. DOI


  1. For some reasons unknown to me there is still a pop(x) instruction in Yul. The documentation reads "discard value x".

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    ↩︎

  2. Yul has a gas() instruction and also a pc() instruction that do not have a meaning in Yul but only in EVM bytecode. ↩︎

  3. If I missed an important project, feel free to let me know. ↩︎