Try โ€‚โ€‰HackMD

How ๐ŸVyper Compiles Into Bytecode

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

Special thanks go to Charles Cooper, fubuloubu, banteg, and Tanguy Rocher for feedback and review.

gm ๐Ÿsnekooors, as you all well know, Vyper is a contract-oriented Pythonic programming language that is targeting the Ethereum Virtual Machine (EVM). But do you guys know about the inner workings of the Vyper compiler itself? This article attempts to shed some light on how the Vyper compiler itself works and delves into the different layers of the compilation phases. Or, in other words, we examine what happens beneath the surface when you have a Vyper contract, e.g. Foo.vy:

# @version ^0.3.9

@external
@pure
def foo(d: decimal) -> decimal:
    # Yes Vyper has a built-in square root function :)
    return sqrt(d)

and invoke vyper Foo.vy, which outputs:

0x61013261001161000039610132610000f36003361161000c5761011d565b5f3560e01c34610121576367321d24811861011b5760243610610121576004358060140b81186101215760405260205f60405112610121575f606052604051610058575f606052610117565b6040516404a817c8006402540be4008202058060140b811861012157905064012a05f20081018060140b81186101215790506060526040516080525f610100905b8060a052608051606051186100ad57610114565b606051608052604051606051801561012157806402540be4008302058060140b811861012157905090506060518082018060140b811861012157905090506404a817c8006402540be4008202058060140b8118610121579050606052600101818118610099575b50505b6060f35b505b5f5ffd5b5f80fda165767970657283000309000b

Enjoy the ride ๐Ÿซก!

The Vyper Compilation Pipeline

In a nutshell, the Vyper compiler translates the Vyper programming language into the EVM bytecode consisting of (currently) 144 opcodes (including PUSH0). The compilation itself can be broken down into 10 distinct steps (see also the Python module phases.py for a code-level summary):

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

1) Convert Vyper Source Code Into AST

In the first step, the Vyper source code is converted into an abstract syntax tree (AST). The full Python module vyper.ast can be found here. This conversion takes place in several consecutive steps:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More โ†’

  1. First, the source code is pre-parsed, i.e. the code is reformatted from a Vyper source string to a reusable Python source string. Examples are validating the @version pragma against the current compiler version or translating the keywords enum, event, interface, or struct into the Python keyword class. See pre_parser.py for the full code logic.
  2. A Python AST is generated from the modified source code of step 1. This is done via the function parse_to_ast that parses a Vyper source string and generates basic Vyper AST nodes.
  3. For the purpose of error reporting, the Python AST is annotated with token location information. The Python module annotation.py contains the AnnotatingVisitor class, used to annotate and modify the Python AST prior to converting it to a Vyper AST.
  4. Eventually, the modified Python AST nodes are converted to Vyper AST nodes via nodes.py.

2) Validate Literal Vyper AST Nodes

Next, the literal nodes in the AST are (lightly) validated via validate_literal_nodes in validation.py. Examples that will throw at this stage are:

# @version ^0.3.9

foo: constant(address) = 0x6b175474e89094c44da98b954eedeac495271d0F

throws with:

Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Could not determine type for literal value '0x6b175474e89094c44da98b954eedeac495271d0F'
  contract "Foo.vy:3", line 3:25
       2
  ---> 3 foo: constant(address) = 0x6b175474e89094c44da98b954eedeac495271d0F
  --------------------------------^
       4

The reason why the validation fails here is due to the incorrect checksum of the address foo.

or

# @version ^0.3.9

foo: constant(bytes4) = 0x12_34_56

throws with:

Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Underscores not allowed in hex literals
  contract "Foo.vy:5", line 5:24
       4
  ---> 5 foo: constant(bytes4) = 0x12_34_56
  -------------------------------^
       6

or

# @version ^0.3.9

@external
def foo():
    bar: decimal = 3.123456789123456789

throws with:

Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Vyper supports a maximum of ten decimal points
  contract "Foo.vy:5", function "foo", line 5:19
       4 def foo():
  ---> 5     bar: decimal = 3.123456789123456789
  --------------------------^
       6

3) Perform Constant Folding

Now constant folding takes place. Constant folding is a generic compiler optimisation technique that eliminates run-time computations with compile-time computations. Thus, the fold function in folding.py replaces constant values in the AST with their values. Furthermore, constant expressions are also evaluated accordingly. Examples are:

  • 3+2 becomes 5 (arithmetic operations),
  • ["foo", "bar"][1] becomes "bar" (references to literal arrays),
  • min(1,2) becomes 1 (built-in functions applied to literals).

4) Conduct Semantic Validation of Program

Next, the semantics of the program are validated via validate_semantics in __init__.py. The structure and the types of the program are checked, and type annotations are added to the AST. A full discussion of how the semantic validation is carried out is beyond the scope of this article, but it can be broadly divided into 4 steps:

  1. First, each (module-level) statement (e.g. variable declarations, interface definitions, or functions) are validated and added to the global namespace accordingly. See the Python class ModuleAnalyzer for further insights.
  2. Second, the compiler performs certain additional checks, such as function selector collisions (see the function validate_unique_method_ids) or circular references (see the function _find_cyclic_call).
  3. Third, the compiler validates each function consecutively. See the Python class FunctionNodeVisitor for further insights.
  4. Fourth, after the contract has been successfully validated, the compiler annotates the expression nodes of the AST with the appropriate types for later code generation. See the Python classes StatementAnnotationVisitor and ExpressionAnnotationVisitor, which are eventually called to annotate the expression nodes of the AST.

5) Generate Getter Functions and Remove Unused Statements

In the next step, for all public variables getter functions are generated via generate_public_variable_getters and any unused statements are removed from the AST via remove_unused_statements. These two functions are bundled into the Python module expansion.py via the expand_annotated_ast function.

6) Allocate Data Positions of Storage and immutable Variables

Now it's time to determine the data positions in storage and bytecode for all storage and immutable variables. The compiler achieves this via data_positions.py.

7) Translate Vyper AST Into VyperIR

The Vyper AST is turned into a lower-level intermediate representation language (IR) called VyperIR via the function generate_ir_for_module in the Python codegen module module.py. VyperIR is used as a high-level assembly. The grammar of VyperIR is (s_expr), where s_expr is one of:

s_expr :=
  INT_LITERAL |
  IDENTIFIER |
  EVM_OPCODE *s_expr |
  PSEUDO_OPCODE *s_expr |
  "with" IDENTIFIER s_expr s_expr |
  "set" IDENTIFIER s_expr |
  "seq" *s_expr |
  "if" s_expr s_expr [s_expr] |
  "repeat" s_expr s_expr s_expr s_expr s_expr

I would like to point out that a new IR design is in the works, called venomIR. For the original discussion see here and the main PR here.

8) Optimise VyperIR

The time has arrived for various optimisations via optimizer.py of the VyperIR representation. It's important to note that further constant folding also takes place at this stage (a typical example is the simplification of safemath expressions), see e.g. the function _optimize_binop.

9) Translate Optimised VyperIR Into EVM Assembly

The VyperIR representation is turned into EVM assembly via calling compile_to_assembly in the IR Python module compile_ir.py.

10) Translate EVM Assembly Into Bytecode

Eventually, the EVM assembly is turned into bytecode via assembly_to_evm.

Let me emphasise that the semantic analysis and codegen part is the most complex part of the compilation and a full delve into it is beyond the scope of this article. We leave these deep-dives as an opportunity for future articles.

Compilation Example

Let's go through the different steps sequentially with an example Foo.vy:

# @version ^0.3.9

SOME_CONSTANT: public(constant(bytes4)) = 0x1122AABB

counter: uint256

@external
def foo(x: uint256) -> uint256:
    self.counter += 1
    # Yes Vyper has a built-in integer square root function :)
    return isqrt(x + self.counter)

1) Output the AST via vyper -f ast Foo.vy

{
    "ast": {
        "ast_type": "Module",
        "body": [
            {
                "ast_type": "VariableDecl",
                "annotation": {
                    "ast_type": "Name",
                    "id": "bytes4"
                },
                "is_constant": true,
                "is_immutable": false,
                "is_public": true,
                "is_transient": false,
                "target": {
                    "ast_type": "Name",
                    "id": "SOME_CONSTANT"
                },
                "value": {
                    "ast_type": "Hex",
                    "value": "0x1122AABB"
                }
            },
            {
                "ast_type": "VariableDecl",
                "annotation": {
                    "ast_type": "Name",
                    "id": "uint256"
                },
                "is_constant": false,
                "is_immutable": false,
                "is_public": false,
                "is_transient": false,
                "target": {
                    "ast_type": "Name",
                    "id": "counter"
                },
                "value": null
            },
            {
                "ast_type": "FunctionDef",
                "args": {
                    "ast_type": "arguments",
                    "args": [
                        {
                            "ast_type": "arg",
                            "annotation": {
                                "ast_type": "Name",
                                "id": "uint256"
                            },
                            "arg": "x"
                        }
                    ],
                    "default": null,
                    "defaults": []
                },
                "body": [
                    {
                        "ast_type": "AugAssign",
                        "op": {
                            "ast_type": "Add"
                        },
                        "target": {
                            "ast_type": "Attribute",
                            "attr": "counter",
                            "value": {
                                "ast_type": "Name",
                                "id": "self"
                            }
                        },
                        "value": {
                            "ast_type": "Int",
                            "value": 1
                        }
                    },
                    {
                        "ast_type": "Return",
                        "value": {
                            "ast_type": "Call",
                            "args": [
                                {
                                    "ast_type": "BinOp",
                                    "left": {
                                        "ast_type": "Name",
                                        "id": "x"
                                    },
                                    "op": {
                                        "ast_type": "Add"
                                    },
                                    "right": {
                                        "ast_type": "Attribute",
                                        "attr": "counter",
                                        "value": {
                                            "ast_type": "Name",
                                            "id": "self"
                                        }
                                    }
                                }
                            ],
                            "func": {
                                "ast_type": "Name",
                                "id": "isqrt"
                            },
                            "keyword": null,
                            "keywords": []
                        }
                    }
                ],
                "decorator_list": [
                    {
                        "ast_type": "Name",
                        "id": "external"
                    }
                ],
                "doc_string": null,
                "name": "foo",
                "returns": {
                    "ast_type": "Name",
                    "id": "uint256"
                }
            }
        ],
        "doc_string": null,
        "name": "Foo.vy"
    },
    "contract_name": "Foo.vy"
}

A note on the side, I prettified the original ast output with the following bash script (h/t Tanguy Rocher):

vyper -f ast Foo.vy | jq -S | jq 'walk(if type == "object" then with_entries(select(.key | test("end_lineno|lineno|col_offset|end_col_offset|pos|src|node_id") | not)) else . end) |  def reorder(order): . as $in | if type == "object" then reduce (keys_unsorted|order)[] as $key( {}; . + { ($key):  ($in[$key] | reorder(order)) }) elif type == "array" then map( reorder(order) ) else . end; def neworder: "ast_type" as $s | index($s) as $i | if $i==null then . else [$s] + .[:$i] + .[$i+1:] end ; reorder(neworder)';

2) Output the Storage Layout via vyper -f layout Foo.vy

{
    "storage_layout": {
        "counter": {
            "type": "uint256",
            "slot": 0
        }
    },
    "code_layout": {}
}

3) Output the VyperIR Representation via vyper -f ir_runtime Foo.vy

[seq,
  [if, [le, calldatasize, 3], [goto, fallback]],
  [with,
    _calldata_method_id,
    [shr, 224, [calldataload, 0]],
    [seq,
      [assert, [iszero, callvalue]],
      # Line 3
      [if,
        [iszero, [xor, _calldata_method_id, 252460339 <0xf0c3d33: SOME_CONSTANT()>]],
        [seq,
          [goto, external_SOME_CONSTANT____common],
          [seq,
            [label,
              external_SOME_CONSTANT____common,
              var_list,
              [seq,
                [exit_to,
                  _sym_external_SOME_CONSTANT____cleanup,
                  64,
                  /* abi_encode (bytes4) */
                  [seq,
                    [mstore,
                      64,
                      7750569564506974822565848349486848977417197030367740731122796394362592821248 <0x1122AABB>],
                    32]],
                pass]],
            [label,
              external_SOME_CONSTANT____cleanup,
              [var_list, ret_ofst, ret_len],
              [return, ret_ofst, ret_len]]]]],
      # Line 8
      [if,
        [iszero, [xor, _calldata_method_id, 801029432 <0x2fbebd38: foo(uint256)>]],
        [seq,
          [assert, [ge, calldatasize, 36]],
          [goto, external_foo__uint256__common],
          [seq,
            [label,
              external_foo__uint256__common,
              var_list,
              [seq,
                # Line 9
                /* self.counter += 1 */
                [seq,
                  [unique_symbol, sstore_2],
                  [sstore,
                    0 <self.counter>,
                    /* self.counter += 1 */
                    [with,
                      x,
                      [sload, 0 <self.counter>],
                      [with, ans, [add, x, 1 <1>], [seq, [assert, [ge, ans, x]], ans]]]]],
                # Line 8
                [exit_to,
                  _sym_external_foo__uint256__cleanup,
                  64,
                  /* abi_encode (uint256) */
                  [seq,
                    [mstore,
                      64,
                      /* isqrt(x + self.counter) */
                      [with,
                        x,
                        /* x + self.counter */
                        [with,
                          x,
                          [calldataload, 4 <x (4+0)>],
                          [with,
                            y,
                            [sload, 0 <self.counter>],
                            [with, ans, [add, x, y], [seq, [assert, [ge, ans, x]], ans]]]],
                        [with,
                          y,
                          x,
                          [with,
                            z,
                            181,
                            [seq,
                              [if,
                                [ge, y, 87112285931760246646623899502532662132736],
                                [seq, [set, y, [shr, 128, y]], [set, z, [shl, 64, z]]]],
                              [if,
                                [ge, y, 4722366482869645213696],
                                [seq, [set, y, [shr, 64, y]], [set, z, [shl, 32, z]]]],
                              [if,
                                [ge, y, 1099511627776],
                                [seq, [set, y, [shr, 32, y]], [set, z, [shl, 16, z]]]],
                              [if, [ge, y, 16777216], [seq, [set, y, [shr, 16, y]], [set, z, [shl, 8, z]]]],
                              [set, z, [shr, 18, [mul, z, [add, y, 65536]]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [set, z, [shr, 1, [add, [div, x, z], z]]],
                              [with, t, [div, x, z], [select, [lt, z, t], z, t]]]]]]],
                    32]],
                pass]],
            [label,
              external_foo__uint256__cleanup,
              [var_list, ret_ofst, ret_len],
              [return, ret_ofst, ret_len]]]]]]],
  [goto, fallback],
  [label, fallback, var_list, /* Default function */ [revert, 0, 0]]]

For the contract creation code IR, you can simply invoke vyper -f ir Foo.vy.

4) Output the EVM Assembly via vyper -f asm Foo.vy

_sym_subcode_size _sym_runtime_begin2 _mem_deploy_start CODECOPY _OFST _sym_subcode_size 0 _mem_deploy_start RETURN _sym_runtime_begin2 BLANK { _DEPLOY_MEM_OFST_0 PUSH1 0x03 CALLDATASIZE GT _sym_join3 JUMPI _sym_fallback JUMP _sym_join3 JUMPDEST PUSH0 CALLDATALOAD PUSH1 0xe0 SHR CALLVALUE _sym_revert1 JUMPI PUSH4 0x0f0c3d33 DUP2 XOR _sym_join4 JUMPI PUSH32 0x1122aabb00000000000000000000000000000000000000000000000000000000 PUSH1 0x40 MSTORE PUSH1 0x20 PUSH1 0x40 RETURN _sym_join4 JUMPDEST PUSH4 0x2fbebd38 DUP2 XOR _sym_join5 JUMPI PUSH1 0x24 CALLDATASIZE LT _sym_revert1 JUMPI PUSH0 SLOAD PUSH1 0x01 DUP2 ADD DUP2 DUP2 LT _sym_revert1 JUMPI SWAP1 POP PUSH0 SSTORE PUSH1 0x04 CALLDATALOAD PUSH0 SLOAD DUP1 DUP3 ADD DUP3 DUP2 LT _sym_revert1 JUMPI SWAP1 POP SWAP1 POP DUP1 PUSH1 0xb5 PUSH18 0x010000000000000000000000000000000000 DUP3 LT _sym_join6 JUMPI DUP2 PUSH1 0x80 SHR SWAP2 POP DUP1 PUSH1 0x40 SHL SWAP1 POP _sym_join6 JUMPDEST PUSH10 0x01000000000000000000 DUP3 LT _sym_join7 JUMPI DUP2 PUSH1 0x40 SHR SWAP2 POP DUP1 PUSH1 0x20 SHL SWAP1 POP _sym_join7 JUMPDEST PUSH6 0x010000000000 DUP3 LT _sym_join8 JUMPI DUP2 PUSH1 0x20 SHR SWAP2 POP DUP1 PUSH1 0x10 SHL SWAP1 POP _sym_join8 JUMPDEST PUSH4 0x01000000 DUP3 LT _sym_join9 JUMPI DUP2 PUSH1 0x10 SHR SWAP2 POP DUP1 PUSH1 0x08 SHL SWAP1 POP _sym_join9 JUMPDEST PUSH3 0x010000 DUP3 ADD DUP2 MUL PUSH1 0x12 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP4 DIV DUP1 DUP3 DUP2 XOR DUP3 DUP5 LT MUL XOR SWAP1 POP SWAP1 POP SWAP1 POP SWAP1 POP PUSH1 0x40 MSTORE PUSH1 0x20 PUSH1 0x40 RETURN _sym_join5 JUMPDEST POP _sym_fallback JUMPDEST PUSH0 PUSH0 REVERT _sym_revert1 JUMPDEST PUSH0 DUP1 REVERT }

5) Output the Final Contract Creation Bytecode via vyper -f bytecode Foo.vy

0x61018261001161000039610182610000f36003361161000c5761016d565b5f3560e01c3461017157630f0c3d33811861004b577f1122aabb0000000000000000000000000000000000000000000000000000000060405260206040f35b632fbebd38811861016b5760243610610171575f54600181018181106101715790505f556004355f5480820182811061017157905090508060b57101000000000000000000000000000000000082106100ab578160801c91508060401b90505b690100000000000000000082106100c9578160401c91508060201b90505b6501000000000082106100e3578160201c91508060101b90505b630100000082106100fb578160101c91508060081b90505b620100008201810260121c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808304808281188284100218905090509050905060405260206040f35b505b5f5ffd5b5f80fda165767970657283000309000b

6) Output the Final Runtime Bytecode via vyper -f bytecode_runtime Foo.vy

0x6003361161000c5761016d565b5f3560e01c3461017157630f0c3d33811861004b577f1122aabb0000000000000000000000000000000000000000000000000000000060405260206040f35b632fbebd38811861016b5760243610610171575f54600181018181106101715790505f556004355f5480820182811061017157905090508060b57101000000000000000000000000000000000082106100ab578160801c91508060401b90505b690100000000000000000082106100c9578160401c91508060201b90505b6501000000000082106100e3578160201c91508060101b90505b630100000082106100fb578160101c91508060081b90505b620100008201810260121c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808304808281188284100218905090509050905060405260206040f35b505b5f5ffd5b5f80fda165767970657283000309000b

Ok, you've come a long way, but you made it. Congratulations ๐ŸŽ‰!

Wrap-up

Ok, fair, I do understand that this is a lot to digest, but what would life be without challenges ๐Ÿค“. I still hope this was an insightful read! The best way to see if you have grasped everything is to go to Vyper's main repository (now ๐Ÿ˜‰) and tackle an existing issue or ask King Charles for advice on where you can best contribute. Furthermore, if you can just help others understand how this complex machine works, you would already be adding a lot of value. See you deep down in the issues and PRs ๐Ÿซก!

Further References