Special thanks go to Charles Cooper, fubuloubu, banteg, and Tanguy Rocher for feedback and review.
gm ๐snekooors, as you all well know, Vyper is a contract-oriented Pythonic programming language that is targeting the Ethereum Virtual Machine (EVM). But do you guys know about the inner workings of the Vyper compiler itself? This article attempts to shed some light on how the Vyper compiler itself works and delves into the different layers of the compilation phases. Or, in other words, we examine what happens beneath the surface when you have a Vyper contract, e.g. Foo.vy
:
# @version ^0.3.9
@external
@pure
def foo(d: decimal) -> decimal:
# Yes Vyper has a built-in square root function :)
return sqrt(d)
and invoke vyper Foo.vy
, which outputs:
0x61013261001161000039610132610000f36003361161000c5761011d565b5f3560e01c34610121576367321d24811861011b5760243610610121576004358060140b81186101215760405260205f60405112610121575f606052604051610058575f606052610117565b6040516404a817c8006402540be4008202058060140b811861012157905064012a05f20081018060140b81186101215790506060526040516080525f610100905b8060a052608051606051186100ad57610114565b606051608052604051606051801561012157806402540be4008302058060140b811861012157905090506060518082018060140b811861012157905090506404a817c8006402540be4008202058060140b8118610121579050606052600101818118610099575b50505b6060f35b505b5f5ffd5b5f80fda165767970657283000309000b
Enjoy the ride ๐ซก!
In a nutshell, the Vyper compiler translates the Vyper programming language into the EVM bytecode consisting of (currently) 144 opcodes (including PUSH0
). The compilation itself can be broken down into 10 distinct steps (see also the Python module phases.py
for a code-level summary):
In the first step, the Vyper source code is converted into an abstract syntax tree (AST). The full Python module vyper.ast
can be found here. This conversion takes place in several consecutive steps:
@version
pragma against the current compiler version or translating the keywords enum
, event
, interface
, or struct
into the Python keyword class
. See pre_parser.py
for the full code logic.parse_to_ast
that parses a Vyper source string and generates basic Vyper AST nodes.annotation.py
contains the AnnotatingVisitor
class, used to annotate and modify the Python AST prior to converting it to a Vyper AST.nodes.py
.Next, the literal nodes in the AST are (lightly) validated via validate_literal_nodes
in validation.py
. Examples that will throw at this stage are:
# @version ^0.3.9
foo: constant(address) = 0x6b175474e89094c44da98b954eedeac495271d0F
throws with:
Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Could not determine type for literal value '0x6b175474e89094c44da98b954eedeac495271d0F'
contract "Foo.vy:3", line 3:25
2
---> 3 foo: constant(address) = 0x6b175474e89094c44da98b954eedeac495271d0F
--------------------------------^
4
The reason why the validation fails here is due to the incorrect checksum of the address
foo
.
or
# @version ^0.3.9
foo: constant(bytes4) = 0x12_34_56
throws with:
Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Underscores not allowed in hex literals
contract "Foo.vy:5", line 5:24
4
---> 5 foo: constant(bytes4) = 0x12_34_56
-------------------------------^
6
or
# @version ^0.3.9
@external
def foo():
bar: decimal = 3.123456789123456789
throws with:
Error compiling: Foo.vy
vyper.exceptions.InvalidLiteral: Vyper supports a maximum of ten decimal points
contract "Foo.vy:5", function "foo", line 5:19
4 def foo():
---> 5 bar: decimal = 3.123456789123456789
--------------------------^
6
Now constant folding takes place. Constant folding is a generic compiler optimisation technique that eliminates run-time computations with compile-time computations. Thus, the fold
function in folding.py
replaces constant values in the AST with their values. Furthermore, constant expressions are also evaluated accordingly. Examples are:
3+2
becomes 5
(arithmetic operations),["foo", "bar"][1]
becomes "bar"
(references to literal arrays),min(1,2)
becomes 1
(built-in functions applied to literals).Next, the semantics of the program are validated via validate_semantics
in __init__.py
. The structure and the types of the program are checked, and type annotations are added to the AST. A full discussion of how the semantic validation is carried out is beyond the scope of this article, but it can be broadly divided into 4 steps:
ModuleAnalyzer
for further insights.validate_unique_method_ids
) or circular references (see the function _find_cyclic_call
).FunctionNodeVisitor
for further insights.StatementAnnotationVisitor
and ExpressionAnnotationVisitor
, which are eventually called to annotate the expression nodes of the AST.In the next step, for all public variables getter functions are generated via generate_public_variable_getters
and any unused statements are removed from the AST via remove_unused_statements
. These two functions are bundled into the Python module expansion.py
via the expand_annotated_ast
function.
immutable
VariablesNow it's time to determine the data positions in storage and bytecode for all storage and immutable
variables. The compiler achieves this via data_positions.py
.
VyperIR
The Vyper AST is turned into a lower-level intermediate representation language (IR) called VyperIR
via the function generate_ir_for_module
in the Python codegen module module.py
. VyperIR
is used as a high-level assembly. The grammar of VyperIR
is (s_expr)
, where s_expr
is one of:
s_expr :=
INT_LITERAL |
IDENTIFIER |
EVM_OPCODE *s_expr |
PSEUDO_OPCODE *s_expr |
"with" IDENTIFIER s_expr s_expr |
"set" IDENTIFIER s_expr |
"seq" *s_expr |
"if" s_expr s_expr [s_expr] |
"repeat" s_expr s_expr s_expr s_expr s_expr
I would like to point out that a new IR design is in the works, called venomIR
. For the original discussion see here and the main PR here.
VyperIR
The time has arrived for various optimisations via optimizer.py
of the VyperIR
representation. It's important to note that further constant folding also takes place at this stage (a typical example is the simplification of safemath
expressions), see e.g. the function _optimize_binop
.
VyperIR
Into EVM AssemblyThe VyperIR
representation is turned into EVM assembly via calling compile_to_assembly
in the IR Python module compile_ir.py
.
Eventually, the EVM assembly is turned into bytecode via assembly_to_evm
.
Let me emphasise that the semantic analysis and codegen part is the most complex part of the compilation and a full delve into it is beyond the scope of this article. We leave these deep-dives as an opportunity for future articles.
Let's go through the different steps sequentially with an example Foo.vy
:
# @version ^0.3.9
SOME_CONSTANT: public(constant(bytes4)) = 0x1122AABB
counter: uint256
@external
def foo(x: uint256) -> uint256:
self.counter += 1
# Yes Vyper has a built-in integer square root function :)
return isqrt(x + self.counter)
vyper -f ast Foo.vy
{
"ast": {
"ast_type": "Module",
"body": [
{
"ast_type": "VariableDecl",
"annotation": {
"ast_type": "Name",
"id": "bytes4"
},
"is_constant": true,
"is_immutable": false,
"is_public": true,
"is_transient": false,
"target": {
"ast_type": "Name",
"id": "SOME_CONSTANT"
},
"value": {
"ast_type": "Hex",
"value": "0x1122AABB"
}
},
{
"ast_type": "VariableDecl",
"annotation": {
"ast_type": "Name",
"id": "uint256"
},
"is_constant": false,
"is_immutable": false,
"is_public": false,
"is_transient": false,
"target": {
"ast_type": "Name",
"id": "counter"
},
"value": null
},
{
"ast_type": "FunctionDef",
"args": {
"ast_type": "arguments",
"args": [
{
"ast_type": "arg",
"annotation": {
"ast_type": "Name",
"id": "uint256"
},
"arg": "x"
}
],
"default": null,
"defaults": []
},
"body": [
{
"ast_type": "AugAssign",
"op": {
"ast_type": "Add"
},
"target": {
"ast_type": "Attribute",
"attr": "counter",
"value": {
"ast_type": "Name",
"id": "self"
}
},
"value": {
"ast_type": "Int",
"value": 1
}
},
{
"ast_type": "Return",
"value": {
"ast_type": "Call",
"args": [
{
"ast_type": "BinOp",
"left": {
"ast_type": "Name",
"id": "x"
},
"op": {
"ast_type": "Add"
},
"right": {
"ast_type": "Attribute",
"attr": "counter",
"value": {
"ast_type": "Name",
"id": "self"
}
}
}
],
"func": {
"ast_type": "Name",
"id": "isqrt"
},
"keyword": null,
"keywords": []
}
}
],
"decorator_list": [
{
"ast_type": "Name",
"id": "external"
}
],
"doc_string": null,
"name": "foo",
"returns": {
"ast_type": "Name",
"id": "uint256"
}
}
],
"doc_string": null,
"name": "Foo.vy"
},
"contract_name": "Foo.vy"
}
A note on the side, I prettified the original ast
output with the following bash script (h/t Tanguy Rocher):
vyper -f ast Foo.vy | jq -S | jq 'walk(if type == "object" then with_entries(select(.key | test("end_lineno|lineno|col_offset|end_col_offset|pos|src|node_id") | not)) else . end) | def reorder(order): . as $in | if type == "object" then reduce (keys_unsorted|order)[] as $key( {}; . + { ($key): ($in[$key] | reorder(order)) }) elif type == "array" then map( reorder(order) ) else . end; def neworder: "ast_type" as $s | index($s) as $i | if $i==null then . else [$s] + .[:$i] + .[$i+1:] end ; reorder(neworder)';
vyper -f layout Foo.vy
{
"storage_layout": {
"counter": {
"type": "uint256",
"slot": 0
}
},
"code_layout": {}
}
VyperIR
Representation via vyper -f ir_runtime Foo.vy
[seq,
[if, [le, calldatasize, 3], [goto, fallback]],
[with,
_calldata_method_id,
[shr, 224, [calldataload, 0]],
[seq,
[assert, [iszero, callvalue]],
# Line 3
[if,
[iszero, [xor, _calldata_method_id, 252460339 <0xf0c3d33: SOME_CONSTANT()>]],
[seq,
[goto, external_SOME_CONSTANT____common],
[seq,
[label,
external_SOME_CONSTANT____common,
var_list,
[seq,
[exit_to,
_sym_external_SOME_CONSTANT____cleanup,
64,
/* abi_encode (bytes4) */
[seq,
[mstore,
64,
7750569564506974822565848349486848977417197030367740731122796394362592821248 <0x1122AABB>],
32]],
pass]],
[label,
external_SOME_CONSTANT____cleanup,
[var_list, ret_ofst, ret_len],
[return, ret_ofst, ret_len]]]]],
# Line 8
[if,
[iszero, [xor, _calldata_method_id, 801029432 <0x2fbebd38: foo(uint256)>]],
[seq,
[assert, [ge, calldatasize, 36]],
[goto, external_foo__uint256__common],
[seq,
[label,
external_foo__uint256__common,
var_list,
[seq,
# Line 9
/* self.counter += 1 */
[seq,
[unique_symbol, sstore_2],
[sstore,
0 <self.counter>,
/* self.counter += 1 */
[with,
x,
[sload, 0 <self.counter>],
[with, ans, [add, x, 1 <1>], [seq, [assert, [ge, ans, x]], ans]]]]],
# Line 8
[exit_to,
_sym_external_foo__uint256__cleanup,
64,
/* abi_encode (uint256) */
[seq,
[mstore,
64,
/* isqrt(x + self.counter) */
[with,
x,
/* x + self.counter */
[with,
x,
[calldataload, 4 <x (4+0)>],
[with,
y,
[sload, 0 <self.counter>],
[with, ans, [add, x, y], [seq, [assert, [ge, ans, x]], ans]]]],
[with,
y,
x,
[with,
z,
181,
[seq,
[if,
[ge, y, 87112285931760246646623899502532662132736],
[seq, [set, y, [shr, 128, y]], [set, z, [shl, 64, z]]]],
[if,
[ge, y, 4722366482869645213696],
[seq, [set, y, [shr, 64, y]], [set, z, [shl, 32, z]]]],
[if,
[ge, y, 1099511627776],
[seq, [set, y, [shr, 32, y]], [set, z, [shl, 16, z]]]],
[if, [ge, y, 16777216], [seq, [set, y, [shr, 16, y]], [set, z, [shl, 8, z]]]],
[set, z, [shr, 18, [mul, z, [add, y, 65536]]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[set, z, [shr, 1, [add, [div, x, z], z]]],
[with, t, [div, x, z], [select, [lt, z, t], z, t]]]]]]],
32]],
pass]],
[label,
external_foo__uint256__cleanup,
[var_list, ret_ofst, ret_len],
[return, ret_ofst, ret_len]]]]]]],
[goto, fallback],
[label, fallback, var_list, /* Default function */ [revert, 0, 0]]]
For the contract creation code IR, you can simply invoke
vyper -f ir Foo.vy
.
vyper -f asm Foo.vy
_sym_subcode_size _sym_runtime_begin2 _mem_deploy_start CODECOPY _OFST _sym_subcode_size 0 _mem_deploy_start RETURN _sym_runtime_begin2 BLANK { _DEPLOY_MEM_OFST_0 PUSH1 0x03 CALLDATASIZE GT _sym_join3 JUMPI _sym_fallback JUMP _sym_join3 JUMPDEST PUSH0 CALLDATALOAD PUSH1 0xe0 SHR CALLVALUE _sym_revert1 JUMPI PUSH4 0x0f0c3d33 DUP2 XOR _sym_join4 JUMPI PUSH32 0x1122aabb00000000000000000000000000000000000000000000000000000000 PUSH1 0x40 MSTORE PUSH1 0x20 PUSH1 0x40 RETURN _sym_join4 JUMPDEST PUSH4 0x2fbebd38 DUP2 XOR _sym_join5 JUMPI PUSH1 0x24 CALLDATASIZE LT _sym_revert1 JUMPI PUSH0 SLOAD PUSH1 0x01 DUP2 ADD DUP2 DUP2 LT _sym_revert1 JUMPI SWAP1 POP PUSH0 SSTORE PUSH1 0x04 CALLDATALOAD PUSH0 SLOAD DUP1 DUP3 ADD DUP3 DUP2 LT _sym_revert1 JUMPI SWAP1 POP SWAP1 POP DUP1 PUSH1 0xb5 PUSH18 0x010000000000000000000000000000000000 DUP3 LT _sym_join6 JUMPI DUP2 PUSH1 0x80 SHR SWAP2 POP DUP1 PUSH1 0x40 SHL SWAP1 POP _sym_join6 JUMPDEST PUSH10 0x01000000000000000000 DUP3 LT _sym_join7 JUMPI DUP2 PUSH1 0x40 SHR SWAP2 POP DUP1 PUSH1 0x20 SHL SWAP1 POP _sym_join7 JUMPDEST PUSH6 0x010000000000 DUP3 LT _sym_join8 JUMPI DUP2 PUSH1 0x20 SHR SWAP2 POP DUP1 PUSH1 0x10 SHL SWAP1 POP _sym_join8 JUMPDEST PUSH4 0x01000000 DUP3 LT _sym_join9 JUMPI DUP2 PUSH1 0x10 SHR SWAP2 POP DUP1 PUSH1 0x08 SHL SWAP1 POP _sym_join9 JUMPDEST PUSH3 0x010000 DUP3 ADD DUP2 MUL PUSH1 0x12 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP2 DUP5 DIV ADD PUSH1 0x01 SHR SWAP1 POP DUP1 DUP4 DIV DUP1 DUP3 DUP2 XOR DUP3 DUP5 LT MUL XOR SWAP1 POP SWAP1 POP SWAP1 POP SWAP1 POP PUSH1 0x40 MSTORE PUSH1 0x20 PUSH1 0x40 RETURN _sym_join5 JUMPDEST POP _sym_fallback JUMPDEST PUSH0 PUSH0 REVERT _sym_revert1 JUMPDEST PUSH0 DUP1 REVERT }
vyper -f bytecode Foo.vy
0x61018261001161000039610182610000f36003361161000c5761016d565b5f3560e01c3461017157630f0c3d33811861004b577f1122aabb0000000000000000000000000000000000000000000000000000000060405260206040f35b632fbebd38811861016b5760243610610171575f54600181018181106101715790505f556004355f5480820182811061017157905090508060b57101000000000000000000000000000000000082106100ab578160801c91508060401b90505b690100000000000000000082106100c9578160401c91508060201b90505b6501000000000082106100e3578160201c91508060101b90505b630100000082106100fb578160101c91508060081b90505b620100008201810260121c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808304808281188284100218905090509050905060405260206040f35b505b5f5ffd5b5f80fda165767970657283000309000b
vyper -f bytecode_runtime Foo.vy
0x6003361161000c5761016d565b5f3560e01c3461017157630f0c3d33811861004b577f1122aabb0000000000000000000000000000000000000000000000000000000060405260206040f35b632fbebd38811861016b5760243610610171575f54600181018181106101715790505f556004355f5480820182811061017157905090508060b57101000000000000000000000000000000000082106100ab578160801c91508060401b90505b690100000000000000000082106100c9578160401c91508060201b90505b6501000000000082106100e3578160201c91508060101b90505b630100000082106100fb578160101c91508060081b90505b620100008201810260121c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808184040160011c9050808304808281188284100218905090509050905060405260206040f35b505b5f5ffd5b5f80fda165767970657283000309000b
Ok, you've come a long way, but you made it. Congratulations ๐!
Ok, fair, I do understand that this is a lot to digest, but what would life be without challenges ๐ค. I still hope this was an insightful read! The best way to see if you have grasped everything is to go to Vyper's main repository (now ๐) and tackle an existing issue or ask King Charles for advice on where you can best contribute. Furthermore, if you can just help others understand how this complex machine works, you would already be adding a lot of value. See you deep down in the issues and PRs ๐ซก!