This article discusses how EVM compilers generate code for function dispatch and the associated risks with a focus on access controls.
Compilers define one of the fundamental access control mechanisms in smart contracts - which functions are accessible and how.
- Each private function should be executable only via a call path whose root is an external function.
- Each path leading to the execution of the private function should meet the semantic constraints of the source code.
### Birds eye view of function dispatch
In languages like Vyper or Solidity, the dispatch is implicitly set up, i.e., it's abstracted away by the compiler and created without instructions from the source code.
Contracts are like servers, which need to be turned on with each incoming input (an external transaction). The loader (evm runtime) loads the contract, which starts executing at an implicit main function.
The main function is responsible for parsing the input from calldata and executing the functionality which corresponds to the users' commands encoded in the calldata.
The commands have a specific structure defined by the abicoder v2 (this encoding is employed by all the major evm compilers). The encoding defines how to encode the request to execute contracts' functions, i.e., how to define which function to execute and how to encode its inputs.
A function is selected by a function selector, which is defined as the 4B of the keccak256 of the function signature (example: `keccak256("transfer(address,uint256)")[0:4]`).
Once the 4B from calldata are loaded we have to check whether there exists a corresponding function in the contract. *How is this done?*
### The dispatch code aka implicit main
During semantic analysis, the compiler parses each function's visibility (private/external). Then, during the code generation phase, it looks at the set of external ones and computes their function selectors, which are then used for function dispatch in the main function. It can then construct the main function.
The main function will generally do the following:
1. extract the function selector from calldata
2. compare the selector to the set of the selectors of the external functions of the given contract (eg using an `if` chain):
```python
if calldata[0:4] == selector_foo:
...
call foo
if calldata[0:4] == selctor_bar:
...
call bar
if fallback_defined:
...
call fallback
else:
abort execution
```
3. if the selector matches, some additional checks are performed (more on this later), and the corresponding function is called (a jump to its label is performed)
4. if none of the selectors match, then we either call the fallback function or abort the execution
The additional checks will generally include the following:
- payability check - if the function doesn't accept ether, then we assert that `msg.value == 0`
### What can go wrong?
How does this relate to access controls? Let's consider what could go wrong:
- What if the compiler incorrectly infers the visibility during the semantic phase?
- What if the compiler inserts a private function to the dispatch mechanism?
- What if the compiler doesn’t insert an `abort` at the end of the dispatch of the selector search and the code continues executing?
- What if the payability check gets omitted?
- What if the dispatch mechanism is too gas/size intensive?
- What if the compiler computes the selector incorrectly in certain cases?
- What if there's a hash collision between 2 selectors?
If any of these fail, there can be serious consequences. What if a private function is exposed? All private functions assume that they can't be called from top level and thus don't implement the corresponding access controls.
Assume that all of the points are implemented correctly—does this mean that the access controls are implemented correctly? No, generally, compilers can have almost arbitrary miscompilations (compile down to bytecode which has different meaning than original source). To ensure that the access controls are implemented correctly, we would have to consider all possible paths through the code and ensure that none of them breaks the source code semantics.
### Conclusion
We have shown how function dispatch is constructed in the EVM compilers and discussed the associated risks. During contract development, we often take many things for granted. A bug in a calling convention or function dispatch can have catastrophic consequences for the whole ecosystem. Fund your compiler teams appropriately and add some sanity-check tests to your codebase!