# Mythril ## Detection Module being studied: Integer (Over/Under flow) The codefile can be found [here](https://github.com/ConsenSys/mythril/blob/develop/mythril/analysis/module/modules/integer.py). 1. Inside of the class `OverUnderflowAnnotation`, we have an __init__ function. Now what is an __init__ function? > The __init__ function is called everytime an object is created from a class. The __init__ method let's the class determine the object's attributes and serves no other purpose. It is only used within classes. 2. The `self` parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class. 4. `from math import log2, ceil` => Importing log2 and ceiling functionality from the `math` module from Python. 5. `from typing import cast, List, Set` => i. With casting(`typing.cast`) we can force the type checker to treat a variable as a given type. The main case to reach for cast() are when the type hints for a module are either missing, incomplete, or incorrect. ii. `List` and `Set` are imported from the `typing` module so that we can specify the type of content we want/expect (since Python is a loosely typed language). 6. **`from mythril.analysis` import `solver`** => See the section of `Solver: Mythril.Analysis` 7. **`from mythril.analysis.issue_annotation` import `IssueAnnotation`** => See the section of `IssueAnnotation: Mythril.analysis`. 8. **`from mythril.analysis.report` import `Issue`** => See the secion of `Issue: Mythril.Analysis` 8. **`from mythril.analysis.swc_data` import `INTEGER_OVERFLOW_AND_UNDERFLOW`** => See the section of `SWC_DATA: Mythril.Analysis`. 10. **`from mythril.exceptions` import `UnsatError`** => See the secion of `UnsatError: Mythril.exceptions` 11. **`from mythril.laser.ethereum.state.global_state` import `GlobalState`** => See the section of `GlobalState: Mythril.Laser.Ethereum.State` 12. **`from mythril.laser.ethereum.state.annotation` import `StateAnnotation`** => See the section of `StateAnnotation: Mythril.Laser.Ethereum.State` 13. **`from mythril.analysis.module.base` import `DetectionModule`, `EntryPoint`** => See the section of `DetectionModule & EntryPoint: Mythril.Analysis.Module.Base` 14. **`from copy` import `copy`** => Now what about `copy` and `deepcopy` in Python? > In Python, when we use the = operator, It only creates a new variable that shares the reference of the original object. In order to create “real copies” or “clones” of these objects, we can use the copy module in Python. 15. You can do either shallow copy by using `copy.copy()` or make a deep copy by using `copy.deepcopy()`. ![](https://i.imgur.com/cBrSQ7h.png) ![](https://i.imgur.com/irGon2k.png) 16. `from mythril.laser.smt import ( BVAddNoOverflow, BVSubNoUnderflow, BVMulNoOverflow, BitVec, If, symbol_factory, Not, Expression, Bool, And, )` 17. Regarding the import in point 16, BitVec, Expression etc. are all different files or part of same files, where relevant helpers have been imported from the `z3` solver package in order to construct correct first order logic problems for the SMT checker and also handle the outputs. --------- ## Solver: Mythril.Analysis 1. **`mythril.analysis`** is a module. How? Well, the file `mythril/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.analysis import solver` 2. `Solver.py` => Contains *analysis module* helpers to help solve *path constraints*. 3. So, this is a classic SMT solver named z3. You can think of a SMT solver like a black box that takes in a bunch of constraints in form of first order logic and returns a satisfying argument. 4. Here's a very basic `Boolean satisfiability problem`: ![](https://i.imgur.com/DFJ0Gbx.png) 5. So, basically, by solving we mean that if we give this `z3` solver an equation, it will try and give us some values that will satisfy that equation. 6. The `z3 solver` works on the following theories => empty theory, linear arithmetic, nonlinear arithmetic, bitvectors, arrays, datatypes, quantifiers, strings 7. From what I've noticed, there is a heavy usage of `bitvectors` in the codebase. Which makes us ask, what is a bit vector? > In the context of computing, a vector is a linear sequence of numeric values that are stored contiguously in memory. A bit vector is a vector in which each element is a bit (so its value is either 0 or 1). 8. So, essentially a bit-vector is just a vector of bits, then what makes it any complex? > On most systems, the smallest addressable unit of memory is an 8-bit byte. Therefore, it is not possible to read and write the bits in a bit vector individually at the machine code level. In order to read an individual bit from a bit vector, we need to read the byte it’s contained in, then mask it out. <= This is handled by the *z3 solver* 9. The main function inside of this `solver` module is called `get_transaction_sequence`. What this function does is **generate concrete transaction sequence**. i. This function takes in two parameters: a) Global State b) constraints ii. The `constraints` are used to create the transaction sequence for the `global state`. --------- ## IssueAnnotation: Mythril.Analysis 1. **`mythril.analysis`** is a module. How? Well, the file `mythril/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.analysis.issue_annotation import IssueAnnotation`. 2. **IssueAnnotation** is a class with an input parameter called `StateAnnotation`. 3. This class is used to *Issue Annotation to propogate issues*. 4. The `__init__` method of this class, takes in 3 parameters: i. `Conditions`: A list of independent conditions [a, b, c, ...]. Each of these have to be independently be satisfied ii. `issue`: The issue of the annotation iii. `detector`: The detection module 5. By default, this annotation will persist over `calls` and in the `world state`. 6. In Python, property() is a built-in function that creates and returns a property object. The syntax of this function is:`property(fget=None, fset=None, fdel=None, doc=None)` Here, i. fget is function to get value of the attribute ii. fset is function to set value of the attribute iii.fdel is function to delete the attribute iv. doc is a string (like a comment) 7. Mentioned point 6 because: ``` @property def persist_over_calls(self) -> bool: return True``` 8. The @property tag is basically a Pythonic way to replace setters, getters and deleters for a property in a class. [Here](https://www.programiz.com/python-programming/property) is a good explanation. --------- ## Issue: Mythril.Analysis 1. **`mythril.analysis`** is a module. How? Well, the file `mythril/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.analysis.report import Issue` 2. This `Report` module is a module that provides classes which make up an issue report. It has namely two classes: i. Class Issue ii. Class Report 3. We will focus on Class Issue right now, since that is the one being imported into our Integer file. ### Class Issue 1. This class is for the **Representation of an issue and its location.** 2. The `__init__` method of this class, takes in 12 parameters. Some important ones are: i. title: title of the issue ii. contract: contract where issue was found iii. function_name: name of function where issue was found iv. swc_id: corresponding swc id of the issue v. transaction_sequence: The sequence of transactions. vi. address (*unclear*): The address of the issue 3. This class has functions to return the `transaction sequence` with or without the **pre-generated block data**. 4. The function `add_block_data` has been included in the class with a @staticmethod decorator. That makes me wonder what is a `staticmethod` now? Well.... > Static methods in Python are extremely similar to python class level methods, the difference being that a static method is bound to a class rather than the objects for that class. This means that **a static method can be called without an object for that class.** 5. Now, onto the `add_block_data` function. What does it do? Well, it adds sane block data to a `transaction_sequence`. 6. The next important function in this class is `resolve_function_names` and it helps resolve function names for each step. Here **step** basically means each of the transaction in the `transaction_sequence`. > **Note:** This `transaction_sequence` seem to be quite a big deal, but we are yet to uncover what the fuck it actually is and how far and wide is it being used. (Maybe it is being generated during the symbolic execution to determine all plausible paths a contract could take? Maybe. But I am not sure yet.) 7. The next and the last important function in this class is called the function `add_code_info`. This function adds some additional info into a given contract (contract is passed to it as a parameter). If all goes well, the following information is added: i. code ii. fileName iii. source_mapping iv. line number --------- ## SWC_DATA: Mythril.Analysis 1. **`mythril.analysis`** is a module. How? Well, the file `mythril/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.analysis.swc_data import INTEGER_OVERFLOW_AND_UNDERFLOW`. 2. This is an extremely straight forward module and maps the SWC IDs to their registry equivalents. It's kind of like a file of `key-value` pairs. 3. Here is an example of what is present in this file: > DEFAULT_FUNCTION_VISIBILITY = "100" INTEGER_OVERFLOW_AND_UNDERFLOW = "101" OUTDATED_COMPILER_VERSION = "102" FLOATING_PRAGMA = "103" UNCHECKED_RET_VAL = "104" UNPROTECTED_ETHER_WITHDRAWAL = "105" 4. The same thing in a different format, so that it can be assigned to `title` as we saw earlier in the `Issue` class > SWC_TO_TITLE = { "100": "Function Default Visibility", "101": "Integer Overflow and Underflow", "102": "Outdated Compiler Version", "103": "Floating Pragma", "104": "Unchecked Call Return Value", "105": "Unprotected Ether Withdrawal", . . . } --------------- ## UnsatError: Mythril.exceptions 1. **`mythril.exception`** is a module again, because the `Mythril` directory has a `__init__.py` file. Although it's not empty, but ig it serves the same purpose. 2. This is again an extremely straight forward module and it simply **contains general exceptions** classes used by Mythril. 3. A few examples from this module would be as follows: > class IllegalArgumentError(ValueError): """The argument used does not exist""" pass class UnsatError(MythrilBaseException): """A Mythril exception denoting the unsatisfiability of a series of constraints.""" pass class SolverTimeOutException(UnsatError): """A Mythril exception denoting the unsatisfiability of a series of constraints.""" pass 4. By the way, we have imported the `UnsatError` which is the 2nd error in our example list above. --------------- ## GlobalState: Mythril.Laser.Ethereum.State 1. **`mythril.laser.ethereum.state`** is a module. How? Well, the file `mythril/laser/ethereum/state/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.laser.ethereum.state.global_state import GlobalState`. 2. Ok, so this module contains a representation of the **global execution state**. 3. Now, this module has a class called `GlobalState` that represents the current global state. 4. Remember that the __init__ function in a class represents what properties the objects of that class would have. So, the `__init__` function of this class has the following params: i. world_state ii. environment (of type Environmet(will explore later what this exactly is)) iii. node (of type Node(will explore later what this exactly is)) iv. machine_state (of type MachineState(will explore later what this exactly means)) v. transaction_stack vi. last_return_data vii. annotations 5. A function `add_annotations` is used to add annotations to the global state. 6. There exists a function to copy the global state. Pretty standard. 7. The function `get_current_instruction` returns the current instruction for this Global State. `instructions = self.environment.code.instruction_list` 8. The function `current_transaction` returns the current transaction for this Global State. 9. The function `annotate` is used to annotate (or append to the exisiting annotation) the current global state and if `persist_to_world_state` is set to true, then also annotate the world state. 10. Then there exist function to return all annotations or annotations of a specific type. --------------- ## StateAnnotation: Mythril.Laser.Ethereum.State 1. **`mythril.laser.ethereum.state`** is a module. How? Well, the file `mythril/laser/ethereum/state/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.laser.ethereum.state.annotation import StateAnnotation`. 2. This module **includes classes used for annotating trace information** and contains several classes including the `StateAnnotation` class, which is the one we are interested in because it is the one being imported. 3. The `StateAnnotation` class is used to persist information over traces. This allows modules to reason about traces without the need to traverse the state space themselves. 4. It has three properties that are by default set to False (and 1): i. persist_to_world_state ii. persist_over_calls iii. search_importance (used to estimate the priority of a particular annotated state) --------------- ## DetectionModule & EntryPoint: Mythril.Analysis.Module.Base 1. **`mythril.analysis.module.base`** is a module. How? Well, the file `mythril/analysis/module/__init.py__` is empty and we know that: A folder can be marked as a package, by adding an empty __init__.py file. This is what enables calls like: `from mythril.analysis.module.base import DetectionModule, EntryPoint`. 2. This module contains two classes `DetectionModule` and `EntryPoint` and we will talk about them both as both of them are being imported. 3. Primarily, this module includes a definition of the DetectionModule interface. DetectionModules implement different analysis rules to find weaknesses and vulnerabilities. ### Class Entrypoint 1. This class simply contains two enums used to signify the entry_point of detection modules. That's it 2. POST = 1 CALLBACK = 2 ### Class DetectionModule 1. This is the base detection module. 2. All custom-built detection modules must inherit from this class (Remember the user-input custom-built module Joran talked about? ). 3. There are several class properties that expose information about the detection modules. Let's see a few i. name: The name of the detection module ii. swc_id: The SWC ID associated with the weakness that the module detects iii. description: A description of the detection module, and what it detects iv. entry_point: Mythril can run callback style detection modules, or modules that search the statespace. > POST entry points severely slow down the analysis, try to always use callback style modules v. pre_hooks: A list of instructions to hook the laser vm for (pre execution of the instruction) vi. post_hooks: A list of instructions to hook the laser vm for (post execution of the instruction) 4. It has a function `reset_module` that is used to reset the storage of this module. 5. There's a function called `update_cache` that is used to update the module's cache with the passed in `issues`. 6. Finally, the function `execute` is called by Mythril and is the entry point for execution of the detection module. 7. The `execute` function takes in `target` as a parameter and returns a list of all vulnerabilites it found. `target` is the target of the analysis, either a global state (callback) or the entire statespace (post) --------------- ## Class OverUnderflowAnnotation 1. This is a Symbol Annotation used if a BitVector can overflow. 2. The init function in a class represents what properties the objects of that class would have values or be intialized. So, the __init__ function of this class affects these 3 properties: i. overflowing_state: The `Global State` that is overflowing ii. operator: The `operator` causing the overflow iii. constraint: Set of constraint. This is in the form which is solvable by the SMT Solver. --------------- ## Class OverUnderflowStateAnnotation(StateAnnotation) 1. This State Annotation is used if an overflow is both possible and also used in the annotated path 2. A property of this class as expected is: `overflowing_state_annotations`. --------------- ## class IntegerArithmetics(DetectionModule): 1. This is the main module which detects integer over and under flows. 2. As seen in the `DetectionModule` base module, this module has to inherit the base module and it is doing so. Yayy!! 3. The philosophy of this particular detection module is as follows: > For every SUB instruction, check if there's a possible state where op1 > op0. For every ADD, MUL instruction, check if there's a possible state where op1 + op0 > 2^32 - 1" 4. This module has a function called `resetModule` that can reset the module as seen in the `DetectionModule` earlier. 5. Function `_execute` is responsible for executing this module. It takes in a globalState as the state to analyse and returns with the found issues. 6. The above `_execute` function employs functions that help it to carry out the logic laid out in the philosophy in point 3. For example: i. _get_args => gets the arguments ii. _handle_add => checks for overflow and annotates if required iii. _handle_mul => checks for overflow and annotates if required iv. _handle_sub => checks for underflow and annotates if required v. _handle_exp => handles exponentiation instructions 7. Similarly for propagation of the annotation, we have function to handle things like: i. _handle_jumpi ii. _handle_sstore iii. _handle_call iv. _handle_return (Adds all the annotations into the state which correspond to the locations in the memory returned by RETURN opcode.) v. _handle_transaction_end ## Ok wait, what the fuck is .... ### 1. The difference between global state and world state? A global state is global in the sense a variable is global inside of a smart contract. And as for world state, let's try and think what is the meaning of the property `persist_to_world_state` (this is ofc related to an annotation's persistance). The following line should clear your doubt about the `world state`: > If you want annotations to persist through different user initiated message call transactions then this(`persist_to_world_state` property) should be enabled. ### 2. Laser? LASER is a symbolic virtual machine (SVM) that runs Ethereum smart contracts. It accurately models most features of the Ethereum virtual machine including inter-contracts calls. [See here for the official repo](https://github.com/muellerberndt/laser-ethereum) [See here for the mythril.laser documentation](https://mythril-classic.readthedocs.io/en/master/mythril.laser.ethereum.html) --------------- ## File: asm.py 1. `pathlib` module in Python provides various classes representing file system paths with semantics appropriate for different operating systems. This module comes under Python’s standard utility modules. 2. `import re` => Python has a module named re to work with RegEx. -----------