# Interpreter tutorial 10-01-2022 ## Team: Cas Serrarens, 344110 Elitsa Marinova, 498358 ## Introduction In this tutorial, we will create an interpreted programming language. After following it, you will have a simple language, a deeper understanding of the theory behind it, and the possibility to expand it further! ## What is an interpreter? The short answer is: An interpreter converts source code to machine code to be understood by the computers. Compilers work just the same. Computer programs are written on high level languages. These languages can be understood by humans. To make it clear, they contain words and phrases from languages. However, computers cannot understand these high level languages. They only understand binary systems knows as machine code. To start with, a computer program is usually written in high level language described as a source code. These source codes must be converted into machine language and here comes the role of compilers and interpreters. In the following table you can find the differences between a compiler and an interpreter: | Interpreter | Compiler | | -------- | -------- | | Interpreter translates just one statement of the program at a time into machine code. | Compiler scans the entire program and translates the whole of it into machine code at once. | |An interpreter takes very less time to analyze the source code. However, the overall time to execute the process is much slower.|A compiler takes a lot of time to analyze the source code. However, the overall time taken to execute the process is much faster.| |An interpreter does not generate an intermediary code. Hence, an interpreter is highly efficient in terms of its memory.|A compiler always generates an intermediary object code. It will need further linking. Hence more memory is needed.| |Keeps translating the program continuously till the first error is confronted. If any error is spotted, it stops working and hence debugging becomes easy.|A compiler generates the error message only after it scans the complete program and hence debugging is relatively harder while working with a compiler.| |Interpreters are used by programming languages like Ruby and Python for example.|Compliers are used by programming languages like C and C++ for example.| ## Setup To start with this tutorial you will need a couple things set up on your computer. First you will need to **install VS Code(Microsoft Visual Code)**. You can **download this** by clicking on the following link: https://code.visualstudio.com/Download In VS Code **go to extensions**: ![](https://i.imgur.com/dAPpVTv.png) You will need the **python** extension, since we are going to use python for the creation of our own language. By the way, python is also an interpreter. Just go to the extension tab on the left and type "python" in the search box. **Get the first one.** ![](https://i.imgur.com/NZWWbKZ.png) Now that we have everything set up, we can continue with the actual work. ## Beginning the fun part In this part, you will learn about the different components needed to create an interpreter. First there would be a short theory introduction, to get you to understand everything better, then there would be examples and explanations of code. **At the end of this markdown document there will be a link where you can download the whole code.** ### Tokens A token is the minimal element of the language syntax, like an integer (not a digit, but a group of them), a name (not a letter but a group of them), or a symbol (like the mathematical operations). For example, the number '5' to the computer, is just a character. To translate this into machine code, we have to give this data a token. A token exists of a type and a value. The number '5' for example, has the type: 'Integer' and the value: '5'. Every character in a piece of text has a token. For our code we have to create a couple of tokens: ``` # (TT stands for Token Type) TT_INT = 'INT' TT_FLOAT = 'FLOAT' TT_PLUS = 'PLUS' TT_MINUS = 'MINUS' TT_MUL = 'MUL' TT_DIV = 'DIV' TT_LPAREN = 'LPAREN' TT_RPAREN = 'RPAREN' ``` Next we have to create a class for the tokens. Every token that we come across, should be printed to the console. In this way, we can see the type of the token and its value. ``` class Token: def __init__(self, type_, value=None): self.type = type_ self.value = value def __repr__(self): if self.value: return f'{self.type}:{self.value}' return f'{self.type}' ``` ### Lexer After creating the tokens, it is time to create a lexer. What is a lexer? It is used to recognize "words" that make up language elements. It is an iterator and takes a source code and keywords as input. We will start by creating the Lexer class and it's first function which is initialize and will process the text but will also keep track of the position of the character. ![](https://i.imgur.com/V1nEs4Z.png) Next to our advance method which will just go to the next character in the text: ``` def advance(self): self.pos.advance(self.current_char) self.current_char = self.text[self.pos.idx] if self.pos.idx < len(self.text) else None ``` We need to also call it in the __init__ method. #### Make_tokens Next thing we need to do is a method to make tokens. We need to loop through all of them to check each character and append their types. This is done in such way: ``` def make_tokens(self): tokens = [] while self.current_char != None: if self.current_char in ' \t': self.advance() elif self.current_char in DIGITS: tokens.append(self.make_number()) elif self.current_char == '+': tokens.append(Token(TT_PLUS)) self.advance() elif self.current_char == '-': tokens.append(Token(TT_MINUS)) self.advance() elif self.current_char == '*': tokens.append(Token(TT_MUL)) self.advance() elif self.current_char == '/': tokens.append(Token(TT_DIV)) self.advance() elif self.current_char == '(': tokens.append(Token(TT_LPAREN)) self.advance() elif self.current_char == ')': tokens.append(Token(TT_RPAREN)) self.advance() else: pos_start = self.pos.copy() char = self.current_char self.advance() return [], IllegalCharError(pos_start, self.pos, "'" + char + "'") return tokens, None ``` #### Make_number / Digits We can see that in the beginning we check if the character is a number with the function make_number which we also need to define. `def make_number(self): num_str = '' dot_count = 0 while self.current_char != None and self.current_char in DIGITS + '.': if self.current_char == '.': if dot_count == 1: break dot_count += 1 num_str += '.' else: num_str += self.current_char self.advance() if dot_count == 0: return Token(TT_INT, int(num_str)) else: return Token(TT_FLOAT, float(num_str))` #### Errors Next it is needed to return an error. For that we need a custom error class which would look like this: ``` class Error: def __init__(self, pos_start, pos_end, error_name, details): self.pos_start = pos_start self.pos_end = pos_end self.error_name = error_name self.details = details def as_string(self): result = f'{self.error_name}: {self.details}\n' result += f'File {self.pos_start.fn}, line {self.pos_start.ln + 1}' return result class IllegalCharError(Error): def __init__(self, pos_start, pos_end, details): super().__init__(pos_start, pos_end, 'Illegal Character', details) ``` #### Run and shell.py The run function is needed to get the text and then run through it. ``` def run(fn, text): lexer = Lexer(fn, text) tokens, error = lexer.make_tokens() return tokens, error ``` #### Position The position class will keep track of the line and column number, and current index. ``` class Position: def __init__(self, idx, ln, col, fn, ftxt): self.idx = idx self.ln = ln self.col = col self.fn = fn self.ftxt = ftxt def advance(self, current_char): self.idx += 1 self.col += 1 if current_char == '\n': self.ln += 1 self.col = 0 return self def copy(self): return Position(self.idx, self.ln, self.col, self.fn, self.ftxt) ``` The reasoning for this is so we can be able to see from which line the error came from which would be useful for when we input bigger files. ### Parser/AST #### What is a parser? A parser is a software component that takes input data (frequently text) and builds a data structure – often some kind of parse tree, abstract syntax tree or other hierarchical structure, giving a structural representation of the input while checking for correct syntax. For example: Let's say we have the following expression 1 + 2 * 3. ![](https://i.imgur.com/uTdseOn.png) The syntax tree that is created shows us that the first node (number 1) should be added to the binary operation MUL (multiply) node, which is a multiplication of 2 other nodes. It also shows the correct order of which the expression should be executed. In this case, the numbers 2 and 3 are multiplied first, then added to 1 to get the result: 7. #### Grammar rules In order to let the program know how our language works, we have to create a grammar for the program. Take the next example: ![](https://i.imgur.com/GUc8Rmy.png) In our grammar, 24 * 2 should be done first, and the result should be added to 8. All of these rules are given a name. This is very important in our program. As you can see, the whole thing is called an expression. The expression can consist of multiple terms and operations. In this case there is 1 operation and 2 terms. Inside these terms are factors. These factors can be integers or floats for example. In our code we will add a txt file, where we define our grammar. #### Nodes In order to make our tree in our grammar, we need to define some nodes. These nodes will be number nodes, or binary operation nodes. Below is a snapshot of that part of the code: ![](https://i.imgur.com/HpYdWiT.png) #### Parser Now we can start working on our parser class. The parser will have to take in the current index just like the lexer and will have to able to advance. Next we will have to create rules for every part in our program. We will have to define factor, expression and term. All of these with their explanations can be found in the code. After we defined all of our rules, we create a small library which places arrows (^) underneath characters when there is an error. This library is called 'string_with_arrows.py' and can also be found in the code. The error is added to the error part like this: ![](https://i.imgur.com/KeRjC8j.png) ### Interpreter Now it is time to build our interpreter which will visit all of the nodes and their child nodes. First we need to determine whether it is a number or a binary operation. For every node we visit, we will visit the left and right node and determine what it is. If there is no operation we can say there is no solution. ``` class Interpreter: def visit(self, node, context): method_name = f'visit_{type(node).__name__}' method = getattr(self, method_name, self.no_visit_method) return method(node, context) def no_visit_method(self, node, context): raise Exception(f'No visit_{type(node).__name__} method defined') ################################### def visit_NumberNode(self, node, context): return RTResult().success( Number(node.tok.value).set_context(context).set_pos(node.pos_start, node.pos_end) ) def visit_VarAccessNode(self, node, context): res = RTResult() var_name = node.var_name_tok.value value = context.symbol_table.get(var_name) if not value: return res.failure(RTError( node.pos_start, node.pos_end, f"'{var_name}' is not defined", context )) value = value.copy().set_pos(node.pos_start, node.pos_end) return res.success(value) def visit_VarAssignNode(self, node, context): res = RTResult() var_name = node.var_name_tok.value value = res.register(self.visit(node.value_node, context)) if res.error: return res context.symbol_table.set(var_name, value) return res.success(value) def visit_BinOpNode(self, node, context): res = RTResult() left = res.register(self.visit(node.left_node, context)) if res.error: return res right = res.register(self.visit(node.right_node, context)) if res.error: return res if node.op_tok.type == TT_PLUS: result, error = left.added_to(right) elif node.op_tok.type == TT_MINUS: result, error = left.subbed_by(right) elif node.op_tok.type == TT_MUL: result, error = left.multed_by(right) elif node.op_tok.type == TT_DIV: result, error = left.dived_by(right) elif node.op_tok.type == TT_POW: result, error = left.powed_by(right) if error: return res.failure(error) else: return res.success(result.set_pos(node.pos_start, node.pos_end)) def visit_UnaryOpNode(self, node, context): res = RTResult() number = res.register(self.visit(node.node, context)) if res.error: return res error = None if node.op_tok.type == TT_MINUS: number, error = number.multed_by(Number(-1)) if error: return res.failure(error) else: return res.success(number.set_pos(node.pos_start, node.pos_end)) ``` #### Outcome We will then create a value class to display the answer of the whole expression and display it in the console. We will also be adding an extra check if the division is not done by 0. If it is, it will display an error. ``` class Number: def __init__(self, value): self.value = value self.set_pos() self.set_context() def set_pos(self, pos_start=None, pos_end=None): self.pos_start = pos_start self.pos_end = pos_end return self def set_context(self, context=None): self.context = context return self def added_to(self, other): if isinstance(other, Number): return Number(self.value + other.value).set_context(self.context), None def subbed_by(self, other): if isinstance(other, Number): return Number(self.value - other.value).set_context(self.context), None def multed_by(self, other): if isinstance(other, Number): return Number(self.value * other.value).set_context(self.context), None def dived_by(self, other): if isinstance(other, Number): if other.value == 0: return None, RTError( other.pos_start, other.pos_end, 'Division by zero', self.context ) return Number(self.value / other.value).set_context(self.context), None def powed_by(self, other): if isinstance(other, Number): return Number(self.value ** other.value).set_context(self.context), None def copy(self): copy = Number(self.value) copy.set_pos(self.pos_start, self.pos_end) copy.set_context(self.context) return copy def __repr__(self): return str(self.value) ``` ## The final code This brings us to our final code. Every class/subject can be found in the link below: [**Repository Link**](https://github.com/Caulli/APCSubject2)