--- tags: programming languages --- # A Calculator: The Parser (part of a course on programming languages) We have seen how to write an interpreter for arithmetic expressions operating on abstract syntax. We have also seen how to specify the language of arithmetic (concrete syntax) using a context-free grammar and how to use the grammar to parse by hand arithmetic expressions. What is missing is a parser that translates concrete into abstract syntax automatically. Instead of learning how to implement a parser ourselves, we will learn how to use a parser generator. Before we start, recall our running example of a context-free grammar: ``` Exp -> Exp '+' Exp1 Exp1 -> Exp1 '*' Exp2 Exp2 -> Integer Exp -> Exp1 Exp1 -> Exp2 Exp2 -> '(' Exp ')' ``` ## The parser generator BNFC A parser generator takes as input a context-free grammar and produces as output a parser. We will use the parser generator BNFC. In fact, BNFC comes with its own language, LBNF, which is a domain specific language in which one can write context-free grammars. The grammar above looks in BNFC as follows. Plus. Exp ::= Exp "+" Exp1 ; Times. Exp1 ::= Exp1 "*" Exp2 ; Num. Exp2 ::= Integer ; coercions Exp 2 ; **Activity:** Discuss what features BNFC adds to the bare-bones context-free grammars we have seen above. We will postpone a more detailed study of BNFC to the course on Compiler Construction where we have to build more complicated grammars. But if you want to learn more now look at [bnfc.digitalgrammars.com](http://bnfc.digitalgrammars.com/) and this entry on [coercions](https://bnfc.readthedocs.io/en/latest/lbnf.html#coercions) in the LBNF-documentation. [^LBNF2] [^LBNF2]: (LBNF for "Labelled BNF" is the name of the language in which one writes the grammar that is then converted to a parser). ## Generating a Parser from a Context-Free Grammar This section refers to the files in the git repo - `src/Calculator` - `Calculator.hs` - `Interpreter.hs` - `numbers.cf` I want to show that, using a parser generator, it is easy to extend the calculator we wrote for abstract syntax to more familiar concrete syntax. You should install [BNFC](http://bnfc.digitalgrammars.com/tutorial/bnfc-tutorial.html) and replay the following on your own machine.[^bnfc] I collected some additional [BNFC Installation Instructions](https://hackmd.io/@alexhkurz/Sk2cL_vyT). [^bnfc]: You need to [install Haskell](https://hackmd.io/@alexhkurz/Hk86XnCzD) first. Download the directory Calculator from the git repo. It should contain 3 files Calculator.hs Interpreter.hs numbers.cf Have a look at the files. `Interpreter.hs` is essentially the interpreter from the last session. We will come to `Calculator.hs` below. `numbers.cf` contains the context-free grammar in BNFC-format. We use the grammar to generate a parser in two steps. First run bnfc -m -haskell numbers.cf Which files does `bnfc` generate? The most important one for us is `AbsNumbers.hs`. Open it in an editor. What do you find? Second, we compile all these files to generate the actual parser: make The parser is now available as `TestNumbers`. Running [^echo] echo "1+2*3" | ./TestNumbers [^echo]: `echo` copy its input to "standard output". The pipe `|` connects standard output of the program to the left with the standard input of the program to the right. should give you as output the abstract syntax tree from the previous session: ``` Parse Successful! [Abstract Syntax] Plus (Num 1) (Times (Num 2) (Num 3)) [Linearized tree] 1 + 2 * 3 ``` **Summary:** The following sequence of commands will be important for future assignments: bnfc -m -haskell numbers.cf make echo "1+2*3" | ./TestNumbers ## Putting Everything Together Recall that we have the files Calculator.hs Interpreter.hs numbers.cf Putting everything together requires some glue codef. `Calculator.hs` essentially calls `eval` which is the interpreter from last session and is defined in `Interpreter.hs`. If you run ghc Calculator.hs you should get a file `Calculator` which you can run on any input in the language we defined: echo "1+2*3" | ./Calculator 7 **Remark:** This calculator uses Haskell's native numbers. Instead, we could have used the same grammar and abstract syntax to interpret arithmetic in our own virtual machine. ## <font>Homework</font> (Mandatory) - Install bnfc - Verify that you can replay the lecture on your computer We will need this for Assignment 2. - Read the rest of the notes below. - Read [Lecture 2: Abstract and Concrete Syntax](https://www.cse.chalmers.se/edu/year/2011/course/TIN321/lectures/proglang-02.html) from Aarne Ranta's *Programming Languages Course*. ## <font color=red>Homework (for the Report)</font> Using the BNFC grammar ``` Plus. Exp ::= Exp "+" Exp1 ; Times. Exp1 ::= Exp1 "*" Exp2 ; Num. Exp2 ::= Integer ; coercions Exp 2 ; ``` write out the abstract syntax trees for the following strings: [^ast] - `2+1` - `1+2*3` - `1+(2*3)` - `(1+2)*3` - `1+2*3+4*5+6` Is the abstract syntax tree of `1+2+3` identical to the one of `(1+2)+3` or the one of `1+(2+3)`? <!-- ## Adding Features to the Calculator - **See [Assignment 1](https://github.com/alexhkurz/programming-languages-2020/tree/master/assignments.md):** If you allow yourself the use of the built-in Haskell arithmetic, then the program above can easily be adapted to a fully fledged calculator. See the directory [Calculator3](https://github.com/alexhkurz/programming-languages-2020/tree/master/src/Calculator3) for how the interpreter looks like if you use Haskell's numbers instead of our own successor numbers. - **Optional Project:** Hook it up with a nice user interface. This could be a nice project to write about in your **blog/tutorial**. --> ## Stocktaking What did we learn? One reason why compiler construction is difficult is that there are so many different languages involved, on so many different levels. Already in our small example, we have encountered a source language, BNF, abstract syntax and a host language: - Our source language was a language for arithmetic expressions given by a context-free grammar. - This context-free grammar was described, see `numbers.cf`, in the (domain specific) programming language `BNFC`. `BNFC` comes with the tool `bnfc` (written in Haskell, btw) that generates a parser from the grammar. - The parser translates programs in the source language into abstract syntax, which can be seen as yet another language. Abstract sytax can be defined in many ways. In our example, it is given in Haskell by data Exp = Num Int | Plus Exp Exp | Times Exp Exp which is generated automatically[^AbsNumbers] by `bnfc`. - The abstract syntax is then interpreted, see `Interpreter.hs`, in the *host language*. In our case the interpreter `eval` is written in Haskell. **Remark:** Conceptually, the interpretation of abstract syntax can be broken into two steps. We could say that the functions `add` and `mult` define a *virtual machine*. This is very close to what happens in Java and Python, where the source code is compiled to virtual machine code, which is then interpreted. ## References [BNFC Installation Instructions](https://hackmd.io/@alexhkurz/Sk2cL_vyT) General background is available in the Wikipedia articles on [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) and [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar). Also consult the article on the [Chomsky hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy#The_hierarchy), in particular the table that aligns grammars with automata. For an introduction to the parser generator `bnfc` see the [BNFC tutorial](http://bnfc.digitalgrammars.com/tutorial/bnfc-tutorial.html). The language in which the input for `bnfc` is written is called LBNF (labelled BNF) and documented in the [LBNF reference](https://bnfc.readthedocs.io/en/latest/lbnf.html). Gregor Ulm in [Writing an Interpreter with BNFC and Haskell](https://gregorulm.com/writing-an-interpreter-with-bnfc-and-haskell/) is based on the same theory and software as our approach. (Thanks to Tally Holcombe for the reference.) [^ast]: Recall that in a previous exercise you already constructed the *concrete syntax trees* for these expressions using the CFG ``` Exp -> Exp '+' Exp1 Exp1 -> Exp1 '*' Exp2 Exp2 -> Integer Exp -> Exp1 Exp1 -> Exp2 Exp2 -> '(' Exp ')' ``` The BNFC-grammar ``` Plus. Exp ::= Exp "+" Exp1 ; Times. Exp1 ::= Exp1 "*" Exp2 ; Num. Exp2 ::= Integer ; coercions Exp 2 ; ``` is essentially the same as the CFG, but names the first three rules of the CFG. These names will be the parent nodes in the AST. In more detail, to obtain the AST from the CST - eliminate from the AST all rules that do not have a name and - label the parent nodes with the names of the corresponding rule and - erase concrete syntax such as `+`, `*`, parentheses, etc For example, the CST ``` Exp / | \ Exp + Exp1 | | Exp1 Exp2 | | Exp2 Integer | | Integer 2 | 1 ``` becomes the AST ``` Plus / \ Num Num | | 1 2 [^AbsNumbers]: If the name of the grammar is `numbers.cf` then the file that contains the algebraic data type for the abstract syntax is `AbsNumbers.hs`.