---
tags: programming languages
---
# A Calculator in Haskell (Concrete Syntax) (2020)
In the [previous session](https://hackmd.io/@alexhkurz/SyxKCkR6U) we found that the ability of Haskell to do
- pattern matching and
- recursion on algebraic data types
makes it easy to implement a simple calculator that operates on abstract syntax. (We will see later in the course and then next semester in Compiler Construction, that this scales up to real world programming languages.)
We also found that abstract syntax is not convenient for human readers. Instead of having to work through
Plus (Num 1) (Times (Num 2) (Num 3))
we generally prefer
1 + 2 * 3
So can we get the best of both worlds? Elegant and concise programs working on abstract synatx but reading input from humans in concrete syntax?
Sure. We only need to find a way to automatically translate concrete syntax into abstract syntax.
The bad news is that this is more difficult than it should be. The good news is that this is a problem that was solved quite some time ago: The transformation of concrete syntax into abstract syntax is known as **parsing**. And the theory (and practice) of parsing is well understood.[^parsing]
## A Short Introduction to Parsing
Looking at parsing from the point of view of ordinary (1-dimensional) syntax, parsing is about putting the parentheses in the right place. For example,
1+2*3
is turned into
1+(2*3)
But this is still a linear expression that is not easy to process automatically. In particular, matching opening and closing parentheses may be arbitrarily far apart.
**Activity:** We can find similar examples in natural language. Use the idea of "putting the parentheses into the right place" to explain how the following sentences are ambigous.
Novak Djokovic, five times a champion here, failed to reach the concluding Sunday of the ATP World Tour Finals for a record eighth time [...].
**Optional Activity:** Write a calculator (operating on ordinary concrete syntax) in your favourite language without using a library for parsing. This is an excellent exercise to learn to appreciate the subtleties of parsing.
Better for processing by a machine is to bring the input string into a tree format such as
Plus
/ \
Num Times
| / \
1 Num Num
| |
2 3
The aims of this session are the following.
- Define the concrete syntax of a calculator using a context free grammar.
- Write a context-free grammar in BNFC.
- Use `bnfc` to generate a parser that translates concrete arithmetic expressions to abstract ones.
- Link up the parser with the calculator from the previous session to obtain a calculator written in Haskell and operating on concrete syntax.
This session is meant to be rather introductory. We will dig deeper into these concepts later. In particular, next semester, we will see that all of these concepts scale up to write a compiler for a programming language such as C++.
## A simple context-free grammar
A context-free grammar is a set of rules such as the following. (I skip the rules for `Integer`s.)
Exp -> Exp '+' Exp1
Exp -> Exp1
Exp1 -> Exp1 '*' Exp2
Exp1 -> Exp2
Exp2 -> Integer
Exp2 -> '(' Exp ')'
The purpose of a **context-free grammar** is to define a language, that is, a set of strings (aka words). In our case, we want to define arithmetic expressions. A particular string is in the language if it can be derived from the **start symbol**, which is `Exp` in the example. The symbols that will appear in the strings are those that are enclosed in single quotes, that is,
+
*
(
)
The other symbols such as `Exp`, `Exp1`, etc ... control which strings can be derived. We will not need a formal definition of such derivations for now, so I only give an example of a derivation. Fell free to ask me for more details.
(In the following I will drop the single quotes for readability.)
Exp ->
Exp1 ->
Exp1 * Exp1 ->
Exp2 * Exp1 ->
Exp2 * Exp2 ->
Integer * Exp2 ->
Integer * Integer ->
2 * Integer ->
2 * 1
The same string can have many different derivations as we are free to choose the order in which we apply the rules. On the other hand, for the grammar above, any two different derivations of the same string correspond to the same tree.
## Homework (until Thursday, extended to Tuesday)
**Exercise:** Study the derivation above. How many other deriviations can you make just by switching the order in which you apply the rules?
**Exercise:** In the lecture I drew the deriviation above as a tree. How many different derivation trees can you make?
The next exercise is the most important one. You need to do enough exercises of this kind until you feel comfortable parsing small examples by hand.
**Exercise:** Write out the derivation trees (also called **parse trees** or **concrete syntax trees**) for the following strings:
- `2+1`
- `1+2*3`
- `1+(2*3)`
- `(1+2)*3`
- `1+2*3+4*5+6`
**Exercise:** Why do the following strings not have parse trees (given the context-free grammar above)?
- `2-1`
- `1.0+2`
- `6/3`
- `8 mod 6`
**Exercise:** Can you change the grammar, so that the strings in the previous exercise become parsable?
---
this is how far we got in the Tuesday lecture ... and the Thursday lecture was largely a review ... we continue below on Tuesday:
---
## The parser generator BNFC
A parser generator takes as input a context-free grammar and produces as output a parser. We will use the parser generator BNFC. In fact, BNFC is also a language to define context-free grammars. The grammar above looks in BNFC as follows.
Plus. Exp ::= Exp "+" Exp1 ;
Times. Exp1 ::= Exp1 "*" Exp2 ;
Num. Exp2 ::= Integer ;
coercions Exp 2 ;
**Activity:** Discuss what features BNFC adds to the bare-bones context-free grammars we have seen above.
We will postpone a more detailed study of BNFC to the course on Compiler Construction where we have to build more complicated grammars. But if you want to learn more now look at [bnfc.digitalgrammars.com](http://bnfc.digitalgrammars.com/) and this entry on [coercions](https://bnfc.readthedocs.io/en/latest/lbnf.html#coercions) in the LBNF-documentation. [^LBNF2]
## Precedence levels
The grammar above has one subtle feature: The grammar makes sure that the *only* way to parse
1+2*3
is to mean $1 + (2\cdot 3)$ and not as $(1+2)\cdot 3$.
Watch [this video](https://www.youtube.com/watch?v=jf1xhZSpCvg) for an explanation of how to do this.
## Homework (until Thursday)
- With the simplified grammar without precedence levels
Exp -> Exp '+' Exp
Exp -> Exp '*' Exp
Exp -> Integer
how many parse trees can you find for the following expressions?
1+2+3
1*2*3*4
- Answer the question above using instead the grammar
Exp -> Exp '+' Exp1
Exp -> Exp1
Exp1 -> Exp1 '*' Exp2
Exp1 -> Exp2
Exp2 -> Integer
---
this is how far we got on Tuesday
---
## Generating a Parser from a Context-Free Grammar
I want to show that, using a parser generator, it is easy to extend the calculator we wrote for abstract syntax to more familiar concrete syntax.
You should install [BNFC](http://bnfc.digitalgrammars.com/tutorial/bnfc-tutorial.html) and replay the following on your own machine.[^bnfc] I collected some instructions [here](https://github.com/alexhkurz/programming-languages-2020/blob/master/BNFC-installation.md).
Download the directory [`Calculator`](https://github.com/alexhkurz/programming-languages-2020/tree/master/src/Calculator2). It should contain 3 files
Calculator.hs
Interpreter.hs
numbers.cf
Have a look at the files. `Interpreter.hs` is essentially the interpreter from the last session. We will come to `Calculator.hs` below. `numbers.cf` contains the context-free grammar in BNFC-format.
We use the grammar to generate a parser in two steps. First run
bnfc -m -haskell numbers.cf
Which files does `bnfc` generate? The most important one for us is `AbsNumbers.hs`. Open and it an editor. What do you find?
Second, we compile all these files to generate the actual parser:
make
The parser is now available as `TestNumbers`. Running [^echo]
echo "1+2*3" | ./TestNumbers
should give you as output the abstract syntax tree from the previous session:
Parse Successful!
[Abstract Syntax]
Plus (Num 1) (Times (Num 2) (Num 3))
[Linearized tree]
1 + 2 * 3
**Summary:** The following sequence of commands will be important for future assignments:
bnfc -m -haskell numbers.cf
make
echo "1+2*3" | ./TestNumbers
## Putting Everything Together
Putting everything together requires some glue code which is available for you to download as [`Calculator.hs`](https://github.com/alexhkurz/programming-languages-2020/tree/master/src/Calculator2/Calculator.hs). `Calculator.hs` essentially calls `eval` which is the interpreter from last session and is defined in [`Interpreter.hs`](https://github.com/alexhkurz/programming-languages-2020/tree/master/src/Calculator/Interpreter.hs) If you run
ghc Calculator.hs
you should get a file `Calculator` which you can run on any input in the language we defined:
echo "1+2*3" | ./Calculator
S (S (S (S (S (S (S O))))))
## Homework
- Install bnfc
- Verify that you can replay the lecture on your computer
Now you should be ready to do Task 2 of Part 2 of Assignment 1.
- Read the rest of the notes below.
## Adding Features to the Calculator
- **See [Assignment 1](https://github.com/alexhkurz/programming-languages-2020/tree/master/assignments.md):** If you allow yourself the use of the built-in Haskell arithmetic, then the program above can easily be adapted to a fully fledged calculator. See the directory [Calculator3](https://github.com/alexhkurz/programming-languages-2020/tree/master/src/Calculator3) for how the interpreter looks like if you use Haskell's numbers instead of our own successor numbers.
- **Optional Project:** Hook it up with a nice user interface. This could be a nice project to write about in your **blog/tutorial**.
## Stocktaking
What did we learn? One reason why compiler construction is difficult is that there are so many different languages involved, on so many different levels. Already in our small example, we have encountered a source language, BNF, abstract syntax and a host language:
- Our source language was a language for arithmetic expressions given by a context-free grammar.
- This context-free grammar was [described](https://github.com/alexhkurz/programming-languages-2020/blob/master/src/Calculator2/numbers.cf) in the (domain specific) programming language `BNFC`.[^LBNF] `BNFC` comes with the tool `bnfc` (written in Haskell, btw) that generates a parser from the grammar.
- The parser translates programs in the source language into abstract syntax, which can be seen as yet another language. Abstract sytax can be defined in many ways. In our example, it is given in Haskell by
data Exp = Num Int | Plus Exp Exp | Times Exp Exp
which is generated automatically[^AbsNumber] by `bnfc`.
- The abstract syntax is then [interpreted](https://github.com/alexhkurz/programming-languages-2020/blob/master/src/Calculator2/Interpreter.hs) in the *host language*. In our case the interpreter `eval` is written in Haskell.
**Remark:** Conceptually, the interpretation of abstract syntax can be broken into two steps. We could say that the functions `add` and `mult` define a *virtual machine*. This is very close to what happens in Java and Python, where the source code is compiled to virtual machine code, which is then interpreted.
**Remark:** Each language defines a level of abstraction that hides detail. We could complicate the picture sketched above to reveal even more languages. The Haskell runtime system is written in C and C--. Haskell is compiled to LLVM. LLVM is compiled to the target architecture on your computer, which in turn is implemented in microcode. (I should admit that with this remark I am crossing the limits of my knowledge ... while it would be very interesting to dive into more details, I am glad that we do not need to do this for this course.)
## References
General background is available in the Wikipedia articles on [BNF](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) and [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar). Also consult the article on the [Chomsky hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy), in particular the table that aligns grammars with automata.
For an introduction to the parser generator `bnfc` see the [BNFC tutorial](http://bnfc.digitalgrammars.com/tutorial/bnfc-tutorial.html).
The language in which the input for `bnfc` is written is called LBNF (labelled BNF) and documented in the [LBNF reference](https://bnfc.readthedocs.io/en/latest/lbnf.html).
## Recommended Reading
The literature on parsing is vast. We will learn much more about parsing in the course on Compiler Construction. Here, I limit myself to a few early references that continue to have a lasting influence. (It is always worth to look at the classics, something we don't do enough in science.)
- Joachim Lambek: [The Mathematics of Sentence Structure](https://www.cs.cmu.edu/~fp/courses/15816-f16/misc/Lambek58.pdf). 1958.
Read sections 1-4 up to the end of page 158. Terms worth learning here are: sentence, nonsentence, syntactic type, leakage, decision problem, primitive type, context, compound type. Section 4 is important because it explains the idea of *parsing-as-deduction*.
While Lambek's work continues to be influential, and is based on many of the same ideas, his formalism is different from context-free grammars, which were introduced on the first two pages of Chapter 4 of
- Noam Chomsky: [Syntactic Structures](https://www.academia.edu/4073170/Noam_Chomsky_Syntactic_Structure). 1957.
and are still an excellent introduction. Also the general background reviewed in Chapter 2 is worth reading today (as is much else in this book). For the historical significance also see its introduction and [Wikipedia on Syntactic Structures](https://en.wikipedia.org/wiki/Syntactic_Structures).
[^parsing]: If you read articles and books about compilers written before approx 1979, be aware that there was a time where parsing was THE problem of compiler construction. See this excellent [timeline](https://jeffreykegler.github.io/personal/timeline_v3). I found particular revealing the entries from "Language as of 1965" to "The Parsing Problem as of 1979".
[^bnfc]: You need to [install Haskell](https://hackmd.io/@alexhkurz/Hk86XnCzD) first.
[^echo]: `echo` copy its input to "standard output". The pipe `|` connects standard output of the program to the left with the standard input of the program to the right.
[^AbsNumber]: If you run BNFC, there will be a file `AbsNumber.hs` ... have a look yourself.
[^LBNF]: More precisely, the language in which the context-free grammars are written is called [LBNF](https://bnfc.readthedocs.io/en/latest/lbnf.html), for labelled BNF.
[^LBNF2]: (LBNF for "Labelled BNF" is the name of the language in which one writes the grammar that is then converted to a parser).