---
tags: programming languages
---
(part of a course on programming languages)
# Syntax of Lambda Calculus
**Prerequisites:** Context-free grammars, concrete syntax trees, abstract syntax trees, how to generate a parser using `bnfc`.
**Learning Outcomes:** After having worked through the exercises and homework, students will be able to
- check whether a string is a lambda calculus program,
- construct the abstract syntax tree of a lambda calculus program.
*Remark:* A program is by definition syntactically correct (otherwise it is not a program in the first place). But we sometimes say "syntactically correct program" or "legal program" or "well-formed program" for emphasis.
**Videos:**
- [Syntax of Lambda Calculus](https://youtu.be/D0kH1BpNr14)
- [Parsing Lambda Calculus Expressions 1](https://youtu.be/eYstx7uuE6c)
- [Parsing Lambda Calculus Expressions 2](https://youtu.be/yls1NEUlzZA)
---
When you see the $\lambda$-calculus for the first time, it will look weird. Watch the [video preview](https://youtu.be/D0kH1BpNr14) to see that you already know some of the basic ideas.
---
We will use what we learned about the calculator in order to define our first programming language, the ***$\lambda$-calculus*** or ***lambda calculus***.
**Discussion:** A calculator shares many features with a programming language. What is missing from a calculator?
## Overview
Lambda calculus has only three programming constructs:
- **Abstraction:** If $e$ is a program (also called an expression in lambda calculus), possibly containing a variable $x$, then
$$\lambda x.e$$ is the program (or function), which has as a formal parameter $x$. This is called abstraction, because the program $\lambda x.e$ does not depend on $x$ anymore, $x$ is abstracted away. We will explain this in more detail below.
- **Application:** If $e_1$ and $e_2$ are programs then $$e_1\, e_2$$ is the program which applies the function $e_1$ to the argument $e_2$.
- **Variables:** The basic programs are just the variables.
## The Abstract Syntax of Lambda Calculus
<!--
In textbooks and articles, one often finds the following
**Definition 1 (abstract syntax):** The abstract syntax of the $\lambda$-calculus is the language defined by the grammar
$$ e ::= \lambda x.e \mid e\, e \mid x$$
where $x$ ranges over an infinite set, the elements of which are called variables.
**Activity:**
- Compare the three bullet points above with $\ e ::= \lambda x.e \mid e e \mid x\ .$ Both have the same meaning. What are the advantages of the shorter notation?
- Recall from our discussion of the calculator that abstract syntax is a syntax for trees. Build (produce, generate) some abstract syntax trees from the grammar.
-->
**Remark:** We will comment on the three clauses in turn.
- (Function definition.) $\lambda x.e$ is a function. If we compare this with a C-like language
int plus_one(int x) { return x+1 }
we note the following differences:
- No types, that is, `plus_one(x){ return x+1 }`
- No need for `return`, we just return the last value as in `plus_one(x){x+1}`
- No functions names which leaves us with `(x){x+1}` which we write as $\lambda x. x+1$, or, in ASCII, as `\x. x+1`
We will see later what to do about the `+ 1`.
Function definition is called abstraction since in the same way as the definitions
int plus_one(int x) { return x+1 }
and
int plus_one(int y) { return y+1 }
denote the same function, so do
$\lambda x.x+1$ and $\lambda y.y+1$
denote the same function. More generally,
$\lambda x.e$ and $\lambda y.e'$
denote the same lambda term if $e'$ arises from $e$ by substituting $y$ for $x$. In other words, the meaning of $\lambda x.e$ does not depend on $x$, the name $x$ has been *abstracted*.
- (Function Application.) In a C-like language we would write function application as
plus_one(2)
whereas now we write, as in Haskell,
plus_one 2
or, rather,
(\x. x+1) 2
in other words, we juxtapose (write next to each other) the function and its argument.
- (Variables.) Variables are mere names. The only operation available on names will be testing whether two names are equal. In particular, there will be no assignment of values to variables. (The variables of lambda calculus are not references to memory.)
<!--
**Remark:** The grammar $\ e ::= \lambda x.e \mid e\, e \mid x$ defines the syntax of lambda expressions as abstract syntax trees. We add parenthesis if we want to disambiguate a string as in $$(\lambda x. (\lambda y. ((x y) z))) (((a b) c)d)$$
If you want to be sure that a string is not ambiguous, you can modify the grammar as follows
$$\ e ::= (\lambda x.e) \mid (e\, e) \mid x$$
**Exercise:** Write $(\lambda x. (\lambda y. ((x y) z))) (((a b) c)d)$ as an abstract syntax tree.
-->
## Rules for Dropping Parentheses
To recap the difference between abstract and concrete syntax, while abstract syntax is syntax for trees, concrete syntax is syntax for strings. When we linearize trees to strings we get many parentheses, as for example in
$$(((((\lambda a.(\lambda b.(\lambda c.(\lambda d. (((a + b) + c)+d)))))\, 1)\,2)\,3)\, 4)$$
instead of the much more readable [^meaning]
$$(\lambda a.\lambda b.\lambda c.\lambda d. a + b + c+d)\, 1\,2\,3\, 4$$
So we want rules for dropping parenthesis. [^lambdaxyz]
*Remark:* If you already got used to the rules for dropping parentheses in Haskell what comes next will look familiar.
- **Application associates to the left**. This means that, for example
|short|all parentheses |
|---|---|
|$a\,b\,c$| $((a\,b)c)$|
|$a\,b\,c\,d$|$(((a\,b)c)d)$|
- **Abstraction** is essentially a unary operation, so we can drop parentheses without creating ambiguities. For example,
|short|all parentheses |
|---|---|
|$\lambda a.\lambda b. \lambda c. e$|$(\lambda a.(\lambda b. (\lambda c. e)))$|
|$\lambda a.\lambda b. \lambda c. \lambda d.e$|$(\lambda a.(\lambda b. (\lambda c.(\lambda d. e))))$|
- **Application has higher precedence than abstraction**. For example,
|short|longer | all parentheses
|---|---|---|
|$\lambda a.\lambda b. a\, b$|$\lambda a.\lambda b. (a\, b)$ | ($\lambda a.(\lambda b. (a\, b)))$
|$\lambda a.\lambda b. \lambda c. a\,b\,c$|$\lambda a.\lambda b. \lambda c. (a\,b\,c)$|$(\lambda a.(\lambda b. (\lambda c.((a\,b)c))))$|
We sometimes say that the scope of a $\lambda$ extends as far to the right as possible.
**Exercise:** Use the convention on parentheses described in the previous remark in order to eliminate as many parentheses as possible from $(\lambda x. (\lambda y. (x y z))) ((a b) c)$ without changing the abstract syntax tree denoted by the expression.
## Parsing Concrete Syntax
In the previous section we have seen how to go from abstract syntax to concrete syntax by adding parentheses and rules for dropping parentheses. In this section we will see how to go from concrete syntax to abstract syntax. Parsing concrete syntax can be understood as putting all parentheses back into the expression, that is, as transforming strings into trees.
<!--
At this point, you should recall our use of BNFC for the calculator. We are going to apply the same methodology to $\lambda$-calculus instead of to arithmetic expressions.
In BNFC, we formalise $\quad\ e ::= \lambda x.e \mid e\, e \mid x\quad$ as follows. The main difference is that each rule gets a name and goes on a separate line. This then looks as follows.
-->
**Definition 2 (concrete syntax):** The lambda calculus is the language defined by the folloing BNFC grammar
Abs. Exp ::= "\\" Ident "." Exp ;
App. Exp ::= Exp Exp1 ;
Var. Exp1 ::= Ident ;
coercions Exp 1 ;
**Remark:** Most of the notation has been explained before. Some comments:
- Since $\lambda$ is not an ASCII character, we write `\` instead.
- Accordingly, we would have liked to write, in the first line, `Exp ::= "\" Id "." Exp`. But `\` is a special character that needs to be escaped, so writing `\\` in the BNFC grammar allows us to write `\` in the programs.
- The reason for writing `Exp ::= Exp Exp1` instead of the simpler `Exp ::= Exp Exp` is the same why we wrote `Exp ::= Exp "+" Exp1` instead of `Exp ::= Exp "+" Exp` when defining a grammar for arithmetic expressions. It captures that application has a higher precedence than abstraction.
## <font color=red>Homework for the report</font>
1. (recommended, helps you for the next item and needed later anyway) The purpose of this item is to apply what you learned about the calculator to lambda calculus, namely to take a grammar and generate a parser. Specifically, using `bnfc` and the grammar of lambda-calculus above create a parser for lambda-calculus expressions. For more on `bnfc`, see the lecture [A Calculator: The Parser](https://hackmd.io/@alexhkurz/BkqOWbgMF) and links therein. [^bnfc]
[^bnfc]: The basic workflow is
```
bnfc -m -haskell grammar.cf
make
echo "string" | ./TestGrammar
```
where `string` is the string you want to parese and `grammar.cf` is the name of the file that contains your grammar.
2. (for the report) Based on the grammar above, write out (possibly with help from your bnfc parser) the abstract syntax trees (in 2-dimensional notation, that is, not linearized) for the following expressions
x
x x
x y
x y z
\ x.x
\ x.x x
(\ x . (\ y . x y)) (\ x.x) z
(\ x . \ y . x y z) a b c
If you did item 1, also copy-paste the linearized abstract syntax tree you get from the bnfc-generated parser.
## Additional Exercises
- (IMPORTANT) This is an exercise in reading lambda-calculus expressions. It is important as we will spend a lot of time with lambda calculus from now on.
Which of the following expressions e1 and e2 have the same abstract syntax tree? (You can use your parser again.)
| e1 | e2 |
|:-- |:--- |
| \x.y | \ x . y |
| x y | (x y) |
| x y | ((x y)) |
| x y z | (x y) z |
| x y z | xyz |
| x y z | x (y z) |
| \x.y z | \x.(y z) |
| \x.y z | (\x.y) z |
| (\x.x) y z | ((\x.x) y) z |
| (\x. x) y z | (\x. x) (y z) |
| \x.\y.x y | \x.(\y.x y) |
| \x.\y.x y | (\x.\y.x) y |
| \ x. x | \ y. y |
**Remark:** On my system `echo "(\x.x)a"` does not work in the `zsh`. So I either write `echo "(\ x.x)a"` or use `bash`. If I parse a file, there is no difference between `\x` and `\ x`.
**Exercise:** Show that the following are not lambda calculus expressions (according the BNFC grammar above)
\ . x
\ x y. x -- this could be a legal abbreviation
x \ y
What error message would you suggest a parser could return to the programmer?
**Exercise:** Are the following lambda caclulus programs? If yes, write down the abstract syntax tree.
x
(x)
x \ y . z
x \y . z
x \y.z
\\ x . z
x \ y \ z. z
\x.\y. x y z
**Exercise:** Put in all parentheses in the right places in the following lambda-expressions. (You can check your answers using the abstract syntax produced by your `bnfc`- generated parser and then linearize the tree by hand, putting in the parentheses.)
a b c \ x . x
a b \ x . a x
\ a . \ b . \ c . a b c x
\ x . y b \ x . a x
\ x . y \ x . y \ x . a x
a b c \ x . \ y . a b c \ x . x a b c \ y. a b c
## Optional Exercises
**Exercise:** Change the grammar so that application is now written as `x @ y` or `x * y` or some other notation you like. Parse some sample lambda terms in your own concrete syntax.
## Videos
- [Parsing Lambda Calculus Expressions 1](https://youtu.be/eYstx7uuE6c)
- [Parsing Lambda Calculus Expressions 2](https://youtu.be/yls1NEUlzZA)
<!--
- Parse some programs in the [LambdaNat language](https://github.com/alexhkurz/programming-languages-2019/tree/master/Lab1-Lambda-Calculus/LambdaNat0).
- Solve the exercises of this lecture with the help of the tool as explained [here](https://github.com/alexhkurz/programming-languages-2019/blob/master/Lambda-Calculus/LambdaNat/grammar/README.md).
- Compare the abstract syntax trees produced by the parser with the parsing you have done by hand in the exercises above.
- Most interesting is perhaps to check, using the tool, whether e1 and e2 as in the table
| e1| e2|
|:--:|:---:|
|\x.y|\ x . y |
|( x y ) | (x y)|
|( x y ) | (xy) |
|( x y ) z | x y z |
|x (y z) | x y z |
| x | (x) |
| x | ((x)) |
| \ x.x | \ y. y|
have the same abstract syntax trees.
-->
[^meaning]: We will learn in the next lecture that the meaning of this expression will be $1+2+3+4=10$.
[^lambdaxyz]: A further common abbreviation is $$(\lambda\, a\,b\,c\,d. a + b + c+d)\, 1\,2\,3\, 4$$