The aim of these short notes is to provide just enough background to get started with Grammatical Framework (GF). We will see more on GF next week.
language โ- a set of strings
concrete syntax โ- a string of characters or tokens such as 1+2*3
characters vs tokens โ- the string 12+34*56
8 characters but only 5 tokens
lexing โ- dividing a string of characters into a list of tokens
abstract syntax tree โ- a machine readable encoding of the abstract structure of a sentence
Example: An abstract syntax tree of 12+34*56
is Plus 12 (Times 34 56)
and an abstract syntax tree of (12+34)*56
is Times (Plus 12 34) 56)
.
Exercise: Convert between the one dimensional notation of abstract syntax trees in the two dimensional notation.
parsing โ- creating abstract syntax tree from concrete syntax
abstract grammar โ- definition of the abstract syntax trees of a language
concrete grammar โ- defines the string of an abstract syntax tree
These slides provide a short overview that contains all the essential ideas we need.
A context-free grammar is a set of rules such as the following. (I skip the rules for Integer
s such as Integer -> '1'
, etc)
โโโโExp -> Exp '+' Exp
โExp -> Exp '*' Exp
โExp -> Integer
โExp -> '(' Exp ')'
Exp
is the start symbol, terminal symbols are in single quotes.
Exercise 1: Find all derivations of 1+2*3
. Write them out in linear form and as a tree.
Exercise 2: Find all derivations of (1+2)*3
.
Now consider the following modification of the grammar.
โโโโExp -> Exp '+' Exp1
โโโโExp -> Exp1
โExp1 -> Exp1 '*' Exp2
โExp1 -> Exp2
โExp2 -> Integer
โExp2 -> '(' Exp ')'
Exercise 3: Find all derivations of 1+2*3
.
Using the two grammars above, how many derivations are there of 1+2+3+4
?
A famous example of a context free grammar is the one that allows one to derive all strings of parentheses so that every (
matches exactly one )
to the right. For example
โโโโโ ((()))
โโโโโ ()()()
โโโโโ ((()(()))(()))(())
are legal strings in the language of balanced parentheses.
Find a grammar for this language (two rules are enough).
For the grammar above is there a string that has two different parse trees?
Bonus question: Is it the case that every string with the following two properties is balanced? (i) the number of (
equals the number of )
and (ii) all prefixes have at least as many (
as )