---
tags: 298 NLP, grammar, copyright haub and kurz
---
# Introduction to Grammar
The aim of these short notes is to provide just enough background to get started with [Grammatical Framework](http://www.grammaticalframework.org/) (GF). We will see more on GF [next week](https://hackmd.io/@alexhkurz/HJ9rNyz0K).
## Terms of the Trade
### General Terms
language --- a set of strings
concrete syntax --- a string of characters or tokens such as `1+2*3`
characters vs tokens --- the string `12+34*56` 8 characters but only 5 tokens
lexing --- dividing a string of characters into a list of tokens
abstract syntax tree --- a machine readable encoding of the abstract structure of a sentence
**Example:** An abstract syntax tree of `12+34*56` is `Plus 12 (Times 34 56)` and an abstract syntax tree of `(12+34)*56` is `Times (Plus 12 34) 56)`.
**Exercise:** Convert between the one dimensional notation of abstract syntax trees in the two dimensional notation.
parsing --- creating abstract syntax tree from concrete syntax
### Grammatical Framework
abstract grammar --- definition of the abstract syntax trees of a language
concrete grammar --- defines the string of an abstract syntax tree
## Context Free Grammars
These [slides](https://people.cs.clemson.edu/~goddard/texts/theoryOfComputation/6a.pdf) provide a short overview that contains all the essential ideas we need.
A context-free grammar is a set of rules such as the following. (I skip the rules for `Integer`s such as `Integer -> '1'`, etc)
Exp -> Exp '+' Exp
Exp -> Exp '*' Exp
Exp -> Integer
Exp -> '(' Exp ')'
`Exp` is the start symbol, terminal symbols are in single quotes.
**Exercise 1:** Find all derivations of `1+2*3`. Write them out in linear form and as a tree.
**Exercise 2:** Find all derivations of `(1+2)*3`.
Now consider the following modification of the grammar.
Exp -> Exp '+' Exp1
Exp -> Exp1
Exp1 -> Exp1 '*' Exp2
Exp1 -> Exp2
Exp2 -> Integer
Exp2 -> '(' Exp ')'
**Exercise 3:** Find all derivations of `1+2*3`.
## Homework
- Using the two grammars above, how many derivations are there of `1+2+3+4`?
- A famous example of a context free grammar is the one that allows one to derive all strings of parentheses so that every `(` matches exactly one `)` to the right. For example
((()))
()()()
((()(()))(()))(())
are legal strings in the language of balanced parentheses.
Find a grammar for this language (two rules are enough).
- For the grammar above is there a string that has two different parse trees?
- Bonus question: Is it the case that every string with the following two properties is balanced? (i) the number of `(` equals the number of `)` and (ii) all prefixes have at least as many `(` as `)`