--- tags: 298 NLP, grammar, copyright haub and kurz --- # Introduction to Grammar The aim of these short notes is to provide just enough background to get started with [Grammatical Framework](http://www.grammaticalframework.org/) (GF). We will see more on GF [next week](https://hackmd.io/@alexhkurz/HJ9rNyz0K). ## Terms of the Trade ### General Terms language --- a set of strings concrete syntax --- a string of characters or tokens such as `1+2*3` characters vs tokens --- the string `12+34*56` 8 characters but only 5 tokens lexing --- dividing a string of characters into a list of tokens abstract syntax tree --- a machine readable encoding of the abstract structure of a sentence **Example:** An abstract syntax tree of `12+34*56` is `Plus 12 (Times 34 56)` and an abstract syntax tree of `(12+34)*56` is `Times (Plus 12 34) 56)`. **Exercise:** Convert between the one dimensional notation of abstract syntax trees in the two dimensional notation. parsing --- creating abstract syntax tree from concrete syntax ### Grammatical Framework abstract grammar --- definition of the abstract syntax trees of a language concrete grammar --- defines the string of an abstract syntax tree ## Context Free Grammars These [slides](https://people.cs.clemson.edu/~goddard/texts/theoryOfComputation/6a.pdf) provide a short overview that contains all the essential ideas we need. A context-free grammar is a set of rules such as the following. (I skip the rules for `Integer`s such as `Integer -> '1'`, etc) Exp -> Exp '+' Exp Exp -> Exp '*' Exp Exp -> Integer Exp -> '(' Exp ')' `Exp` is the start symbol, terminal symbols are in single quotes. **Exercise 1:** Find all derivations of `1+2*3`. Write them out in linear form and as a tree. **Exercise 2:** Find all derivations of `(1+2)*3`. Now consider the following modification of the grammar. Exp -> Exp '+' Exp1 Exp -> Exp1 Exp1 -> Exp1 '*' Exp2 Exp1 -> Exp2 Exp2 -> Integer Exp2 -> '(' Exp ')' **Exercise 3:** Find all derivations of `1+2*3`. ## Homework - Using the two grammars above, how many derivations are there of `1+2+3+4`? - A famous example of a context free grammar is the one that allows one to derive all strings of parentheses so that every `(` matches exactly one `)` to the right. For example ((())) ()()() ((()(()))(()))(()) are legal strings in the language of balanced parentheses. Find a grammar for this language (two rules are enough). - For the grammar above is there a string that has two different parse trees? - Bonus question: Is it the case that every string with the following two properties is balanced? (i) the number of `(` equals the number of `)` and (ii) all prefixes have at least as many `(` as `)`