Introduction to Grammar

The aim of these short notes is to provide just enough background to get started with Grammatical Framework (GF). We will see more on GF next week.

Terms of the Trade

General Terms

language โ€“- a set of strings

concrete syntax โ€“- a string of characters or tokens such as 1+2*3

characters vs tokens โ€“- the string 12+34*56 8 characters but only 5 tokens

lexing โ€“- dividing a string of characters into a list of tokens

abstract syntax tree โ€“- a machine readable encoding of the abstract structure of a sentence

Example: An abstract syntax tree of 12+34*56 is Plus 12 (Times 34 56) and an abstract syntax tree of (12+34)*56 is Times (Plus 12 34) 56).

Exercise: Convert between the one dimensional notation of abstract syntax trees in the two dimensional notation.

parsing โ€“- creating abstract syntax tree from concrete syntax

Grammatical Framework

abstract grammar โ€“- definition of the abstract syntax trees of a language

concrete grammar โ€“- defines the string of an abstract syntax tree

Context Free Grammars

These slides provide a short overview that contains all the essential ideas we need.

A context-free grammar is a set of rules such as the following. (I skip the rules for Integers such as Integer -> '1', etc)

โ€‹โ€‹โ€‹โ€‹Exp -> Exp '+' Exp                                
โ€‹Exp -> Exp '*' Exp                            
โ€‹Exp -> Integer                                         
โ€‹Exp -> '(' Exp ')'      

Exp is the start symbol, terminal symbols are in single quotes.

Exercise 1: Find all derivations of 1+2*3. Write them out in linear form and as a tree.

Exercise 2: Find all derivations of (1+2)*3.

Now consider the following modification of the grammar.

โ€‹โ€‹โ€‹โ€‹Exp -> Exp '+' Exp1                                
โ€‹โ€‹โ€‹โ€‹Exp -> Exp1                                        
โ€‹Exp1 -> Exp1 '*' Exp2                              
โ€‹Exp1 -> Exp2                                       
โ€‹Exp2 -> Integer                                         
โ€‹Exp2 -> '(' Exp ')'                                

Exercise 3: Find all derivations of 1+2*3.

Homework

  • Using the two grammars above, how many derivations are there of 1+2+3+4?

  • A famous example of a context free grammar is the one that allows one to derive all strings of parentheses so that every ( matches exactly one ) to the right. For example

    โ€‹โ€‹โ€‹โ€‹โ€‹  ((()))
    โ€‹โ€‹โ€‹โ€‹โ€‹  ()()()
    โ€‹โ€‹โ€‹โ€‹โ€‹  ((()(()))(()))(())
    

    are legal strings in the language of balanced parentheses.

    Find a grammar for this language (two rules are enough).

  • For the grammar above is there a string that has two different parse trees?

  • Bonus question: Is it the case that every string with the following two properties is balanced? (i) the number of ( equals the number of ) and (ii) all prefixes have at least as many ( as )