# The LALR(1) parser generated by BNFC/Happy
## Introduction
In our introductions to [Shift-Reduce Parsing](https://hackmd.io/@alexhkurz/rk5PsF2EI) and also to [Shift/reduce and reduce/reduce conflicts](https://hackmd.io/@alexhkurz/SJx6T5R48), we learned how to obtain a `.info` file for a BNFC/Happy generated parser and we used it to explain how shift-reduce parsing works.
Even if the grammar is not ambiguous and has no shift/reduce or reduce/reduce conflicts, while doing the shift-reduce parsing "by hand", we may still have to guess whether to shift or reduce and to backtrack in case we would run into a cul-de-sac.
LR, or LALR(1), parsers remove this non-determinacy and make the parsing fully deterministic. This is much more efficient than an algorithm based on backtracking. In fact, LALR(1) parsing has a time-complexity linear in the size of the input program.
Here we look in detail at the LALR(1) parser generated by BNFC/ Happy for the same grammar we used previously as our running example.
EAdd. Exp ::= Exp "+" Exp1 ;
EMul. Exp1 ::= Exp1 "*" Integer ;
EInt. Exp1 ::= Integer ;
coercions Exp 1 ;
## Required Reading
- Section 3.7 and 3.8 of [Implementing Programming Languages](http://www.cse.chalmers.se/edu/year/2012/course/DAT150/lectures/plt-book.pdf).
- The wikipedia article [LR parsing](https://en.wikipedia.org/wiki/LR_parser#Parse_table_for_the_example_grammar) is also useful.
## LALR(1) parsing with a Happy parser
We use [ParCalc.info](https://github.com/alexhkurz/compiler-construction-2022/blob/master/Sources/Calculator/ParCalc.info) to parse `1+2*3`. The parser is a finite state machine (DFA) equipped with a stack. As we can see from `ParCalc.info` there are 8 rules and 15 states.
*To read the following you also need to read simultaneously `ParCalc.info`. Put both next to each other. The following text is commentary to my reading of `ParCalc.info`.*
A parser is a pushdown automaton, that is, a DFA that a also maintains a stack as external memory. A stack is a memory that can only be accessed via the operations push and pop.
To keep track of where the pushdown automaton is in its computation we consider its `configuration` that is its stack together with the remaining input. We start at the configuration
`(0 , 1+2*3)`
saying that we start in `State 0` of `ParCalc.info`, which tells us to do
L_integ shift, and enter state 3
that is, to push state 3 on the stack and continue in the configuration
`(0 L_integ 3 , +2*3)`
where the first component records that we went from state 0 to state 3, the latter being the current state. State 3 is named
Integer -> L_integ . (rule 2)
because the role of state 3 in the parser is to apply rule 2 of the grammar. To know what computation state 3 performs we look at
'+' reduce using rule 2
because the look-ahead is `+`. (The look-ahead is the first element of the input `+2*3` in the current configuration). To reduce means to pop the current state from the stack and to continue in the configuration
`(0 Integer , +2*3).`
So we are back to state 0 which tells us
Integer goto state 4
an since `Integer` is on the top of the stack we go to configuration
`(0 Integer 4, +2*3)`
State 4 is named
Exp1 -> Integer . (rule 6)
and tells us
'+' reduce using rule 6
so we go back to State 0 and continue with configuration
`(0 Exp1, +2*3)` which tells us
Exp1 goto state 8
hence we go to
`(0 Exp1 8, +2*3)`
Exp -> Exp1 . (rule 4)
Exp1 -> Exp1 . '*' Integer (rule 5)
'+' reduce using rule 4
then
`(0 Exp , +2*3)` which tells us
Exp goto state 7
## Summary
To summarise, we have so far completed the following part of the parse
|Configuration | action |
|--- |--- |
|`(0 , 1+2*3)` | shift, enter 3 |
|`(0 L_integ 3 , +2*3)` | reduce 2 |
|`(0 Integer , +2*3)` | goto 4 |
|`(0 Integer 4, +2*3)` | reduce 6 |
|`(0 Exp1 , +2*3)` | goto 8 |
|`(0 Exp1 8 , +2*3)` | reduce 4|
|`(0 Exp , +2*3)` | goto 7|
In other words we constructed the part of the parse that consists of
Exp
|
Exp1
|
Integer
|
L_integ
|
1
## Homework
- Complete the table above.
Self check: You need 16 more steps and the last one is
|Configuration | action |
|--- |--- |
|`(0 Exp 7 , <>)` | accept |
- Extract the parse tree from the completed table.
<!--
# LALR(1) parsing of `1+2*3`
|Configuration | action |
|--- |--- |
|`(0 , 1+2*3)` | shift, enter 3 |
|`(0 L_integ 3 , +2*3)` | reduce 2 |
|`(0 Integer , +2*3)` | goto 4 |
|`(0 Integer 4, +2*3).` | reduce 6 |
|`(0 Exp1 , +2*3)` | goto 8 |
|`(0 Exp1 8 , +2*3)` | reduce 4|
|`(0 Exp , +2*3)` | goto 7|
|`(0 Exp 7, +2*3)` | shift, enter 10|
|`(0 Exp 7 + 10 , 2*3)` | shift, enter 3|
|`(0 Exp 7 + 10 L_integ 3, *3)` | reduce 2|
|`(0 Exp 7 + 10 Integer, *3)` | goto 4|
|`(0 Exp 7 + 10 Integer 4, *3)` | reduce 6|
|`(0 Exp 7 + 10 Integer , *3)` | goto 13|
|`(0 Exp 7 + 10 Integer 13 , 3)` | shift, enter 9|
|`(0 Exp 7 + 10 Integer 13 * 9, 3)` | shift, enter 3|
|`(0 Exp 7 + 10 Integer 13 * 9 L_integ, <>)` | shift, enter 3|
|`(0 Exp 7 + 10 Integer 13 * 9 L_integ 3, <>)` | reduce 2|
|`(0 Exp 7 + 10 Integer 13 * 9 Integer, <>)` | goto 14|
|`(0 Exp 7 + 10 Integer 13 * 9 Integer 14, <>)` | reduce 5|
|`(0 Exp 7 + 10 Exp1 , <>)` | goto 13|
|`(0 Exp 7 + 10 Exp1 13 , <>)` | reduce 3|
|`(0 Exp , <>)` | goto 7 |
|`(0 Exp 7 , <>)` | accept |
**(0 Exp 7 , +2*3)**
'+' shift, and enter state 10
**(0 Exp 7 + 10, 2*3)**
L_integ shift, and enter state 3
(0 Exp 7 + 10 L_integ 3, *3)
Integer -> L_integ . (rule 2)
'*' reduce using rule 2
**(0 Exp 7 + 10 Integer , *3)**,
Integer goto state 4
**(0 Exp 7 + 10 Integer 4 , *3)**,
Exp1 -> Integer . (rule 6)
'*' reduce using rule 6
**(0 Exp 7 + 10 Exp1 , *3)**,
Exp1 goto state 13
**(0 Exp 7 + 10 Exp1 13, *3)**,
Exp -> Exp '+' Exp1 . (rule 3)
Exp1 -> Exp1 . '*' Integer (rule 5)
'*' shift, and enter state 9
**(0 Exp 7 + 10 Exp1 13 * 9, 3)**,
L_integ shift, and enter state 3
**(0 Exp 7 + 10 Exp1 13 * 9 L_integ 3, )**,
maybe I should write insert here the end-of-file character, which I had dropped so far
**(0 Exp 7 + 10 Exp1 13 * 9 L_integ 3, %eof)**
Integer -> L_integ . (rule 2)
%eof reduce using rule 2
**(0 Exp 7 + 10 Exp1 13 * 9 Integer , %eof)**
Exp1 -> Exp1 '*' . Integer (rule 5)
Integer goto state 14
**(0 Exp 7 + 10 Exp1 13 * 9 Integer 14 , %eof)**
Exp1 -> Exp1 '*' Integer . (rule 5)
%eof reduce using rule 5
**(0 Exp 7 + 10 Exp1 , %eof)**
Exp -> Exp '+' . Exp1 (rule 3)
Exp1 goto state 13
**(0 Exp 7 + 10 Exp1 13 , %eof)**
Exp -> Exp '+' Exp1 . (rule 3)
Exp1 -> Exp1 . '*' Integer (rule 5)
%eof reduce using rule 3
**(0 Exp , %eof)**
Exp goto state 7
**(0 Exp 7, %eof)**
%start_pExp -> Exp . (rule 0)
Exp -> Exp . '+' Exp1 (rule 3)
%eof accept
-->
## Further reading
From [Grune, Jacobs: Parsing Techniques](https://www.dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf) (highly recommended):
- p.25 explains notions such as sentence, sentential forms, production steps, production rules.
- p. 73 on bottom-up parsing explains that the ***handle*** is the segment of the sentential form (= the relevant top of the stack minus the states) that needs to be reduced next by matching it against the right-hand side of the rule that was the one last applied to produce the sentential form. Finding the correct handle efficiently is one thing that makes parsing difficult. One may have to trade off expressivity against efficiency. The compromise that is widely considered to be the best is called LALR(1) and is used by parser generators such as Yacc, Bison, Happy, Cup.
- p. 144-145 features the basic idea of shift-reduce parsers.
- p. 184-185 gives a summary of deterministic bottom-up parsing.
- p. 213 ff Section 9.6 contains an introduction to LALR(1) parsing with some interesting historical remarks.
- [Conflict Tips](https://www.haskell.org/happy/doc/html/sec-conflict-tips.html#sec-lalr) from the Happy documentation.