The LALR(1) parser generated by BNFC/Happy

# The LALR(1) parser generated by BNFC/Happy ## Introduction In our introductions to [Shift-Reduce Parsing](https://hackmd.io/@alexhkurz/rk5PsF2EI) and also to [Shift/reduce and reduce/reduce conflicts](https://hackmd.io/@alexhkurz/SJx6T5R48), we learned how to obtain a `.info` file for a BNFC/Happy generated parser and we used it to explain how shift-reduce parsing works. Even if the grammar is not ambiguous and has no shift/reduce or reduce/reduce conflicts, while doing the shift-reduce parsing "by hand", we may still have to guess whether to shift or reduce and to backtrack in case we would run into a cul-de-sac. LR, or LALR(1), parsers remove this non-determinacy and make the parsing fully deterministic. This is much more efficient than an algorithm based on backtracking. In fact, LALR(1) parsing has a time-complexity linear in the size of the input program. Here we look in detail at the LALR(1) parser generated by BNFC/ Happy for the same grammar we used previously as our running example. EAdd. Exp ::= Exp "+" Exp1 ; EMul. Exp1 ::= Exp1 "*" Integer ; EInt. Exp1 ::= Integer ; coercions Exp 1 ; ## Required Reading - Section 3.7 and 3.8 of [Implementing Programming Languages](http://www.cse.chalmers.se/edu/year/2012/course/DAT150/lectures/plt-book.pdf). - The wikipedia article [LR parsing](https://en.wikipedia.org/wiki/LR_parser#Parse_table_for_the_example_grammar) is also useful. ## LALR(1) parsing with a Happy parser We use [ParCalc.info](https://github.com/alexhkurz/compiler-construction-2022/blob/master/Sources/Calculator/ParCalc.info) to parse `1+2*3`. The parser is a finite state machine (DFA) equipped with a stack. As we can see from `ParCalc.info` there are 8 rules and 15 states. *To read the following you also need to read simultaneously `ParCalc.info`. Put both next to each other. The following text is commentary to my reading of `ParCalc.info`.* A parser is a pushdown automaton, that is, a DFA that a also maintains a stack as external memory. A stack is a memory that can only be accessed via the operations push and pop. To keep track of where the pushdown automaton is in its computation we consider its `configuration` that is its stack together with the remaining input. We start at the configuration `(0 , 1+2*3)` saying that we start in `State 0` of `ParCalc.info`, which tells us to do L_integ shift, and enter state 3 that is, to push state 3 on the stack and continue in the configuration `(0 L_integ 3 , +2*3)` where the first component records that we went from state 0 to state 3, the latter being the current state. State 3 is named Integer -> L_integ . (rule 2) because the role of state 3 in the parser is to apply rule 2 of the grammar. To know what computation state 3 performs we look at '+' reduce using rule 2 because the look-ahead is `+`. (The look-ahead is the first element of the input `+2*3` in the current configuration). To reduce means to pop the current state from the stack and to continue in the configuration `(0 Integer , +2*3).` So we are back to state 0 which tells us Integer goto state 4 an since `Integer` is on the top of the stack we go to configuration `(0 Integer 4, +2*3)` State 4 is named Exp1 -> Integer . (rule 6) and tells us '+' reduce using rule 6 so we go back to State 0 and continue with configuration `(0 Exp1, +2*3)` which tells us Exp1 goto state 8 hence we go to `(0 Exp1 8, +2*3)` Exp -> Exp1 . (rule 4) Exp1 -> Exp1 . '*' Integer (rule 5) '+' reduce using rule 4 then `(0 Exp , +2*3)` which tells us Exp goto state 7 ## Summary To summarise, we have so far completed the following part of the parse |Configuration | action | |--- |--- | |`(0 , 1+2*3)` | shift, enter 3 | |`(0 L_integ 3 , +2*3)` | reduce 2 | |`(0 Integer , +2*3)` | goto 4 | |`(0 Integer 4, +2*3)` | reduce 6 | |`(0 Exp1 , +2*3)` | goto 8 | |`(0 Exp1 8 , +2*3)` | reduce 4| |`(0 Exp , +2*3)` | goto 7| In other words we constructed the part of the parse that consists of Exp | Exp1 | Integer | L_integ | 1 ## Homework - Complete the table above. Self check: You need 16 more steps and the last one is |Configuration | action | |--- |--- | |`(0 Exp 7 , <>)` | accept | - Extract the parse tree from the completed table.  ## Further reading From [Grune, Jacobs: Parsing Techniques](https://www.dickgrune.com/Books/PTAPG_1st_Edition/BookBody.pdf) (highly recommended): - p.25 explains notions such as sentence, sentential forms, production steps, production rules. - p. 73 on bottom-up parsing explains that the ***handle*** is the segment of the sentential form (= the relevant top of the stack minus the states) that needs to be reduced next by matching it against the right-hand side of the rule that was the one last applied to produce the sentential form. Finding the correct handle efficiently is one thing that makes parsing difficult. One may have to trade off expressivity against efficiency. The compromise that is widely considered to be the best is called LALR(1) and is used by parser generators such as Yacc, Bison, Happy, Cup. - p. 144-145 features the basic idea of shift-reduce parsers. - p. 184-185 gives a summary of deterministic bottom-up parsing. - p. 213 ff Section 9.6 contains an introduction to LALR(1) parsing with some interesting historical remarks. - [Conflict Tips](https://www.haskell.org/happy/doc/html/sec-conflict-tips.html#sec-lalr) from the Happy documentation.