Chapter12: Constituent Parsing

# Chapter12: Constituent Parsing ## 12.1 Introoduction ## 12.2 Bottom-Up Parsing 372 Words ⇒ structure ### 12.2.1 The Shift–Reduce Algorithm 372 **Shift** : shift 1 word to stack from word list **Reduce** : change stacked elements by grammar rule *e.g, the ⇒ det det, noun ⇒ np np, vp ⇒ s* repeat **Shift** and **Reduce** until s(root) is stacked ![](https://i.imgur.com/gy7gLq9.png) **(Test.1)** ### 12.2.2 Implementing Shift–Reduce Parsing in Prolog 373 ### 12.2.3 Differences Between Bottom-Up and Top-Down Parsing 375 ||pros|cons| |---|---|---| |Top-down|- more intutive<br>- can handle null constituents like *det --> []*|- | |Bottom-up|- can handle left-recursive rules|- require aaditional code<br>- not be as natural| **(Test.2)** ## 12.3 Chart Parsing 376 Chart (tabular) parsing : technique to avoid a parser repeating a same analysis. ### 12.3.1 Backtracking and Efficiency 376 ### 12.3.2 Structure of a Chart 376 construct local structure like &dArr; ![](https://i.imgur.com/mJmwDeM.png) node : ①,②,... arc : →,... ### 12.3.3 The Active Chart 378 **dotted rule** : insert dot(•) in the right-hand side of a current parsed word. **inactive arc** : completely parsed rule *e.g, np --> det noun •* **active arc** : parser needs more words from the input to confirm rule *e.g, np --> det • noun np --> • det noun* **(Test.3)** how to make an active chart + make an arc according to *Arcs* + write a rule to the arc according to *Rules* ![](https://i.imgur.com/20RQIcP.png) **(Test.4)** ### 12.3.4 Modules of an Earley Parser 379 Input sentence: *The meal of the day* **predictor** : selects all the rules that can process active arcs > start --> • np > np --> • det noun > np --> • det adj noun > np --> • np pp **scanner** : inserts new word from input into chart in the form of *pos --> word* if POS to right of a dot are matched the word > det --> the • **completer** : uses new constituents (inserted by scanner) to advance the dot of active arcs expecting them > np --> det • noun > np --> det • adj noun predictor → scanner → completer → predictor →　... TODO: ↑ more explanation **(Test.5)** ### 12.3.5 The Earley Algorithm in Prolog 382 ### 12.3.6 The Earley Parser to Handle Left-Recursive Rules and Empty Symbols 386 Before creating a new arc, the predictor predicate checks that it is not already present in the chart. So infinite execution is avoided. > If the arcs bellow are already present > np --> • np pp > np --> • det noun > np --> • det adj noun > , np --> • np pp predicts nothing more. **(Test.6)** ## 12.4 Probabilistic Parsing of Context-Free Grammars 388 **probabilistic parsing** : a parser trying more frequent rules first ## 12.5 A Description of PCFGs. 389 **Probabilistic Context-Free Grammar** : A context-free grammar in which each rule is assigned with a probability **(Test.7)** The probability for sentence S to have the parse tree T is defined as the product of probabilities attached to reles used to produce the tree: $P(T,S)=\Pi_{rule(i) producing T}P(rule(i))$ $P(S)=\sum_{T producing S}P(T,S)$ ($P(S)$は正しくないかも...) **(Test.8)** ### 12.5.1 The Bottom-Up Chart 391 In contrast to the Earley parser, the CYK algorithm stores completely parsed constituents in the chart. **(Test.9)** ### 12.5.2 The Cocke–Younger–Kasami Algorithm in Prolog 393 ### 12.5.3 Adding Probabilities to the CYK Parser 394 ## 12.6 Evaluation of Constituent Parsers 395 ### 12.6.1 Metrics 395 ### 12.6.2 Performance of PCFG Parsing 396 ## 12.7 Improving Probabilistic Context-Free Grammars 397 T 1 has a probability higher than that of T 2 and then corresponds to the most likely parse tree. we split NP into two symbols: NP^S ,corresponding to a subject, and NP^VP, to an object. s --> np, vp. np --> det, n. vp --> v, np ↓ s --> np_s, vp_s. np_s --> det, n. np_vp --> det, n. vp_s --> v, np_vp. **(Test.10)** ## 12.8 Lexicalized PCFG: Charniak’s Parser 398 ## 12.9 Further Reading 400 ## Mini Test (2018) 1. 以下の文法を使って"the waiter brought the meal"をshift-reduceパーサで解析した時のスタックの変化を以下の表形式で書いてください。reduceしたときには使った規則の番号も書いてください。 2. top-down depth-first parsingとbottom-up depth-first parsingの利点、欠点をまとめてください。 3. chart parsingに関してdotted rule(ドット付き規則)、active arc(活性弧)、inactive arc(非活性弧)について説明してください。 4. 下の表は"Bring the meal"をchart parsingで解析している途中の状態です。これを図示してください。 5. Earley parser(top-down chart parser)のpredictor, scanner, completerについてその役割を説明してください。さらに、これらが協調してどのようにchart parserが動作するか説明してください。 6. Earley parserはtop-downに解析を進めますが、なぜ左再帰規則に対しても無限ループに陥らないのか説明してください。 7. 文脈自由文法と確率文脈自由文法の違いを説明してください。 8. 確率文脈自由文法において木の生成確率、文の生成確率をどのように計算するかそれぞれ説明してください。 9. Earley parserとCYK parserの違いを説明してください。 10. 確率文脈自由文法を使っても統語木のあいまい性解消がうまくいかない理由を説明してください。性能を改善するための方策としてどのようなものがありますか。 ## Mini Test (2019) 1. What is a drawback of transforming left-recursion rules for adopting top-down parsing. 2. The right figure shows an order of words and phrases to be a processes in top-down parsing. Draw a similar figure for bottom-up parsing. 3. Complete the following table representing as shift-reduce paeses a sentence "The waiter brought the meal." by using the grammar rules (1) ~ (9). Indicate a rule number at each reduce action. 4. Summarise the advantage and disadvantage of top-doen and bottom-up parsing. 5. Draw all inactive arcs after parsing "Bring the meal" as a vp by using grammar rules in Q3. 6. Explain the active arcs and inactive arcs of Chart parsing. 7. Give three components od the Earley parser (top-down chart parser), and describe the function of each component briefly. 8. Why Earley parser does not fall into an infinite loop, even though it is basically a top-down parser? 9. How can we calclate the probability of a parse tree generated by PCFG? What about the probability of sentence? 10. Calculate constituent reacall, constituent precision and cross brackets for the following example. Gold: ((bring_v) ((the_det meal_n)_np (of_p (the_det day_n)_np)_pp)_np) Out : (((bring_v) (the_det meal_n)_np)_vp (of_p (the_det day_n)_np)_pp)_vp