Complier DA SA BI

# Complier 期末 DA SA BI [TOC] ## L4: Syntax Analysis ### 3 Classes of Parsers for Grammars * Universal parser * 通用，但很慢 * Specific parser * Top-down * From top to bottom. * EX: LL * Bottom-up * From bottom to top. * 功能比 Universal parser 差，但夠強了 * EX: LR ### Error Handling * Lexcial errors * Syntactic errors * Semantic errors * Logical errors ### Error recovery strategies * Panic-mode: 持續丟棄輸入Token * Phrase-level recovery: 以預測的 good phrases 詞句替換 bad phrases, 可能會錯誤預測 * Error productions: 以額外文法檢測錯誤 * Global correction: 以最小的修改量復原 ### Context-Free Grammar: Formal Definition * Terminals: 組成字串的基本符號 * Nonterminals: 表達字串的集合 * Start symbol: 第一個出現的Nonterminal * Productions: Terminals & Nonterminals 組合成字串的規則 ### Derivation * nonterminals 產出 token 的過程 ![image](https://hackmd.io/_uploads/ByVKRuvXC.png) ### Sentential Form, Sentence, Language * Sentential Form: 在一個 grammar 中，由 start symbol 產出的字串，可包含Terminals和Nonterminals * Sentence: 沒有Nonterminals的Sentential Form * Language: Sentence的集合 > Example: > ``` > S -> AB > A -> aaA | ε > B -> Bb | ε > ``` > 在這種情況下，aaABb是一個Sentential Form，因為它可以從開始S推導出來。然而，aaaabb是一個Sentence，因為它只包含終結符。 ### Mission of a Parser * Parser使用production來產生目標"source code"的過程 ### Leftmost and Rightmost Derivations * If 𝛼⇒𝛽 is a step in which the leftmost nonterminal in 𝛼 is replaced, we write ![image](https://hackmd.io/_uploads/BkYlaOvmR.png) ![image](https://hackmd.io/_uploads/H1gH3dPQ0.png) ### Ambiguities Ambiguities: 有多種方式解釋同一文法 ### Disambiguating rules * 以"丟棄"不需要的語法來解決ambiguity的問題 > ![image](https://hackmd.io/_uploads/Hk5d-KD7C.png) > 如果一個 “if” 有一個 “else”，那麼沒有 “else” 的 “if” 不能作為其子節點。這就是所謂的 “avoid” 規則 ### Why Use Regexes to Define the Lexical Syntax of a Language? * Modularity: 將語言的語法結構分為 lexical 和 nonlexical 部分，便可將編譯器前端化為兩個可管理的部件 * Sufficiency: 語言的 lexical rules 太簡單以至於不需要CFG * Simplicity(for human): Regex更簡潔，更容易理解。 * Efficiency(for computer): 基於Regex的scanner比基於CFG的還快 ### 練習 ![image](https://hackmd.io/_uploads/H1qNfjd7R.png) 1. ::: spoiler ![image](https://hackmd.io/_uploads/BJx7ujdXR.png) ::: 2. ::: spoiler ![image](https://hackmd.io/_uploads/Ska4dsdmA.png) ::: 3. ::: spoiler ![image](https://hackmd.io/_uploads/rJjaYsd7A.png) ::: ### Eliminating Ambiguity * Idea: 先讓內部的、更靠近的if-then match else match ### Elimination of Left Recursion 假設我們有以下語法： S → Aa ∣ b A → Ac ∣ Sd ∣ ϵ 首先，消除間接左遞歸：替換 A → Sd： A → (Aa∣b)d ∣ ϵ 展開後： A → Aad ∣ bd ∣ ϵ 接著，對A應用直接左遞歸的消除方法：原始規則： A → Ac ∣ Aad ∣ bd ∣ ϵ 消除左遞歸後： A → bdA′ ∣ ϵA′ A′→ cA' | adA′ ∣ ϵ 並將原來第一行的規則加入: 第一行: S → Aa ∣ b 變成: S → Aa ∣ b A → bdA′ ∣ A′ A′→ cA' | adA′ ∣ ϵ **這樣就成功地消除了左遞歸。** ### Left Factoring * 使用時機：不曉得要用哪一個 production rule 展開 nonterminal A * 不要一開始就在不同的規則間做抉擇 * 把一樣的看完，看到不一樣的時候，再做決定 * 解決方法：把最長的相同部分的留下，不一樣的部分用另一個符號取代 > Example: > ![image](https://hackmd.io/_uploads/HySIPlGNA.png) ### Non-context-free Language Constructs p43, p44 => #不懂 ### Top-Down Parsing * left-most derivations * derivations 從 start symbol 開始 * 從文法中挑最左邊的 nonterminal 下手 > Example: > ![image](https://hackmd.io/_uploads/BJH2-bGNC.png) ### Different Parsers * Recursive-Descent Parsing * 從 root 一直往下做 parsing * 若發現 terminal 和欲產生的目標句子不符，做 backtrack，選其他條規則 * 缺點：會猜錯、會倒退 (backtrack) * 若是有left recursion的grammar問題，會造成Recursion-Desent Parsing有infinite loop，故須先解決left recursion * Predictive Parsing * 同樣是 recursive-descent parsing 但不會倒退 * 利用查看下一個字元，可確定要走哪條規則 ### First & Follow Set ![image](https://hackmd.io/_uploads/HJ0svDEN0.png) ![image](https://hackmd.io/_uploads/rJohvw44R.png) ### LL(1) ![image](https://hackmd.io/_uploads/BJ15tPNE0.png) #### Predictive parsing table 遍歷每一個產生式 A → α： * 對於每一個 a 屬於 FIRST(A)，將 A → α 加入表項 M[A, a]。 * 如果 ε 屬於 FIRST(α)，對於每一個 b 屬於 FOLLOW(A)，將 A → ε 加入表項 M[A, b]。 **如果存在衝突（即 M[A, a] 已經有一個產生式且當前產生式不同），則文法不是 LL(1)。**