--- title: Adv Compilers(10/13)-W5#5 tags: 111-1, AdvCompiler --- [TOC] <style> .blue { color: blue; } .red { color: red; } .green { color: green; } .purple { color: purple; } </style> ###### reference: [Compiler Design - Syntax Analysis_TutorialPoint](https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm) ![](https://i.imgur.com/WDsC0Y5.png) ##### 10/13(W5) # Syntax - 口語稱 parsing - the second phase of a compiler - Regular expressions cannot check balancing tokens, such as parenthesis. - Regular expressions cannot check balancing tokens, such as parenthesis. - **every Regular Grammar is also context-free**![](https://i.imgur.com/7rgBbC1.png) $\therefore$ CFG is a helpful tool in describing the syntax of programming languages. ## Context-Free Grammar(CFG) - A context-free grammar has four components: 1. **(V)**: A set of non-terminals. - Sets of strings help define the language generated by the grammar. 2. **(Σ)**: terminal symbols, A set of tokens. - The basic symbols from which strings are formed. 3. **(P)**: A set of productions. - Terminals and non-terminals can be combined to form strings. - Each production consists of a **non-terminal**, called the *left side of the production*, **an arrow**, and **a sequence of tokens and/or non- terminals**, called the *right side of the production*. 4. **(S)**: Start symbol. - One of the non-terminals is designated for where the production begins. ### v.s. Context Sensitive - 要回去看 context 的前後 token 才能決定要不要使用某條 grammer。 ## Syntax Analyzers(=parser) - Take the input from a lexical analyzer in the form of token streams. - Accomplishes two tasks 1. parsing the code, looking for errors 2. generating a parse tree as the output of the phase. ## Derivation(=a sequence of production rules) - In order to **get the input string** during parsing, we take two decisions for some sentential form of input: 1. Deciding the **non-terminal** which is to be replaced. - Options to determine non-terminal which is to be replaced. (1) Left-most Derivation input string 中有兩個以上的 non-terminal 時,選**最左方**(符合 production rules)的 non-terminal 推導為 token strings。 (2) Right-most Derivation input string 中有兩個以上的 non-terminal 時,選**最右方**(符合 production rules)的 non-terminal 推導為 token strings。 2. Deciding the **production rule,** by which, the non-terminal will be replaced. ## Parse tree - All leaf nodes are terminals. - All interior nodes are non-terminals. - In-order(照順序) traversal(拜/巡訪) gives **original input string**. 按照文法推導,將 leaf nodes of tree 從左至右依序讀出,即 original input string。 ### Ambiguous - Have **more than one parse tree** (left or right derivation) for at least one string. $\rightarrow$ 當一個文法產生不只一種 parse tree 時,此文法為 Ambiguous。 - No method can detect and remove ambiguity automatically - However, it can be **removed** by either ***re-writing** the whole grammar without ambiguity*, or by *setting and following **associativity** and **precedence** constraints(限制)*. ### _associativity(結合性) - If the operation is **left-associative**, then the operand will *be taken by the left operator* ,or if the operation is **right-associative**, the *right operator will take the operand*. - <span class="red">**相同的 operator 同時出現時,哪個先算。**</span> - Operations such as **Addition**, **Multiplication**, **Subtraction**, and **Division** are **left associative**. - example (id op id) op id $\rightarrow$ (id + id) + id - Operations like **Exponentiation** are **right associative**. - example id op (id op id) $\rightarrow$ id ^ (id ^ id) ### _precedence - <span class="red">**不同 operator 同時出現,哪個 operator 先進行運算。**</span> (定義時的優先度) ###### 如何寫一個程式,讓程式做 parsing ## Parsing way ![](https://i.imgur.com/NjWn7yy.png) ### Top-down Parsing ### _Predictive Parsing - a recursive descent parser **with no backtracking or backup**. - the choice of the rule to be expanded is made upon the next terminal symbol. - PROPERTIES: - The most used hand-writing algorithm - Recursive descent - One function for each non-terminal(每個 non-terminal 寫一個 function) - One clause for each production(每個 production 用一個 case-switch處理) - 若符合條件的文法,就可以 implement it;有任何狀況不符合規定(能同時符合兩個選項/不能給單一選項),就不能稱 predictive parsing ###### ref:[Predictive Parser in Compiler Design](https://www.geeksforgeeks.org/predictive-parser-in-compiler-design/) ### Bottom-up Parsing ## Eliminating Left Recursion - Left Recursion - If it has a non-terminal A such that there is a derivation A → A $\alpha$ for some string $\alpha$. - arrow 右邊的第一個 token 就是 arrow 左方的 non-terminal 時,會無限呼叫自己。 - **Eliminating** it. - technique 1: ![](https://i.imgur.com/EreWulq.png) - technique 2: ![](https://i.imgur.com/f8DlqMy.png) - (在 left derivation 將 left recursion(左遞迴)改為 right recursion(右遞迴)) ![](https://i.imgur.com/FwIoWKp.png) ## Left Factoring - factoring: 把同類的字提出來 - 無法從 arrow 右邊的第一個 token 判別出要使用哪個 production 時的改寫方式。 ## LL(1) - <span class="red">**No** *ambiguous* or *left-recursive grammar* can be LL(1).</span>