---
title: Adv Compilers(10/13)-W5#5
tags: 111-1, AdvCompiler
---
[TOC]
<style>
.blue {
color: blue;
}
.red {
color: red;
}
.green {
color: green;
}
.purple {
color: purple;
}
</style>
###### reference: [Compiler Design - Syntax Analysis_TutorialPoint](https://www.tutorialspoint.com/compiler_design/compiler_design_syntax_analysis.htm)

##### 10/13(W5)
# Syntax
- 口語稱 parsing
- the second phase of a compiler
- Regular expressions cannot check balancing tokens, such as parenthesis.
- Regular expressions cannot check balancing tokens, such as parenthesis.
- **every Regular Grammar is also context-free**
$\therefore$ CFG is a helpful tool in describing the syntax of programming languages.
## Context-Free Grammar(CFG)
- A context-free grammar has four components:
1. **(V)**: A set of non-terminals.
- Sets of strings help define the language generated by the grammar.
2. **(Σ)**: terminal symbols, A set of tokens.
- The basic symbols from which strings are formed.
3. **(P)**: A set of productions.
- Terminals and non-terminals can be combined to form strings.
- Each production consists of a **non-terminal**, called the *left side of the production*, **an arrow**, and **a sequence of tokens and/or non- terminals**, called the *right side of the production*.
4. **(S)**: Start symbol.
- One of the non-terminals is designated for where the production begins.
### v.s. Context Sensitive
- 要回去看 context 的前後 token 才能決定要不要使用某條 grammer。
## Syntax Analyzers(=parser)
- Take the input from a lexical analyzer in the form of token streams.
- Accomplishes two tasks
1. parsing the code, looking for errors
2. generating a parse tree as the output of the phase.
## Derivation(=a sequence of production rules)
- In order to **get the input string** during parsing, we take two decisions for some sentential form of input:
1. Deciding the **non-terminal** which is to be replaced.
- Options to determine non-terminal which is to be replaced.
(1) Left-most Derivation
input string 中有兩個以上的 non-terminal 時,選**最左方**(符合 production rules)的 non-terminal 推導為 token strings。
(2) Right-most Derivation
input string 中有兩個以上的 non-terminal 時,選**最右方**(符合 production rules)的 non-terminal 推導為 token strings。
2. Deciding the **production rule,** by which, the non-terminal will be replaced.
## Parse tree
- All leaf nodes are terminals.
- All interior nodes are non-terminals.
- In-order(照順序) traversal(拜/巡訪) gives **original input string**.
按照文法推導,將 leaf nodes of tree 從左至右依序讀出,即 original input string。
### Ambiguous
- Have **more than one parse tree** (left or right derivation) for at least one string.
$\rightarrow$ 當一個文法產生不只一種 parse tree 時,此文法為 Ambiguous。
- No method can detect and remove ambiguity automatically
- However, it can be **removed** by either ***re-writing** the whole grammar without ambiguity*, or by *setting and following **associativity** and **precedence** constraints(限制)*.
### _associativity(結合性)
- If the operation is **left-associative**, then the operand will *be taken by the left operator* ,or if the operation is **right-associative**, the *right operator will take the operand*.
- <span class="red">**相同的 operator 同時出現時,哪個先算。**</span>
- Operations such as **Addition**, **Multiplication**, **Subtraction**, and **Division** are **left associative**.
- example
(id op id) op id $\rightarrow$ (id + id) + id
- Operations like **Exponentiation** are **right associative**.
- example
id op (id op id) $\rightarrow$ id ^ (id ^ id)
### _precedence
- <span class="red">**不同 operator 同時出現,哪個 operator 先進行運算。**</span>
(定義時的優先度)
###### 如何寫一個程式,讓程式做 parsing
## Parsing way

### Top-down Parsing
### _Predictive Parsing
- a recursive descent parser **with no backtracking or backup**.
- the choice of the rule to be expanded is made upon the next terminal symbol.
- PROPERTIES:
- The most used hand-writing algorithm
- Recursive descent
- One function for each non-terminal(每個 non-terminal 寫一個 function)
- One clause for each production(每個 production 用一個 case-switch處理)
- 若符合條件的文法,就可以 implement it;有任何狀況不符合規定(能同時符合兩個選項/不能給單一選項),就不能稱 predictive parsing
###### ref:[Predictive Parser in Compiler Design](https://www.geeksforgeeks.org/predictive-parser-in-compiler-design/)
### Bottom-up Parsing
## Eliminating Left Recursion
- Left Recursion
- If it has a non-terminal A such that there is a derivation
A → A $\alpha$ for some string $\alpha$.
- arrow 右邊的第一個 token 就是 arrow 左方的 non-terminal 時,會無限呼叫自己。
- **Eliminating** it.
- technique 1:

- technique 2:

- (在 left derivation 將 left recursion(左遞迴)改為 right recursion(右遞迴))

## Left Factoring
- factoring: 把同類的字提出來
- 無法從 arrow 右邊的第一個 token 判別出要使用哪個 production 時的改寫方式。
## LL(1)
- <span class="red">**No** *ambiguous* or *left-recursive grammar* can be LL(1).</span>