【Compiler】 Ch1. Introduction

什麼是編譯器？

編譯器是一個program，可以將來源語言(source language)轉換為目標語言(target language)。
- 很多人會將compiler和IDE搞混，IDE全名為Integrated Development Environment，是工程師開發用的整合環境。IDE除了包含compiler之外，通常還有文字編輯、除錯工具等功能。
- 例如：VisualStudio 是一個IDE，gcc 是一個編譯器。
編譯器範例：
- C/C++、Java的編譯器
- 文字格式化 - Tex、LaTex
- Silicon compilers(硬體編譯器)
- Query interpreters - SQL compilers
- 組譯器
- 瀏覽器

Analysis-Synthesis Model

編譯的流程可以分成以下兩塊：
- Analysis，分析，又稱front end
  - 將原程式拆成不同的零件
  - 得到中間碼 intermediate representation (IR)
- Synthesis，合成，又稱back end
  - 利用IR組合出目標程式
  - (選做)最佳化
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
本課程礙於時間，只著重在Analysis的部分。

Analysis

有些工具、功能就是利用Analysis的功能：
- 結構編輯器
  - 有些程式編輯器會有自動完成code的輔助功能，例如打入for就幫你產生好後面的括號等，就是利用Analysis的功能。
- 美觀顯示
  - 分析程式來完成有高亮功能的文字編輯器。
- 靜態檢查
  - 在你還沒編譯/執行程式前，有些工具就可以幫你早出單字拼錯、型別錯誤等問題。
- 直譯器
  - 不產生出完整的目標程式，直接分析完來源程式就執行。

Analysis分為三個面向
- Linear Analysis (Lexical Analysis)
  - 掃描字元並將他們分組，形成tokens
- Hierarchical Analysis (Syntax Analysis)
  - 將tokens組合成語法
- Semantic Analysis
  - 辨識語法錯誤和型別問題
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Lexical Analysis

將字元們轉成tokens。
- Tokens是語法中的最小單位。
舉例，現在有一行程式為position = initial + rate * 60
- 經過scan，將其轉為<id, position> = <id, initial> + <id, rate> * 60
- 其中<id, position> = <id, initial> + <id, rate> * 60都各是一個token

Syntax Analysis

將tokens分組，形成為語法。
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
在階層式結構中，常用遞迴關係去表達一個程式，例如：
- 所有identifier都是一個expression
- 所有number都是一個expression
- 如果expression1和expression2都是expressions，則 expression1 op expression2和(expression1)也是expression

Semantic Analysis

檢查語意錯誤。
產生型別的資訊，也可以用於做隱含型別轉換等功能。
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

上圖為整個編譯的流程圖，由上而下得到目標程式。
- 中間每個環節都需與symbol table溝通
- 中間每個環節都需注意錯誤偵測

Symbol-Table Management

編譯過程中我們必須建立symbol table去存放各種identifier的資訊
- identifier的名稱
- identifier的種類(變數、常數、函式等)
- identifier的型別(int, float, string)
- identifier的可視範圍、生命週期
在lexical analysis偵測到identifier後放入table，在syntax analysis and semantic analysis填入屬性。

Error Detection and Reporting

以下為編譯時常見的錯誤。
Lexical: 出現看不懂、未定義的字元或是字元無法組成合法的token。
- 例如：int @123
Syntax: token無法組成合法語句。
- 例如：int if = 0
Semantic: 包含很多錯誤，最常見為型別錯誤。
- 例如：int a = b + c，但b屬於陣列

tags: `compiler` `note`