# 編譯器 Compiler ###### tags: `courses meow` `CSIE-3` `Compiler` `2021 Fall` :::success :::spoiler click to open TOC [TOC] ::: ::: :::info ### :link: Links - [上課連結](https://meet.google.com/dzf-jskc-bzd) - [eeclass](https://ncueeclass.ncu.edu.tw/course/bulletin/9342) - [9 grid](http://ninegrids.csie.ncu.edu.tw/) - [online judge](http://domjudge.csie.ncu.edu.tw/) ::: ## Introduction ```graphviz digraph { label="Figure 1.1: A user's view of a compiler" rankdir = LR proLan [label="Programming Language"] com [label="Compiler"] macLan [label="Machine Language"] proLan -> com -> macLan } ``` ### Front-end and Back-end - Front-end - turn source code into **intermediate representation (IR)** - platform indeqndent (Mac, Linux, Windows...) - Back-end - generate the **target code (assembly code)** - platform dependent (Mac, Linux, Windows...) ```graphviz digraph { source [label="Source\ncode"] ir [label="Intermediate\nrepresentation"] target [label="target\ncode"] rankdir = LR source -> ir ir -> target source -> target [style=dashed] } ``` ### Types of machine code - Pure Machine Code - without operating system or library routines - Augmented Machine Code - *Augmented* by - `syscall` / `win32API`... - other - Virtual Machine Code - useing **virtual instruction set** ![](https://i.imgur.com/RyIDBsE.jpg) - e.g. - Linux `/usr/bin/java hello-world.jar` - Windows `C://programx86//java hello-wrold.jar` - java -> JVM interpret ### Bootstrapping We have C compiler on X86 and it was written by C language How can we generate a compiler on ARM? ![](https://i.imgur.com/K6WmOvr.png) > `cc.c` => c compiler source code > `cc-x86-modified.exe` => an executable c compiler which backend has modified (to produce ARM instruction set) > `cc-arm.exe` => an executable c compiler in ARM ```graphviz digraph { source [label="cc.c"] cc_x86_arm_exe [label="cc-x86-modified.exe"] cc_arm_exe [label="cc-arm.exe"] rankdir = LR source -> cc_x86_arm_exe -> cc_arm_exe } ``` ![](https://i.imgur.com/SjYjJYa.png) #### Our understanding of the graph Goal: we want a **L-compiler** written in **L**, but we do not have **L-compiler**(but we have K-compiler) -> So we use **K** to write a compiler for **L** -> first **L-compiler** get!! -> use **first L-compiler** to compile **L-compiler** written in **L** -> **L-compiler** written in **L** get!! -> discarded the **first L-compiler** > and ignore the VM part. idk what it means... :::info - 假設我今天有一個可以編譯J語言的編譯器J_compiler - 目標:要寫一個R語言,而且我希望這個R語言是自託管(有專門編譯R語言的編譯器且由R語言寫成)而非代託管(有專門編譯R語言的編譯器但並非由R語言寫成,可能由L語言寫成...) - 步驟: - 第一步:首先使用J語言寫出一個代託管的R語言編譯器 -> R_compiler.j - 第二步:將R_compiler.j放入J語言編譯器中得到執行檔 -> R_compiler.exe - 第三部:用R語言寫出一個R語言的自託管編譯器 -> R_compiler.r - 第四部:將R_compiler.r放入R_compiler.exe中得到用R語言寫成的R語言編譯器(完成) <!-- 如果我沒有理解錯的話應該大概是這樣? 像這樣用不同語言產生新的編譯環境似乎又叫Cross Compiler(?我沒有很確定哈哈 --> ::: ```graphviz digraph { node [shape="record"] L_com_K [label="L-compiler.K"] L_com_K_exe [label="L-compiler.K.exe"] L_com_K -> L_com_K_exe [label="by K-compiler"] L_com_L [label="L-compiler.L"] L_com_L_exe [label="L-compiler.L.exe"] L_com_L -> L_com_L_exe [label="by L-compiler.K.exe"] } ``` ![](https://i.imgur.com/qTFTS7u.png =x200) ### Compiler vs. Interpreter - compiler - 特點:Compiler會根據不同的系統架構,直接將程式碼轉換成電腦語言然後再轉換成執行檔。 - 優點: - 因為是轉換成電腦語言,所以電腦在處理電腦語言的時候速度會較快。 - 缺點: - 編譯過程中需要對應不同的系統環境進行轉換,因此跨平台能力低(0?)。 - Interpreter - 特點:直譯器會先將程式碼轉換成特定中間語言(OS independent),然後要執行時再逐行轉換並執行。 - 優點:由於事先轉換成中間語言,因此大部分系統都可以共用此中間語言,跨平台能力高。 - 缺點:電腦需要先轉換成中間語言,且執行時才一行一行轉譯,需要耗費較多時間。 ### Different types of interpreter - interpret one line and than execute one line - > LISP, BASIC - translate source code into some efficient **intermediate representation** and immediately execute this - > e.g. Perl, Python, MATLAB, and Ruby - compiler make a pre-compiled code -> interpreter run them (compiler is part of interpreter) - > e.g. UCSD, Pascal ### Organization of a Compiler ![](https://i.imgur.com/2Q7u8SE.png) Scanner, Parser -> frontend others -> backend ### Scanner analysis of the source program ![scanner explanation](https://i.imgur.com/QmKDNQP.png) ### Parser builds an abstract syntax tree (AST) ![](https://i.imgur.com/1hBWrdl.png) ### Type checker(semantic analyzer) - checks and decorates the **semantics** of each AST code - checks type **correctness**, **reachability** and **termination**, and **exception handling** > :warning: disclaimer: no guarantee on correctness[name=蛋蕉] :::info -> source code ``` int a = 1 + 2 - 3 * 4 - 5 ``` -> Parser ``` (assign (a) (sub (sub (add (1) (2)) (mul (3) (4))) (5)) ``` > Hey! this is a tree! -> typechecker ``` (assign (int:a) (int:sub (int:sub (int:add (int:1) (int:2)) (int:mul (int:3) (int:4))) (int:5))* ``` > Hey! I'm tree ::: :::warning reachability & termination reachability: x correctness: ```c= // test.c int a; a = 5.2; ``` ![](https://i.imgur.com/UT3WxUU.png) ::: ### Translator - If an AST node is semantically correct then ```graphviz digraph { source [label="AST"] IR_code [label="IR code"] rankdir = LR source -> IR_code [label="Tranlator"] } ``` ### Optimizer - Improves IR code’s performance - Reomve redundant code ### Code generator - Mapping IR code or AST into target code --- # chapter 2 ## The grammar for ac Assumed we have a simple language called ac :::info `i a` declare a integer variable a `f a` declare a float variable a `a = 1` assign 1 (int num) to variable a ::: ### grammar example > 1. a program can be split into declarations and statements > 2. declarations can be split into many single declaration > 3. a declaration can be split into `intdcl id` or `dloatdcl id` > 4. ... ``` 1 Prog -> Dcls Stmts $ 2 Dcls -> Dcl Dcls | \lambda 4. Dcl -> flaotdcl id | intdcl id 6. Stmts -> Stmt Stmts | \lambda 9. Stms -> id assign Val Expr | print id 10 Expr -> plus Val Expr | minus Val Expr 13. Val -> id | inum | fnum ``` - $\lambda$ (\lambda): empty string - Prog: Program - Dcls: declarations - dcl: single declaration - Stmts: Statements - stmt: single statement - Expr: Expression - Val: value - inum: integer numbers - fnum: floating numbers ### terminal - terminal - $\lambda$ - intdcl, floatdcl - inum, fnum - id - assign - print - ... - non-terminal - stmt - dcl - val - expr - ... ### decent recursive way > not all grammar can implemented by recursive way > many grammar cannot be written as recursive decent parser just write a function for every non-terminal grammar. ## AST ``` # an example of ac code f b i a a = 5 b = a + 3.2 p b ``` ```graphviz digraph { node [shape="record"] p [label="Program"] fb [label="{floatdcl | b}"] p -> fb p -> ia ia [label="{intdcl | a}"] ass1 [label="assign"] ida [label="{id | a}"] i5 [label="{inum | 5}"] p -> ass1 ass1 -> ida ass1 -> i5 ass2 [label="assign"] idb [label="{id | b}"] plus [label="plus"] ida2 [label="{id | a}"] f3d2 [label="{fnum | 3.2}"] p -> ass2 ass2 -> idb ass2 -> plus plus -> ida2 plus -> f3d2 pb [label="{print | b}"] p -> pb } ``` ### traversal AST #### create symbol table ```graphviz digraph { node [shape="record"] p [label="Program"] fb [label="{floatdcl | b}"] p -> fb [color="red" label="create symbol table"] p -> ia [color="red" label=""] symbol_b[label="register *b* as a float\n in symbol table" color="red"] fb -> symbol_b symbol_a[label="register *a* as a int\n in symbol table" color="red"] ia -> symbol_a ia [label="{intdcl | a}"] ass1 [label="assign"] ida [label="{id | a}"] i5 [label="{inum | 5}"] p -> ass1 ass1 -> ida ass1 -> i5 ass2 [label="assign"] idb [label="{id | b}"] plus [label="plus"] ida2 [label="{id | a}"] f3d2 [label="{fnum | 3.2}"] p -> ass2 ass2 -> idb ass2 -> plus plus -> ida2 plus -> f3d2 pb [label="{print | b}"] p -> pb } ``` #### type ```graphviz digraph { label="after type checking" node [shape="record"] p [label="Program"] fb [label="{floatdcl | b}"] p -> fb p -> ia ia [label="{intdcl | a}"] ass1 [label="{assign | integer}" color="red"] ida [label="{id | a}"] i5 [label="{inum | 5}"] p -> ass1 ass1 -> ida ass1 -> i5 ass2 [label="{assign | float}" color="red"] idb [label="{id | b}"] plus [label="{plus | float}" color="red"] int2float [label="{int2float | float}" color="red"] ida2 [label="{id | a}"] f3d2 [label="{fnum | 3.2}"] p -> ass2 ass2 -> idb ass2 -> plus plus -> int2float int2float -> ida2 plus -> f3d2 pb [label="{print | b}"] p -> pb } ``` ### code generation # Chapter 3 ## Regular expression [JavaScript 的 rex](https://developer.mozilla.org/zh-TW/docs/Web/JavaScript/Guide/Regular_Expressions) :::danger ### 注意 regular expression 有很多不同版本,每個版本的定義會有點不同 ::: ### some boring definition bla bla bla ### LEX https://www.computerhope.com/unix/ulex.htm --- # Chapter 4 > 我不是很確定我聽到了什麼 > 同感 ## The definition of a grammar xxx ## Properties of grammars - nullable - useless - if it can never procudct a terminal - ambiguous - more than one parse-tree ## First() & Follow() > 希望我理解的沒錯 ### First set :::info FIRST($\alpha$) > the set of terminals that begin strings derived from $\alpha$ ::: - 第一個不可被替換的terminal們 - EX: - S -> ABC - A -> a|$\lambda$ - B -> b|$\lambda$ - C -> c|$\lambda$ - First(S) = {a,b,c} - 因為A有可能返回 $\lambda$ ,所以當 A 為 $\lambda$ 時第一個就是 b,所以要把 B 考慮進去,同理C。 - 因為不知道怎麼打Landa所以打^當作NULL