C.A.Lee
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- robots: index, follow tags: NCTU, CS, 共筆 description: 交大資工課程學習筆記 lang: zh-tw dir: ltr breaks: true disqus: calee GA: UA-100433652-1 --- 編譯器設計概論 -- 楊武 ====== [TOC] ## Syllabus - http://people.cs.nctu.edu.tw/~wuuyang/homepage/Lecture/lecture.compiler.html - 書: [C.N. Fischer, R.K. Cytron, and R.A.Y. J. LeBlanc, Jr., Crafting a Compiler, Addison-Wesley, 2010.](https://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjsutvMjKHWAhUCopQKHZdyCw8QFggmMAA&url=http%3A%2F%2Fbank.engzenon.com%2Fdownload%2F560e7301-482c-43fd-9f80-16a9c0feb99b%2FCrafting_a_Compiler_by_Fischer_Cytron_and_LeBlanc.pdf&usg=AFQjCNGF2HT8dpEPo4SZcy1srmIaOta82A) - 成績 - 期中期末 30-40% - Project(分四次 lab) 20-40% - Scanner - Parser - Abstract syntax trees, Symbol table - Code generator ``` program abc; var i, j : integer; begin i := 1 j := i + 2; end; ``` ## Course ### Ch1 Introduction - flow: C code -C compiler-> Assembly -Assembler-> Binary - Course Structure 1. Introduction 2. A Simple Compiler 3. Scanning - Theory and Practice 4. Grammars and Parsing 5. Top-Down Parsing 6. Bottom-Up Parsing 7. Syntax-Directed Translation 8. Additional reading: Double dispatch 9. Symbol Tables and Declaration Processing 10. Additional reading: Java exception 11. Semantic Analysis 12. Intermediate Representations 13. Code Generation for a Virtual Machine 14. Additional reading: Java bytecode generation 15. Run-Time Support 16. Target Code Generation (Register Allocation) 17. Additional reading: BURS instruction selection 18. Program Optimization (Control Flow Analysis, Data Flow Analysis, Static Single Assignment Form) - Overview - ![](https://i.imgur.com/yYv3anl.png =350x) - source code is portable: compiler 讓程式不需要 base on 硬體(CPU...) - 利用不同的 compiler 編譯到不同的硬體上 - 高階語言比較容易懂 - compiler 使的程式設計更快速 - What Do Compilers Do? - 產生 machine code - pure machine code - augmented machine code - hardware + OS + language-specific support - ex. I/O, math functions, storage allocation - virtual machine code - 需要 VM - ex. JVM, LLVM, Pascal P-code, MISP U-code - Format - assembly code format QQ - 對小機器很方便 ( CPU mem 很小之類的) - relocatble binary format - 產生 machine code,但是有些地方填空白,為了可以把動態的位址 or 其他東西填入 - 可以把別人的程式 link 進來 - load-and-go - compile 產生的 code 只放在 mem 裡,不會存入硬體 - ex. 測試程式時 - Interpreters(直譯器) - 讀一行,作一行 - <--> compile 是把程式全部讀入,全部編譯,全部產生 machine code - 可是現在的 compile 與 interpreters 會混在一起,都可能讀一行作一行,都可能一個 session 一起編譯 - Ex: python interpreter - 好處: - 執行時的可修改性 - dynamically-typed - 更好除錯 (每一行都是斷點) - 因機器而異 (machine independence) - 有些語言在發展程式時,用 interpreter,在 production 時用 compiler - JIT compiler / Runtime compiler - Just In Time Compiler QQ - program 經過 compile 之後變成中介碼 ( IR ),中介碼一邊執行一邊 compile (????????????) - ex. JS - Syntax and Semantics - 根據 programming language menu 設計 - Syntax - context-free syntax - context-sensitive syntax - Semantics(語意) - ex. 變數應該要被 defined,type 正確 - ex. attribute grammars `E ::= E + T` - run-time semantics - operational semantics - 非常麻煩 (?) - axiomatic semantics - 用來作證明 - denotational semantics - 比較像 high level language - ex. NYU’s Ada-Ed system - Formal semantics - 目前的自然語言容易被誤解,不同人的解讀會不一樣 - 有了統一格式後,大家都要照著這個格式去解讀 - Organization of a Compiler ![](https://i.imgur.com/5SucgcK.png =350x) 1. analysis of the source code (分析) 2. synthesis of the target code (組合) - 分段工作 ![](https://i.imgur.com/L29277w.png =150x) - compiler 內部 flow ![](https://i.imgur.com/Hm0qyrn.png =250x) - [symbol table](https://zh.wikipedia.org/wiki/%E7%AC%A6%E5%8F%B7%E8%A1%A8) - scanner: - 把空白, comment, ... 的東西踢掉 - 把 code 分組 - [#PRAGMA](http://ccckmit.wikidot.com/cp:pragma) (給 compiler 一個指引) - 現在多用 regex 作規範 (lex) - ![](https://i.imgur.com/8xQ8SJc.png =100x) - parser: - 透過 menu 的規則,把 token 合併 or 分析成不同的結構(while, for, if...) - 根據 grammar 將 code 改成 token - yacc/llgen: 將 grammar 改成 parser 的工具 - sematic routine: 檢查 menu 規定,產生 IR - 可能符合 grammar 的樣子,可是結構不對 - ex. 重複宣告, 繼承關係 - 檢查完後,產生 IR - 整理成 attribute grammars(AG) - optimizer - 有時候 optimize 會在 linker 才作 (看到大架構後才作優化) - code genterator - 產生 machine code - 每種 CPU 的 register 都不一樣 => 作法不太一樣 - 經過 optimizer 後,就比較難 debug 惹 - 種類 - one-pass compiler - 一口氣就做出來了(沒有 IR) - retargetable compiler - 把可重用的抽出來,下次有機會時可以把他拿出來重用 - e.g. gcc - Symbol tables - 存一些資訊 - 用什麼資料結構也有影響 - ex. ![](https://i.imgur.com/zDukKPe.png =200x) - Compiler Writing Tool - Programming Language & Compiler - Compiler 是一個被動的東西,需要根據 language 的 menu 來作 - BUT compiler 也會影響到程式語言 -> language 的設計要讓 compiler 好寫好讀,不然難維護,之後就會慢慢式微了 - Computer architecture & Compiler - 原來 compiler compile 出來的東西要 architecture instructure 有才能用 - CPU 有很多 instructure 是 compiler 不會用的 -> 拔掉 ya - 設計 instructure 需要問 compiler 會不會用 & 要教別人用 - Compiler 種類 - diagnostic - 簡單 - 穩定 - 重點在不能出錯 => 可能出錯的都抓出來 - optimizing - 最佳化 - 盡量做出最快的(compile 出來後的程式要跑得愈快愈好) - 一直在裡面改,測試會不會比較好 - retargetable - compile 完的程式可移到其他平台上做 - separate - 將檔案切開來 compile,在用 linker 連起來 - 如果之後有改 code,只要將改到的部分被切出來的地方重新 compile 就可以了,不用全部重 compile - close - 在 A 平台上,幫忙 B 平台 compile - ex. OS, 嵌入式 - integrated programming environments - parallelizing and distributed - virtual machines and cloud computing ![](https://i.imgur.com/9tKK1x4.png) - bootstrapping and cross-compiler - 用自己的語言,產生自己的 compiler - ![](https://i.imgur.com/1vYKFOI.png =404x) - ![](https://i.imgur.com/4iZV6cu.png =404x) ## Ch2 A Simple Compiler - A simple compiler 1. An informal definition of the ac language 2. Formal definition of ac 3. Phases of a simple compiler 4. Scanning 5. Parsing 6. Abstract syntax tree 7. Semantic analysis 8. Code generation - 將 ac 語言轉換為 dc 語言 - ac: - the adding calculator language - types: only integers and floats. - keywords: f (float), i (integer), and p (print), if, begin, while repeat, ... - variables: 23 個 a~z - Formal Define `ac` - Token - Token 是 characters 的序列 - Grammar - 用 context-free grammar 定義出 syntax - ex. `:=` 是 Pascal 的 assign token - 在 := 左邊的是 nonterminal => 代表可以繼續解析 - 在 := 右邊的 => 有可能是 nonterminal,也有可能是 terminal - terminal 的東西其實就是 scanner 傳回來的 token - grammar ex. - ![](https://i.imgur.com/ceFyy6M.png =450x) - derivation - 解析 - ex. ![](https://i.imgur.com/QY5tuOd.png =550x) - derivation 也可以化成 tree - ![](https://i.imgur.com/t9dhLpc.png =350x) - parsing 就是到過來的 derivation - syntax tree - complex syntax tree - 比較複雜 - - abstract syntax tree - compiler 都在這個上面作 => 比較簡單 - ![](https://i.imgur.com/m8aRphG.png =660x) - Scanning - 把 token 抽取出來 - 把 token 們切開 - 有些 token 的解析不只要讀一個 char,可能要讀更多 char - i.g. ++ - 同一個字串,可能有多種拆法 - ex. 12345 or 123, 45 or 12, 34, 5 or ... - 一般來說,都是找合法最長的 - Parsing - 把每一個 parsing procedure 寫出來 - 幾乎不用大腦 ˊˇˋ - Syntactic analysis - 像是 int + char => parser 檢查不到 - private, public - 需要寫 syntactic routing 作這些檢查 - ex. Java: x.y.z => CFG 無法區分這個東西 - 問題 1. syntactic analysis 不夠強大到足以檢查所有語言 => 有時候是程式 run 起來後在自己檢查 2. abstract symbal tree 好像很麻煩 => 如果不做了化,在軟體工程來看,開發起來會更麻煩 - Semantic analysis - Symbol tables - 動態維護 - 把 symbol 存起來,至於要怎麼存,就是實作上的課題了 - ex. abstract symbol table tree - Type checking - 檢查 abstract symbol tree 上的每一個 node - 如果這個 operation 需要檢查 type,則會作特別檢查 ex. int + string (X) - 如果有自動轉型了話,其實只是自動幫你加上一個轉型的 function call - Code generation - mathine code - 要先講是哪一顆 CPU - 舉 JVM 為例: - 好吧,我也看不懂 - 怪怪的 instruction... ## Ch3 Scanner - Overview - function input 其實就是一個 scanner - Scanner 是將 string 分成一塊一塊的 token - Scanner 需要了解程式語言設計本身的細節 - Scanner 的規則要一樣 - 用 regex 來定規則相當不錯 - 用 regex 就不用寫 scanner 了,這套規則自己就是程式本身了 - ya 真好 - Regex - xxx - Finite automata - Practical Considerations - Reserved Words - Use regular expressions => 非常複雜 - 依序檢查 keyword (tree 比對) - Symbol Table - string space - NFA - DFA - scanner 1. NFA -> DFA 2. miniman DFA - lenth of RE - 把括號去掉後的剩下字串長度 ## Ch4 Grammar - Context-free grammars - `$` 代表結束 - 所有 grammars 產生的 sentence 所形成的集合 => language - 名詞 - sentence: 全部都是 terminal - sentential form: 沒有做完的 sentence(生成 sentence 中,還留有某些 non-terminal 還沒作) - phrase - simple phrase - handle:最左邊的 simple phrase - parser - leftmost / rightmost - 從最左邊/右邊開始做的 parser - grammars - ambiguous grammars - 無法產生唯一的 pase tree - Σ∗ is the set of all finite-length strings composed of terminals - Context-free grammars - Context-sensitive grammars - Chomsky’s hierarchy - type 0 ~ 4 - type 0 grammars: no restrictions, α → β. - type 1 grammars: context-sensitive grammars, αAβ → αδβ. (parse 會很久) - type 2 grammars: context-free grammars, A → α. (CSG↑ 的特例) - type 3 grammars: regular grammars, A → aB. - useless nonterminals - 存在會造成永遠無法終止的 nonterminal symbol - ex. B→bB - ambiguous grammars - 可能會造成不只一個 paser tree - ex. `E→E−E` && `E → ID` - 無法判斷是否是 ambiguous - => 可是可以符合某些條件必定不是 ambigous - grammar equality - 是否定義同一個 language - 無解 - Extended BNF ⇒ BNF - 多了 [] 與 {},讓寫 grammar 比較方便 - Parsers and Recognizers - Recognizer: 判斷 input 是否符合 grammar - Parsers: 除了判斷 input 之外,還要畫出 parsing tree - top-down, predictive, pre-order, lm, LL - bottom-up, post-order, rm(reverse), LR - 不管是 LL or LR parser,都會往前多看一個 char (也只要多看一個) - grammar 分析 - Nullable nonterminals - 哪些 nonterminal 可以變成 $\lambda$ - 作法: - 產生 termal 的 nonterminal 是 nullable - 將這個 nonterminal 加入 nullable - 重複確認 nonterminal 是否是 nullable - 若是,加入 nullable - 重複檢查動作 - first sets and follow sets - ![](https://i.imgur.com/8kfjSbm.png =380x) - E => P(E) => f(E): 所以 f 會在 E 的 first set 裡 - terminal 的 first set 是自己 - 找 first: (a = X0X1X2...) - first(a) = first(X0) - 如果 first(X0) 可以是 $\lambda$ - first(a) = first(X0) + first(X1) - 如果 a 沒有直接變成 $\lambda$,而是 first(Xn) 有 $\lambda$,這些後來被加進來的 $\lambda$ 都不要算 - 以此類推... - follow set 的算法 - follow set 代表這個 state 結束後,下一個可能會遇到的 symbol - 聽不太懂 QQ ## Ch5 LL Parser (Top-Down Parsing) - top-down parsers - Recursive descent parser - Table-driven LL parser - top-down and depth-first, leftmost, 1-token lookahead - parser generator: 幫助生成 parser 的工具 - grammar => parser generator => parser - LL(k) Grammar - 應該走哪一條 - try-and-error - lookahead - 當有多條路時 - 先建一個 predict table,主要是用來紀錄這個 state 下一個會遇到的 symbol 可能是哪些 - 所以需要統計 next state 的 first symbol - 在用 lookahead 對照這個 predict table - predict table - ![](https://i.imgur.com/VN8LLJS.png =100x) - ![](https://i.imgur.com/5bO0xWP.png =350x) - 如果 predict table 找到的 first symbol 不只一個 => 不是 LL(1) grammar,lookahead 只找一個不夠 - LL(k) 代表往前看 k 個 - Recursive descent LL(1) Parser - Recursive 作 - Table Driven LL(1) Parser - Loop + Stack 作 - 建一個二維 table,其中一個維度是 symbol,另一個是 state - ![](https://i.imgur.com/tztW7Y4.png =350x) (數字代表第幾條 rule) - 拿到 lookahead 之後就可以直接查表 - 如果 parse table 理某一格有兩個或更多個 rule => 出現 conflict => 不能用 LL(1) parser 作 - 所謂 confilct 就是在一個格子理,吃到兩個 rule,這代表查表來到這裡時,往前只看一個 char ,會有兩條路可以走,這時候需要繼續往下看,才能確定要選哪一條路 => LL(n) 才做得到 - 好處 - generic driver:基本上只要是 LL(1) 就可以用,只有在 grammar 改了時候,才要把 table 修改 - 更快 (non-recursive) - implement - ![](https://i.imgur.com/Yd2wmr7.png =160x) - ![](https://i.imgur.com/LRlAZgg.png =250x) - stack 是用來存在這個 rule 下,我們會先走第一個 state,可是之後的 state 在處理完這個 state 之後還是要回來處理,stack 就是用來存後面的 state => 避免使用 recursive - toupble shutting - Common Prefix - *可能*可以用 Common Prefix 處理掉,讓 LL(n) -> LL(1) - 在這樣的條件下: ![](https://i.imgur.com/fNIX5TM.png =350x) - 有些 grammar 可能真的不能處理,真的是 LL(N) - 插入新的 state 處理掉 conflict 的狀況 - ![](https://i.imgur.com/AI30V4Q.png =150x) - 觀察 S 會有 confilct - 用插入 U 來解決 - Left Recursion - 出現 A -> Aa 的狀況 - 因為會出現 A ⇒ Aα ⇒ Aαα ⇒ . . . 的狀況 - 一樣用插入新 state 解決 - ![](https://i.imgur.com/d8xhOXV.png =160x) - Error Recovery - 碰到 error 就停下來 => 不 friendly - 希望盡可能找到多一點 error - 遇到 parse error 時,刪掉目前的 symbol,看看是否可以做下去 ### Buttom-Up Parser - LR(k) 通常 k == 1 - ![](https://i.imgur.com/n8cC2zK.png =250x) `_(┐「ε:)_` - ![](https://i.imgur.com/7ByrFRH.png =250x) - 三問題 - 何時作 shift? 何時 reduce? - handle 應該 reduce 成哪一種 nonterminal? - handle 有多長?

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully