Compiler : Front End

# Compiler : Front End > https://people.cs.nctu.edu.tw/~ypyou/courses/Compiler-f18/ > [color=pink] ## Lexical Analyzer > scanner * enter symbol table * token? lexeme? * token ex. KW * lexeme ex. int ### What is the difference between keyword and reserved word? https://stackoverflow.com/questions/1078908/what-is-the-difference-between-keyword-and-reserved-word ### lex > RE -> NFA -> DFA -> Optimize DFA -> C program > the longest-earliest match > 先判斷長ㄉ一樣的話看前面ㄉ ```shell lex template.l gcc lex.yy.c -lfl ./a.out ``` http://blog.yo-ga.space/2017/04/26/what-is-lex/ :::warning `\s` doesn't work, because it's a Perl extension. ::: ### coding log 大多數kw/id/number/float/scientific都是打regex就可以寫好的，但是string的話需要操作到state (因為要處理跳脫字元[也可以操作字串啦但比較沒有用到lex的特性]) <hr> 首先我們先建立一個state `%x string` 當我們看到雙引號就進入這個state，然後在看到`\\[\\"]`時就處理跳脫，其他`.`時就照常地存入buffer 這裏需要注意的是因為match的方式是==longest-earliest==所以跳脫會match到較長的`\\[\\"]`而不會是`.` 而使用%x 跟 %s 的最大差別是：如果rule沒前綴的角括號，%x只會套用到initial state而已但%s則會視為global rule <hr> 而多行註解的部分也是加一個state 基本上comment結束完要跳回上一個state(initial或pragma) 但如果換行之後就固定跳回initial 因為可能會有以下這種狀況 ```c #pragma source on /* lalala lalala */ wow ``` 如果繼續跳回pragma的話wow不會被當成id而是會報錯(因為pragma後面只能是{whitespace}*{commentBegin}) 而為什麼要跳回上一state是因為pragma有可能後面接多個註解 <hr> 而pragma要用strtok拿第二個切割子(token|source|statistic)和最後一個on|off 去做設定來控制輸出 ## Syntax Analyzer > parser * setence structure * parsing tree * top down * bottom up * context-free grammar * BNF * start symbol * sentencial form * sentence * PDA(Push Down Automata) * like `{{{{{{{}}}}}}}` ## Semantic Analyzer * type checking