Compiler HW - HackMD

# Compiler HW ###### tags: `compiler` # Useful resource * 康乃爾大學 CS : Introduction of Compiler * [課程](http://www.cs.cornell.edu/courses/cs412/2007sp/) * [講義 - 網址後面加上 lectures](http://www.cs.cornell.edu/courses/cs412/2007sp/lectures/lec12.pdf) * Iowa State University * [scope and symbol table](http://web.cs.iastate.edu/~weile/cs440.540/5.SemanticAnalysis.scope.pdf) * symbol table detaild * [link](http://www.uio.no/studier/emner/matnat/ifi/INF5110/v17/slides/06-symboltables-handout.pdf) # hw1 ## youtube https://www.youtube.com/watch?v=54bo1qaHAfk https://www.youtube.com/watch?v=GpxEGhv7C5A&list=PLYLkmd7paNvjXxjgBhaFT_j9u-Dk2TQJc&index=4 https://www.youtube.com/watch?v=rqKtqPplQeQ ## 紀錄 ### 作業要求 * [resource](http://dinosaur.compilertools.net/) * 後面的作業會用到前面作業的內容 -> well-structured and easily changed * micor go * 沒有分號 * if 和 for 沒有 () * 沒有 while * lexical definition * token 兩個分類 * 會傳給 parser 的 token * 被 scanner discard 的 token (recognized but not passed to the parser) * symbol table 存 id * 有效的 insertion 和 retrieval operation -> 用 hash table * 必須要有以下 function * `create_symbol()` : 建立 symbol table, 宣告一個資料結構並給其一個 memory space 來存 variable * `insert_symbol()` : insert s 到 symbol table 的 new entry * `lookup_symbol()` : 回傳 string s 的 entry index，找不到 s 就回傳 nil * `dump_symbol()` : dump all entry of the symbol table * ![](https://i.imgur.com/99uPgSi.png =300x) * 功能 * 在每一行print出被辨識到的 token，且捨棄空白... 和 undefined character sets * token: delimiter, operator, keyword, identifier, constant * 計算 program 的 line 數量 * 實做 symbol table 的 function (上面四個) * create 和 insert 必須要 print 出訊息來檢查， * 當到了 end of file 時，要列出所有 symbol table 的內容 (dump) * 捨棄 C/C++ type comment * print 出內容，然後在 parsing 時捨去 (不給 parser) * 計算 comment 的 line 數量 * 一行算 +1 * 出現在一般的 code 後面也算一行，獨立的 comment 也算一行 (or多行) * syntax error check * 檢查 undefined variables * 檢查 redefined variables * 寫 readme ## 上課 * `lex.l` : language definition * `c compiler` -> gcc * a.out : output token * `yyparse()` 變成 main program 去 call `yylex()` * part 2 : rule match * RE, Action * 3 : symbol table, and some functions * yytext : lex and yacc shared environment variable * used to communicate * extra function may not be used (just look) * 23 , action * 24 , optional 括號 * 空白 tab 換行過濾掉印出 ; * 27 begin -> condition 動作 * e.g. what 開頭字串 * state AA BB CC 都要先定義 * begin 切到另一個 mode 繼續做 ... ? * 怎麼做做了什麼 syntax error detect # hw2 yacc ## youtube https://www.youtube.com/watch?v=yTXCPGAD3SQ&t=316s https://www.youtube.com/watch?v=__-wUHG2rfM ## description * design grammar * check semantic correctness ## requirements * grammar for **variable declarations** * symbol table function * create, insert, lookup, dump * handle assignment operators * =, += ... * arithmetic operations and consider **brackets and precedence** * modulo (%) 不會有 floating-point * **Type checking** * grammar for **`print and println`** function and display argument content * detect **semantic error** and **display the error message** * **error type and line number** * error 發生 variable 的 value 已不重要 ### advanced * grammar for `if ... else if ... else` * implement **scoping check function** * scope : a set of statements `{}` * variable is accessible in the block and all the inner block, but not outside the block * different inner block in same scope block can't see each other * README ## assignment * grammar based on the given **Lex** code and create a parser using **Yacc** * do some simple **checking of semantic correctness** ### Yacc definition * token that should be accepted by Yacc. * build code to **analyze these tokens** and **check syntax validity based on the grammar rules** * 3 tasks 1. define **tokens and types** * tokens * define it in both Lex and Yacc code (make sure **consistency**) * Lex forward the occurrence of the token to Yacc * tips (Yacc) : define token * declare : `%token` * 沒有定義在 rule 的且沒有宣告成 token，預設為 nonterminal * types * refers to one of the predefined data types **integer** and **float32** [?]() * tip : define type * [define a type for ***yylval*** using `%union` by myself]() * `%union { int i_val; }` : [means ***yylval** is able to be accessed via the **int** type]() * [define type ***for*** token usgin `%type` and give type name] * `%type<i_val> I_CONST` : [means the token **I_CONST** has the **int** type] 2. design grammar and implement actions * Grammae * convert CFG to Yacc rules ```clike A -> B1 B2 B3 A -> C1 C2 C3 A -> D1 D2 D3 --- A : B1 B2 B3 | C1 C2 C3 | D1 D2 D3 ; ``` * cosider **precedence of operators** * based on **ANSI C grammar rule** * Actions * **C statements** * 當 parser 辨識到 production rule for the input stream 時執行 * I/O, call function, update the program states(start condition) * rule 之間可以放 action ```clike constant : I_CONST { printf("type %s value %d", "int", $1); } | F_CONST { printf("type %s value %f", "float32", $1); } ``` 3. handle semantic errors * 三種 * **divide by zero** * undeclared variable (operate on undeclared) * re-define variable (print要?) * 當遇到 error 時，要顯示 error msg * 包含 type of the semantic error(3種)，**line number** ### Symbol table * 在 Yacc 建symbol table * create * insert entry * look up entry * dump (include value) * **assign the value to the entry** ### What should parser do * parser 要有基本功能，要完成 bonus，**scanner** 也需要可以提供 advanced feature * basic * grammar * variable declaration(syntax) * variants of the assignment operators * `=, +=, -=, *=, /=, %=` * arithmetic opearions * 包含 **brackets and precedence** * 著記: 越後面的推導 precedence 越高 * `%` 不能用在 floating point * `print` , `println` * display the contents of the arguments for the function call(印出來) * symbol table function * 至少四個 function (要多加 assign?) * entry value 要正確 * detect semantic error and display the error msg * error type + line number * advanced * grammar * `if ... else if ... else` * zero or more occurence of the `else if` * **scoping check function in parser** * correctly handle the scope of the variable ### Yacc template * 問題 * yylineno * %union * %token (token without return) * %token <i_val> I_CONST (token with return, specify type) * %type <f_val> stat (nonterminal with return) * 這裡因為沒有定義的 symbol 就是 nonterminal, 所以才要另外宣告 nonterminal with return * %start * `$$` * `$1` ## Spec for micro-go * parser 應該要 implement 這些 syntax 和 operator 等等 * 此外，**scanner 和 parser** 不用 handle **undefined behavior** ### Rule * EBNF ```clike | alternation () grouping [] option (0 or 1 times) {} repetition (0 to n times) ``` ```clike decimal_digit = "0" ... "9" unicode_letter = /* a Unicode code point classified as "Letter" */ unicode_char = /* an arbitrary Unicode code point except newline */ letter = unicode_letter | "_" ``` * identifiers : `letter { letter | decimal_digit}` * 要有 letter 開頭 * keywords : `else`, `if`, `for`, `var` * 不能用在 id * predeclared identifiers : `print`, `println` * comments * `// xxx \n` : // 開始 \n 結束 * `/* xxx */` : /* 開始 * / 結束 * operator * arithmetic * `+`, `-`, `*`, `/`, `%`, `++`, `--` * relational * `<`, `>`, `<=`, `>=`, `==`, `!=` * assignment * `=`, `+=`, `-=`, `*=`, `/=`, `%=` * ligical * `&&`, `||` * literal * integer * `decimal_literal = ("1" ... "9") { decimal_degit }` * 要除了 0 以外的數字開頭 * floating-point ```clike decimals = decimal_digit{ decimal_digit } float_literal = decimals "." decimals ``` * 是否可處理? 00.5 * stinrg * `string_literal = "\"" { unicode_char } "\""` * char 的 sequence * type * `Type = int | float32` * int = 32bit (signed) * float32 = 32bit * Blocks * 包含換行 * declarations + statements * 要在大括弧中 ```clike Block = "{" StatementList "}" StatementList = { Statement } ``` * 有 implicit block * universe block * `if`, `for` statement 被當作是在自己的 implicit block? * block nest and influence scoping * block 巢狀以及影響 scope * Declarations and Scope * 變數宣告，可能多個一起宣告?有default value? * 我覺得這個 grammar 有錯: 沒有 id ```clike Declaration = "var" VarSpec //(應該要有 id) VarSpec = Type [ "=" Expression ] ``` * Expression * Arithmetic * apply to numeric value * `+`, `-`, `*`, `/` : int and float32 * `%` : only integer * result : the same type as the first operand * Comparison * `==`, `!=`, `<`, `<=`, `>`, `>=` * result : untyped boolean value * Logical * apply to boolean value * `&&`, `||` * right operand is evaluated conditionally * `p&&q` : if p then q else false * `p||q` : if p then true else q * result : boolean * Summary ```clike Expression = Operand | Expression binary_op Expression Operand = Literal | identifier | "(" Expression ")" Literal = int_literal | float_literal | string_literal binary_op = "||" | "&&" | re_op | add_op | mul_op re_op = "==" | "!=" | "<" | "<=" | ">" | ">=" add_op = "+" | "-" mul_op = "*" | "/" | "%" ``` * Statement * Expression statements * signle id, arithmetic, comparison and logical operations ```clike ExpressionStmt = Expression ``` * example: `x+y-z*10+100/5 >= y || y -z` * IncDec statements * operand 必須是 addressable * 應該是 id++ 和 id-- 吧為什麼是 expression? (雖然有包起來id) ```clike IncDecStmt = Expression ( "++" | "--") ``` * `x++ equals x+=1` * Assignments statements * LHS operand 必須是 addessable ```clike Assignment = Expression assign_op Expression assign_op = "=" | "+=" | "-=" | "*=" | "/=" | "%=" ``` * For statements * 要注意 block 可能換行可能沒換行 * repeated execution of a block ```clike ForStmt = "for" Expression Block ``` * If statments * 注意是否有 else 或是多個 else if ```clike IfStmt = "if" Expression Block [ "else" (IfStmt | Block) ] ``` * Print statements ```clike PrintStmt = "print" "(" string_literal ")" PrintlnStmt = "println" "(" string_literal ")" ``` * Summary ```clike Statement = ExpressionStmt | IncDecStmt | Assignment | ForStmt | IfStmt | PrintStmt | PrintlnStmt ``` * Operator Precedence | Category | Precedence | Operator | | -------- | ---------- | -------- | | postfix | 6 | ++ -- | | multiplication | 5 | * / % | | addition | 4 | + - | | comparison | 3 | == != < <= > >= | | logical AND | 2 | && | | logical OR | 1 | \|\| | ## implement * 參考 ANSI C 整體架構 * - [ ] [lex regular expression](http://dinosaur.compilertools.net/lex/index.html) * 助教的 lex 我看不懂 * `" \ [ ] ^ - ? . * + | ( ) $ / { } % < >` * `"` : quote : 圈起來之後就變成 text character * 除了 blank, tab, newline 還有上面一堆，其他都是 text character * `\` : 用來做 escape 也可放在 `[]` class 裡面 * `[]` : character class : match 到裡面其中一個(假如只有寫一次而且沒有 operator 的話) * `^` : 1. 在 `[]` class 裡面就是除了 class 裡面的其他都 match, 2. 放在 rule 最前面，用來當作只處理 start of line * `-` : 放在 class 裡面當作 range，想要在 class 裡面match `-` 的話，要把 `-` 放在最前面或最後面 * `?` : optional : 前面的東西可有或可無 * `.` : match 所有 character 除了 `\n` = [^\n] * `*` : zero or more * `+` : one or more * `|` : alternation * `()` : group * `$` : 放在 rule 的最尾端，代表只會 match end of a line 的東西，(代表後面馬上接著換行) * `ab/\n` = `ab$` * `/` : `ab/cd` : 代表 match 後面接著 cd 的 ab * `{}` : 1. definition expansion (enclose a name) : 找前面自己有定義的, 2. repetition (enclose numbers) `a{1,5}` : match 出現 1 到 5 次的 a * `<>` : start condition * `%` : initial % is special, being the separator for Lex source segments? * http://westes.github.io/flex/manual/Patterns.html * http://dinosaur.compilertools.net/lex/index.html ## Doc ### 1 Basic specifications ```clike declarations %% rules %% programs%% ``` * rules * `A : BODY ;` * 大小寫不同 * BODY 可包含 token or nonterminal * 假如有 nonterminal symbol match empty string * `empty : ;` * 使用 token 要宣告在 **declaration** * `%token name1 name2 ...` * 沒有宣告的都是 nonterminal，且這個 nontermianl 至少要出現一次在 rule 的 LHS * start symbol * 定義 : `%start symbol` * represents grammar rule 所描述的最大, 最 general 的 structure * endmarker? * The end of the input to the parser is signaled by a special token, called the endmarker. If the tokens up to, but not including, the endmarker form a structure which matches the start symbol, the parser function returns to its caller after the endmarker is seen; it **accepts** the input. If the endmarker is seen in any other context, it is an error * **end-of-file** or **end-of-record** * **Lex** 要回傳 endmarker ### 2. actions * action 可能回傳 value, 或是可能得到之前 action 回傳的 value * 此外，lexical analyzer 可以回傳 token 的 value * 在 action 之間和 parser 之間做溝通 * `$` : signal to Yacc in this *context* * **to return a value**: * action set `$$` to some value * `{ $$ = 1; }` : action does nothing but return the value 1 * 為了得到先前的 action 或是 lexical analyzer 的 return value * use : `$1, $2` * return the value (returned by the components of the right side of a rule, reading from the left to right) * `A : B C D ;` * `$2` : the value returned by C * `$3` : the value returned by D * rule 應該要 retrun value * `expr : '(' expr ')' { $$ = $2 ;}` * 這種 rule 通常都是括弧中的 expr 當作 return value * default: **the value of the rule** is the value of the first element in it (`$1`) * Yacc 允許在 rule 中間就可以放 action * 代表不一定要完全 parse 一個 rule 就可執行一定的 action * **但這個 action 預設會回傳值** (set `$$`) * 這個 action 可以 access 其左邊的東西的 return value * 在這個 action 右邊的可以 access 這個 action 的 return value ```clike A : B { $$ = 1;} C { x = $2; y=$3;} ; ``` * 不是放在最後的 action，其實是被 Yacc 所 handle * 藉由另外加入一個 nonterminal，match empty string 來完成 ```clike $ ACT : /*empty*/ { $$ = 1; } ; A : B $ACT C { x=$2; y=$3;} ; ``` * 在很多應用，output 不是被 action 直接完成，而是 data structure(eg. parse tree) * `$$ = node()...` * 可以另外定義 variable 到 **declarations section** * scope 是 global 的 * **action** 和 **lexical analyzer** 都可以知道 * 在目前的例子 value 都是 integer，其他 tpye 會在 Section 10 討論? ### 3 Lexical Analysis * 要有 lexical analyzer 來讀 input 和 communicate tokens to the parser (with value, if desired) * interger-valued function called `yylex` * retrun : intger (the token number) * 假如有一個 value 和該 token 有關，應該把它 assign 給 external variable **yylval** * Lex 和 Yacc 要有共同的 token 定義 (可以讓 Yacc 或是 user 決定，但是都是用 C 的 `#define` 來達成) * token 定義是一個 integer number * **預設** 為 character set 的數字，其他則為 257 開始往上加 * **自己定義** 在 declarations section 然後在 token name 或是 literal 後面直接加上 nonegative integer * **number 都一定要不一樣** * `0` 或是負值應該要保留給 endmarker 不能自己定義 * token 的名字不能跟 C 和 parser(yy) 的名字撞到 ### 4 How the parser work * 不一定是必要的，但是可以對於 **error recovery** 和 **ambiguities** 比較有可理解的改善方法 ### 5 ambiguity * **shift/reduce conflice** , **reduce/reduce conflict** * 有 choice * Yacc 內部 disambiguating rules 1. In a shift/reduce conflict, the default is to do the shift. 2. In a reduce/reduce conflict, the default is to reduce by the earlier grammar rule (in the input sequence). * 有 if 的 ambiguous 範例 ### 6 Precedence * precedence, associativity of the binary operator ## Bison * semantic value : 除了 token 還可以另外知道 value 或是 id 的真正名字 * semantic action : * `yyparse` parser function + lexical analyzer(`yylex`) + error-reporting function(`yyerror`) * 流程 * run bison on the grammar to produce the parser * compile the code output by bison, as well as any other source files * link the **object file** to produce the finished product * form of Bison grammar file ```clike %{ C declarations %} Bison declarations %% Grammar rules %% Additional C code ``` * **C declaration** : defin **types and variable** used in the action, and **`#include`** * **Bison declarations** : declare names of the **terminal** and **nonterminal** symbols * also describe **operator precedence** and the **data types of semantic value of various symbols** * **Grammar rule** : define how to construct each **nonterminal** symbol from its part * **Additional C code** : 其他 function 或是 `yylex` ，或是 action 會 call 的 subroutine 也放這裡 ## Bison Grammar Files ### Symbols : terminal and nonterminal * **terminal symbol** (**token type**) : a class of syntactically equivalent tokens * **nonterminal symbol** : a class of syntactically equivalent groupings * by convention, is should be all lower case * token type * **named token type** : upper case * Bison decalred : `%token` * 變成一個 C macro in the parser file * 所以可以用這個 name 來代表 code(數字) * **character token type** * `+` 代表 + * 除非需要指定 semantic value data type、associativity、precedence，否則不用宣告 * 在 `yylex` 中直接使用 ASCII * **literal string token** * 除非需要指定 semantic value data type、associativity、precedence，否則不用宣告 * `yytname` ? * ">=" (至少一個 character) * do not work in Yacc * 編譯 Yacc 的時候要用 `-d` 產生 definition，這樣才可以讓另外一個檔案的 Lex 知道 token 和 yylval 的定義 * `error` 是保留的 terminal symbol，用在 **error recovery** ### Syntax of Grammar Rules ```clike result : components ... ``` * result : nonterminal * whitespace 只是用來分隔 symbol，不會有影響 * action : `{C statements}` (follows the components usually) * **multiple rule** : ` ... | ...` * `components` can be empty * means `result` can match the **empty string** * example : comma-separated sequence of zero or moer *exp* groupings ```clike expseq: /*empty*/ | expseq1 ; expseq1: exp | expseq1 ',' exp ; ``` ### Recursive Rules * **recursive** : **result** nonterminal appears also on its RHS * **left recursion** , **right recursion** * leftmost symbol in RHS or rightmost symbol in RHS * always use **left recursion** * it can parse a sequence if any number of elements with bounded stack space, but right recursion may not * **indirect** or **mutual** recursion * **mutually-recursive** nonterminal : each refers to each other ### Defining Language Semantics * grammar only determine the **syntax** , the **semantic** are determined by the semantic values * 和不同的 **token** 或是 **groupings** 相關的 semantic value * 且當 groupings 被辨識到即 take **action** * e.g. grouping : `x + y` #### Data Types of Semantics Values * default 是 int，可指定其他 type : `YYSTYPE` as a macro #### More Than One Value Type * **different data types for different kinds of tokens and groupings** * `int` for integer, `char*` for string constant, **identifier may need a pointer to an entry in the symbol table** * 要用超過一個 data type for semantic value，必須要做以下兩件事 1. 指定 type 的集合，`%union` 2. 選擇一個 type for each symbol (terminal or nonterminal) for which semantic values are used * token: `%token` , grouping: `%tpye` 在 Bison declaration #### Actions * syntactic rule + C code * 大部分的 action **用來計算 semantic value for the grouping** * action 可以放在 rule 的任何位置，但經常只使用一個，且放在最後 * **action can refer to the semantic values of the components matched by the rule** * `$n` : the value of the *n* th component * `$$` : the value for the grouping * default for a rule * `$$ = $1` * the default rule is valid only if the tow data types **match** * empty rule 必須有 action，除非 rule 的 value 沒關係 * `$0` or `$-1...` 可以使用，用來 refer 現在的 rule 的 **before stack** #### Data Types of Values in Actions 1. 假如只有單一個 type, `$$` and `$n` 都會是這個 type 2. 假如用 `%union` 宣告 type，則要在使用 `$$` and `$n` 時指定 type * `$<type>n` 的方式來指定 ```clike $union { int itype; double dtype; } $<itype>1 $<dtype>2 ``` #### Actions in Mid-Rule * 只能 access 前面的 component, 因為後面的還沒 parse * 在後面的 action 要 access 時，也要把這個 mid-action 算進 symbol 裡面(component) * mid-rule 也可以有 semantic value : set `$$` * 因為沒有 symbol 來設定 data type，所以都要使用 `$<...>` 來決定 type * 要 set 整個 rule 的 semantic value 只能藉由最尾端的 action，mid-action 無法做到 (`$$` 用來設定自己的 value) * 會有一些問題 * Bison 是使用新的 nonterminal 來代替 mid-rule ### Bison Declarations * **Bison declarations** section 用來定義 **grammar 的 symbol** 以及 **data types of semantic values** * **所有 token type name 都要被宣告** (除了 single-character literal token (`+`, `*`)) * 預設第一個 rule 會是 start symbol，但也可以自己指定 start symbol #### Token Type Names * basic way to declare **token type name** (terminal) * `%token name` * Bison 會 convert 成 `#define ...` directive in the parser，且 **yylex** 可以使用 `name` 來代表這個 **token types code** (假如是同個檔案，不是的話要多另外處理) * alternatively : 設定 **precedence** * `%left` * `%right` * `%nonassoc` * 當 **stack type** 是 **union** ，必須要特別擴充 token 的宣告 (不管哪一種都是) ```clike %union { // define stack type double val; symrec *tptr; } %token <val> NUM // 定義 token NUM 和他的 type ``` * 可以在定義 token 的最後面放入相關的 **literal string token** * `%token arrow "=>"` ```clike %token <operator> OR "||" %token <operator> LE 134 "<=" %left AND "&&" ``` * 當指定 **literal string** 和 **token name** * 可以交互使用他們 (在 further declarations or **the grammar rule**) * `yylex` 可以用 **token name** 或是 **literal string** 來得到 **token type code number** #### Operator Precedence * 和 `%token` 宣告方法一模一樣，也是用來宣告 token * `%left` , `%right` * left-associative or right-associative * `%nonassoc` : x op y op z : syntax error * 所有 token 在同一個 precedence declaration 有相等的 precedence，且根據 **associativity** 來 nest * **當兩個 token 在不同的 precedence declarations associate** ， <font color="green">***越後面被宣告的就有越高的 precedence 且會被先 group***</font> * The precedence of an operator determines how it nests with other operators. All the tokens declared in a single precedence declaration have equal precedence and nest together according to their associativity. When two tokens declared in different precedence declarations associate, the one declared later has the higher precedence and is grouped first. #### The collection of value Types ## Parser C-Language Interface * semantic value 應該放在 yylval * default 是 int type * 要用其他 type 的話要用 `%union` declaration : yylval 換成 union type * 所以再存 yylval 的時候要用正確的 member ## implement 問題 - [ ] id 處理底限 _ ? - [ ] 多個 var 一起宣告？ - [ ] logic `!` 要做什麼處理？應該要算 true / false - [ ] block 單純把會出現的東西包起來，不用特別指定 block 裡面要額外處理什麼 - [ ] 沒有 `for` 的測資 - [ ] 不用處理 `&&` and `||` 的結果 # HW3 Jasmin ## resource [Instruction](http://jasmin.sourceforge.net/instructions.html) [JVM and Jasmin tutorial](http://saksagan.ceng.metu.edu.tr/courses/ceng444/link/f3jasmintutorial.html) ## 思考步驟 * 寫檔用一個 function 抱起來 * 一個 global 的 file pointer 讓大家自由存取 * 決定 * 要用一個 function 包全部 (參數給 string) 還是多個 function 包不同類別 (例如宣告運算 assgin 等等的) * [ ] 看 assembly code 架構決定 * 要知道需要 gen 的所有 assembly code 怎麼寫 * 還是需要 symbol table 在做中間的 check 因為前兩次作業的要求都有例如 undefined redefined divided by zero 和 modulo * 應該不難邏輯之前都有了只要寫檔寫成正確的格式就好 * 本來 execution 的地方就變成 gen code 的程式碼就好了？因為完全不用執行檢查錯誤的地方也是正常執行只要讓整體 grammar 可以繼續跑下去就好這才是我最剛開始想的不用在 Yacc 裡面執行才對 * 是否要確定一下 local 設定成 10 是不是 general 的或是會不會造成什麼問題 * 今天至少要把環境建立好把助教的程式跟我自己的結合 * 可能還要考慮一下 advanced 怎麼寫 * 官網應該有 * scope 可能要用不同層的 variable 來實現，有可能在 Yacc 刻意控制產生的 local variable ## description * 插入 Jasmin assembly instructions 到 Flex/Bison * 從 `test.go` 產生出 `test.j` * 再把 `test.j` 給 **Jasmin** 產生 `test.class` * 再把 `test.class` 給 **JVM** (`java`) 跑 ### 評分 * `.j` 和 **JVM** 結果 * flex/bison 需要 print 出 error msg ### requirement * 用 **local variable** 來處理 **variable declaration** * **要初始成 0** ，有 assign 則不用處理 (1 grammar) * arithmetic operation for integer and float32 * `+` `-` `*` `/` `%` `++` `--` `+=` `-=` `*=` `/=` `%=` * `++` , `--` 是 postfix (1 要注意) * 當 `%` 跟 `float32` 的 **constant/vairable** 做操作時，都應該視為 **invalid action** (1 即不gen code 但要 print) * `print` 、 `println` function * 不考慮 `print(x++)` 和 `println(x++)` * `if ... else if ... else ...` * 不用考慮 scoping (1 不做 advanced 的 scope 的話這個 y 會直接被放在這個 function 或是 main 的 scope 裡面！？應該是，那做了 scope 呢？) * ![](https://i.imgur.com/LL2XUi6.png) * **當 error 發生在 parsing 時** 1. 要 print 出所有 error msg (1 和 assignment 2 一樣) 2. 不要 gen Java assembly code ! (`.j` file) (1 意思是不要 gen 這一段 code, 還是程式結束？應該是不要 gen 這一段 code, 因為之前作業本來就可以簡單做到程式結束！) * 所以當 error 發生時不 gen code ### Advanced * `for statement` * `scoping` for **JVM** * 想在上層用控制 code 的方式來做，刻意改變可以使用的變數 -> 就有 scoping 的效果 * 測資應該不會太多 variable ，感覺行得通 (1 ㄏㄏ) * `user defined function` * 這是瞎會？ ### resource 整理官網 * [x] [about Jasmin](http://jasmin.sourceforge.net/about.html) * [x] [Jasmin home page](http://jasmin.sourceforge.net/) * [ ] [Jasmin user guide](http://jasmin.sourceforge.net/guide.html) * [ ] instruction * [ ] [Jasmin instruction](http://jasmin.sourceforge.net/instructions.html) * [ ] [官網也有範例和解釋](https://cs.au.dk/~mis/dOvs/jvmspec/ref-Java.html) * method * [ ] [stack overflow invoke method 討論](https://stackoverflow.com/questions/24722634/jasmin-invoke-a-method-using-arguments) * 好像只要打一句 `invoke` 什麼的就好 * tutorial * [ ] [JVM / Jasmin 一些程式的 mapping](http://saksagan.ceng.metu.edu.tr/courses/ceng444/link/f3jasmintutorial.html) * [ ] [實際的程式跟 Jasmin 程式的例子很棒應該可以直接寫完作業](http://www.cs.sjsu.edu/~pearce/modules/lectures/co/jvm/jasmin/demos/demos.html) * [ ] [slide 整理，看到 Jasmin 設置的 keyword 或許可以來這裡找](http://www.ist.tugraz.at/_attach/Publish/Cb14/Jasmin.pdf) ### 流程 - [ ] 嘗試快速理解 Jasmin 以及需要的基本 instruction - [ ] 建立好助教的code 跟我的 code 的聯繫 - [ ] 之後把我作業的雛型用出來 - e.g. API / flow - [ ] 開始做基本的功能 (要加上 type checking) - [ ] 可以修剪改一些 grammar - [ ] 以及刪除掉不用在 parsing 做的那些運算 - [ ] symbol table 仍然需要 - [ ] 做進階功能 - [ ] scoping : 控制 variable - [ ] for statement : 要可以操控 stack 的變數？或是有一個 local 直接拿來用 -> 應該不是因為 operation 都需要經過 stack - [ ] 放在最後 - [ ] user define function : 要知道怎麼設 entrance , 怎麼 call function 和 parse ## 一些筆記 ### user guide * package name : `java/lang/String` 而不是 `java.lang.String` * 可以做 method, field, subroutine, exception handler 等等，且 Jasmin syntax 好讀且緊湊 * `.class` directive 可以設定 output 路徑 `-d` * `-g` 可以加入 line number * `.line` 會被 ignore #### Statements (newline-separated) * directive, instruction 可以有 argument ，放在同一行，用`space` 分開 * directive * meta-level information * **directive name + zero or more arguments** * `.` ``` .catch .class .end .field .implements .interface .limit .line .method .source .super .throws .var ``` * example ``` .limit stack 10 .method public myMethod()V .class Foo ``` * instruction * **instruction name + zero or more arguments** (空白和換行分隔) * [instruction](http://jasmin.sourceforge.net/instructions.html) * label * **Label_name`:`** (換行分隔) * example * `Foo:` , `Label:` * name 不能用數字開頭，且不能包含特殊符號 `= : . " -` * 也不能跟 instruction name 相同 * `#_1:` 可用 * label 只能放在 method defintion 裡面，且 **label 是 local to 這個 method** #### The Jasmin Tokenizer * 會把讀進來的東西拆成 token，以 whitespace characters (spaces, tabs, newlines) 來切 * 包含 :::info directive names instruction names labels comments type descriptor names class names numbers and quoted strings etc. ::: * comment * `;` 開頭，且 `;` 前面一定要有空白，否則被當作一般的 token 對待 * Numbers and Strings * 還不支援科學符號和 exponent format，以及 character 和 octal * 可以接受 `1, 123, .25, 0.03, 0xA` 但不接受 `1e-10, 'a', '\u123'` * 不是所有的 backslash escape sequence 都支援，但有支援 `\n` , `\t` * Class Names * `java/lang/String` : Javaa class file format convention * Type Descriptors * type information is also written as they aapear in class files (不太懂) * 在 class file 指定 type : decriptor * `[Ljava/lang/Thread;` is an array of Threads, etc. * `Ljava/lang/String` : string type * `I` : integer * Method * 分成三個部份 (single token) * ![](https://i.imgur.com/wrEHUkU.png) * **method** : `println` * **class** : `java.io.PrintStram` * **type descriptor for the method** : `(Ljava/lang/String;)V` * takes a String and returs no result * `V` 應該是 void * example * ![](https://i.imgur.com/Y9rnNPC.png) * Fields * 分成兩個 token : **name and class of the field** + **its descriptor** * example: * `getstatic mypackage/MyClass/my_font Ljava/lang/Font;` * mypackage/MyClass 裡面的 field `my_font` ，且 `my_font` 的 type 是 `Ljava/lang/Font` : a Font object #### FILES * information on the class * the name of the class * the name of the source file that the class originated from * the name of the superclass * etc. * 一般來說，Jasmin file 從以下三個 directive 開始 ``` .source <source-file> .class <access-spec> <class-name> .super <class-name> ; for example .source MyClass.j .class public MyClass .super java/lang/Object ``` * `.source` : optional * specify the value of the **SoruceFile** attribute for the class file * `foo.src` 而不是 `/home/user/foo.src` : 不能有 pathname * `.class` , `.super` * name of this class and its superclass * `<class-name>` : full name of the class, 包含 package name `foo/baz/MyClass` * `<access-spec>` : access permission (zero or more) * `public`, `final`, `super`, `interface`, `abstract` * `.interface` : 和 `.class` 一樣 syntax * 定義 Java 的 interface * `.interface public foo` * `.implements` * 可以指定這個 class 去 implement 什麼 interface 了 #### Field Definitions * 前面定義 header ，現在定義 field (member) * `field <access-spec> <field-name> <descriptor> [ = <value> ]` * `<value>` : for final fields * 最後定義， method ``` .method <access-spec> <method-spec> <statements> .end method ``` * `<access-spec>` * `<method-spce>` : name and type descriptor of the method * `<statements>` : code * Method 定義不能 nested * `return` 要自己決定要不要加入 ``` .method foo()V return ; must give a return statement .end method ``` * **Method directives** * 以下的 directives 只能放在 method 定義裡 * `.limit stack <integer>` * 設定 method 需要的最大 size stack * `.limit locals <integer>` * 設定 method 需要的 local variable 數量 * `.line <integer>` * 用來 debug ，到時候會顯示錯誤的行數 ``` .method foo()V .line 5 bipush 10 i_store_2 .line6 ... ``` * `.var <var-number> is <name> <descriptor> from <label1> to <label2>` * 用來定義 local variable number 的**name**, **type descriptor**, **scope** * 可用來 debug ，因為可以 print 出精確的變數名稱，而不是 local variable number ``` .method foo()V .limit locals 1 ; variable 0 as an "int Count;" , scope is the code between Label1 and Label2 .var 0 is Count I from Label1 to Label2 Label1: bipush 10 istore_0 Label2: return .end method ``` * `.throws <classname>` * `.catch <calssname> from <label1> to <label2> using <label3>` * Abstract Methods * body 不要放 code 就是了 * 裡面還是可以放 `.throws`