2016q3 Homework5 (jit-compiler)

# 2016q3 Homework5 (jit-compiler) contributed by <`heathcliffYang`> <`janetwei`> [Github](https://github.com/janetwei/jit-construct) ## 目標 >>中英文字間請以空白隔開 >>[color=red][name=課程助教] >>好的 ## Code - [ ] README.md a Just In Time (JIT) compiler for the brainfuck language. **JIT compilation:** 意思是在 run time 時編譯，由兩種編譯方法組合(AOT & interpretation) **Cross compiler :** 可以產生針對另外一個平台 (platform) 的執行檔，而不是現在正在運行的平台。常用於 multiple platforms > compiler 與 platform 的關係是什麼? > platform 是指 CPU 架構嗎? > 在 compiler 查對照的組語的時候嗎? > compiler 與 platform 的相依性多大? **ABI :** (Application Binary Interface for the ARM Architercture)==NOT YET== - [ ] Makefile ==CC 好像沒有被定義?== > 目前認為是因為此次作業使用不同 compiler 測試結果,因此在 Makefile 裡就沒有定義 CC ``` run-jit-x64: jit-x64 ./jit-x64 progs/hello.b && objdump -D -b binary \ -mi386 -Mx86-64 /tmp/jitcode ``` **objdump :** 顯示 object file 的資訊 `-D` : Display the assembler mnemonics for the machine instructions from objfile. `-b` : 標示 object file 的 object-code format [參考資料](https://sourceware.org/binutils/docs/binutils/objdump.html) **.dasc :** DynASM has two parts: the assembler/preprocessor, written in Lua, and the the linker/encoder, written in C. dynasm.lua is the preprocessor. **.dasl files** refer to Lua/ASM files **.dasc files** refer to C/ASM files. [參考資料](https://luapower.com/dynasm#examples) **arm-linux-gnueabihf- :** gcc for armhf architecture, `-v`可以看編譯器更詳細資訊 `-mfloat-abi` 有三種值 soft : 不用fpu進行浮點運算，而是用軟體算==待查詢浮點運算的方法== softfp : armel architecture hard : armhf architecture, 用 fpu 計算，傳參數也用 fpu 的 float register 傳，省去了轉換，但是中斷的 overhead 高 [參考資料](http://fanli7.net/a/caozuoxitong/Linux/20140810/525891.html) **QEMU :** 模擬處理器 (virtural processor?), 具有高速與跨平台的特性，分為 User mode & System mode User mode : ==QEMU能啟動那些為不同中央處理器編譯的Linux程序。 Ex: Wine== System mode : 能模擬整個電腦系統，從 CPU 到周邊 **Wine :** 讓在 x86 上的 Unix family OS 可以跑 Microsoft Windows 的程式 > ==CPU & platform 的關係== **Lua :** [參考資料](https://zh.wikipedia.org/wiki/Lua) - [ ] interpreter - [ ] compiler-x86 - [ ] compiler-x64 - [ ] compiler-arm **.b :** brainfuck code ## DynASM **DynASM** is a preprocessor and tiny runtime library for creating assemblers and JIT compilers in C orC++. * DynASM is a pre-processing assembler. * DynASM converts mixed C/Assembler source to plain C code. * It is written in Lua. * There is no machine-dependency for the pre-processor itself. **.dasc**:檔案中包含 C 和 assembly ,藉由 Lua 可以轉成 C code **`|`**: The code of this line should be pre-processed. 可包含 assembly language instructions 或 directives. **`dasm_put()`**:copies machine code snippets from the action list and merges them with the arguments [參考資料](http://luajit.org/dynasm_features.html) ## Optimization * Origin 的數據: ``` bench jit-x86 progs/awib.b GOOD 119.2ms progs/mandelbrot.b GOOD 4064.4ms progs/hanoi.b GOOD 10087.5ms ``` >> 兩個版本? >> [color=red][name=課程助教] >> 已經修改過,謝謝助教 1. Contraction 此方法是將多餘運算整併，例如 +++++ ，其實就是+=5，此改善方法雖然直覺簡單，但實質效益也不差，而實作方法可在遇到+ - > < 時即檢查是否為連續出現。 ```C case '+': count=continuous_count(p); | add byte [PTR],count p+=(count-1); break; ``` ![](https://i.imgur.com/qQdq5sg.png) ```C case '-': count_dec=continuous_count(p); | sub byte [PTR],count_dec p+=(count_dec-1); break; ``` 此為做完`+` `-`運算後所優化的結果 ![](https://i.imgur.com/mcPVWbz.png) 因為 hanoi.b 裡面大部份都是 contraction 的`+` `-`運算,所以優化有明顯的改變將連續的`+` `-` `>` `<`統計後再一次運算所優化的結果 ![](https://i.imgur.com/XarueOP.png) 2. check loops * clear loop:遇到[-],將迴圈歸零,將PTR的值為0 ```C int clear_loop(char *p) { if( *(p+1)=='-' && *(p+2)==']' ) return 1; else return 0; } ``` 先判別是否為[-],若是則 return 1,其餘 return 0 ```C case '[': if (top == limit) err("Nesting too deep."); if(clear_loop(p)){ p+=2; | mov byte [PTR], 0 break; } // Each loop gets two pclabels: at the beginning and end. // We store pclabel offsets in a stack to link the loop // begin and end together. else { maxpc += 2; *top++ = maxpc; dasm_growpc(&state, maxpc); | cmp byte [PTR], 0 | je =>(maxpc-2) |=>(maxpc-1): break; } ``` ![](https://i.imgur.com/3GAlO3M.png) 之後參考老師的 code 將 clear loop & copy loop & multipile loop 整合 ```C int result = 0; int check_loops(char *p,int *index,int *mult) { int res,offset = 0,_index = 0; result = 0; if (*(p+1) != '-') return -1; p += 2; while (*p != ']') { if (*p == '[' || *p == '-' || *p == '.' || *p == ',') return -2; res = continuous_count(p); if (*p == '>') offset += res; else if (*p == '<') offset -= res; else if (*p == '+') { for(int i = 0;i < _index;i++) { if(index[i] == offset) { mult[i] += res; goto L1; } } index[_index] = offset; mult[_index] = res; _index++; } L1: p += res; } if (offset != 0) return -1; result += *p; return _index; } ``` 判斷`[-`完後如果後面接的是`*p == '[' || *p == '-' ||*p == '.' || *p == ','`則會忽略前面的`[-`,因此為了跟 return -1 的做區分,所以 return -2 ```C else if(count == -2) { for(int j = *p-2;j < *p ; j++) { switch(*p) { case '[': if (top == limit) err("Nesting too deep."); maxpc += 2; *top++ = maxpc; dasm_growpc(&state, maxpc); | cmp byte [PTR], 0 | je =>(maxpc-2) |=>(maxpc-1): break; case ']': if (top == pcstack) err("Unmatched ']'"); top--; | cmp byte [PTR], 0 | jne =>(*top-1) |=>(*top-2): break; } } } ``` return -2 的時候,前面兩個的判別跟原本的 code 一樣 ```C count_scan = scan_loop(p, brainfuck_ptr, brainfuck_size); if (count_scan > 0) { | add PTR,count_scan break; } else if (count_scan < 0) { count_scan = 0-count_scan; | sub PTR,count_scan break; }*/ count = check_loops(p , index , mult); if (count == 0) { p+=2; | mov byte [PTR],0 break; } else if(count > 0) { for(int i = 1; i < count; i++) { if(mult[i] ==1) { | add PTR,index[i] | mov byte [PTR],*brainfuck_ptr } else { int multi_result=(*brainfuck_ptr)*mult[i]; | add PTR,index[i] | add byte [PTR],multi_result } } *brainfuck_ptr += result; } ``` >> check_loops 的優化尚未完成 ``` Unmatched ']' progs/awib.b bad output: expected 3b4f9a78ec3ee32e05969e108916a4affa0c2bba got da39a3ee5e6b4b0d3255bfef95601890afd80709 Unmatched ']' progs/mandelbrot.b bad output: expected b77a017f811831f0b74e0d69c08b78e620dbda2b got da39a3ee5e6b4b0d3255bfef95601890afd80709 Unmatched ']' progs/hanoi.b bad output: expected 32cdfe329039ce63531dcd4b340df269d4fd8f7f got da39a3ee5e6b4b0d3255bfef95601890afd80709 ``` 3. scan loops 在實作scan的過程中，發現一個問題，其實我們缺乏一個目標程式裡的pointer，只有compiler在讀程式的時候的pointer 更詳細的討論，之所以original的版本只有compiler讀程式的pointer也不會出問題的原因是，逐個字元翻譯不需要知道目標程式的目的和狀態 ; 然而如果要優化，就要讓compiler理解目標程式到底要做什麼，所謂的理解就是實際執行一次，逐步記錄目標程式的執行情況，剛好這次的brainfuck要紀錄的程式執行情況簡單很多，就是一串用ASCII code表示的array ```c int scan_loop(char *p, char *brainfuck_ptr, int brainfuck_size) { char *ptr = p; int right = 0, count_scan = 0; ptr++; if (*ptr == '>') right++; while (*ptr != ']') { if (*ptr != '>' || *ptr != '<') { if (*ptr != '\n') return 0; ptr++; } ptr++; } p = ptr; // at ']' /* scan right */ if ( right != 0) { count_scan = (memchr(brainfuck_ptr, 0, brainfuck_size) - brainfuck_ptr); brainfuck_ptr += count_scan; return count_scan; } /* scan left */ count_scan = (memrchr(brainfuck_ptr, 0, brainfuck_size) - brainfuck_ptr); brainfuck_ptr += count_scan; return count_scan; } ``` switch判斷部份 `'['` , p已經在`scan_loop()`裡面改完 ``` count_scan = scan_loop(p, brainfuck_ptr, brainfuck_size); if (count_scan > 0) { | add PTR,count_scan break; } else if (count_scan < 0) { count_scan = 0-count_scan; | sub PTR,count_scan break; } ``` 4. 移除換行符號 `$ make bench-jit-x86 | grep GOOD | rev | cut -c 3- >> result_time.csv` ## 背景知識 **just-in-time (JIT) compilation:** 動態編譯在程式執行時將程式碼轉換成 machine code ,並同時可以進行優化 - [ ]執行程式時分析程式碼,找出熱點,避免 compile 到極少執行的程式碼 - [ ]將程式精簡化,優化程式 ### brainfuck 由八種運算符構成: `>` : ++ptr `<` : --ptr `+` : ++*ptr `-` : --*ptr `.` : putchar(*ptr) `,` : *ptr = getchar() -> 從鍵盤讀取一個字符 `[` : while (*ptr){ `]` : } 1. 以byte為單位 2. 被初始化為零的variable 3. 一個指向該variable的pointer 4. 兩個stream用於input & output 5. 當前位置清零 : [-] **sed :** (stream editor)，分析STDIN的資料，再輸出到STDOUT `-f filename` 可以執行 filename 內的 sed 動作 [參考資料](http://dywang.csie.cyut.edu.tw/dywang/linuxProgram/node36.html) * [brainfuck optimization strategies](http://calmerthanyouare.org/2015/01/07/optimizing-brainfuck.html) 1. define an intermediary representation (IR) which an optimizer can operate ``` BF IR C + Add mem[p]++; - Sub mem[p]--; > Right p++; < Left p--; . Out putchar(mem[p]); , In mem[p] = getchar(); [ Open while(mem[p]) { ] Add } ``` 2. contraction ``` IR C Add(x) mem[p] += x; Sub(x) mem[p] -= x; Right(x) p += x; Left(x) p -= x; Out putchar(mem[p]); In mem[p] = getchar(); Open while(mem[p]) { Add } ``` 3. Clear loop: `[-]` ---> `Clear` ``` IR C Add mem[p]++; Sub mem[p]--; Right p++; Left p--; Out putchar(mem[p]); In mem[p] = getchar(); Open while(mem[p]) { Add } Clear mem[p] = 0; ``` 4. Copy loop: `[->+>+<<]` ---> `Copy(x)` ``` IR C Add(x) mem[p] += x; Sub(x) mem[p] -= x; Right(x) p += x; Left(x) p -= x; Out putchar(mem[p]); In mem[p] = getchar(); Open while(mem[p]) { Add } Clear mem[p] = 0; Copy(x) mem[p+x] += mem[p]; ``` 5. Multiplication loops 6. Scan loops : efficiently searching a memory area for occurrences of a particular byte **`void *memchr(const void *s, int c, size_t n);`** : scan n bytes大小的memory area **`void *memrchr(const void *s, int c, size_t n);`** : backward from end return a pointer to the matching byte or NULL ==for the first instance of c 是什麼意思? c 應該是key element?!== ``` ScanLeft p -= (long)((void *)(mem + p) - memrchr(mem, 0, p + 1)); ScanRight p += (long)(memchr(mem + p, 0, sizeof(mem)) - (void *)(mem + p)); ``` 7. Operation offsets : ``` Add(x, off) mem[p+off] += x; Sub(x, off) mem[p+off] -= x; Out(off) putchar(mem[p+off]); In(off) mem[p+off] = getchar(); Clear(off) mem[p+off] = 0; Mul(x, y, off) mem[p+x+off] += mem[p+off] * y; ``` [ASCII](https://zh.wikipedia.org/wiki/ASCII) [Reading note](https://hackmd.io/EwVgHAnAjAphBmBaYBjARgQ0QFhAdgDZEI8ATKRFAZgzQhoAZJgCg===?both)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.