2016q3 Team20_HＷ5 (jit-compiler)

contributed by <eeuserp>, <SaragCheng>

組員

簡伯丞
程鈺涵
施雨妏#

作業要求

A09: jit-compiler

在 github 上 fork jit-construct
學習設計 JIT 編譯器
運用計算機組織的背景知識，進行深入效能評估和提出改善方案
- 提出改善內建 benchmark suite 效能的機制，並且要能夠充分解釋行為，需要一併透過 gnuplot 自動產生效能分析圖表
延展程式語言，提供 concurrency 能力
- 對 Brainfuck 程式語言進行擴充，使其能夠支援 concurrency / parallelism。需要有程式語言的規格描述並且驗證。

參考資料

基礎知識

JIT (Just In Time) compilation

與執行前編譯好執行檔不一樣，JIT 是一邊執行程式一邊編譯，可以直接將程式碼轉成機械碼或是其他形式執行。常常使用在 dynamic programming language。

Interpreter

Compiler

將 brainfuck 編譯成組語

複習組語
cmpb $0, (%r12) takes a byte that r12 point to and compare with 0
if (b < 0) a++;
可更換為沒有分支的版本:
a -= b >> 31;

Brainfuck

Rules:( char *p 指向記憶體區塊)

compiling brainfuck to java in brainfuck

Awib is a brainfuck compiler written in brainfuck
- 前端(frontends) : 接受 source code 作為 input，產生 IR(intermediate representation) 作為 output
- 後端(backends) : 接受 IR 作為 input ，產生可執行檔作為 putput
- IR(intermediate representation) : 是一種資料結構，可將輸入的資料建構為一個電腦程式，也可以將一部份或是所有輸出的程式反推回輸入資料。這意味著 IR 將會保留一些輸入資料的資訊，同時擁有更進一步註釋或是快速查詢的功能。
- Awib compiler 的 frontends 和 backends 之間會有一些 Optimizer
Representing large integers
- 一個 cell 只有 8 bit ，這樣一來 loop 最多不就只能執行敘述 255 次 !? 那使用兩個 cell 一共 16 bit 好了，這樣充其量也只能執行65535 次啊。
- 解決辦法 : 一樣使用兩個 cell，但用法不同。以遞增為例，當 least significant cell 為 255 又要再加上去時， most significant cell ++ ， least significant cell 設為 0。
```
function increment(hi, lo):
if lo < 255:
    return (hi, lo + 1)
return (hi + 1, 0)
```
Storing the stack
- no random memory access in brainfuck
  - 不需要 stack top variable
  - 必須自己設計 stack segment of the memory area
- 必須確保 stack 能夠提供多層巢狀迴圈做計算
- 在記憶體中存放 stack 的一段連續記憶體位址稱為 stack segment ，存放 IR 的一段連續記憶體位址稱為 IR segment。在這兩個 segment 旁邊的位址寫入0 以防迴圈動到這兩個區塊，如下圖。

brainfuck optimization strategies

最佳化策略:

Contraction
Clear loops
Copy loops
Multiplication loops
Operation offsets
Scan loops

未優化版本

輸入make bench-jit-x64
測試結果：

Executing Brainf*ck benchmark suite. Be patient.

progs/awib.b             GOOD	116.0ms
progs/mandelbrot.b       GOOD	4213.6ms
progs/hanoi.b            GOOD	10782.6ms

優化版本

Clear Loop

修改 jit-x64.dasc 檔



if(*(p+1) == '-'&&*(p+2) == ']'){
	|	mov byte [PTR], 0
}

執行結果

Executing Brainf*ck benchmark suite. Be patient.

progs/awib.b             GOOD	91.5ms
progs/mandelbrot.b       GOOD	4149.0ms
progs/hanoi.b            GOOD	4204.0ms

觀察執行結果可以發現跟為優化版本相比只有 hanoi.b 大幅降低執行時間（執行速度提升約 2.56 倍），由此可見 hanoi.b 程式碼裡有大量(44%)的[-]敘述，而 awib.b 只有極少量(14%)的[-]敘述

Contraction

依照作業提示修改 jit-x64.dasc 檔，如下：










int continuous_count(char *p)
{
    char *ptr = p;
    int count = 0;
    while (*ptr == *p) {
        count++;
        ptr++;
    }
    return count;
}

利用add sub 指令修改 for loop 中 > < + - 4 個 case ，指令用法如下

add <reg>,<con>  
add <mem>,<con>

example:

add BYTE PTR [var], 10 	// add 10 to the single byte stored at memory address var
add eax, 10 	// EAX ← EAX + 10

執行結果：

Executing Brainf*ck benchmark suite. Be patient.

progs/awib.b             GOOD	56.4ms
progs/mandelbrot.b       GOOD	1551.1ms
progs/hanoi.b            GOOD	5372.3ms

觀察執行結果可以發現跟未優化版本相比 3 個程式的執行時間接大幅降低許多， awib.b 執行速度提升約 1.6 倍， mandelbrot.b 執行速度提升約 2.71 倍， hanoi.b 執行速度提升約 2.00 倍。

if(count == 1)這個分支可以拿掉嗎？應該會更快？SarahCheng

Clear loops & Copy loops & Multiplication loops

依照作業提示修改 jit-x64.dasc 檔，如下：






















int check_loops(char *p,int *index,int *mult)
{
    int res,offset = 0,_index = 0;
    if (*(p+1) != '-') return -1;  //不合這三種 loop
    p += 2;
    while (*p != ']') {            //not end of loop
        if (*p == '[' || *p == '-' ||
            *p == '.' || *p == ',')
            return -1;&nbsp;        //不合這三種 loop
        res = continuous_count(p);
        if (*p == '>') offset += res;
        else if (*p == '<') offset -= res;
        else if (*p == '+') {
            index[_index] = offset;
            mult[_index] = res;
            _index++;
        }
        p += res;
   }
   if (offset != 0) return -1;//不合法的loop，沒有回到第一個p
   return _index;  //_index==0:clear; else:copy,multi;  
}

以上這段程式碼將 copy 視為 Multiplication 中 * 1的狀況
index[ ] 儲存的是目的位址相較現在位址的位移量
mult[ ] 儲存的是要乘多少( + 的個數)
這個程式碼的精妙之處在於可以隨意以任何位址作目的位址，而且不局限於參考資料中的兩次

修改 for loop 中 [ case :
- 注意register的使用(ref)
- - x64 架構的 CPU 是屬於 64 位元，包含了 16 個 64 位元的通用暫存器 ( general-purpose registers )，後面的 R8~R15，是新增的；而前面的八個暫存器，RAX、RBX、RCX、RDX、RBP、RSP、RSI、RDI，是把原有的 32 位元加以擴充而成。
  - 雖然是在 64 位元系統中，但是還是可以使用 32、16、8 位元的暫存器。如上圖所示，EAX、EBX、ECX、EDX、ESI、EDI、EBP、ESP 等 32 位元的暫存器仍然可以使用；而新增加的 32 位元暫存器名為 R8D~R15D，暫存器名結尾的『D』是指雙字組 ( DWORD )。16 位元的暫存器也有十六個，分別是舊的 AX、BX、CX、DX、DI、SI、BP、SP 與新增的 R8W~R15W，這裏的『W』，顯然就是字組 ( WORD ) 之意。可用的 8 位元暫存器也有十六個，分別是舊有的 AL、BL、CL、DL 與新增的 SIL、DIL、BPL、SPL、R8BR15B，這裏的『B』是位元組 ( BYTE ) 的意思，而『L』是指低位元組之意

_index=check_loops(p,idx,mult);

length = sizeof(index)/sizeof(int);
if(_index==0){
    |    mov byte [PTR], 0
}
else{
    while(i<length){
	|    push r8
	|    push r9
	|    movzx r8, byte [PTR]
	|    imul r8, mult[i]
	|    mov r9, PTR
	|    add r9,idx[i]
	|    add byte [r9],r8b
	|    
	|    pop r9
	|    pop r8
	i++;
	}
}

註 : 不知道為什麼加上clike=整個程式碼區塊就會走鐘

執行結果:

Executing Brainf*ck benchmark suite. Be patient.

progs/awib.b             GOOD	47.4ms
progs/mandelbrot.b       GOOD	1516.5ms
progs/hanoi.b            GOOD	178.5ms

相較於位優化版本 awib.b 的執行速度提升了約2.45倍， mandelbrot.b的執行速度提升了約2.78倍， hanoi.b 的執行速度提升了約60.40倍。觀察可以得知 Clear loops & Copy loops & Multiplication loops 的處理對 hanoi.b 有非常巨大的幫助，因為光是clear loops 和 copy loops就佔了整個程式碼的80%。對mandelbrot.b沒什麼影響。
length_idx: 128 length_index: 0, so it can work because it didn't run the loop

Q1.does our ptr move forward by 'count'?
Q2.take away the code mov byte [PTR],0 , the error still exist
WHAT'S THE PROBLEM ABOUT WHILE LOOP? SarahCheng
try this:

while(x==1){
    //Do something
}

The same loop in assembler:

        jmp loop1   ; Jump to condition first
cloop1  nop         ; Execute the content of the loop
loop1   cmp ax,1    ; Check the condition
        je cloop1   ; Jump to content of the loop if met

deal with the variable 'length', not sure does it work or not

register int *length asm ("r10");

variable to reg
那這樣還要push r10嗎?

一行一行測試結果：
- failed output after add this: | add byte [PTR],r8b

output:

progs/awib.b             bad output: expected 3b4f9a78ec3ee32e05969e108916a4affa0c2bba got 93898477b77b662771e12579cd4d1b7ad1771d1d
���	������������
progs/mandelbrot.b       bad output: expected b77a017f811831f0b74e0d69c08b78e620dbda2b got da39a3ee5e6b4b0d3255bfef95601890afd80709

可以參考 Team6 / github

問題

對專案 make 產生以下錯誤訊息

/usr/lib/gcc-cross/arm-linux-gnueabihf/5/../../../../arm-linux-gnueabihf/bin/ld: cannot find crt1.o: No such file or directory
/usr/lib/gcc-cross/arm-linux-gnueabihf/5/../../../../arm-linux-gnueabihf/bin/ld: cannot find crti.o: No such file or directory
collect2: error: ld returned 1 exit status
Makefile:64: recipe for target 'jit-arm' failed
make: *** [jit-arm] Error 1

有照著github上的指示安裝套件了 eeuserp

試著用 sudo apt-get install gcc-multilib 來安裝必要的套件 jserv
可以參考 https://hackmd.io/s/ryDY1CPyl
施雨妏

組語不夠熟，debug:

error: bad operand size in "imul rb,i?":| imul ah, mult[i]

maybe: ah x86的 8 bits register，但是乘法必須是32 bits的register SarahCheng

error: mixed operand size in "mov rq,xb":| mov rax, byte [PTR]
SarahCheng

#include <windows.h>
there is a func: ARRAYSIZE()
SarahCheng
ref

error: array size missing in ‘index’int index[];
missing terminating ' character [-Werror] case '.:
printf("PTR:%s, index[0]:%d",p,index[0]);不知道為什麼是這結果

參考資料

好用!!eeuserp