進階電腦系統理論與實作專題: jitboy

--- tags: 進階電腦系統理論與實作 --- # 進階電腦系統理論與實作專題: jitboy > [2020q3 專題: Game Boy 模擬器 + JIT 編譯器](https://hackmd.io/@sysprog/rysijItcP) > [GitHub](https://github.com/RinHizakura/jitboy) ## About Game Boy [Game Boy 的硬體設計與運作原理](https://hackmd.io/@RinHizakura/BJ6HoW29v) ## About DynASM > 開始前先下載並安裝: [LuaJIT](http://luajit.org/install.html) > Reference: > * [Hello, JIT World: The Joy of Simple JITs](https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html) > * [The Unofficial DynASM Documentation](http://corsix.github.io/dynasm-doc/tutorial.html) DynASM 是為 [LuaJIT project](http://luajit.org/) 編寫的 dynamic assembler，其包含兩個部份: preprocessor 將混合 C/assembly 的程式(`.dasc`) 轉換成純 C code，以及一個小型的 runtime library 以處理需要推遲到 runtime 的工作。 ![](https://i.imgur.com/HoyLASQ.png =500x) ### Hello, DynASM World! 首先，嘗試透過一個簡單的程式初步理解如何使用 DynASM。我們的目標很簡單: 動態產生一段 machine code，會將某個指定的值存入 `eax` register 中，也就是: ``` mov eax, <user's value> ret ``` 如下為要實現此目標對應的 `.dasc` 程式: ```c= #include "dynasm-driver.h" // DynASM directives. |.arch x64 |.actionlist actions // This define affects "|" DynASM lines. "Dst" must // resolve to a dasm_State** that points to a dasm_State*. #define Dst &state int main(int argc, char *argv[]) { if (argc < 2) { fprintf(stderr, "Usage: jit2 <integer>\n"); return 1; } int num = atoi(argv[1]); dasm_State *state; initjit(&state, actions); | mov eax, num | ret // Link the code and write it to executable memory. int (*fptr)() = jitcode(&state); // Call the JIT-ted function. int ret = fptr(); assert(num == ret); // Free the machine code. free_jitcode(fptr); return ret; } ``` * 由 pipe (`|`) 起始的 statement 會被 DynASM prepreprocessor 處理，通常是 assembly 指令或者 [directives](https://en.wikipedia.org/wiki/Directive_(programming)) * 第 4 行中，定義生成 machine code 的目標機器是 x64 架構 * `Dst` 需被定義為一個指向 `dasm_State*` 型態的 pointer，也就是 18 行中的 `dasm_State *state` ```cpp void initjit(dasm_State **state, const void *actionlist) { dasm_init(state, 1); dasm_setup(state, actionlist); } ``` * `initjit` * `dasm_init` 初始化 `dasm_State` 結構 `state`，後續的參數 `1` 表示只使用一個 code section * 將定義為 `.actionlist` 的 `actions` 作為 `dasm_setup` 的變數，則 DynASM preprocessor 會產生名稱同為 `actions` 的陣列，作為之後 `dasm_put` 操作的對象 * 21 和 22 行則是我們要產生 code，每一行指令會被事實上會被 append 進 `state` 中 ```cpp void *jitcode(dasm_State **state) { size_t size; int dasm_status = dasm_link(state, &size); assert(dasm_status == DASM_S_OK); char *mem = mmap(NULL, size + sizeof(size_t), PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0); assert(mem != MAP_FAILED); *(size_t*)mem = size; void *ret = mem + sizeof(size_t); dasm_encode(state, ret); dasm_free(state); int success = mprotect(mem, size, PROT_EXEC | PROT_READ); assert(success == 0); return ret; } ``` * `jitcode` * `dasm_link` 計算產生出的 machine code 大小 * 則根據此大小，由 [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html) 建立一塊 virtual address space 中的可讀可寫的匿名空間 * 可以看到在最前面多 mapping 一段 `size_t` 空間，用來儲存產生的 machine code 大小 * 對於預留的空間，則呼叫 `dasm_encode` 來產生 machine code * 產生後，呼叫 `dasm_free` 釋放掉當初在 `dasm_init` allocate 來的空間 * [`mprotect`](https://man7.org/linux/man-pages/man2/mprotect.2.html) 讓這段 machine code 變成可讀可執行，但不可寫，避免被破壞 :::danger :question: 這裡我有疑惑的是為甚麼 `mprotect` 的空間大小是 `size` 而非 `size + sizeof(size_t)`，也就是一開始 `mmap` 的空間大小? 原本我認為是範例的錯誤，但 `jitcalc` 中也是如此撰寫的，不確定是否是我對程式的邏輯有誤解? ::: > `jitcalc` 的程式碼沿用上述範例，要注意到 mprotect 輸入的地址必須是 page-aligned，你可以對照看 jitboy 的 `src/emit.dasc` > :notes: jserv * 則 28 行中，可以得到執行期產生之 code 的回傳值，也就是 `eax` 中的值，因此如果產生的 code 正確，理應和輸入的參數相同(滿足 `assert(num == ret)`) * `free_jitcode` 透過 [`munmap`](https://man7.org/linux/man-pages/man2/munmap.2.html) 取消 mapping 你可以透過執行 `./jit2 99; echo $?` 確認是否得到正確的返回值。 :::info 完整程式碼請參考 GitHup repo: [jitdemo](https://github.com/RinHizakura/jitdemo)(改動自 [Hello, JIT World: The Joy of Simple JITs](https://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html)，並僅確認 `jit2` 部份可以正確執行) 可參照 reference 了解如何延伸應用 DynASM 以寫出 Brainf*ck 的 JIT compiler ::: ### jitcalc 接下來，從應用案例探討 DynASM 更複雜的使用，同時也建立對 x86 instruction 的相關知識。參考的案例為老師提供的 repo: [sysprog21/jitcalc](https://github.com/sysprog21/jitcalc)。從 `jitcalc` 的 main function 開始讀起的話，可以看到 `add_func` 向 `struct function` 型別的全域變數 `FUNCTABLE` 填入函式名稱與 function pointer 進行註冊。而後續最關鍵部份是呼叫 `compile` 使得 JIT code 得以被建立。下面從 `compile` 探討如何輸入的數學式是如何被解析，以及對應的 machine code 是如何被建立的。 :::info :bell: 此章節主要目的為理解 DynASM 的應用以及 x64 指令的用法，對於 parser 如何運作不會著墨太多，但還是會闡述程式邏輯至可以大致理解 jitcalc 的運作方式 ::: #### `compile` ```cpp void *compile(const char *src) { /* initialize dynasm dstate */ dasm_State *state = NULL; dasm_init(&state, 1); dasm_setup(&state, actions); struct compiler comp = { .dstate = state, .tag = 0, .tag_cap = 0, }; /* Hack */ #undef Dst #define Dst (&(comp.dstate)) /* initialize lexer */ if (tk_init(&(comp.tk), src) < 0) return NULL; | push rbx /* start parsing */ if (expr(&comp, REG_EAX)) return NULL; | pop rbx | ret state = comp.dstate; /* get the jited code */ return jitcode(&state); } ``` * 可以看到整體的流程其實很接近我們在 *Hello, DynASM World!* 章節所展示的例子: `dasm_init` 初始化 dasm_State 結構 state、`dasm_setup` 將定義為 .actionlist 的 `actions` 作為陣列名稱、定義指向 `dasm_state*` 的 `Dst` 等 * 透過 `tk_init` 將 `struct tokenizer` 結構根據 `src` 初始化(`src` 是要被 parsing 的 expression) * 因為 `ebx` 會在 `expr` 以後的程式中被使用，呼叫 `expr` 前需先 `push rbx` 將值保留在 stack 中，待從 `expr` 返回後再 pop 回復 `rbx` 狀態 > ![](https://i.imgur.com/9rUHVMb.png) > *x86 之 general purpose register 結構* * 最後也和 *Hello, DynASM World!* 所展現的一樣，呼叫 `jitcode` 透過 `state` 產生執行時期的程式 #### `expr` * 從 `expr` 開始會依次呼叫: `logic` -> `comparison` -> `term` -> `factor` -> `unary` -> `atomic` 對 expression 進行處理 * 後序呼叫的函式也可能再次呼叫 `expr`，可以想成是一個大 expression 中部份 token 組合可以形成一個小的子 expression * 這個順序應與 operator 的優先順序是有關係的，可以嘗試把 1. `comparison` 中的 `term` 換成 `factor` 2. `factor` 中的兩個 `unary` 換成 `term` 3. `term` 中的兩個 `factor` 換成 `unary` 觀察算式計算結果的改變 * 除此之外，`expr` 也處理 tenary operator (`?:`) 產生 branch 所需要的 machine code #### `logic` ```cpp int logic(struct compiler *comp, int reg) { if (comparison(comp, REG_EBX)) return -1; int op; int tag = -1; ... ``` * 如果 `comparison(comp, REG_EBX)` 的回傳為 0，此時的 **`ebx`** 是一個子 expression 中優先度更高的 operator 計算後的中間值 ```cpp do { if (comp->tk.tk != TK_AND && comp->tk.tk != TK_OR) { break; } op = comp->tk.tk; tk_move(&(comp->tk)); if (tag < 0) { check_compiler_tag(comp); tag = comp->tag; /* get the tag for current jump position */ ++comp->tag; } | xor eax, eax | cmp ebx, 0 if (op == TK_AND) { | je => tag } else { | setne al | jne => tag } if (comparison(comp, REG_EBX)) return -1; } while (1); ``` * 如果子 exprssion 後接著的是 `&&` (`TK_AND`) 或者 `||` (`TK_OR`)，需要生成對應 code * 因為 `&&` / `||` 產生的 code 中需要 branch，而 branch 需要的 label 是動態產生的，因此迴圈中第一個 iteration 要先初始化 `tag` * 呼叫 `check_compiler_tag`，檢查目前允許的 label 數量是否足夠 * 若不足，則透過 [`dasm_growpc`](http://corsix.github.io/dynasm-doc/reference.html#dasm_growpc) 來增加產生的 code 中可用的 label 數量 * `xor eax, eax` 將 `eax` 清零 * `cmp ebx, 0` 檢查 `ebx` 之結果 * 如果是 `&&` operator，`ebx` 若為零則 `je => tag` 跳至 `tag` 處，後續的子 exprssion 不需要執行 (但仍然會產生) * 如果是 `||` operator，`ebx` 若不為零則設 `al` 為 1，且當 `ebx` 不為 0 時 `| jne => tag` 跳至 tag 處，後續的子 exprssion 不需要執行 * `comparison(comp, REG_EBX)` 遞迴生成後續子 expression ```cpp /* normalize value */ if (tag >= 0) { |=> tag: | cmp eax, 0 | setne al if (reg== REG_EBX) { | movzx ebx, al } else { | movzx eax, al } } else { /* we do not have a actual logic combinator, so no need to normalize * the value. */ if (reg == REG_EAX) { | mov eax, ebx } } return 0; } ``` * `if (tag >= 0)` 處理確實 parsing 到 `&&` 或 `||` 的情況: * 執行到 `tag` 時，檢查 `eax` 是否為零並設置對應的 `al` * 並根據 target register 做移動 * `else` 下則是未遇到 `$$` 或 `||`，單純根據 target register 做移動即可 #### `comparison` / `term` / `factor` 這 3 個函式的實作大致類似，為避免重複冗贅的解釋，這裡我用 `term` 作為範例解釋，並僅提出 3 者在 machine code 產生的 instruction 細節做探討。 ```cpp /* term */ int term(struct compiler *comp, int reg) { if (factor(comp, REG_EAX)) return -1; do { int op; if (comp->tk.tk != TK_ADD && comp->tk.tk != TK_SUB) { if (reg == REG_EBX) { | mov ebx, eax } break; } op = comp->tk.tk; tk_move(&(comp->tk)); | push rax if (factor(comp, REG_EBX)) return -1; | pop rax if (op == TK_ADD) { | add eax, ebx } else { | sub eax, ebx } } while (1); return 0; } ``` * 以 `term` 為例，如果 `factor(comp, REG_EAX)` 的回傳為 0，此時的 `eax` 是一個子 expression 中優先度更高的 operator 計算後的中間值 * 在 `do...while` 迴圈中: * 對 token 為加號 `TK_ADD` 或減號 `TK_SUB` 進行處理 * 如果是 `TK_ADD` 或 `TK_SUB`，還需要呼叫 `factor(comp, REG_EBX)` 先去計算優先度更高的運算並將結果保存在 `ebx` 中 * 因為此時的 `rax` 中的值是後續要使用的，但 `factor` 的過程又會使用到 `eax`，因此需要 push 到 stack 中儲存 * 則最後根據 `TK_ADD` 或 `TK_SUB` 把 `eax` 和 `ebx` 透過對應指令計算 * `factor`、`comparison` 的結構大致相同，可以以同樣邏輯和運算的優先級去思考其他 instruction 細節: * `comparison`: * `setl` / `setle` 等一系列指令會根據 `cmp` 的結果設置 flag，並設置 `eax` 為對應的結果，詳細可以參考 [unary](#unary) 章節在 token 為 `TK_NOT` 時的處理方式 * `factor`: * [`cdq`](https://www.felixcloutier.com/x86/cwd:cdq:cqo) 將 `eax` 中的資料加以擴充後，分別存入 `edx` 與 `eax` 中，變成 `edx` : `eax` (64 bits)，這是因為除數 (`ebx`) 是 32 bits，需要有放商和餘數的空間，最後商存於 `eax` 中，餘數存於 `edx` 中 #### `unary` ```cpp /* unary */ int unary(struct compiler *comp, int reg) { /* generate code for unary is kind of tricky since the generation order * is not the order of how code is written. Here we need to save all the * unary operator until we get the operand, then we can do the generation. */ int op[32]; int sz = 0; do { if (comp->tk.tk == TK_NOT || comp->tk.tk == TK_SUB || comp->tk.tk == TK_ADD) { if (sz == 32) { fprintf(stderr, "You have 32 consecutive unary operators!"); return -1; } op[sz++] = comp->tk.tk; } else { break; } tk_move(&(comp->tk)); } while (1); ... ``` * `unary` 的 parsing 會先把 unary operator 放進 buffer `op`，並往下 parsing 直到得到 operand ```cpp /* now we need to call atomic function */ /* we do not have to save rax because until now we have not used it. */ if (atomic(comp, REG_EAX)) return -1; /* now we know EAX must contain our baby */ for (int i = 0; i < sz; ++i) { switch (op[i]) { case TK_SUB: | neg eax case TK_ADD : /* positive sign is unimplemented */ break; case TK_NOT: | cmp eax, 0 | sete al | movsx eax, al break; default: assert(0); } } if (reg == REG_EBX) { | mov ebx, eax } return 0; } ``` * 呼叫 `atomic` 對這個 operand 進行處理，考慮 expression 是合法的情況下，在 `atomic` 產生的 machine code 執行後 `eax` 應該會是 operand 之值 * 呼叫 `atomic` 時不需先保存 `rax`，因為 `atomic` 及其呼叫函式所生成的是第一段依賴 `rax` 的 machine code * 則可以看到 for 迴圈對應 unary 的類型生成 machine code * `TK_SUB`: [`neg`](https://www.felixcloutier.com/x86/neg) * `TK_ADD`: 維持 operand (`eax`) 原貌即可 * `TK_NOT`: 如果 `eax` 為 0，則將 `eax` 設為 1，否則設成 0 * [`movsx`](https://www.felixcloutier.com/x86/movsx:movsxd) * [`sete`](https://web.itu.edu.tr/kesgin/mul06/intel/instr/sete_setz.html) * `if (reg == REG_EBX) {| mov ebx, eax}`: 如果 target register 是 `ebx`，因我們之前的計算值是擺在 `eax` 上，需轉存到 `ebx` #### `atomic` 解析目前解析之 token 種類: ```cpp /* number value is always returned in EAX register */ int atomic(struct compiler *comp, int reg) { /* lex next token from the input */ switch (comp->tk.tk) { case TK_LPAR: tk_move(&(comp->tk)); if (expr(comp, reg)) return -1; if (comp->tk.tk != TK_RPAR) { fprintf(stderr, "The sub-expression is not properly closed!"); return -1; } tk_move(&(comp->tk)); return 0; ... ``` * 左括號 `TK_LPAR`: 透過 `tk_move` 移動到下一個 token 並呼叫 `expr` 繼續 parsing，正確的 expression 下，`expr` 應一路 parsing 到 matching 的右括號(`TK_RPAR`)後返回 ```cpp case TK_VARIABLE: { const char *var = add_string(comp->tk.val.symbol); if (!var) { fprintf(stderr, "too much symbol name!"); return -1; } tk_move(&(comp->tk)); if (comp->tk.tk == TK_LPAR) { /* it is a function call */ return func_call(comp, reg, var); } /* generate call stub */ | mov rdi, var | callp &lookup if (reg== REG_EBX) { | mov ebx, eax } break; } ``` * 變數 `TK_VARIABLE`: 呼叫 `add_string` 將變數名稱加入 `STRTABLE`，並判斷接著此變數的是否為左括號，若是，表示為 function 名稱呼叫 `func_call` 生成對應的 machine code * 尚未有自定義變數的功能 ```cpp case TK_NUMBER: { int num = comp->tk.val.num; if (reg == REG_EAX) { | mov eax, dword num } else { | mov ebx, dword num } tk_move(&(comp->tk)); break; } default: return -1; } return 0; } ``` * 數值 `TK_NUMBER`: 取出數值並透過 `mov` 指令儲存至參數指定的 target register 中 * 若以上皆非，則回傳 -1 #### `func_call` ```cpp /* function call */ int func_call(struct compiler *comp, int reg, const char *fn) { int cnt = 0; /* function argument count */ assert(comp->tk.tk == TK_LPAR); void *addr = func_lookup(fn); /* function address */ if (!addr) { fprintf(stderr, "no such function:%s!", fn); return -1; } tk_move(&(comp->tk)); ... ``` * 如果 parsing 到函式呼叫，則透過 `func_lookup` 可以從函式名稱對應到該函式的位址 * 並透過 `tk_move` parsing 函式中的所有參數 ```cpp while (comp->tk.tk != TK_RPAR) { /* until now we have not used EAX/RAX */ if (expr(comp, REG_EAX)) return -1; /* push the value into corresponding register for call */ switch (cnt) { case 0: | pusharg1 eax break; case 1: | pusharg2 eax break; case 2: | pusharg3 eax break; case 3: | pusharg4 eax break; case 4: | pusharg5 eax break; case 5: | pusharg6 eax break; default: assert(0); } ++cnt; if (cnt == 6) { fprintf(stderr, "only 6 arguments is allowed to call a function!"); return -1; } /* check comma or not */ if (comp->tk.tk != TK_COMMA) break; tk_move(&(comp->tk)); } ``` * 接著就只是不斷呼叫 `expr` 取出參數內容，則 `expr` 會把參數擺在 `eax`，只需要對應 `cnt` 得知這是第幾個參數，並移動至對應的 register 即可 * 例如 `pusharg1 eax` 展開來是 `mov edi eax` * jitcalc 中允許最多 6 個參數，對應 6 個暫存器 `edi`、`esi`、`edx`、`ecx`、`r8d`、`r9d`，因為它們是在 x86-64 架構的 System V ABI 中指出可用來傳遞參數型態為整數的 register :::info 根據 [System V Application Binary Interface AMD64 Architecture Processor Supplement](https://www.uclibc.org/docs/psABI-x86_64.pdf) 第 21 頁 **3.2.3 Parameter Passing**: > If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used. ::: ```cpp /* Expect one ")" */ if (comp->tk.tk != TK_RPAR) { fprintf(stderr, "The function is not properly closed!"); return -1; } /* Emit call */ | callp addr /* Now move result accordingly */ if (reg == REG_EBX) { | mov ebx, eax } tk_move(&(comp->tk)); return 0; } ``` * 退出迴圈後，應可以 match 到一個右括號 `TK_RPAR` * 此時，產生 `callp addr`，此 macro 之定義可以下方程式碼，`rbx` 會儲存欲呼叫之函式地址並透過 `call rbx` 執行 ```cpp /* We use rbx since it is not the 6 registers that is used to pass parameter * and also not use as return value, so it is safe to do this before we call * into another function. */ |.macro callp, addr | push rbx | mov rbx, (uintptr_t) addr | call rbx | pop rbx |.endmacro ``` ## 程式原理對於 Game Boy 架構和 DynASM 的運作有了一定的認識後，嘗試研讀 [sysprog21/jitboy](https://github.com/sysprog21/jitboy) 程式碼，為避免複製太多程式碼影響版面，下面我會直接針對行數標註做說明，可以直接對照原專案閱讀(編撰文件時的 commit 版本是 `0466847`)。 ### `init_vm` 模擬器被啟動時，根據實際的 GameBoy 啟動方式將 `gb_vm` 對應的內容也做初始化。 ```c=14 if (!gb_memory_init(&vm->memory, filename)) return false; dump_header_info(&vm->memory); ``` * 先預設是 filename 不為 NULL 的情況，則檔案會被開啟並把 `vm->memory->fd` 設成對應的 file descripter ```cpp /* initialize memory layout and map file filename */ bool gb_memory_init(gb_memory *mem, const char *filename) { ... else { mem->fd = open(filename, O_RDONLY); if (mem->fd < 0) { LOG_ERROR("Could not open file! (%i)\n", errno); return false; } mem->mem = mmap((void *) 0x1000000, 0x8000, PROT_READ, MAP_PRIVATE, mem->fd, 0); if (mem->mem == MAP_FAILED) { LOG_ERROR("Map failed! (%i)\n", errno); return false; } if (mmap(mem->mem + 0x8000, 0x8000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0) == MAP_FAILED) { LOG_ERROR("Allocating memory failed! (%i)\n", errno); return false; } } ``` * `gb_memory_init` 的任務: * `vm->memory->mem` 透過 [mmap](https://man7.org/linux/man-pages/man2/mmap.2.html) 將該 `fd` 從 offset 0 映射到從指定位址(`0x1000000`)開始的連續 0x8000 大小位址，並設為可讀 * 對照 GameBoy 的 memory map 可知對應的是 ROM 空間，因此不可寫 * 另外的 `0x8000` - `0xFFFF` 也需要被映射，但是要設為可讀而且可寫 * 並且初始化相關的變數如 filename、使用的 ROM/RAM bank index * `mem->mbc = mem->mem[0x0147]` 是因為卡匣的 0x147 位置表示 MBC 的類型 * `dump_header_info` 解析卡匣並印出上面的相關信息 * 初始化結構成員，硬體相關變數的初始化根據 [Pan Docs: Power Up Sequence](https://gbdev.io/pandocs/#power-up-sequence) > 「[匣](http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi?o=dcbdic&searchid=W00000006902)」字的意思是「收藏器物的小箱子」，它只有 ==ㄒㄧㄚˊ== 的讀音，常見誤讀為**ㄐㄧㄚˊ**。案例：印表機的「墨水匣」 ```cpp=87 if (init_io) { /* init lcd */ if (!init_window(&vm->lcd)) return false; ... ``` ```cpp bool init_window(gb_lcd *lcd) { SDL_Init(SDL_INIT_VIDEO); lcd->win = SDL_CreateWindow("jitboy", SDL_WINDOWPOS_UNDEFINED, SDL_WINDOWPOS_UNDEFINED, 160 * 3, 144 * 3, SDL_WINDOW_OPENGL); if (!lcd->win) { LOG_ERROR("Window could not be created! SDL_Error: %s\n", SDL_GetError()); return false; } lcd->vblank_mutex = SDL_CreateMutex(); lcd->vblank_cond = SDL_CreateCond(); lcd->exit = false; lcd->thread = SDL_CreateThread(render_thread_function, "Render Thread", (void *) lcd); return true; } ``` * `init_window` 的任務: * [`SDL_Init`](https://wiki.libsdl.org/SDL_Init): 初始化 SDL library，`SDL_INIT_VIDEO` 表示初始為視訊相關的系統狀態 * [`SDL_CreateWindow`](https://wiki.libsdl.org/SDL_CreateWindow) 建立 `sdlWindow` 物件做為繪畫的畫布 * [`SDL_CreateMutex`](https://wiki.libsdl.org/SDL_CreateMutex) 建立 `SDL_mutex` 以做 synchronization，透過 [`SDL_LockMutex`](https://wiki.libsdl.org/SDL_LockMutex) 和 [`SDL_UnlockMutex`](https://wiki.libsdl.org/SDL_UnlockMutex) 做上鎖和解鎖操作 * [`SDL_CreateCond`](https://wiki.libsdl.org/SDL_CreateCond) 建立 `SDL_cond`，這讓 thread 可以藉由 [`SDL_CondWait`](https://wiki.libsdl.org/SDL_CondWait) 解開 mutex 並等待直至被 signal 後，返回執行並重新將 mutex 上鎖 * [`SDL_CreateThread`](https://wiki.libsdl.org/SDL_CreateThread) 建立 `SDL_Thread`，thread 將運行 `render_thread_function`，`lcd` 則是此 function 會使用到的參數 ### `run_vm` `run_vm` 的程式結構比較複雜，因此此部份我先透過 gdb 追蹤 jitboy 在運行 `mario.gb` 時在 `run_vm` 的執行，嘗試理解此函數的作用。在初次啟動時因為 `vm->state.pc` 是初始為 0x100 的，因此會進入下面的分支: ```cpp=110 /* compile next block / get cached block */ if (vm->state.pc < 0x4000) { /* first block */ if (vm->compiled_blocks[0][vm->state.pc].exec_count == 0) { if (!compile(&vm->compiled_blocks[0][vm->state.pc], &vm->memory, vm->state.pc, vm->opt_level)) goto compile_error; } LOG_DEBUG("execute function @%#x (count %i)\n", vm->state.pc, vm->compiled_blocks[0][vm->state.pc].exec_count); vm->compiled_blocks[0][vm->state.pc].exec_count++; vm->state.pc = vm->compiled_blocks[0][vm->state.pc].func(&vm->state); LOG_DEBUG("finished\n"); } ``` * 首次啟動時，因為每個 block 的 `exec_count` 是初始為 0，會執行到 113 行的 if 中，`compile` 會被呼叫 * 一旦被 `compile` 過，之後就不需要重新呼叫 `compile` 並轉換 * 119 行將 `exec_count` 加一 * 然後 120 行就可以從轉換後的 block 之 `f_start` 位置開始執行 :::info :question: 待解: 回傳的 `vm->state.pc` 對應哪一段產生的 machine code? 答: 因為每個 block 都是停止於一個非條件的 jump 類型指令，因此正常情況下的會回傳跳轉後的地址，例如 `inst_jp` 中可以產生的 `return` ::: ```cpp=170 do { if (vm->state.inst_count >= vm->state.next_update) { /* check interrupts */ update_ioregs(&vm->state); if (vm->memory.mem[0xff44] == 144) { if (vm->draw_frame) { unsigned time = SDL_GetTicks(); if (!SDL_TICKS_PASSED(time, vm->next_frame_time)) { SDL_Delay(vm->next_frame_time - time); } vm->time_busy += time - vm->last_time; vm->last_time = SDL_GetTicks(); if (++(vm->frame_cnt) == 60) { vm->frame_cnt = 0; float load = (vm->time_busy) / (60 * 17.0); char title[15]; sprintf(title, "load: %.2f", load); SDL_SetWindowTitle(vm->lcd.win, title); vm->time_busy = 0; } vm->next_frame_time += 17; /* 17ms until next frame */ SDL_CondBroadcast(vm->lcd.vblank_cond); vm->draw_frame = false; } } else { vm->draw_frame = true; } ... ``` * 進入 `do...while()` 迴圈，第一個 iteration 中，在 171 行時 `vm->state.inst_count = 5`(NOP:1 + JP: 4) 而 `vm->state.next_update = 0` * 可以對照 [Gameboy CPU (LR35902) instruction set](https://www.pastraiser.com/cpu/gameboy/gameboy_opcodes.html)，不過要注意程式裡的 cycle 是指 machine cycles 而連結中指的是 clock cycle，因此要 / 4 計算 * `update_ioregs` 根據 `inst_count` 累計更新相關的 MMIO register * 175 行檢查如果 LCDY(`0xff44`) == 144，且 `vm->draw_frame` 為 true，表示即將要切換到 V-Blank mode，要將透過 SDL 模擬的 Gameboy 畫面進行更新: * 177 行 [`SDL_GetTicks()`](https://wiki.libsdl.org/SDL_GetTicks) 取得目前的 counter * 178 行 [`SDL_TICKS_PASSED`](https://wiki.libsdl.org/SDL_TICKS_PASSED) 確保 frame 與 frame 間需間隔至少 17ms，否則就用 [`SDL_Delay`](https://wiki.libsdl.org/SDL_Delay) 補足這個延遲 * 185 行 branch 中計算畫面更新負載的相關信息並顯示在視窗標題上(細節暫略) ```cpp=202 uint16_t interrupt_addr = start_interrupt(&vm->state); if (interrupt_addr) { LOG_DEBUG("interrupt from %i to %i\n", vm->state.pc, interrupt_addr); /* end halt mode */ if (vm->state.halt == 1) vm->state.halt = 0; /* save PC to stack */ vm->state._sp -= 2; *(uint16_t *) (&vm->state.mem->mem[vm->state._sp]) = vm->state.pc; // jump to interrupt address vm->state.pc = interrupt_addr; } ... ``` * `start_interrupt` 判斷 interrupt 是否 enable，設定 `state->trap_reason` 並回傳 ISR 位置 * 208 行當 interrupt 產生時會結束 halt mode * 212 行移動 stack pointer 2 bytes，儲存現在的 pc 到 stack 中，然後把 pc 移動到 ISR ```cpp=218 if (vm->state.halt == WAIT_STAT3 && (vm->memory.mem[0xff41] & 0x3) == 0x3) { vm->state.halt = 0; /* end wait for stat mode 3 */ } if (vm->state.halt == WAIT_LY && vm->memory.mem[0xff44] == vm->state.halt_arg) { vm->state.halt = 0; /* end wait for ly */ } ``` :::danger :question: 待解: 特殊的 halt 狀態? ::: ```cpp=227 vm->state.next_update = next_update_time(&vm->state); } if (vm->state.halt != 0) { vm->state.inst_count = (vm->state.inst_count < vm->state.next_update ? vm->state.next_update : vm->state.inst_count + 16); } } while (vm->state.halt != 0); ``` :::danger :question: 待解: `next_update_time` 和 `inst_count` 的更新 ::: ### `compile` ```cpp /* compiles block starting at start_address to gb_block */ bool compile(gb_block *block, gb_memory *mem, uint16_t start_address, int opt_level) { LOG_DEBUG("compile new block @%#x\n", start_address); GList *instructions = NULL; uint16_t i = start_address; for (;;) { gbz80_inst *inst = g_new(gbz80_inst, 1); uint8_t opcode = mem->mem[i]; if (opcode != 0xcb) { *inst = inst_table[opcode]; } else { opcode = mem->mem[i + 1]; *inst = cb_table[opcode]; } inst->args = mem->mem + i; inst->address = i; i += inst->bytes; if (inst->opcode == ERROR) { LOG_ERROR("Invalid Opcode! (%#x)\n", opcode); return false; } LOG_DEBUG("inst: %i @%#x\n", inst->opcode, inst->address); instructions = g_list_prepend(instructions, inst); if (inst->flags & INST_FLAG_ENDS_BLOCK) break; } ``` * compile 會將 `block` 中從 `start_address` 開始的 code 做轉換 * 透過 [`g_new`](https://developer.gnome.org/glib/unstable/glib-Memory-Allocation.html#g-new) 初始化 `inst` 所需空間 > 關於 [Glib](https://en.wikipedia.org/wiki/GLib): > * [Glib 就是懶．資料處理好手 - GList 雙向鏈結(Doubly-Linked)](https://fred-zone.blogspot.com/2009/01/glib-glist-doubly-linked.html) > * 在 for 迴圈中，解析每個 byte 來得知 ROM 中的指令為何，並考慮是 1 byte 或 `PREFIX CB` 系列的 2 bytes instruction 的處理 * 根據每個 instruction 需要的 byte，在 for 迴圈中應可以將 instruction 和其參數一一解析，並透過 [`g_list_prepend`](https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html#g-list-prepend) 將 `gbz80_inst` 物件用 doubly-linked list 串接起來 * 如 [jitboy/README.md](https://github.com/sysprog21/jitboy/blob/master/README.md) 所述，非條件的 jump 類型指令會帶有 `INST_FLAG_ENDS_BLOCK` flag，讓迴圈結束，完成一個 block 的指令 > Every unconditional jump (JP, CALL, RST, RET, RETI), as well as EI (Enable Interrupts) terminate a block. ```cpp instructions = g_list_reverse(instructions); if (!optimize_block(&instructions, opt_level)) return false; if (!optimize_cc(instructions)) return false; bool result = emit(block, instructions); g_list_free_full(instructions, g_free); return result; } ``` * [`g_list_reverse`](https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html#g-list-reverse) 將 doubly-linked list 的 next 和 prev 反轉 * `optimize_block` 根據原專案中提到的優化策略做對應處理(暫略) * `optimize_cc` ?? * `emit` 將 block 中的 gbZ80 instruction 在執行時期轉換成 x64 instruction ### `emit` `emit` 是把 gbZ80 指令轉換成 x86 指令的關鍵部份，撰寫在 [emit.dasc](https://github.com/sysprog21/jitboy/blob/master/src/emit.dasc) 中，並會透過 DynASM preprocessing 使得可以在執行期產生 code。 ```cpp=989 bool emit(gb_block *block, GList *inst) { dasm_State *d; uint32_t npc = 4; uint64_t cycles = 0; uint16_t end_address = 0; |.section code dasm_init(&d, DASM_MAXSECTION); |.globals lbl_ void *labels[lbl__MAX]; dasm_setupglobal(&d, labels, lbl__MAX); |.actionlist gb_actions dasm_setup(&d, gb_actions); dasm_growpc(&d, npc); dasm_State **Dst = &d; |.code |->f_start: | prologue ``` * `dasm_init` 初始化 `dasm_State *` 物件 * [`.globals lbl_`](http://corsix.github.io/dynasm-doc/reference.html#_globals) 被 DynASM preprossing 成 enum 結構，結構的成員都是 `lbl__` 為 prefix，包含一個必定存在的 `lbl__MAX` 以及每個 `-> label` 語法的 label 都會被映射成 enum 中的數值，然後就可以透過 [`dasm_setupglobal`](http://corsix.github.io/dynasm-doc/reference.html#dasm_setupglobal) 做初始化 * 可以參考這個說明: [link](http://corsix.github.io/dynasm-doc/tutorial.html#dasm_setupglobal) * `dasm_setup` 初始化 `gb_actions` * `dasm_growpc` 指定動態產生的 label 數量，我們知道 `npc` 為 4，仔細找的話也可以發現對應數量的 label `->f_start:`、`1:`、`2:`、`3:` * 1008 行可以參考這個說明: [link](http://corsix.github.io/dynasm-doc/tutorial.html#prologue) * 定義 `dasm_State **Dst = &d` 讓 DynASM preprocessor 可以後續的 code 生成轉換成 `dasm_put(Dst, ...)` * `.code` 對應 `.section code` ，使得後續產生的 machine code 都會在這個 section 中 * `->f_start:` 讓產生的 machine code 可以知道 label 的地址並轉換成 function pointer * [`prologue`](https://github.com/sysprog21/jitboy/blob/master/src/dasm_macros.inc#L109) 是一個 macro: 執行一個 block 的 machine code 可以想成是執行一個 function call，則當 block 被執行時，應進行某些準備: ``` |.macro prologue | push rbx | push rsp | push rbp | push r12 | push r13 | push r14 | push r15 ``` 1. backup register 到 stack 上 :::info :question: 為甚麼只要 push 這些 or 為甚麼要 push 這些? :paperclip: 根據 [System V Application Binary Interface AMD64 Architecture Processor Supplement](https://www.uclibc.org/docs/psABI-x86_64.pdf) 第 16 頁 **3.2.1 Registers and the Stack Frame** > *Registers %rbp, %rbx and %r12 through %r15 “belong” to the calling function and the called function is required to preserve their values. In other words, a called function must preserve these registers’ values for its caller.* 可以對照下表: ![](https://i.imgur.com/ORO8zar.png) ::: :::info 使用 `make clean debug` 編譯後再執行 `jitboy`，會得到 `/tmp/jitcode?` 檔案，接著執行 ```shell $ objdump -D -b binary -mi386 -Mx86-64 /tmp/jitcode? ``` 即可觀察產生的程式碼，注意到 x86-64 ABI 的要求。部分 `pushfq` 非必要，可移去。 :notes: jserv ::: ``` | mov aState, rArg1 | mov xA, 0 | mov A, state->a | mov xB, 0 | mov B, state->b | mov xC, 0 | mov C, state->c | mov xD, 0 | mov D, state->d | mov xE, 0 | mov E, state->e | mov xH, 0 | mov H, state->h | mov xL, 0 | mov L, state->l | mov xSP, 0 | mov SP, state->_sp | mov tmp1, state->mem | mov aMem, [tmp1 + offsetof(gb_memory, mem)] | mov tmp2, state->flags | push tmp2 |.endmacro ``` 2-1. `state` 由 [`|.type state, gb_state, aState`](http://corsix.github.io/dynasm-doc/reference.html#_type) 定義，使得可以透過 `state->??` 這樣的語法的來表達 `[aState + offsetof(gb_state,??)]` 2-2. `rArg1(rdi)` 會是輸入函式的第一個參數，應會是紀錄著 GameBoy 狀態的 `gb_state` 結構之 `state` 位址(會透過類似 `vm->compiled_blocks[n][vm->state.pc].func(&vm->state);` 這樣的 statement 執行此 block)，因此將其載回 `aState` 中 | Game Boy | x86-64 | comment | |----------|----------|---------| | A | r0 (rax) | accumulator | | F | - | generated dynamically from the `FLAGS` register | | B | r1 (rcx) | | | C | r2 (rdx) | | | D | r3 (rbx) | | | E | r13 | | | H | r5 (rbp) | | | L | r6 (rsi) | | | SP | r7 (rdi) | | | PC | - | not necessary | | - | r8 | base address of Game Boy address space | | - | r9 | address of strct `gb_state` | | - | r10 | temporary register | | - | r11 | temporary register | | - | r12 | temporary register | | - | r4 (rsp) | host stack pointer | 2-3. `state` 會紀錄目前 gameBoy 的狀態(GP register、stack pointer 等)，因此要根據上面的對應更新目前 x86_64 暫存器 * 可以看到從 a - l 的暫存器及 SP 狀態被載回對應的 x86-64 暫存器 * 也需要載回 `aMem`(r8)，這是 Game Boy 定址空間的 base address :::danger ```cpp enum { INST_FLAG_NONE = 0x00, INST_FLAG_PRESERVE_CC = 0x01, INST_FLAG_PERS_WRITE = 0x02, INST_FLAG_USES_CC = 0x04, INST_FLAG_AFFECTS_CC = 0x08, INST_FLAG_ENDS_BLOCK = 0x10, INST_FLAG_SAVE_CC = 0x20, INST_FLAG_RESTORE_CC = 0x40 } flags; ``` :question: 待解: `gb_state` 中 `flags` 的用途及其需要被 push 到 stack 的原因 ::: ```cpp=1013 for (; inst; inst = inst->next) { end_address = DATA(inst)->address + DATA(inst)->bytes - 1; if (DATA(inst)->flags & INST_FLAG_RESTORE_CC) { | popfq | pushfq } switch (DATA(inst)->opcode) { ... ``` ```cpp=1190 ... } if (DATA(inst)->flags & INST_FLAG_SAVE_CC) { | pop tmp1 | pushfq } } ``` 一旦將 register 映射好，就可以迴圈將 linked list 的指令一一處理了: * `DATA(inst)` 將 linked list 中的 `gbz80_inst *` 結構取出，則 `DATA(inst)->address + DATA(inst)->bytes - 1` 計算出整個 instruction 涵蓋到的位址 `end_address` * 經迴圈中不斷更新，`end_address` 會是最後一個 instruction 的結束地址，也是此 block 的結束地址 :::danger :question: 待解: 1016 行和 1193 行在 flag 有 `INST_FLAG_RESTORE_CC` 和 `INST_FLAG_SAVE_CC` 時對 CPU state 的處理 ::: * 暫略每個 gbz80 instruction 如何轉換成 x86 instruction 的細節QQ ```cpp=1199 | add qword state->inst_count, cycles | return -1 size_t sz; if (dasm_link(&d, &sz) != 0) { LOG_ERROR("dasm_link failed\n"); goto exit_fail; } void *buf = mmap(0, sz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (!buf) { LOG_ERROR("could not allocate memory for JIT compilation\n"); goto exit_fail; } if (dasm_encode(&d, buf) != 0) { LOG_ERROR("dynasm_encode failed\n"); goto exit_fail; } if (mprotect(buf, sz, PROT_READ | PROT_EXEC) != 0) { LOG_ERROR("could not make compiled function executable\n"); goto exit_fail; } ``` * 轉換完整個 block 之後，需要更新 `state->inst_count`，加上將這些指令執行後的總 cycles * `return` 表示一個 block 的結束，把對應的 x86_64 暫存器狀態載回 `gb_state` 結構中，也回復一開始 backup 到 stack 中的暫存器內容 * 後續的流程就類似最初的範例，產出 machine code 並設定其讀寫執行權限 * 前面曾經提過，因為每個 block 都是停止於一個非條件的 jump 類型指令，因此正常情況下的回傳是透過這個指令，如果執行到 `return -1`，應是發生非預期的錯誤 ```cpp=1224 block->func = labels[lbl_f_start]; block->mem = buf; block->size = sz; block->end_address = end_address; block->exec_count = 0; dasm_free(&d); ``` * `block->func` assign 為 `labels[lbl_f_start]` 得到 `f_start` label 的指標位置 * `block->mem` 為產生出的可執行 machine code * 也要紀錄該 block 的 code 大小和 `end_address` * 首次轉換時 `exec_count` 為 0，則之後再執行到同段時不需再轉換一次，累計 `exec_count` 可以知道此 block 被執行的次數 ### `update_ioregs` jitboy 利用 counter(`inst_count`) 的計算來模擬 Game Boy 的 timer，而 `update_ioregs` 則據此模擬的 timer 作對應的更新。 ```cpp=35 int clock[] = {256, 4, 16, 64}; int cl = clock[mem[0xff07] & 0x3]; ``` * `0xff07` 是 timer control register，依此 bitwise 計算可以得知 timer 需要被更新的 config frequency 對應的 machine cycle 數量 ```cpp=38 if (state->inst_count > state->tima_count + cl) { state->tima_count = state->inst_count; if (mem[0xff07] & 0x4) { mem[0xff05]++; if (mem[0xff05] == 0) { mem[0xff05] = mem[0xff06]; // timer interrupt selected mem[0xff0f] |= 0x04; } } } ``` * `state->inst_count > state->tima_count + cl` 表示 `run_vm` 完成目前的 block 後，累計的 machine cycle 已經需要更新 timer register(`0xff05`) * `mem[0xff07] & 0x4 != 0` 表示 timer enable * `mem[0xff05] == 0` 為 overflow，此時從 timer modulo register 重新載入並且拉起 `TIMA` 中斷的 flag :::danger :question: 待解: 按照此邏輯的話，似乎每次執行一個新的 block 最多只會 timer + 1(`mem[0xff05]++`)? 但一個 block 的執行沒有可能需要累加更多 timer 嗎? ::: ```cpp=50 /* div-register 0xff04 */ if (state->inst_count > state->div_count + cl) { state->div_count = state->inst_count; mem[0xff04]++; } ``` :::danger :question: divider 為甚麼不是檢查 64 個 machine cycles 而是也跟著 config frequency 去更新？ ::: > 這裡程式沒寫完。 > :notes: jserv ```cpp=56 /* reset the coincidence flag */ mem[0xff41] &= ~0x04; ``` * 先清掉 LCD Status coincident flag ```cpp=59 // ly-register 0xff44 if (state->inst_count > state->ly_count + 114) { state->ly_count = state->inst_count; if (mem[0xff44] < 144) update_line(mem); mem[0xff44]++; mem[0xff44] %= 153; if (mem[0xff45] == mem[0xff44]) { /* Set the coincidence flag */ mem[0xff41] |= 0x04; /* Coincidence interrupt selected */ if (mem[0xff41] & 0x40) mem[0xff0f] |= 0x02; /* stat interrupt occurs */ } /* if-register 0xff0f */ if (mem[0xff44] == 144) { /* VBLANK interrupt is pending */ mem[0xff0f] |= 0x01; /* mode 1 interrupt selected */ if (mem[0xff41] & 0x10 && (mem[0xff41] & 0x03) != 1) mem[0xff0f] |= 0x02; /* stat interrupt occurs */ /* LCDC Stat mode 1 */ mem[0xff41] &= ~0x03; mem[0xff41] |= 0x01; } } ``` ![](https://i.imgur.com/n84cpEB.png) > 圖源: [The Ultimate Game Boy Talk](https://media.ccc.de/v/33c3-8029-the_ultimate_game_boy_talk) ![](https://i.imgur.com/lvj9wOG.png) > [gbdev: FF41 - STAT - LCDC Status (R/W)](https://gbdev.gg8.se/wiki/articles/Video_Display#FF41_-_STAT_-_LCDC_Status_.28R.2FW.29) * 每個 scanline 需要 114 個 machine cycle(P.S. 圖中寫成 clocks 是因為原影片的定義差異) * `mem[0xff44]` < 144 時呼叫 `update_line` 對畫面的 scanline 進行更新 * 66 行更新 LCDY(`0xff44`) * 69 行如果 LCDY compare = LCDY，拉起 LCD Status coincident flag * LCDC interrupt 會有多種發生的原因，其中一種是 LCDY compare = LCDY 時，74 行檢查如果 LYC=LY coincidence interrupt enable，則拉起 `LCDC` interrupt 的 flag * 79 行當 `mem[0xff44] == 144` 時要切換為 V-Blank，在第 81 行拉起產生 `VBlank` interrupt 的 flag * 84 行: `mem[0xff41] & 0x10` 表示 Mode 1 V-Blank Interrupt enable，此時拉起 `LCDC` interrupt 的 flag (注意非 `VBlank` interrupt) * 87 行將 LCD status `0xff41` 的 mode 設為 1(V-Blank mode) :::danger :question: 待解: 類似前面的問題， `mem[0xff44]++;` 似乎代表一個 block 的執行最多只能更新一個 scanline? 總結這個問題的話，在我目前對程式碼的理解，在非 halt mode 時， `do...while()` 迴圈僅會執行一個 interration? 這表示每個 block 的執行至多只會呼叫 `update_ioregs()` 以 `++` 方式更新 `tima-register(0xff05)`、`div-register(0xff04)`、`ly-register(0xff44)` 一次，因為 if 敘述中的 count 更新總是以 `state->??_count = state->inst_count;` 的形式存在 (但並未嚴謹的確認，需要追蹤程式碼的執行釐清) ::: > 函式 `update_ioregs` 呼叫 `update_line`，後者定義於 `lcd.c` 中，你要搭配 `render_back` 的實作來看，畫面更新取決於 LY 和 SCROLL Y，可以更新的範圍以有效的 sprites 為主。 > :notes: jserv :::danger :question: 待解: `%= 153` 與總共只需要 144 + 10 = 154 個 lines 數量之 clocks 的更新是否有關? 如果是為甚麼不是 `%= 154` ? ::: > 顯然程式碼錯了，歡迎送 pull request 過來。 > [gameboy-emu/gameboy.h](https://github.com/sysprog21/gameboy-emu/blob/master/gameboy.h) 定義 `LCD_VERT_LINES`，你可查閱相關程式碼，再來改進 `jitboy` > :notes: jserv > :+1: 好的，待我對照相關文件與程式碼後會再嘗試完整的修正此部份 ```cpp=93 if (mem[0xff44] < 144) { /* if not VBLANK */ if (state->inst_count - state->ly_count < 20) { /* mode 2 interrupt selected */ if (mem[0xff41] & 0x20 && (mem[0xff41] & 0x03) != 2) mem[0xff0f] |= 0x02; /* stat interrupt occurs */ /* LCDC Stat mode 2 */ mem[0xff41] &= ~0x03; mem[0xff41] |= 0x02; } else if (state->inst_count - state->ly_count < 63) { /* LCDC Stat mode 3 */ mem[0xff41] &= ~0x03; mem[0xff41] |= 0x03; } else { /* mode 0 interrupt selected */ if (mem[0xff41] & 0x08 && (mem[0xff41] & 0x03) != 0) mem[0xff0f] |= 0x02; /* stat interrupt occurs */ /* LCDC Stat mode 0 */ mem[0xff41] &= ~0x03; } } } ``` * `mem[0xff44]` < 144 時: * 95 行，小於 20 表示是 Mode 2 OAM search，如果 Mode 2 OAM interrupt enable 則要產生之 * 103 行 < 63 表示在做 pixel transfer * 否則是 Mode 0 H-blank，如果 Mode 0 H-Blank interrupt enable 則要產生之 ### `render_back` 用文字敘述這邊的程式碼可能會有點混亂 :laughing: 所以先嘗試總結這個函式的作用: 總結這裡概念的話，就是 Gameboy 中有一個 256 x 256 pixel 的畫面，但實際顯示的只是其中 160 x 144，因此我們需要計算出這個範圍中用到的 tiles(且不一定是 160 x 144 / 8 x 8 = 20 x 20 個 tiles! 因為 Scroll X/Y 的移動單位是 pixel)，然後對應到我們用來模擬畫面的 SDL buffer 上！這裡看似複雜的參數設計也是為此邏輯。 ```cpp=3 static void render_back(uint32_t *buf, uint8_t *addr_sp) { uint32_t pal_grey[] = {0xffffff, 0xaaaaaa, 0x555555, 0x000000}; /* point to tile map */ uint8_t *ptr_map = addr_sp; if (addr_sp[0xff40] & 0x8) ptr_map += 0x9c00; else ptr_map += 0x9800; ``` * `pal_grey` 從 index 0 至 3 代表從黑到白的顏色變化 * 要把 `ptr_map` 指向 tile map 的位置，透過 LCD Control (`0xFF40`) 的 bit 3(LSB = 0) 得知背景的 tile map 是 low 還是 high ```cpp=14 /* Current line + SCROLL Y */ uint8_t y = addr_sp[0xff44] + addr_sp[0xff42]; /* SCROLL X */ int j = addr_sp[0xff43]; uint8_t x1 = j >> 3; /* Advance to row in tile map */ ptr_map += ((y >> 3) << 5) & 0x3ff; ``` * 從 LCD Y(`0xff44`) 和 Scroll Y(`0xff42`) 計算出 VRAM 中整個 256 x 256 pixels 畫面的 160 x 144 範圍在左上角頂點之 y 座標 * 每個 tiles 是 8 x 8 pixels 大小，因此 18 行可以計算出 tiles map index 在 x 方向的位移是 `(x / 8)` * 又整個畫面是 256 x 256 pixels，因此第 21 行可以計算出 tiles map index 在 y 方向的位移是 `(y / 8) * (256 / 8)` (`& 0x3ff` 是 index 超出 1023 的話要 round 回開頭，因為 tiles 的總數是 32 x 32 個) ```cpp=23 int i = addr_sp[0xff44] * 160; // 0; j &= 7; uint8_t x = 8 - j; uint8_t shift_factor = ((uint8_t)(~j)) % 8; ``` * `i` 則是 SDL 需要的 buffer 之 index，因為 buffer 的大小為 160 x 144，`* 160` 位移到要更新的位置 * `x` 計算第一個 tiles 因 pixel 的偏移實際需要畫出的 pixel 數量 `1 - 8`，這是因為要考慮實際顯示的單位是 pixels 而非 tiles * 而 `shift_factor` 其實就是 `x - 1` ```cpp=27 for (; x < 168; x += 8) { uint16_t tile_num = ptr_map[x1++ & 0x1f]; if (!(addr_sp[0xff40] & 0x10)) tile_num = 256 + (signed char) tile_num; /* point to tile. * Each tile is 8 * 8 * 2 = 128 bits = 16 bytes */ uint8_t *ptr_data = addr_sp + 0x8000 + (tile_num << 4); /* point to row in tile depending on LY and SCROLL Y. * Each row is 8 * 2 = 16 bits = 2 bytes */ ptr_data += (y & 7) << 1; for (; j < 8 && (x + j) < 168; shift_factor--, j++) { uint8_t indx = ((ptr_data[0] >> shift_factor) & 1) | ((((ptr_data[1] >> shift_factor)) & 1) << 1); /* if bit 0 in LCDC is not set, screen is blank */ buf[i] = (addr_sp[0xff40] & 0x01) ? pal_grey[(addr_sp[0xff47] >> (indx << 1)) & 3] : (unsigned) -1; i++; } j = 0; shift_factor = 7; } ``` * 透過計算出的 tiles map index 得到 `tile_num`(`& 0x1f` 是 index 超出 31 的話要 round 回開頭，因為一個 row 是 32 個 tiles) * 29 行透過 LCD Control (`0xFF40`) 的 bit 4(LSB = 0) 得知 tiles 是被儲存在 low bank 還是 high bank * 對於 tiles，索引 low bank 時 index 是 0~255，而索引 high bank 時則是 -128~127，因此需要根據兩者差距修正 `tile_num` * 則 34 行透過 `tile_num` 就可以得到 tiles 的位置了(每個 tile 是 16 bytes 所以 `<< 4`) * 38 行計算在 tile 中的 y 方向偏移 * 至此，`ptr_data` 指向的 byte 應為目標 pixel (以及後續連續的 pixels，因為每 2 個 byte 可以表示 8 個 pixels) * 39 行的 for 迴圈將此 byte 應該被畫出的 pixels 對應到 palettes 並寫到 SDL 的 buffer 上 :::warning 上方的程式碼有點亂，相較之下 [gameboy.h](https://github.com/sysprog21/gameboy-emu/blob/master/gameboy.h) 的 `__gb_draw_line` 就乾淨多了，也許之後可重寫/重構。 :notes: jserv ::: > :+1: 對照並理解程式後會再嘗試改進這部份! ```cpp=52 if (addr_sp[0xff40] & 0x20) { uint8_t wx = addr_sp[0xff4b] - 7; uint8_t wy = addr_sp[0xff4a]; y = addr_sp[0xff44]; // current line to update uint8_t *tile_map_ptr = addr_sp + ((addr_sp[0xff40] & 0x08) ? 0x9800 : 0x9c00) + (y - wy) / 8 * 32; uint8_t *tile_data_ptr = addr_sp + ((addr_sp[0xff40] & 0x10) ? 0x8000 : 0x9000); i = y * 160; ``` * `addr_sp[0xff40] & 0x20` 表示 window enable，因此要處理 window 的繪圖 * `y` 是要更新的 line，因此與前面的邏輯相同 `* 160` 移到到要更新的 SDL buffer 位置 `i` * 57 行算出 windows 在 y 方向 tiles map index 並調整 `tile_map_ptr` 至該 tiles map 位置 * `0xff4a` 是 Window Y Position，而 `0xff4b` 是 Window X Position minus 7，所以計算的方式要注意 :::info 我想這裡應該是寫錯了? 57 行符合 gameboy 開發文件的寫法應該是 ```cpp uint8_t *tile_map_ptr = addr_sp + ((addr_sp[0xff40] & 0x40) ? 0x9c00:0x9800) + (y - wy) / 8 * 32; ``` (對照 [gameboy.h 1199 行](https://github.com/sysprog21/gameboy-emu/blob/master/gameboy.h#L1199) 也是相似的邏輯) ::: > Confirmed. Please send pull request. * 根據 tiles 是放在 low bank 還是 high bank 調整 `tile_data_ptr` ```cpp=64 for (x = 0; x < 160; ++x) { if (x < wx || y < wy) continue; uint8_t *tile = tile_data_ptr + 16 * ((addr_sp[0xff40] & 0x10) ? tile_map_ptr[(x - wx) / 8] : (int8_t) tile_map_ptr[(x - wx) / 8]); tile += (y - wy) % 8 * 2; int col = ((*tile >> (7 - (x - wx) % 8)) & 1) + (((*(tile + 1) >> (7 - (x - wx) % 8)) & 1) << 1); buf[i + x] = pal_grey[(addr_sp[0xff47] >> (col << 1)) & 3]; } } ``` * 在 for 迴圈中對 scanline 的 pixel 逐個更新 * 更新邏輯應和 background 很接近，但程式整體簡潔很多，或許可以考慮用類似的架構重構 background 部份的程式碼(?) * 注意 70 和 71 的差距在於編碼所表示的是 0 ~ 255 還是 -128 ~ 127 ```cpp=81 /* TODO: prioritize sprite */ if ((addr_sp[0xff40] & 0x02) == 0) return; ``` * 如果 sprite display 不 enable，則 return ```cpp=84 bool sprite_8x16_mode = (bool) addr_sp[0xff40] & 0x04; y = addr_sp[0xff44]; for (int sprite = 0; sprite < 40; ++sprite) { int sposy = addr_sp[0xfe00 + 4 * sprite] - 16; int sposx = addr_sp[0xfe01 + 4 * sprite] - 8; /* TODO: support 8x16 sprites */ uint8_t flags = addr_sp[0xfe03 + 4 * sprite]; uint8_t tile_idx = sprite_8x16_mode ? ((flags & 0x40) ? addr_sp[0xfe02 + 4 * sprite] | 0x01 : addr_sp[0xfe02 + 4 * sprite] & ~0x01) : addr_sp[0xfe02 + 4 * sprite]; uint8_t obp = ((flags & 0x10) ? addr_sp[0xff49] : addr_sp[0xff48]); ``` * 84 行透過 LCDC 判斷 sprite 是 8x8 還是 8x16 * 從 `0xFE00` 到 `0xFE9F` 每 4 個 byte 表示一個 sprite，for 迴圈中對 OAM 的 sprites 一一處理 * 第一個 byte 是 y 位置、第二個 byte 是 x 位置 * -8、-16 是因為原座標的偏移，使得物件可以僅部份顯示在畫面上 * 91 行的 `flags` 為 sprites 的 attribute * 92 行則取出出 sprites 的 tile index * 97 行根據 attributes 選擇對應的 palette :::warning :warning: 8 x 16 的 tile 處理可能需要修正，暫不解釋 ::: ```cpp=99 if (sposy > y - 8 && sposy <= y) { /* sprite is displayed in a line */ for (x = 0; x < 8; ++x) { int px_x = ((flags & 0x20) ? 7 - x : x) + sposx; int px_y = ((flags & 0x40) ? 7 - y + sposy : y - sposy); int col = ((addr_sp[0x8000 + 16 * tile_idx + 2 * px_y] >> (7 - x)) & 1) + (((addr_sp[0x8001 + 16 * tile_idx + 2 * px_y] >> (7 - x)) & 1) << 1); if (col != 0 && px_x >= 0 && px_x < 160) { if (!(flags & 0x80) || buf[y * 160 + px_x] == pal_grey[0]) buf[y * 160 + px_x] = pal_grey[obp >> (col << 1) & 3]; } } } ``` * 如果 sprites 的 y 位置在要更新的 scanline 至 scanline - 8 範圍內，則 tiles 中存在要畫在 lcd 上的部份 * 在 for 迴圈中依序處理，`px_x`、`px_y` 根據是否翻轉算出正確的 pixel 位置 * `col` 取得該 pixel 的 palette index * 112 行確認 pixel 值不為 0(pixel = 0 被保留以表示透明)，`px_x` 符合應顯示之範圍 * 113 行確認 sprites 的 priority 超過 background，或者當背景是 color 0 (白色)時，更新 buffer ## 修正程式碼 ### 重寫 `render_back` 的 background 架構原本的程式碼中，background 的部份並不容易閱讀，而前面也提到了 window 的更新方式其實和 background 很類似，兩者主要的不同只是結合 LCD X/Y 與 SCROLL X/Y 或 WINDOW X/Y 的 index 方式。因此我參考了 window 的架構去改寫 background 的部份。 ```cpp uint8_t x; uint8_t y = addr_sp[0xff44]; if (addr_sp[0xff40] & 0x01) { uint8_t scy = addr_sp[0xff42]; uint8_t scx = addr_sp[0xff43]; uint8_t *tile_map_ptr = addr_sp + ((addr_sp[0xff40] & 0x8) ? 0x9c00 : 0x9800) + ((y + scy) % 256) / 8 * 32; uint8_t *tile_data_ptr = addr_sp + ((addr_sp[0xff40] & 0x10) ? 0x8000 : 0x9000); int i = y * 160; for (x = 0; x < 160; ++x) { uint8_t *tile = tile_data_ptr + 16 * ((addr_sp[0xff40] & 0x10) ? tile_map_ptr[(((x + scx) % 256) / 8)] : (int8_t) tile_map_ptr[(((x + scx) % 256) / 8) & 0x1f]); tile += (((y + scy) % 256) % 8 * 2); int col = ((*tile >> ((7 - (((x + scx) % 256)) % 8))) & 1) + (((*(tile + 1) >> ((7 - (((x + scx) % 256)) % 8))) & 1) << 1); buf[i + x] = pal_grey[(addr_sp[0xff47] >> (col << 1)) & 3]; } } ``` :::info 註: [koenk/gbc/lcd.c](https://github.com/koenk/gbc/blob/master/lcd.c#L103) 的 `lcd_render_current_line` 有我自己認為更好理解的架構，原本我是參考該架構改寫。不過後來我認為重構至和 window 類似已經足夠容易理解，且程式行數也比較簡潔，後來就不採用 koenk/gbc 的架構，也儘量減少可避免的程式碼更動。 ::: ### 重寫 `render_back` 的 sprite 更新完善符合 Game Boy 特性的實作，例如 8x16 mode 的支援、一個 scanline 只能畫出至多 10 個 object、優先權要根據 x 座標等。我用會使用到 8x16 sprites 的 ROM Wario Land II 來觀察，下面可以看到經此調整後的結果: | 修改前 | 修改後 | |:------------------------------------ | ------------------------------------ | | ![](https://i.imgur.com/k9P4B6y.png) | ![](https://i.imgur.com/MChWaNh.png) | | ![](https://i.imgur.com/JcwSdzD.gif) | ![](https://i.imgur.com/BYUobDd.gif) | | ![](https://i.imgur.com/8MVuPH2.gif) | ![](https://i.imgur.com/ZAfPF5L.gif) | 解釋修改後的程式: ```cpp struct __attribute__((__packed__)) OAMentry { uint8_t y; uint8_t x; uint8_t tile; uint8_t flags; }; ``` * 首先定義一個 `OAMentry`，這是為了方便我們保存下 OAM 相關的資訊 ```cpp struct OAMentry *objs[10]; int num_objs = 0; uint8_t obj_tile_height = addr_sp[0xff40] & 0x04 ? 16 : 8; for (int i = 0; i < 40; i++) { struct OAMentry *obj = (struct OAMentry *) (addr_sp + 0xfe00 + 4 * i); if (obj->y > 0 && obj->y < 160) { if (y >= obj->y - 16 && y < obj->y - 16 + obj_tile_height) { uint8_t pos = num_objs; while (pos > 0 && objs[pos - 1]->x <= obj->x) { if (pos < 10) { objs[pos] = objs[pos - 1]; } pos--; } objs[pos] = obj; num_objs++; } } if (num_objs >= 10) break; } ``` * 首先遍歷整個 OAM，將範圍符合的前 10 個 sprites 加入 `objs` * 並維持優先權低者在 index 比較低的位置 * 值得注意的是 `if (obj->y > 0 && obj->y < 160)` 的這個條件，`oam_x` 為 0 或者大於等於 168 時雖理論上不該被畫進螢幕上，但是仍被作為一個物件計算 > 此段的實作對應 [Pan Docs - Sprite Priorities and Conflicts](https://gbdev.io/pandocs/#sprite-priorities-and-conflicts) 對 sprite 處理的敘述: > > * To keep unused sprites from affecting onscreen sprites, set their Y coordinate to Y = 0 or Y >= 160 (144 + 16) (Note : Y <= 8 also works if sprite size is set to 8x8). Just setting the X coordinate to X = 0 or X >= 168 (160 + 8) on a sprite will hide it, but it will still affect other sprites sharing the same lines. > > * During each scanline's OAM scan, the LCD controller compares LY to each sprite's Y position to find the 10 sprites on that line that appear first in OAM. It discards the rest, allowing only 10 sprites to be displayed on any one line > > * When these 10 sprites overlap, the highest priority one will appear above all others, etc. (Thus, no Z-fighting.) In CGB mode, the first sprite in OAM ($FE00-$FE03) has the highest priority, and so on. In Non-CGB mode, the smaller the X coordinate, the higher the priority. The tie breaker (same X coordinates) is the same priority as in CGB mode. ```cpp for (int sprite = 0; sprite < num_objs; ++sprite) { int sposy = objs[sprite]->y - 16; int sposx = objs[sprite]->x - 8; uint8_t flags = objs[sprite]->flags; uint8_t tile_idx = (obj_tile_height == 16) ? ((flags & 0x40) ? objs[sprite]->tile | 0x01 : objs[sprite]->tile & ~0x01) : objs[sprite]->tile; uint8_t obp = ((flags & 0x10) ? addr_sp[0xff49] : addr_sp[0xff48]); if (sposy > y - obj_tile_height && sposy <= y) { /* sprite is displayed in a line */ for (x = 0; x < 8; ++x) { int px_x = ((flags & 0x20) ? 7 - x : x) + sposx; int px_y = ((flags & 0x40) ? obj_tile_height - 1 - y + sposy : y - sposy); int col = ((addr_sp[0x8000 + 16 * tile_idx + 2 * px_y] >> (7 - x)) & 1) + (((addr_sp[0x8001 + 16 * tile_idx + 2 * px_y] >> (7 - x)) & 1) << 1); if (col != 0 && px_x >= 0 && px_x < 160) { if (!(flags & 0x80) || buf[y * 160 + px_x] == pal_grey[0]) buf[y * 160 + px_x] = pal_grey[obp >> (col << 1) & 3]; } } } } ``` * 接著就根據我們所計算，將 `objs` 中的 sprites 一一顯示 * 整體的思路跟我們之前所解釋的相同，值得注意的是 8x16 mode 實作方法: 在 8 x 16 模式下，tile index 的最低 bit 會被忽略，此時第一個（偶數編號）圖塊構成物件的上半部分，第二個（奇數編號）圖塊則構成物件的下半部分 ### divider 的更新在 `update_ioregs` 中， diverer(`ff04`) 的更新在原本的程式碼中是未完成的部份。根據 [gbdev-Timer and Divider Registers](https://gbdev.gg8.se/wiki/articles/Timer_and_Divider_Registers)，其更新頻率為 16384 Hz，和 timer control(`ff07`) 的 bits 1-0 被設為 `11` 時相同，所以改動 51 行為: ```cpp=50 /* div-register 0xff04 */ if (state->inst_count > state->div_count + 64) { state->div_count = state->inst_count; mem[0xff04]++; } ``` ### LCDY 的更新在 `update_ioregs` 中，LDCY 的更新應有誤，根據 [gbdev-LCD Position and Scrolling](https://gbdev.gg8.se/wiki/articles/Video_Display#LCD_Position_and_Scrolling)，LCDY 之值介於 0 到 153 間，且值為 144 至 153 時是 VBLANK mode，因此修正第 67 行: ```cpp=59 // ly-register 0xff44 if (state->inst_count > state->ly_count + 114) { state->ly_count = state->inst_count; if (mem[0xff44] < 144) update_line(mem); mem[0xff44]++; mem[0xff44] %= 154; ``` ### 修正被錯誤滾動的 window 如上面 gif 示例，可以看到雲中會有詭異的小黑點出現。下面的 gif 可以更好的看出問題所在，可以看到上面視窗的硬幣 icon 底部隨著背景的捲動一起被移動了! ![](https://i.imgur.com/Naux5L0.gif =350x) 因此可以推理出是更新 LCDY 的時間和畫面的更新沒有對在一起，故修改兩者的發生順序如下圖展示修改後的結果: ```cpp=59 // ly-register 0xff44 if (state->inst_count > state->ly_count + 114) { state->ly_count = state->inst_count; mem[0xff44]++; mem[0xff44] %= 154; if (mem[0xff44] < 144) update_line(mem); ``` ![](https://i.imgur.com/mVirBo5.gif =350x) ## Make Some noise! 嘗試參考老師的另一個專案 [gameboy-emu](https://github.com/sysprog21/gameboy-emu)，讓 jitboy 也可以發出聲音。 ### `audio_init` 參考 [gameboy-emu/main.c](https://github.com/sysprog21/gameboy-emu/blob/master/main.c#L605) 605 行初始化 SDL 之聲音系統的方式。值得注意的是 gameboy-emu 的 `audio_mem` 是用一個全域的陣列去模擬，但是因為 jitboy 中原先已經存在模擬的記憶體區段(`gb_vm` -> `gb_memory` -> `mem`)，而 CPU 指令對記憶體的操作也會直接在上面反應。為了方便兩邊的整併，這裡的改動中是投機的將 `audio_mem` reference 到 `mem` 的相對位置。在 [gameboy-emu/apu.c: `audio_init`](https://github.com/sysprog21/gameboy-emu/blob/master/main.c#L605) 中，仔細觀察可以看到部份初始值應和 [Pan Docs: Power Up Sequence](https://gbdev.io/pandocs/#power-up-sequence) 所述相關，且事實上在 `init_vm` 已經初始化過。但其他初始值則尚未有其來源的頭緒。目前經嘗試即使不初始其他在 Power Up Sequence 未提及的 register 似乎也沒有問題(?)，這部份需要仔細思考。 ```cpp void audio_init(gb_memory *mem) { SDL_InitSubSystem(SDL_INIT_AUDIO); SDL_AudioDeviceID dev; SDL_AudioSpec want, have; want.freq = AUDIO_SAMPLE_RATE; want.format = AUDIO_F32SYS, want.channels = 2; want.samples = AUDIO_SAMPLES; want.callback = audio_callback; want.userdata = NULL; printf("Audio driver: %s\n", SDL_GetAudioDeviceName(0, 0)); if ((dev = SDL_OpenAudioDevice(NULL, 0, &want, &have, 0)) == 0) { printf("SDL could not open audio device: %s\n", SDL_GetError()); exit(EXIT_FAILURE); } /* reference */ audio_mem = mem->mem + 0xFF10; /* Initialize channels and samples */ memset(chans, 0, sizeof(chans)); chans[0].val = chans[1].val = -1; SDL_PauseAudioDevice(dev, 0); } ``` * 透過 [`SDL_InitSubSystem`](https://wiki.libsdl.org/SDL_InitSubSystem) 啟動 SDL 對 audio 的支援 * [`SDL_OpenAudioDevice`](https://wiki.libsdl.org/SDL_OpenAudioDevice) 接受 [`SDL_AudioSpec`](https://wiki.libsdl.org/SDL_AudioSpec) 結構的 `want`，用來要求相關的設定，回傳的 `have` 則是實際的設定(詳細請參閱連結說明) * `freq` 是每秒發送到 device 的採樣數量 * `format` 設為 `AUDIO_F32SYS` 表示以 32-bits 符點數紀錄，並以本地的 endian 方式儲存 * `channels` 是 output channels 之數量 * `samples` 是 sample frame buffer 的大小 * `callback` 是當 device 需要新的 sample 時就會再次呼叫的函數，此時我們應將 audio 的 frame buffer 做相應的更新 * 文件中提到 audio callback 大多情況下都會執行在另一個獨立的 thread，因此我們可能需要使用 [`SDL_LockAudioDevice`](https://wiki.libsdl.org/SDL_LockAudioDevice) 來避免 race condition，這是之後需要思考並修改之處 * 讓 `audio_mem` 做為全域變數 reference 到模擬的記憶體空間，方便後續的使用，而 `chans` 結構體也要記得做初始化 * 一旦準備好播放聲音，呼叫 `SDL_PauseAudioDevice(dev, 0);` callback function 就會開始不斷被呼叫 ### `audio_callback` 此部份參照 [gameboy-emu/apu.c: `audio_callback`](https://github.com/sysprog21/gameboy-emu/blob/master/apu.c#L329)。`channel_update` 類似於 gameboy-emu 中的 [`audio_write`]((https://github.com/sysprog21/gameboy-emu/blob/master/apu.c#L414))，但不重新把值寫回原位置，僅透過 register 和其值去更新 `chans`。 ```cpp /* SDL2 style audio callback function */ void audio_callback(void *userdata, uint8_t *restrict stream, int len) { float *samples = (float *) stream; /* Appease unused variable warning. */ (void) userdata; memset(stream, 0, len); for (uint_fast8_t i = 0; 0xFF10 + i < 0xFF40; ++i) channel_update(0xFF10 + i, audio_mem[i]); update_square(samples, 0); update_square(samples, 1); update_wave(samples); update_noise(samples); } ``` * 當 audio callback 被呼叫時，就呼叫 `channel_update`，根據 memory 中與聲音相關的 MMIO register 去調整 `chans`，再根據 `chans` 用 `update_***` 去設定 audio buffer * 我們可以預想當 `audio_callback` 被呼叫時，那些與聲音相關的 MMIO register 可能仍持續在變動，產生 race condition，這是否有可能是導致聲音效果不佳的原因之一? ### 修正錯誤的雜音在前面的更動中，`chans` 是在 `audio_callback` 中一次性被更新的，而當其中一個範圍內的 MMIO register 被更新時，就去調整 `chans`。顯然這是導致聲音效果不佳的最大原因! 而要調整此問題其實相當容易，捨棄在 `audio_callback` 中呼叫 `channel_update`，而是改在 memory.c 中的 `gb_memory_write` 增加一段 else if 分支: ```cpp ... else if(addr >= 0xff10 && addr <= 0xff3f) { /* audio update */ LOG_DEBUG("Memory write to %#" PRIx64 ", value is %#" PRIx64 "\n", addr, value); lock_audio_dev(); channel_update(addr, value); mem[addr] = value; unlock_audio_dev(); } ... ``` 前面有提到，`audio_callback` 會在獨立的 thread 中被運行，因此我們需要做好正確的 synchronization。可以透過 Thread Sanitizer 來檢查產生 data race 的地方並且予以修改。一個比較明顯會有 race 的地方是 main thread 中對 memory 的寫入與 `audio_callback` 對 memory 的讀取。解決方式很容易，就是用一對 `SDL_LockAudioDevice()` 和 `SDL_UnlockAudioDevice()` 去包圍 audio 的 memory 被寫入之處(也就是上面程式中 `lock_audio_dev()`、`unlock_audio_dev()` 的內部實作)。藉此，我們可以避免 main thread 在更新 memory 時 audio 的 thread 也同時在處理聲音。大功告成! 現在遊戲的音樂聽起來舒服多了 :musical_note: ! (注意需要搭配前面的 synchronization，才可以成功播出完整且清晰的聲音) :::warning 期待新的 pull request! :notes: jserv ::: #### 待解問題 - [ ] 名為 "PulseHotplug" 的 thread 產生的 data race? ## ROM header 的解析問題修改程式時，注意到 `dump_header_info` 好像存在與文件中有差異的地方。對照 [gbdev - The Cartridge Header](https://gbdev.gg8.se/wiki/articles/The_Cartridge_Header) 的說明，ROM size 和 RAM size 的對應與原始的計算方式存在一些差異，下面列出原始程式的計算方式及文件中所列的實際對應: * `0x148`: `rom size = 32 << mem->mem[0x148]` ![](https://i.imgur.com/a0S8iQI.png) * `0x149`: `ram size = mem->mem[0x149] > 0 ? 1 << (mem->mem[0x149] * 2 - 1) : 0` ![](https://i.imgur.com/cmMPLH7.png) 原始程式的式子寫法不能涵蓋所有的對應(因為有一些對應的規則比較特別)，因此調整如下: ```cpp void dump_header_info(gb_memory *mem) { printf("ROM information about file %s:\n", mem->filename); printf("+ Title: %s\n", mem->mem + 0x134); printf("+ Manufacturer: %s\n", mem->mem + 0x13f); printf("+ Cartridge type: %#2x\n", mem->mem[0x147]); int rom_size = mem->mem[0x148] >= 0x08 ? 1024 + (1 << (7 + (mem->mem[0x148] - 0x52))) : 32 << mem->mem[0x148]; printf("+ ROM size: %i KiB\n", rom_size); int ram_size = mem->mem[0x149] > 0 ? (mem->mem[0x149] < 5 ? 1 << (mem->mem[0x149] * 2 - 1) : 64) : 0; printf("+ RAM size: %i KiB\n", ram_size); printf("\n"); /* recording header info for some later usage */ mem->max_ram_banks_num = ram_size == 2 ? 1 : ram_size / 8; } ``` #### 待解問題 - [ ] 對照上表 ROM 和 RAM 大小的最大值，是否代表目前程式所允許的最大 ROM / RAM bank 數量在某些卡匣下有不足? (目前的參數是 `MAX_ROM_BANKS = 256`、`MAX_RAM_BANKS = 4`) ## 讀檔功能 ~~不能存檔的話是要怎麼玩 Pokémon~~ 關於存檔和讀檔功能的模擬，簡而言之就是當遊戲中出現存檔操作時，將模擬器中多個模擬的 RAM banks 寫成檔案，當遊戲重新開啟時則反之將其重新載回(卡匣的 header 會有 RAM banks 數量的資訊)。而這些 banks 會根據 MBC 模式對應在 `0xA000 - 0xBFFF`，則遊戲可以藉由操作此範圍的記憶體載回存檔。關於這些 banks 在不同 MBC 模式下的操作方式，實際對應已經在 `gb_memory_write` 有一系列的處理。因此如果實作上沒錯的話，只要把紀錄的檔案載入模擬的 RAM banks 中，理應就可以完成讀檔的功能了。為了驗證我的想法，因此我先實作讀檔功能而非存檔: 使用 Pokémon Red 在 [koenk/gbc](https://github.com/koenk/gbc) 專案上產生的紀錄檔進行讀取，如果符合預期，或許就可以放心的照著這個思路將存檔也實作出來。 ```cpp bool read_battery(char *savfile, gb_memory *mem) { bool ret = true; FILE *fp; fp = fopen(savfile, "rb"); if (!fp) { LOG_ERROR("Fail to open file %s\n", savfile); return false; } /* get file size and allocate size accordingly */ fseek(fp, 0, SEEK_END); size_t sz = ftell(fp) * sizeof(uint8_t); uint8_t *buf = malloc(sz); if (buf == NULL) { LOG_ERROR("Fail to allocate memory for file %s\n", savfile); ret = false; goto read_battery_end; } rewind(fp); size_t buf_size = fread(buf, sizeof(uint8_t), sz, fp); size_t ramsize = mem->max_ram_banks_num * 0x2000; if (buf_size != ramsize) { LOG_ERROR("Size mismatch between savfile and cartridge RAM\n"); ret = false; goto read_battery_end; } memcpy(mem->ram_banks, buf, buf_size); read_battery_end: fclose(fp); free(buf); return ret; } ``` 程式會預設紀錄檔的名字(`savfile`) 是 `ROM 名稱 + "sav"`，進行一系列的檢查後更新 `mem->ram_banks`。如下圖，因成功載入紀錄檔後出現 continue 選項，可以延續之前的存檔繼續遊戲! ![](https://i.imgur.com/8PhcZh8.png =300x) :::danger 回到前一章的待解問題，因為 Pokémon Red 的 RAM 大小剛好是 `MAX_RAM_BANKS = 4` 的總大小 32 KiB 所以沒問題。但如果沒理解錯的話，放在 RAM size 更大的遊戲目前的設計理應就會有錯 ::: ## 存檔功能確認讀檔的想法無誤後，嘗試實作存檔的功能。 ```cpp bool write_battery(char *savfile, gb_memory *mem) { gb_memory_ram_flush_back(mem); int ret = true; size_t ramsize = mem->max_ram_banks_num * 0x2000; FILE *fp = NULL; uint8_t *buf = malloc(ramsize); if (buf == NULL) { LOG_ERROR("Fail to allocate memory for file %s\n", savfile); ret = false; goto write_battery_end; } memcpy(buf, mem->ram_banks, ramsize); fp = fopen(savfile, "wb"); if (!fp) { LOG_ERROR("Failed to open file %s\n", savfile); ret = false; goto write_battery_end; } fwrite(buf, sizeof(uint8_t), ramsize, fp); write_battery_end: fclose(fp); free(buf); return ret; } ``` 大致上就是讀檔的反向操作，唯一要注意的是由於模擬 0xa000 到 0xbfff 記憶體區段的內容是 `ram_banks` 的副本(詳見 `gb_memory_change_ram_bank`)，因此一開始要先呼叫 `gb_memory_ram_flush_back` 將該段記憶體位置的內容先寫回 `ram_banks` 中對應的 bank 裡，再將 `ram_banks` 之內容寫成檔案。另一方面，`write_battery` 發生的時機點參考 [Sameboy/main.c](https://github.com/LIJI32/SameBoy/blob/7fc59b5cf4e27c05c239ce1d5c1ac083bf58e832/SDL/main.c#L442) 呼叫 `GB_save_battery` 的時機，基本是發生在模擬器正常結束時，因此如果程式非預期結束時會遺失存檔(例如用 `kill` 命令停止模擬器的運行，在 Samboy 上測試也確實如此)。像是 [gbc/emu.c](https://github.com/koenk/gbc/blob/44e4998776604f7dffd85a12294561f1f27e31a4/emu.c#L135) 的 `emu_step` 這樣在每次更新 frame 時額外去檢查存檔可以有效避免此狀況，不過因為大部份時候其實模擬器都是正常關閉的，這個作法反而可能導致太頻繁的不必要 IO，因此我選擇前者的存檔方式來實作。 ## 嘗試整合 [GBIT](https://github.com/koenk/gbit) 考慮到模擬器架構比較特別，為了可以整合 GBIT，有若干問題需要思考並依照更動。 ### Step CPU jitboy 中，`gbz80.c` 中的 `compile` 函式將 ROM 的執行以多道指令形成之 "block" 為單位(且必須終止於 jmp / call 系列指令)產生，而非一個 "instruction" 為單位。但是因為測試程式必須逐個 instruction 檢查實作的正確，為了滿足 GBIT 的假設，因此每次 `mycpu_step` callback 時，只好不依賴 `compile` 函式，另外將讓單個 instruction 產生並放到 block 中並執行。完整程式碼請參考: [RinHizakura/jitboy: test_cpu.c](https://github.com/RinHizakura/jitboy/blob/gbit/test_cpu.c#L161) 事實上這個更動和 `gbz80.c` 中的 compile 函式類似，但有幾個調整的細節: * 因為是測試程式，沒必要像真正執行卡匣時 cache 住產生過的 x86 instruction，並以 pc 來 index，重複使用 `vm->compiled_blocks[1][0]` 儲存即可 * 為了因應 gbit，我們的 block 不一定中止於 jmp / call，這個調整使 block 所回傳的不一定是正確的下個 pc，因此我們要獨立計算 pc 以通過測試... * ~~好啦我知道寫得很醜~~ * 因為我們沒有要 cache 住 jitcode 而是不斷的重用一個block，需要呼叫 `free_block(block)` 避免 leak ### Status flag jitboy 除了 subtract flag，對於 Game Boy 的其他 status flag 是沒有模擬的。在 jitboy 上的作法是: 當需要使用到 flag 時，直接拿 x86 的 status flag 對應的 status 來用。不過由於原始的 GBIT 程式會需要初始 status flag 或者檢查 instruction 執行後的 flag 狀態是否正確，因此我在這裡設計了一些投機的技巧。 ```cpp= /* 0xfc is an invalid opcode in Game Boy. I borrow it for status flag reset */ [0xfc] = {SET_F, NONE, NONE, 0, 0, 0, 0, 0, INST_FLAG_AFFECTS_CC}, /* 0xfd is an invalid opcode in Game Boy. I borrow it for status flag load_back */ [0xfd] = {LD_F, NONE, NONE, 0, 0, 0, 0, 0, INST_FLAG_USES_CC}, ``` 兩個在 Game Boy 中沒用到的 opcode 被我拿來偽裝成設定和讀取 host architecture status flag 的指令。除了第一個成員需設定成使 `emit` 可以對應到正確函式的值(`SET_F` 或 `LD_F`)，以及該指令需根據使用或影響 flag 去設定 `INST_FLAG_USES_CC` 或 `INST_FLAG_AFFECTS_CC` 外，其他都沒用到所以可以任意設定。接著，首先看到設定的部份，在 `mycpu_step` 開始會先產生設定 flag 的偽指令 ```cpp static int mycpu_step(void) { ... /* run the fake instruction for flag reset */ gbz80_inst *inst_0 = g_new(gbz80_inst, 1); *inst_0 = inst_table[0xfc]; inst_0->args = flag_args; inst_0->address = -1; instructions = g_list_prepend(instructions, inst_0); ``` `flag_args` 會在 `mycpu_set_state` 階段根據 GBIT 要求對應到 host architecture 的 status flag 樣貌，因此對應到的 jit code 生成如下: ```cpp static bool inst_set_flag(dasm_State **Dst, gbz80_inst *inst, uint64_t *cycles) { | push rax | mov ah, inst->args[0] | sahf | pop rax return true; } ``` 然後是讀出的部份 ```cpp static void load_flag(void) { gb_block *block = &vm->compiled_blocks[2][0]; GList *instructions = NULL; /* run the fake instruction for flag load */ gbz80_inst *inst_last = g_new(gbz80_inst, 1); *inst_last = inst_table[0xfd]; inst_last->args = NULL; inst_last->address = -1; instructions = g_list_prepend(instructions, inst_last); instructions = g_list_reverse(instructions); if (!optimize_cc(instructions)) { exit(1); } bool result = emit(block, instructions); g_list_free_full(instructions, g_free); if (result == false) { LOG_ERROR("Fail to compile instruction\n"); exit(1); } block->func(&vm->state); free_block(block); } static int mycpu_step(void) { ... load_flag(); return inst->cycles; } ``` 在執行完測試的指令後額外再去執行一個讀出 status flag 的偽指令。注意到這個偽指令不能放在和測試指令相同的 block 一起執行，因為如果測試指令在中途 `return` 的話，會沒辦法執行到讀出 flag 的指令。如下則是對應產生的 jit code，會透過 macro `restore_flag` 把 flag 狀態導回 GBIT 用來檢查 flag 的變數中。 ```cpp static bool inst_load_flag(dasm_State **Dst, gbz80_inst *inst, uint64_t *cycles) { | push rax | lahf | restore_flag r0 | pop rax return true; } ``` :::warning 這個實作假設 GBIT 每次檢查 state 時最多只會 step 一個 instruction，如果不是則可能會出錯! ::: ### 寫入記憶體為符合 GBIT 用來檢查記憶體操作的規定，在 `gb_memory_write` 需額外呼叫 `mymmu_write(addr, value)` 來滿足之。然而只進行上述的修改的話會發現這個作法會有問題。因為像是 INC / DEC 系列的 instruction 雖也有機會操作記憶體，但並不是透過 `gb_memory_write` ，而是直接對操作位置做 x86 的 inc / dec 指令。因此對於正確的 inc / dec 指令，gbit 可能會回報錯誤。為此，只好微調 GameBoy 的 INC / DEC 相關指令的實作: 將 INC / DEC 計算後的結果再次呼叫 DynASM 的 macro `write_byte` 寫回相同位置。 ### 讀出記憶體 ```cpp static void mycpu_set_state(struct state *state) { memset(vm->memory.mem, 0xaa, 0x10000); memcpy(vm->memory.mem, instruction_mem, instruction_mem_size); ...... ``` 因為某些指令會讀出記憶體，然而 jitboy 會直接透過 jit code 做讀取而不呼叫額外的函式，沒辦法直接用 `mymmu_read` 寫入 GBIT 的 mock mmu。因此，每次重新設定 state 時， jitboy 會從 `instruction_mem` 複製內容至自己的 memory 中以確保測試正確。 :::warning GBIT 中有提到不該把 `*instruction_mem` 內容複製到模擬器的 memory 中，而是複製該指標，因為每次 step 時其內容都會變更。假設每次 step 後，就會呼叫 `mycpu_set_state` 變更 `*instruction_mem`，那麼在 `mycpu_set_state` 用複製的方法更新 `vm->memory.mem`，雖然效率不佳，但應該還是可以滿足正確性。又目前我嘗試直接把 `vm->memory.mem` 指向 `*instruction_mem`，會出現不解的錯誤。因此暫時採複製方法，之後如果發現問題所在或者發現此方法有問題再嘗試改進。 ::: :::danger 可以的話其實希望是不要更動到原本 Game Boy 的相關程式檔案，只調整測試的程式來完成 GBIT 的測試，降低 Game Boy 模擬器程式本身和 GBIT 的耦合度。舉例來說可以檢查 `mycpu_step` 前後的記憶體變動來判斷是否有寫入記憶體，但是這個作法的問題是當寫入與原本該位置相同的值時會無法被判斷出來。所以可以見到為了確保測試可以儘量完整(不要因為 jitboy 架構上的不容易整合就跳過某些測試)，目前會需要更動到原本 Game Boy 的相關程式來整合 GBIT。也許再思考一下是否存在更佳的替代方案。 ::: ## 透過 GBIT 測試並改正錯誤則依前述改動 GBIT 程式並編譯成 shared object 之後，稍微更動 Makefile 檔將兩者整併在一起。(完整的整併參閱 [RinHizakura/jitboy: gbit](https://github.com/RinHizakura/jitboy/tree/gbit))。目前根據測試的結果逐一檢查是對測試的整合錯誤，還是確實在 jitboy 有實作錯誤中，並嘗試修正之。 ### 問題 (1) :negative_squared_cross_mark: 在 opcode 0x34 的指令 `INC (HL)` 中出現問題，當 `H = 0xff`、`L = 0xff` 時會出現 segmentation fault: ``` ==27779== Invalid read of size 8 ==27779== at 0xDF760A8: ??? ==27779== by 0xFEFF: ??? ==27779== by 0xFFFE: ??? ==27779== by 0x5E1303F: ??? ==27779== by 0xFFFFFF: ??? ==27779== Address 0x100ffff is in a rw- anonymous segment ==27779== ==27779== ==27779== Process terminating with default action of signal 11 (SIGSEGV) ==27779== Access not within mapped region at address 0x1010000 ==27779== at 0xDF760A8: ??? ==27779== by 0xFEFF: ??? ==27779== by 0xFFFE: ??? ==27779== by 0x5E1303F: ??? ==27779== by 0xFFFFFF: ??? ``` 看起來好像是操作到未 mapping 的範圍 0x1010000? 對照在 GBIT 測試中 mmap 的呼叫參數 ```cpp mem->mem = mmap((void *) 0x1000000, 0x10000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); ``` 但邏輯上不應該要操作到 0x1010000? 考慮下面的案例 ```cpp #include <stdint.h> #include <sys/mman.h> #include <stdio.h> int main () { uint8_t *mem = mmap((void *) 0x1000000, 0x10000, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); uint64_t *tmp = (uint64_t *)&mem[0xffff]; *tmp += 1; return 0; } ``` 這個案例在 valgrind 的測試下也會產生類似的錯誤，思考兩者的錯誤原因是否存在共通性: 1. 在 DynASM 下 `| inc byte [aMem + 0xffff]` 會不會是先取存取 64-bit 的 `[aMem + 0xffff]` 再轉換成 byte 而產生此錯誤? 2. 也就是說對 mmap 邊界的記憶體範圍做 INC / DEC 可能會產生問題? 但在 GameBoy 中，實際會對這些範圍的記憶體做 INC / DEC 嗎(例如 `0xFFFF` 的 Interrupt enable) * 如果不會，也許刻意將 mmap 範圍稍微增加: `mem->mem = mmap((void *) 0x1000000, 0x10008, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);` 是合理的通過 GBIT 方式 * 如果會，是否也要將模擬器的 mmap 範圍也稍微增加以避免此錯誤？ ### 問題 (2) :negative_squared_cross_mark: 在這裡有個還不確定的問題是，ROM 會不會透過 INC / DEC 去寫 0x8000 以下的位址(MBC 相關的操作)? 這個疑問是因為在 Pokémon Gold 中追蹤並測試發現以下問題，下面簡述指令的流程: * 首先 `CP` reg A 和 imm * 追蹤得知此時 reg A 為 0xac，imm 是 0x2 * 下個指令是 `JP/CALL` 的 `case CC_C` 對應的 x86 `jnc` 指令，也就是如果 carry 不被設置時跳至指定 label * 因為 0xac < 0x2，carry 不被設置，所以跳躍 * 跳躍後執行的指令是 `DEC (HL)` 此時的 HL 為 0x13d，因此會嘗試寫入到 mmap 成不可寫的位置，導致 segmentation fault 如果 ROM 是會透過 INC / DEC 操作 0x8000 以下的位址，可能需要額外的程式邏輯避免此狀況(但從相關的文件看起來似乎是沒有這種情境?)。如果錯誤原因為其他，則出錯的地方有若干個可能性，還需要審慎思考 ### 問題 (3) :negative_squared_cross_mark: 關於 0x08 的 `LD (nn),SP` 指令(對應到 jitboy 的 `LD16` 類型)，涉及將 stack 的值寫入記憶體的操作，但這裡直接 `| mov word [aMem + addr], SP;`，而不是透過和 `LD` 類型一樣的 `write_byte` 方式寫入，是否合理?(換句話說，`LD16` 寫入的位址是否都是確實需要直接被寫入記憶體中的，還是也有可能是需要額外處理的位址，例如聲音、MBC 相關?) * 假設合理，為了通過 GBIT 還是要類似前面處理 INC / DEC 系列指令的作法寫入 GBIT 的 mock mmu ### 問題 (4) :heavy_check_mark: `ADD HL,BC`(0x09) 根據 [Gameboy CPU (LR35902) instruction set](https://www.pastraiser.com/cpu/gameboy/gameboy_opcodes.html) 需要把 `f_subtract` 設回 0 ### 問題 (5) :heavy_check_mark: 雖然我們用 host architecture 的 flag 來模擬 Game Boy 的 flag，看似有一好處是對應的 x86 指令可以直接對 host architecture 的 flag 做更動，我們可以不必費心再去寫額外的程式調整 flag。不過事實上卻好像不是這樣，因為兩個看似對應的指令可能在 flag 更動尚有不同的規則。舉例來說，jitboy 中把 `RLCA`(0x07) 和 `RLC n`(prefix cb 的前幾個指令) 都對應到 x86 上的 `ROL`，然而根據 [x86 Instruction Set Reference: RCL/RCR/ROL/ROR](https://c9x.me/x86/html/file_module_x86_id_273.html) 上的敘述，`ROL` 並不影響 zero 和 adjust flag，但 GameBoy 上卻預期 adjust flag 要被設為 0，而 zero flag 則根據是 `RLCA` 或 `RLC n` 要設為 0 或者根據目標位置的結果值去調整! ![](https://i.imgur.com/TLaeivb.png) ![](https://i.imgur.com/T1W6DwU.png) 詳細的修改請參考我的 [GitHub(gbit 分支)](https://github.com/RinHizakura/jitboy/tree/gbit)。此外 `RRCA` 、 `RLA` 、 `RRA` 等 rotate 系列的指令也有類似的問題，用類似的方式修正即可。 ### 問題 (6) :heavy_check_mark: ![](https://i.imgur.com/hncnKJV.png) 延續前一個問題 `ADD HL, nn` 在 Game Boy 中不會影響 zero flag。除此之外，Half-carry 的更新在 Game Boy 和 jitboy 的實作方式出現了不一致。在 x86 下我們想使用 [Adjust flag](https://en.wikipedia.org/wiki/Adjust_flag) 對應 half-carry flag，其更新的方式為: > The Auxiliary flag is set (to 1) if during an "add" operation there is a carry from the low nibble (lowest four bits) to the high nibble (upper four bits), or a borrow from the high nibble to the low nibble, in the low-order 8-bit portion, during a subtraction. 然而，在 jitboy 上 16 bits 的加法雖然是通過 1 個指令去完成，但是 Game Boy 上卻是拆成兩個 8 bits 的加法。舉例來說，0x000f + 0x0001 這個運算，在 jitboy 上直接一個 add 完成，因為最後的 4 bits 0xf + 0x1 產生進位，所以 adjust flag 會被設置。可是在 Gameboy 上，卻是先拆成低位的 0x0f + 0x01，產生的 carry 再進位到高位的 0x00 + 0x00，所以 half-carry flag 不會被設置。簡單的說，GameBoy 上 16 bits 運算的 half-carry 是否設置是根據是否從第 11 位進位，而非從第 3 位進位。 > 可參考 [Game Boy: Half-carry flag and 16-bit instructions](https://stackoverflow.com/questions/57958631/game-boy-half-carry-flag-and-16-bit-instructions-especially-opcode-0xe8) 完整的改動請參考 [RinHizakura/jitboy: emit.dasc](https://github.com/RinHizakura/jitboy/blob/master/src/emit.dasc#L566)，assembly code 優先考慮正確性所以可能還可以改進得更漂亮，不過目前 GBIT 上測試並運行可以通過。 ### 問題 (7) :heavy_check_mark: 修正 DAA: * [GameBoy - Help With DAA instruction](https://forums.nesdev.com/viewtopic.php?t=15944) * [Z80 DAA instruction](https://stackoverflow.com/questions/8119577/z80-daa-instruction) * [SameBoy: daa](https://github.com/LIJI32/SameBoy/blob/c496797fce8169764e0c2d3ae15d367272589df7/Core/sm83_cpu.c#L667) ### 問題 (8) :heavy_check_mark: * 修正 CPL、CCF、SCF * 修正 AND * 修正 ADD SP * 修正 BIT ### 問題 (9) :negative_squared_cross_mark: RET、POP 在 gbit 的測試中，stack pointer 可以等於 0xFFFF，此時 POP 2 bytes 會錯誤。因為我們在 index 時沒有將加一做 rounding(換句話說，當 sp 為 0xFFFF 時，POP 2 個 bytes 要存取到的位置是 0xFFFF 和 0x0000，但原本的 jitboy 設計會存取 0xFFFF 和 0x10000，導致錯誤)。雖然可以修改以通過 GBIT，但重點是真實 GameBoy 的 stack pointer 會在 0xFFFF 嗎?有為此問題做修改的必要嗎? ### END 至此，所有指令都可以通過囉! 好耶! ![](https://i.imgur.com/HYYHokA.png) ![](https://i.imgur.com/iy3V18c.png) ## Reference ### Gameboy * [Gameboy Development Wiki.](https://gbdev.gg8.se/wiki/articles/Main_Page) * [Gameboy CPU (LR35902) instruction set](https://www.pastraiser.com/cpu/gameboy/gameboy_opcodes.html) * [awesome-gbdev](https://github.com/gbdev/awesome-gbdev) * [Pan Docs](https://gbdev.io/pandocs/) * [The Cycle-Accurate Game Boy Docs](https://github.com/AntonioND/giibiiadvance/blob/master/docs/TCAGBD.pdf) * [GameBoy CPU Manual](http://marc.rawer.de/Gameboy/Docs/GBCPUman.pdf) * [Game Boy: Complete Technical Reference](https://gekkio.fi/files/gb-docs/gbctr.pdf) ### DynASM * [The Unofficial DynASM Documentation](http://corsix.github.io/dynasm-doc/reference.html) * [JIT Compiler](https://github.com/ultraleap/Element/blob/main/LMNT/doc/JIT.md) ### x86_64 * [x86 and amd64 instruction reference](https://www.felixcloutier.com/x86/) * [Intel Instruction Set pages](https://web.itu.edu.tr/kesgin/mul06/intel/index.html) * [System V Application Binary Interface AMD64 Architecture Processor Supplement](https://www.uclibc.org/docs/psABI-x86_64.pdf) * [System V ABI - OSDev Wiki](https://wiki.osdev.org/System_V_ABI) ### SDL * [SDL Wiki](https://wiki.libsdl.org/FrontPage) ### GLib * [Glib: Memory Allocation](https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html) * [GLib: Doubly-Linked Lists](https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html)