2025q1 Homework3

# 2025q1 Homework 3 contributed by < `Brianpan` > {%hackmd NrmQUGbRQWemgwPfhzXj6g %} ## 目標 - [x] 縮減使用者和核心層級的通訊成本，並允許並行的對奕 - [x] 引入若干 coroutine (注意：不得使用 POSIX Thread，採取教材提到的實作方式)，分別對應到 AI1, AI2 (具備玩井字遊戲的人工智慧程式，採取不同的演算法，可指定運作於使用者/核心層級) 和鍵盤事件處理，意味著使用者可連續觀看多場「電腦 vs. 電腦」的對弈，當按下 Ctrl-P 時暫停或回復棋盤畫面更新 (但運算仍持續)、當按下 Ctrl-Q - [ ] 在「電腦 vs. 電腦」的對弈模式，比照前述 load average 的計算方式，規範針對下棋 AI 的 load 機制，並運用定點數來統計不同 AI 實作機制對系統負載的使用率，整合到畫面輸出，開發過程中應有對應的數學分析和羅列實作考量因素。 - [x] 對弈的過程中，要在螢幕顯示當下的時間 (含秒數)，並持續更新。 - [x] 原本 kxo 棋盤繪製在核心模式，你應該修改程式碼，使畫面呈現的部分全部在使用者層級，且善用 bitops，降低核心和使用者層級之間的通訊成本，應給予相關量化 - [ ] 自 jserv/ttt 移植 reinforcement learning (RL) 到 kxo 核心模組，使其得以動態切換。 ### 整理筆記 - coroutine: https://hackmd.io/@brianpan21/coroutine - 定點數與強化學習: https://hackmd.io/@brianpan21/fixed_points - device driver: https://hackmd.io/@brianpan21/drivers ## 縮減使用者和核心層級的通訊成本，並允許並行的對奕 [commit](https://github.com/Brianpan/kxo/commit/04dbf3f1e32114603927999b709d55f2aff9a07e) 為了縮減使用者和核心層級的通訊成本,必須減少draw_buffer使用量原始版本中是在main.c裡面畫好所有的棋盤, 主要流程如下： - timer_handler 透過timer每100ms執行一次 - timer handler中會排程game_tasklet - game_tasklet中會在條件attr_obj.display == '1'下排程workqueue drawboard_work - 在workqueue drawboard_work中把draw_buffer送至kfifo 在kxo_read的驅動裝置 - kxo_read被使用者層級的程式呼叫,透過kfifo_to_user傳送kfifo內資料到使用者層級的buf ```c! kfifo_to_user(&rx_fifo, buf, count, &read) ``` :::danger 考慮到棋盤的對稱性，縮小範圍 ::: 因此採取的實作方式是ttt中的簡易湊雜函式三種狀態：沒有, O, X 棋盤位置0是$3^{15}$, 1是$3^{14}$以此類推 :::danger 注意書寫規範：中文和英文字元之間要有空白字元 ::: 這樣使用32 bytes就可以傳送整個棋盤狀態 ```c! int table_to_hash(char *table) { int ret = 0; for (int i = 0; i < N_GRIDS; i++) { ret *= 3; if (table[i] == ' ') { ret += 0; } else if (table[i] == 'O') { ret += 1; } else if (table[i] == 'X') { ret += 2; } } return ret; } char *hash_to_table(int hash) { char *table = malloc(sizeof(char) * N_GRIDS); if (!table) return NULL; for (int i = N_GRIDS - 1; i >= 0; i--) { table[i] = " OX"[hash % 3]; hash /= 3; } return table; } ``` 這個實作可以用位元運算加速 [commit](https://github.com/Brianpan/kxo/commit/d9ec72163f5f6a953e9d17b0c56efab3e6149604) 和上面乘以三的差別只在改成位元左移兩位和使用&3取模數來還原棋盤 #### 允許並行的對奕理解是引入更多棋盤,在workqueue排程的時候必須區別哪個棋盤在給AI下棋要考慮多個使用者同時存取的情況,不同使用者不分享棋盤資訊或是有另一個考量情況：所有使用者在加入同時都在觀看棋局這樣會是一個單一生產者, 多個消費者問題 (像是apache kafka之類的messaging queue) **待實現** ## 引入若干 coroutine實現棋盤過程 [commit](https://github.com/Brianpan/kxo/commit/ddf002cb34cd0a121f0abeea8a5cce6592968398), [commit](https://github.com/Brianpan/kxo/commit/bf641a2b4393e292634062742cc5b34e1db87a5d) ~~這裡的實作是透過修改 kxo_attr 增加一個成員game_no表示進行多少場遊戲需要列印此外透過一個u64大小的整數去存取一場比賽的結果的方式最後ox-user去存取XO_DEVICE_ATTR_FILE時讀取結果~~ 這個實作發現有個bug,就是暫停過後會喪失部份對弈數據故採用參考同學的實作ioctl: [commit](https://github.com/Brianpan/kxo/commit/81a927fcd814abf01fb3f749af133cc0ee28c5c2) 這裡詳細的實作包含 - 經由circ_buf來將每一局的結果記錄 - ioctl把共有多少對弈局數和對弈結果傳到使用者端程式 - 使用者端透過print_game_moves, print_moves將結果印出 ### ioctl 多註冊一個kxo_ioctl函數 ```c= static const struct file_operations kxo_fops = { #if LINUX_VERSION_CODE < KERNEL_VERSION(6, 4, 0) .owner = THIS_MODULE, #endif .read = kxo_read, .llseek = no_llseek, .open = kxo_open, .release = kxo_release, .unlocked_ioctl = kxo_ioctl, }; ``` kxo_ioctl實作 ```c= static long kxo_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) { int ret; switch (cmd & 1) { case IOCTL_READ_SIZE: int game_no; write_lock(&attr_obj.lock); game_no = attr_obj.game_no; attr_obj.game_no = 0; write_unlock(&attr_obj.lock); return game_no; break; case IOCTL_READ_LIST: u64 game_move = fast_buf_get(); ret = copy_to_user((void*) arg, &game_move, sizeof(u64)); break; default: ret = -ENOTTY; } return ret; } ``` ### 對弈結果存取存取對弈結果我們使用u64的理由是剛好可以把4x4的棋盤結果放滿位元0-3是第一動位元4-7是第二動最後60-63位元有兩種可能 - 這局棋總共多少步 - 當走到只剩一格時是實際下的位置 :::danger 要考慮到核心模組在下棋過程中出現錯誤的狀況 ::: 因此我們要考慮0, 15兩個特殊情況 - 因為步數小於15一定保證至少兩個0會出現,所以60-63位元一定是代表真實的步數 - 15的情況要判斷前15格是不是有下到格子15 ```c= static void print_moves(uint64_t moves) { // left most 4 bits might be number of moves int move_no = (moves >> 60) & 0x0F; bool has_zero = false, has_fifteen = false; for (int m = 0; m < move_no; m++) { int step = (moves >> (m << 2)) & 0x0F; if (step == 0) has_zero = true; if (step == 15) has_fifteen = true; printf("%c%d", 'A' + GET_COL(step), 1 + GET_ROW(step)); if (m != move_no - 1) printf(" -> "); } // special case for whether 15 is included in first 15 moves // if not included, print 15 if (move_no == (N_GRIDS - 1) && !has_fifteen) { printf(" -> "); printf("%c%c", 'A' + GET_COL(15), '1' + GET_ROW(15)); } else { int step = (moves >> (move_no << 2)) & 0x0F; // has 2 zero means move_no is correct if (step == 0 && has_zero) return; int next_step = (moves >> ((move_no + 1) << 2)) & 0x0F; if (next_step == 0 && step == 0) return; // this case of full board case for (int i = move_no; i < N_GRIDS; i++) { int step = moves >> (i << 2) & 0x0F; printf(" -> "); printf("%c%c", 'A' + GET_COL(step), '1' + GET_ROW(step)); } } } ``` ## 對弈的過程中，要在螢幕顯示當下的時間 (含秒數)，並持續更新。 [commit](https://github.com/Brianpan/kxo/commit/52bb92ab9c3d1af2a9eb9d54dfe00a94ab32fb42) 這裡透過timer和SIGALRM<s>實現</s>列印時間同時設定好印出時間的是第一行讓寫入的空間不會被select和alrm_handler同時存取 ```c static void draw_time() { time_t now = time(NULL); struct tm *tm_now = localtime(&now); strftime(time_buf, sizeof(time_buf), "%Y-%m-%d %H:%M:%S", tm_now); printf("\x1b[0;0H"); printf("\x1b[0m"); printf("\x1b[1;37m"); printf("\x1b[40m"); printf(" %s ", time_buf); printf("\x1b[0m"); printf("\x1b[0;0H"); printf("\x1b[0m"); printf("\x1b[1;37m"); printf("\x1b[40m\n"); } static void alrm_handler(int signum, siginfo_t *info, void *ucontext) { printf("\033[1;1H\033[2K"); /* ASCII escape code to move to first line */ draw_time(); } static void timer_init() { struct sigaction sa = { .sa_handler = (void (*)(int)) alrm_handler, .sa_flags = SA_SIGINFO, }; sigfillset(&sa.sa_mask); sigaction(SIGALRM, &sa, NULL); } ``` ## 原本 kxo 棋盤繪製在核心模式，你應該修改程式碼，使畫面呈現的部分全部在使用者層級，且善用 bitops，降低核心和使用者層級之間的通訊成本，應給予相關量化 [commit](https://github.com/Brianpan/kxo/commit/04dbf3f1e32114603927999b709d55f2aff9a07e) 5/1號筆記 -> 考慮是否可以縮減成一個位元組去存取棋盤狀態 ## 程式效能評估 ### 考量因子 - cache (page cache) - IPI (Inter Processor Interrupt) - Page Faults - Interrupts - Rescheduling (指定<s>核心</s>) - CPU Feature (Intel hyperthreading) :::danger 注意書寫規範：中文和英文字元之間要有空白字元注意用語，區分「核心」(kernel) 和「核」(core) :::