2023q1 Homework3 (fibdrv)

--- tags: linux2023 --- # 2023q1 Homework3 (fibdrv) ## 開發環境 ```shell $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 9 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 5599.85 ``` 作業要求: [L04: fibdrv](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a) >基本觀念研讀筆記在[這邊](https://hackmd.io/@chiangkd/fibdrv-note) ## 自我檢查清單及作業要求 **自我檢查清單** - [ ] 研讀上述 [Linux 效能分析的提示](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c) 描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作 - [ ] 研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 學習指出針對大數運算的加速運算和縮減記憶體操作成本的舉措，紀錄你的認知和疑惑 - [ ] 注意到 fibdrv.c 存在著 DEFINE_MUTEX, mutex_trylock, mutex_init, mutex_unlock, mutex_destroy 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/Pthreads) 的程式碼來確認。 - 搭配閱讀〈[並行和多執行緒程式設計](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2FS1AMIFt0D)〉 ## 為什麼不該在虛擬機器中進行實驗 >[KVM](https://hackmd.io/@RinHizakura/SJpFJ0mfF#What-is-KVM)、[Hypervisor](https://en.wikipedia.org/wiki/Hypervisor)、[What Is a Hypervisor? Types, Use Cases, and Career Opportunities](https://www.coursera.org/articles/what-is-hypervisor) ### Hypervisor 又稱 Virtual machine monitor (VMM) ，用以建立與執行虛擬機器 (VM) 的軟、韌或硬體，可分為兩種形式如下圖 ![](https://i.imgur.com/42hyO5B.png) - **Type 1: native or bare-metal hypervisors** - 直接運行於硬體上 (也就是需要硬體 support)，Hypervisor 和 CPU 之間不存在額外的 OS layer - **優點:** - 直接存取記憶體 - 高效、安全、穩定 - **缺點:** - 需要一台獨立於主機的硬體來負責管理虛擬機的環境以及資源 - **TYPE 2: hosted hypervisors** - 運行於 OS 上，就像是平常使用到的應用程式一樣，所以必須在額外透過硬體的 OS 才能訪問記憶體 - **優點:** - 安裝快速，易於使用 (user-friendly) - **缺點:** - 效率差，因其是透過 OS 間接訪問硬體資源 ### Discussion 本次作業一大重點為效能分析，而間接記憶體存取 (type-2 hypervisor) 會大幅的影響效能，這也是為何不該使用虛擬機器進行本實驗。 ~~但 type-1 的 hypervisor 可以直接存取記憶體，或許本次作業的效能分析容許使用 type-1 hypervisor ? 例如透過 [Hyper-V](https://en.wikipedia.org/wiki/Hyper-V) 建立 Linux 環境進行?~~ - `perf` 會去呼叫 PMU 共用硬體下時間量測沒有參考價值，記憶體存取只是其中一個問題 ## 時間測量 >參照 [核心模式的時間測量](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c)、[Linux 核心設計: Timer 及其管理機制](https://hackmd.io/@sysprog/linux-timer) 如果只在 user space 做測量，那就只能測量進出 kernel 內部的時間，測量不到 kernel 內部，進出 kernel 其實有其他的成本開銷 (例如 [mode switch](https://www.ibm.com/docs/en/aix/7.1?topic=performance-mode-switching)) 參考 [The high-resolution timer API](https://lwn.net/Articles/167897/) 引入 `ktime_t` >[commit 218dfe3](https://github.com/chiangkd/fibdrv/commit/218dfe30187ca82ea12a8bf39015d28b60ca11be) ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { // return (ssize_t) fib_sequence(*offset); fib_kt = ktime_get(); /* multiple method under test */ ssize_t ret = fib_sequence(*offset); // different method implementation fib_kt = ktime_sub(ktime_get(), fib_kt); return ret; } ``` ![](https://i.imgur.com/t21CzDM.png) 由圖可見 fast doubling 的 iteration 版本比直觀實作的 iteration 版本快上許多，但 recursion 版本卻比原版慢。 ## 排除干擾因素 >參照[Linux 效能分析的提示](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA)、[KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU)、[研讀筆記](https://hackmd.io/@chiangkd/fibdrv-note#%E6%A0%B8%E5%BF%83%E6%A8%A1%E5%BC%8F%E7%9A%84%E6%99%82%E9%96%93%E6%B8%AC%E9%87%8F) ```shell $ cat /proc/cpuinfo | grep "^processor" | wc -l 8 ``` 修改 `/etc/default/grub` 使編號 `0` 的 CPU (用 system monitor 看的話是 CPU 8) 不被排程器干擾 ```shell $ taskset -cp 1 pid 1's current affinity list: 1-7 ``` ![](https://i.imgur.com/FVVP02R.png) - 可以看到 CPU 1 維持 0.0 % **將程式固定在特定的 CPU 中執行** ```shell sudo taskset -c 1 ./measure ``` 比對一下設定 processor affinity 前後差別 (以 normal interation 和 fast doubling iteration 舉例) 在關閉 [SMT (Hyper-Threading)](https://en.wikipedia.org/wiki/Simultaneous_multithreading) 時發現一關掉 CPU 5 至 CPU 8 就會停止運作，檢查了一下規格 ([Intel® Core™ i7-7700HQ](https://www.intel.com.tw/content/www/tw/zh/products/sku/97185/intel-core-i77700hq-processor-6m-cache-up-to-3-80-ghz/specifications.html))，發現**核心數量為 4、執行緒數量為 8** - 原來 system monitor 上顯示的 CPU 數量是執行緒的數量 **未指定 CPU、未[排除干擾效能分析的因素](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0)** ![](https://i.imgur.com/2X9gFCO.png) **指定 CPU、 [排除干擾效能分析的因素](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0)** ![](https://i.imgur.com/yf7DJzB.png) - 看不太出什麼差異 ## 加速 Fibonnaci 運算 ### Fast doubling >[commit 05fd783](https://github.com/chiangkd/fibdrv/commit/05fd783131a76bb0a8d716a53986ee36a28bb1e2) 參考[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E5%8A%A0%E9%80%9F-Fibonacci-%E9%81%8B%E7%AE%97)以及[硬體加速章節](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a#%E8%80%83%E6%85%AE%E5%88%B0%E7%A1%AC%E9%AB%94%E5%8A%A0%E9%80%9F-Fn-%E7%9A%84%E6%89%8B%E6%B3%95)，將 fast doubling 手法引入 $$ \begin{split} &F&(2k) = F(k)[2F(k+1) - F(k)] \\ &F&(2k+1) = F(k+1)^2+F(k)^2 \end{split} $$ 以計算 $F(11)=89$ 為例 ```graphviz strict digraph G { 1[label="F(11)"] 2[label="F(5)"] 3[label="F(6)"] 4[label="F(2)"] 5[label="F(3)"] 6[label=" " ,color=white] 7[label=" " ,color=white] 8[label=" " ,color=white] 9[label=" " ,color=white] 10[label="F(1)", style=filled] 11[label="F(2)", style=filled] {rank = same; 2;3;} {rank = same; 4;5;} {rank = same; 10;11;} 1 -> {2, 3} 2 -> {4, 5} 2 -> 3 [style=dashed; arrowhead=vee] 5 -> 3 [style=dashed; arrowhead=vee] 3 -> {6, 7} [color=white] 4 -> {8, 9} [color=white] 5 -> {10, 11} } ``` 每一次的迭代都會需要 $F(k)$ 以及 $F(k+1)$ 的值，其樹的深度 (不含 root) 為 $log_2^k$ 向下取整，$ceil(log_2^{11})=3$ 也就代表了樹的深度 (不含 root)，而觀察 `11` 轉換為 2 進制為 `0b1011` 其介於 `0b1000=8` 以及 `0b10000=16` 之間，其向下取整對應到 `0b1000` ，可透過型態長度減掉 leading bit 的數量再減 1 取得，也就是向下取整後 `0b1000` 的 tailing bit 數量，可對應到作業說明提供的範例中計算 `count` 的方式 ```c uint8_t count = 63 - __builtin_clzll(target); ``` ## 引入 Bignum - [研讀筆記](https://hackmd.io/@chiangkd/fibdrv-note#%E5%88%9D%E6%AD%A5%E6%94%AF%E6%8F%B4%E5%A4%A7%E6%95%B8%E9%81%8B%E7%AE%97) 引入[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E5%88%9D%E6%AD%A5%E6%94%AF%E6%8F%B4%E5%A4%A7%E6%95%B8%E9%81%8B%E7%AE%97)、[實作程式碼](https://github.com/KYG-yaya573142/fibdrv/blob/master/bn_kernel.c)中的 `bn` 結構體，使其能夠計算 $F(92)$ 之後的數值 - [commit 6ede812](https://github.com/chiangkd/fibdrv/commit/6ede812eb43eeada88f9f8ae23b1724935e47dfa) 為了驗證引入後是否能夠正確計算 Fibonacci number，必須修改原專案中的測試程式 ([筆記](https://hackmd.io/LSfQaad1SPqozp3E0dNjwg?view#Makefile)) [commit 9c8ea72](https://github.com/chiangkd/fibdrv/commit/9c8ea72dc9e0dcb99067d2b88db792ccc7cfb0b9) 包含以下修改 - 修改 `client.c` 以及 `fibdrv.c` 以計算至 $Fib(1000)$ 的值 - 修改 `expected.txt` 使其在與 `Makefile` 中的 `diff` 命令判斷時使用 - 修改 `verify.py` 使其驗證至第 1000 項 - 修改 `pre-commit.hook`: 新增 `--inline-suppr` 讓程式可以 inline suppressions (在 `client.c` `sz` 回傳 `copy_to_user` 不可被 copy 的值 but not use，後續應新增條件) ```shell $ make check Passed [-] ``` 且 `verify.py` 沒有顯示任何訊息 (代表通過) ```shell $ scriipts/verify.py <nothing> ``` ## 效能分析及改善大數運算 >[筆記](https://hackmd.io/@chiangkd/fibdrv-note#%E6%A0%B8%E5%BF%83%E6%A8%A1%E5%BC%8F%E7%9A%84%E6%99%82%E9%96%93%E6%B8%AC%E9%87%8F) 在 [measure.c](https://github.com/chiangkd/fibdrv/blob/bignum/src/measure.c) 使用 `clock_gettime` 量測 user-level 所花的時間，再透過 user-level 所花的時間減去 kernel-level 所花的時間 (`fobdrv` 中使用 `ktime`) 來計算 `copy_to_user` 所耗費的時間 ### bignum (iteration method) ![](https://i.imgur.com/jyxt8ih.png) - 量測計算到 $Fib(1000)$ 可以發現圖中有突然的凸起發生透過以下指令將 [Hyper-Threading]() 關閉後可以明顯改善 ```shell sudo sh -c "echo off > /sys/devices/system/cpu/smt/control" ``` ![](https://i.imgur.com/c1W5X5K.png) 在此我指定只有 CPU 0 可以執行我的程式，但在 Hyper-Threading 開啟的情況下 CPU 0 其實有兩個執行緒在同時運作 (這也是 intel 想要做到的，在 CPU 內部僅複製必要資源，讓兩個執行緒可以同時執行，以此在一個單位時間內處理兩個執行緒的工作以類比真正的雙核心)，故猜測未關閉 HT 的凸起可能是因為其 CPU 內部資源被另一條執行緒給搶用了如果我一邊播影片且不關閉 HT 測量會更明顯，如下圖 ![](https://i.imgur.com/dlJtWnh.png) 在計算 $Fib(100)$ 時可以比較明顯的看出階梯狀的情況(待分析) ![](https://i.imgur.com/CGKR2lS.png) ### bignum (fast-doubling within iteration) ![](https://i.imgur.com/TUuhJ1Q.png) 與正常 iteration 版本比較 ![](https://i.imgur.com/KYEUnBP.png) - 可以看到大約在 $Fib(500)$ 之前 iteration 版本是比較快的，而之後 fast doubling 比較快 (待分析) ### 改寫 `bn_fib_fdoubling` >[Reference](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E5%88%9D%E6%AD%A5%E6%94%AF%E6%8F%B4%E5%A4%A7%E6%95%B8%E9%81%8B%E7%AE%97) 參考資料提及的方案在直接引入時遇到一些 Bug，解決過程放在[筆記](https://hackmd.io/@chiangkd/fibdrv-note#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-1-%E6%94%B9%E5%AF%AB-bn_fib_fdoubling)中 ```c static void mode_select(void) { switch (FIBMODE) { case 1: fib_method = &bn_fib; break; case 2: fib_method = &bn_fib_fdoubling; break; case 3: fib_method = &bn_fib_fdoubling_v1; break; default: break; } } ``` 新增改善後函式，可透過編譯時定義巨集來指定演算法，以本節為例 ```shell make check FIBMODE=3 ``` ### 引入 [memory pool](https://en.wikipedia.org/wiki/Memory_pool) >[commit 4448082](https://github.com/chiangkd/fibdrv/commit/44480821ac0c7dfd8ead96327686b56155928a20) 在 `bn` 中定義新的 element 用以管理 `bn` 結構體的可用大小，以此來減少 `bn_resize` 呼叫 `krealloc` 的次數其中以 k 為單位配置更大的空間可以透過修改巨集 `ALLOC_CHUNK_SIZE` 達成。 ### 善用 [64 位元微處理器特性](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-4-%E5%96%84%E7%94%A8-64-%E4%BD%8D%E5%85%83%E5%BE%AE%E8%99%95%E7%90%86%E5%99%A8%E7%89%B9%E6%80%A7) >[commit 1740327](https://github.com/chiangkd/fibdrv/commit/174032747f6b8129429d90b4e81d8a9f8667a352) 引入根據處理器的位元數結構體 ```c typedef struct _bn { bn_data *number; bn_data size; bn_data capacity; /* total allocated length, size <= capacity */ int sign; } bn; ``` 其中在 64 位元處理器下 `bn_data` 定義為 `uint64_t`，而再進位時需要用到雙倍的空間所以使用 `gcc` 支援的 `__int128` 來定義 `bn_data_tmp`，並根據不同架構定義 `#define DIGITS` 的長度 (`32` or `64`) 來配合大數運算。而在 `bn_do_sub` 中原本的 `carry` 作為借位使用 `long long int` 定義，需為 128 位元有號數，故改為 `__int128`。而在大數運算的函式中也需根據資料型態的改變做調整，否則一個不小 overflow 或是進位條件不滿足導致 `bn` 沒有正確 `resize` ，都有可能至導致計算錯誤或者運算中直接被 `kill` 掉，也有可能直接當機。 >重開機了無數次... 測試計算到 $Fib(10000)$ 的效能 ![](https://i.imgur.com/SrdG6tB.png) - 可以看到 user level 的花費時間明顯比 kernel level 多的多，代表著 `kernel_to_user` 的成本很大 (包含 `copy_to_user` 隨著傳輸資料增加所耗費的時間) 把 `kernel level` 的值抓出來看 ![](https://i.imgur.com/jJ2kb4X.png) ### 改善 `bn_mult` >[Reference](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-5-%E6%94%B9%E5%AF%AB-bn_add-%E5%92%8C-bn_mult) 參考資料中提到， `bn_mult`原本為依序將兩個陣列相乘在將結果跌加至輸出變數，也就是在緣版本中，將兩個陣列相乘，放入 `carry` ，在將其作為引數，呼叫 `bn_mult_add` 更新 `c->number` 的值，而改善版本定義一個新的 function `_mult_partial` 將上述的兩個步驟縮減為一次做完。 ![](https://i.imgur.com/lpxYC0o.png) ### inline assembly >[Reference](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-6-%E5%85%A7%E5%B5%8C%E7%B5%84%E5%90%88%E8%AA%9E%E8%A8%80) 在 `x86-64` 的架構下 `rax` 和 `rdx` 是 64 biits 的暫存器，且 `mulq` 的作用是將兩個 64 位元無號整數相乘，當在執行 `mulq` 指令時，第一個引數為 `rax`，第二個為 `rdx`，在另執行後， `rdx` 暫存器會儲存高的 64 位元的結果，而 `rax` 暫存器會儲存低 64 位元的結果。 ![](https://i.imgur.com/cowUd6I.png) ### Fast doubling 過程中也去查找 hashtable >[使用 hashtable 紀錄以計算過的值](https://hackmd.io/@chiangkd/2023spring-fibdrv#%E4%BD%BF%E7%94%A8-hashtable-%E7%B4%80%E9%8C%84%E4%BB%A5%E8%A8%88%E7%AE%97%E9%81%8E%E7%9A%84%E5%80%BC) 查看目前為止最優化版本的 fast-doubling ```c= void bn_fib_fdoubling(bn *dest, unsigned int n) { ... for (unsigned int i = 1U << (30 - __builtin_clz(n)); i; i >>= 1) { /* F(2k-1) = F(k)^2 + F(k-1)^2 */ /* F(2k) = F(k) * [ 2 * F(k-1) + F(k) ] */ bn_lshift_2(f1, 1, k1); // k1 = 2 * F(k-1) bn_add(k1, f2, k1); // k1 = 2 * F(k-1) + F(k) bn_mult(k1, f2, k2); // k2 = k1 * f2 = F(2k) bn_mult(f2, f2, k1); // k1 = F(k)^2 bn_swap(f2, k2); // f2 <-> k2, f2 = F(2k) now bn_mult(f1, f1, k2); // k2 = F(k-1)^2 bn_add(k2, k1, f1); // f1 = k1 + k2 = F(2k-1) now if (n & i) { bn_swap(f1, f2); // f1 = F(2k), f2 = F(2k-1) bn_add(f1, f2, f2); // f2 = F(2k+1) } } ... } ``` `line 7` to `line13` 負責 fast-doubling 的計算，for-loop 結束後會將使用者 request 的 $Fib(k)$ 存放於 `f2` 指向的地址，以計算 $Fib(10)$ 為例，這個 for loop 一共會跑三次， - 第一圈 k=1, $F(2) =1$, $F(1)=1$ - 第二圈 k=2, $f(4) =3$, $F(3)=2$，此時因 `n=10 (0'b1010)`， `i=2 (0'b10)` - 此時結果為 `f1 = F(2k)=3`，且、 `f2=F(2k+1)` = $Fib(5)=5$ - 第三圈在第二圈的時候已經計算出 `f2` = $Fib(5)$，並且會在 `line 9` 時計算出 $Fib(10)$ 在 `line9` 以及其相關運算中是為了找到 `F(2k)`，在 `line13` 以及相關運算是為了找到 `F(2k-1)` ，在 `line16` 是為了找到 $F(2k+1)$ ，在計算前先進行查找 hash table 的動作，並且若不是從 table 中找到的，則將計算後的值放入 hashtable 中。實作一個 resursion 版本的 fast-doubling 計算，並在函式內新增判斷是否存在於 hashtable 的函式，若不存在，**進行計算並插入 hashtable** ```c static void bn_fib_fdoubling_vhash(bn *dest, unsigned int n) { bn_resize(dest, 1); if (n <= 2) { // Fib(0) = 0, Fib(1) = 1 dest->number[0] = !!n; return; } bn *f1 = NULL; bn *f2 = NULL; unsigned int key1 = n >> 1; unsigned int key2 = (n >> 1) + 1; if(is_in_ht(key1)) { /* F(2k) = F(k) * [ 2 * F(k-1) + F(k) ] */ f2 = hlist_entry(htable[key1].first, hdata_node, list)->data; // f2 = F(k) printk(KERN_DEBUG "f2: find offset %d value = %lld at p %p\n", (key1), f2->number[0], f2); } else { f2 = bn_alloc(1); t1_node = kcalloc(1, sizeof(hdata_node), GFP_KERNEL); if (t1_node == NULL) printk("kcalloc failed \n"); bn_fib_fdoubling_vhash(f2, key1); printk(KERN_DEBUG "f2: no find offset %d, push value %lld to hashtable %p \n", key1, f2->number[0], f2); t1_node->data = f2; INIT_HLIST_NODE(&t1_node->list); hlist_add_head(&t1_node->list, &htable[key1]); // add to hash table } if(is_in_ht(key2)) { /* F(2k-1) = F(k)^2 + F(k-1)^2 */ f1 = hlist_entry(htable[key2].first, hdata_node, list)->data; // f1 = F(k + 1) printk(KERN_DEBUG "f1: find offset %d value = %lld at p %p\n", key2, f1->number[0], f1); } else { f1 = bn_alloc(1); // f1 = F(k-1) t2_node = kcalloc(1, sizeof(hdata_node), GFP_KERNEL); if (t2_node == NULL) printk("kcalloc failed \n"); bn_fib_fdoubling_vhash(f1, key2); printk(KERN_DEBUG "f1: no find offset %d, push value %lld to hashtable %p\n", key2, f1->number[0], f1); t2_node->data = f1; INIT_HLIST_NODE(&t2_node->list); hlist_add_head(&t2_node->list, &htable[key2]); // add to hash table } /* here, f2 = F(k) and f1 = F(k + 1) */ bn *k1 = bn_alloc(1); bn *k2 = bn_alloc(1); /* F(2k) = F(k)[2F(k+1) - F(k)] */ bn_lshift_2(f1, 1, k1); // k1 = 2 * F(k+1) bn_sub(k1, f2, k1); // k1 = 2 * F(k+1) - F(k) bn_mult(k1, f2, k2); // k2 = F(2k) = F(k)[2F(k+1) - F(k)] /* F(2k+1) = F(k+1)^2 + F(k)^2 */ bn_mult(f2, f2, k1); // k1 = F(k)^2 bn_swap(dest, k2); // dest = F(2k) bn_mult(f1, f1, k2); // k2 = F(k+1)^2 bn_add(k2, k1, f1); // f1 = k1 + k2 = F(2k+1); if(n & 1) { bn_swap(f1, dest); // dest = F(2k+1) } // bn_free(f1); bn_free(k1); bn_free(k2); } ``` 但在驗證計算 $Fib(10)$ 卻出現了預期外的結果 ```shell= [11200.151182] f2: no find offset 2, push value 1 to hashtable 00000000c40f0c32 [11200.151188] f2: no find offset 1, push value 1 to hashtable 000000009a908a8e [11200.151191] f1: find offset 2 value = 1 at p 00000000c40f0c32 [11200.151195] f1: no find offset 3, push value 2 to hashtable 0000000012bc4a60 [11200.151197] f2: no find offset 5, push value 5 to hashtable 00000000d0b44af8 [11200.151199] f2: find offset 3 value = 3 at p 0000000012bc4a60 [11200.151201] f2: find offset 2 value = 1 at p 00000000c40f0c32 [11200.151202] f1: find offset 3 value = 3 at p 0000000012bc4a60 [11200.151204] f1: no find offset 4, push value 5 to hashtable 000000002a329d4d [11200.151206] f1: no find offset 6, push value 0 to hashtable 000000001a32073d ``` 首先，遞迴運作的順序正確，但是在 `line 4` 第一次將 $Fib(3)$ 的值 `2` 放入地址 `12bc4a60` 時在 `line 6` 被檢測到存在於 hashtable 中，從中取值時竟然與放入的值不同，這點導致計算出來的結果錯誤。若將本段程式碼有關 hahs table 插入的部份註解掉可以通過 $Fib(1000)$ 的 `make check` 檢測，推論在插入數值時並沒有按照預期的流程插入。 ==on going== ## Git Hooks and GitHub Action 首先在專案中透過 `install-git-hooks` 建立[軟連結](https://en.wikipedia.org/wiki/Symbolic_link) (會連結到隱藏資料夾中的 `.git/hooks` 下的檔案)，如此一來修改 `scripts` 裡面有連結的檔案就會像是在直接操作在 `.git/hooks` 中的目標。 - [Git Internals](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain#ch10-git-internals) / [Git Hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks#_git_hooks) 之前一直沒發現在 push 上去之後 github action 會 run fail ，GitHub Action 測試的內容寫在 `main.yml` 中，以本次作業來說， push 上去後執行 `make` 以及 `make check` - [#3](https://github.com/chiangkd/fibdrv/actions/runs/4461437635/jobs/7835264976) 中為沒有修改 `expected.txt` 以至於在測試時回報錯誤 - [#9](https://github.com/chiangkd/fibdrv/actions/runs/4521312193/jobs/7962988542) 中存在定義了函式卻未使用 - [#18](https://github.com/chiangkd/fibdrv/actions/runs/4644810084/jobs/8220296673) 中 `k_free()` implicit function declaration 是因為沒有 include 到 `slab` - [Linux 核心設計: 記憶體管理](https://hackmd.io/@sysprog/linux-memory#Linux-%E6%A0%B8%E5%BF%83%E8%A8%AD%E8%A8%88-%E8%A8%98%E6%86%B6%E9%AB%94%E7%AE%A1%E7%90%86) 修正完成後在 [#21](https://github.com/chiangkd/fibdrv/actions/runs/4645258496) 通過 GitHub Action 自動測試的檢查項目 ## 觀察 Linux 核心模組若沒用到 mutex，會發生什麼問題 >[Note](https://hackmd.io/@chiangkd/fibdrv-note#%E8%A7%80%E5%AF%9F-Linux-%E6%A0%B8%E5%BF%83%E6%A8%A1%E7%B5%84%E8%8B%A5%E6%B2%92%E7%94%A8%E5%88%B0-mutex%EF%BC%8C%E5%88%B0%E5%BA%95%E6%9C%83%E7%99%BC%E7%94%9F%E4%BB%80%E9%BA%BC%E5%95%8F%E9%A1%8C) 將原先 `client.c` 用另外的 function 包裝起來供 `pthread_create()` 使用，在程式運行時第一個執行緒會負責運行 `main()`，故在 `main()` 執行負責執行緒的 create and join。 >[commit fd26bb0](https://github.com/chiangkd/fibdrv/commit/fd26bb0d630ef488b82667a664a54905096690c7) 運行中為了查看執行緒 ID，可以使用 `pthread_self()` 獲得 POSIX 執行緒 ID，也可以使用 `syscall(__NR_gettid)` 系統呼叫獲得執行緒 ID，但這兩個值並不會一樣， POSIX thread ID 由 thread 來實作，而 `gettid` 回傳的執行緒 ID 則由 kernel 分配數字 >[What is the difference between pthread_self() and gettid()?](https://stackoverflow.com/questions/6372102/what-is-the-difference-between-pthread-self-and-gettid-which-one-should-i-u) 也可以使用封裝後的 `gettid()` 函式，但要記得 `#define __USE_GNU` 或 `#_GNU_SOURCE` ，因為 `gettid` 不是標準的 POSIX 函式，而是 GNU C library 提供的 extension 建立兩個 thread，並由 `main` function 負責 create 和 join，同時使用 `pthread_self` 觀察其輸出。 ```shell Create thread ID = 139658358945344 TID = 139658358945344, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 139658358945344, Reading from /dev/fibonacci at offset 1, returned the sequence 1. TID = 139658358945344, Reading from /dev/fibonacci at offset 2, returned the sequence 1. Create thread ID = 139658350552640 TID = 139658358945344, Reading from /dev/fibonacci at offset 3, returned the sequence 2. thread ID - 139658350552640 Failed to open character device TID = 139658358945344, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 139658358945344, Reading from /dev/fibonacci at offset 4, returned the sequence 3. ``` - 可以看到 main function 共建立兩個 thread，而第一個進入 `fibdrv` 的執行緒 `139658358945344` 拿到 mutex lock，導致在執行緒 `139658350552640` 進入時在 `fib_open` 這邊會透過 `mutex_trylock` 回傳 `EBUSY` (lock contension) 使其在測試程式呼叫 `exit(1)` 並結束，也可以用 `sudo dmesg` 查看訊息。 > [mutex_trylock](https://archive.kernel.org/oldlinux/htmldocs/kernel-locking/API-mutex-trylock.html): Try to acquire the mutex atomically. Returns 1 if the mutex has been acquired successfully, and 0 on contention. - 這邊要注意的是 kernel 中的 `mutex_trylock` 回傳值正數代表成功 (鎖成功取的)，而 0 代表失敗 - `pthread_mutex_trylock` 0 的代表成功，非 0 代表失敗這裡使用 `mutex_trylock` 而非 `mutex_lock` ，差別在於普通的 `lock` 操作該執行緒會一直等到鎖被釋放，若為 `try_lock` ，該執行緒就會**立即**返回一個失敗的值，可以避免等待，但是 `trylock` 的缺點在於當鎖被其他執行緒持有時，如果一直去嘗試獲取所，會增加 CPU 資源的消耗，但在專案中一旦` trylock` 偵測到鎖被持有時就會返回 `EBUSY` 並使得測試程式 `exit`。嘗試把 `exit(1)` 註解掉並測試 $Fib(10)$ ```shell Create thread ID = 140648573302336 Create thread ID = 140648564909632 TID = 140648573302336, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 140648573302336, Reading from /dev/fibonacci at offset 1, returned the sequence 1. thread ID - 140648564909632 Failed to open character device TID = 140648564909632, Reading from /dev/fibonacci at offset 0, returned the sequence . TID = 140648573302336, Reading from /dev/fibonacci at offset 2, returned the sequence 1. TID = 140648573302336, Reading from /dev/fibonacci at offset 3, returned the sequence 2. TID = 140648564909632, Reading from /dev/fibonacci at offset 1, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 2, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 3, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 4, returned the sequence . TID = 140648573302336, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 140648564909632, Reading from /dev/fibonacci at offset 5, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 6, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 7, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 8, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 9, returned the sequence . TID = 140648564909632, Reading from /dev/fibonacci at offset 10, returned the sequence . TID = 140648573302336, Reading from /dev/fibonacci at offset 5, returned the sequence 5. TID = 140648573302336, Reading from /dev/fibonacci at offset 6, returned the sequence 8. TID = 140648573302336, Reading from /dev/fibonacci at offset 7, returned the sequence 13. TID = 140648573302336, Reading from /dev/fibonacci at offset 8, returned the sequence 21. TID = 140648573302336, Reading from /dev/fibonacci at offset 9, returned the sequence 34. TID = 140648573302336, Reading from /dev/fibonacci at offset 10, returned the sequence 55. join! thread ID = 140648573302336 join! thread ID = 140648564909632 ``` - 同樣是兩個執行緒，不過在測試程式這邊沒有在偵測到已經被上鎖時直接 `exit`，可以看到的確只有第一個獲取 mutex lock 的執行緒 `140648573302336` 才有正確的返回值，而執行緒 `140648564909632` 為空。 - 也可以看到因為是使用 `trylock` ，所以在發現鎖被持有時不會一直等待，兩個執行緒是交替執行將 `mutex_trylock` 替換成 `mutex_lock` ```diff static int fib_open(struct inode *inode, struct file *file) { - if (!mutex_trylock(&fib_mutex)) { - printk(KERN_ALERT "fibdrv is in use"); - return -EBUSY; - } + mutex_lock(&fib_mutex); return 0; } ``` 輸出為 ```shell Create thread ID = 140546622355008 TID = 140546622355008, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 140546622355008, Reading from /dev/fibonacci at offset 1, returned the sequence 1. TID = 140546622355008, Reading from /dev/fibonacci at offset 2, returned the sequence 1. TID = 140546622355008, Reading from /dev/fibonacci at offset 3, returned the sequence 2. Create thread ID = 140546613962304 TID = 140546622355008, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 140546622355008, Reading from /dev/fibonacci at offset 5, returned the sequence 5. TID = 140546622355008, Reading from /dev/fibonacci at offset 6, returned the sequence 8. TID = 140546622355008, Reading from /dev/fibonacci at offset 7, returned the sequence 13. TID = 140546622355008, Reading from /dev/fibonacci at offset 8, returned the sequence 21. TID = 140546622355008, Reading from /dev/fibonacci at offset 9, returned the sequence 34. TID = 140546622355008, Reading from /dev/fibonacci at offset 10, returned the sequence 55. TID = 140546613962304, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 140546613962304, Reading from /dev/fibonacci at offset 1, returned the sequence 1. TID = 140546613962304, Reading from /dev/fibonacci at offset 2, returned the sequence 1. Join thread ID = 140546622355008 TID = 140546613962304, Reading from /dev/fibonacci at offset 3, returned the sequence 2. TID = 140546613962304, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 140546613962304, Reading from /dev/fibonacci at offset 5, returned the sequence 5. TID = 140546613962304, Reading from /dev/fibonacci at offset 6, returned the sequence 8. TID = 140546613962304, Reading from /dev/fibonacci at offset 7, returned the sequence 13. TID = 140546613962304, Reading from /dev/fibonacci at offset 8, returned the sequence 21. TID = 140546613962304, Reading from /dev/fibonacci at offset 9, returned the sequence 34. TID = 140546613962304, Reading from /dev/fibonacci at offset 10, returned the sequence 55. Join thread ID = 140546613962304 ``` - 可以看到雖然執行緒 `140546613962304` 在 `140546622355008` 運行時創建，但是因前者拿著鎖，故後者只能等待鎖的釋放，直到前者運算完 $Fib(10)$ 後將鎖釋放才換後者執行。直接將 `fibdrv.c` 中的 `try_lock` 以及 `unlock` 註解掉後兩個執行緒都可以正常運作，不會發生 lock contension ```shell Create thread ID = 140630458103360 TID = 140630458103360, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 140630458103360, Reading from /dev/fibonacci at offset 1, returned the sequence 1. TID = 140630458103360, Reading from /dev/fibonacci at offset 2, returned the sequence 1. Create thread ID = 140630449710656 TID = 140630458103360, Reading from /dev/fibonacci at offset 3, returned the sequence 2. TID = 140630458103360, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 140630458103360, Reading from /dev/fibonacci at offset 5, returned the sequence 5. TID = 140630458103360, Reading from /dev/fibonacci at offset 6, returned the sequence 8. TID = 140630458103360, Reading from /dev/fibonacci at offset 7, returned the sequence 13. TID = 140630449710656, Reading from /dev/fibonacci at offset 0, returned the sequence 0. TID = 140630449710656, Reading from /dev/fibonacci at offset 1, returned the sequence 1. TID = 140630458103360, Reading from /dev/fibonacci at offset 8, returned the sequence 21. TID = 140630458103360, Reading from /dev/fibonacci at offset 9, returned the sequence 34. TID = 140630458103360, Reading from /dev/fibonacci at offset 10, returned the sequence 55. TID = 140630449710656, Reading from /dev/fibonacci at offset 2, returned the sequence 1. TID = 140630449710656, Reading from /dev/fibonacci at offset 3, returned the sequence 2. TID = 140630449710656, Reading from /dev/fibonacci at offset 4, returned the sequence 3. TID = 140630449710656, Reading from /dev/fibonacci at offset 5, returned the sequence 5. TID = 140630449710656, Reading from /dev/fibonacci at offset 6, returned the sequence 8. TID = 140630449710656, Reading from /dev/fibonacci at offset 7, returned the sequence 13. TID = 140630449710656, Reading from /dev/fibonacci at offset 8, returned the sequence 21. Join thread ID = 140630458103360 TID = 140630449710656, Reading from /dev/fibonacci at offset 9, returned the sequence 34. TID = 140630449710656, Reading from /dev/fibonacci at offset 10, returned the sequence 55. Join thread ID = 140630449710656 ``` ## 使用 hashtable 紀錄以計算過的值 >[Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable) 預期引入 [Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable)，並以 $Fib(n)$ 中的 n 作為 key，紀錄以儲存過的值 ### 初步引入 hashtable (single thread) >[commit #0bcdb63](https://github.com/chiangkd/fibdrv/commit/0bcdb636a138d7fa92e17ddcddc40137f6c8fa78) 引入 `hlist` 系列結構體存取已經計算過的值 ```c struct hlist_node { struct hlist_node *next, **pprev; }; ``` ```graphviz digraph G { rankdir=LR; node[shape=record]; map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "]; node[shape=none] null1 [label=NULL] null2 [label=NULL] subgraph cluster_1 { style=filled; color=lightgrey; node [shape=record]; hn1 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 1" } /* subgraph cluster_2 { style=filled; color=lightgrey; node [shape=record]; hn2 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 2" } */ subgraph cluster_3 { style=filled; color=lightgrey; node [shape=record]; hn3 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 3" } map:ht1 -> hn1 hn1:next -> NULL // hn2:next -> null1 // hn2:prev:s -> hn1:next:s map:ht5 -> hn3 hn3:next -> null2 hn1:prev:s -> map:ht1 hn3:prev:s -> map:ht5 } ``` 另外，新增自定義結構 `hdata_node` 嵌入 `hlist_node` 並包含一個指向大數 `bn` 結構體指標，用以儲存 value 值。 ```c typedef struct _hdata_node { bn *data; struct hlist_node list; } hdata_node; ``` ```graphviz digraph G { rankdir=LR; node[shape=record]; map [label="hlist_head.first |<ht0> |<ht1> |<ht2> |<ht3> |<ht4> |<ht5> |<ht7> |<ht8> "]; node[shape=none] null1 [label=NULL] null2 [label=NULL] subgraph cluster_A { subgraph cluster_1 { style=filled; color=lightgrey; node [shape=record]; hn1 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 1" } subgraph cluster_bn { style=filled; color=yellow; node [shape=record] bn1 [label="{number|size|sign}"] label=bn } label="hdata_node" } subgraph cluster_3 { style=filled; color=lightgrey; node [shape=record]; hn3 [label="hlist_node | {<prev>pprev | <next>next}"]; label="hash_key 3" } map:ht1 -> hn1 hn1:next -> null1 // hn2:next -> null1 // hn2:prev:s -> hn1:next:s map:ht5 -> hn3 hn3:next -> null2 hn1:prev:s -> map:ht1 hn3:prev:s -> map:ht5 } ``` 直接將 $Fib(k)$ 中所要計算的第 $k$ 個 Fibonnaci number 作為 hashtable 的 key，value 則指向 `bn` 結構體 ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { char *p = NULL; size_t len = 0, left = 0; bn *fib = NULL; int key = (int) *offset; if (is_in_ht(offset)) { printk(KERN_DEBUG "find offset = %d\n", key); fib = hlist_entry(htable[key].first, hdata_node, list)->data; } else { fib = bn_alloc(1); dnode = kcalloc(1, sizeof(hdata_node), GFP_KERNEL); if (dnode == NULL) printk("kcalloc failed \n"); dnode->data = bn_alloc(1); mode_select(); fib_method(fib, *offset); INIT_HLIST_NODE(&dnode->list); bn_cpy(dnode->data, fib); hlist_add_head(&dnode->list, &htable[key]); // add to hash table } p = bn_to_string(fib); len = strlen(p) + 1; left = copy_to_user(buf, p, len); bn_free(fib); kfree(p); return left; } ``` 呼叫 `fib_read()` 時先判斷 hashtable 該 key 值是否有值，若有值則直接從 hashtable 中取用，直接取用第一個因為在設計上直接將 $k$ 作為 key 值，所以不會發生 collision，若沒有設計足夠大的 hashtable 就會發生 collision ，需要走訪該 key 值對應到的 list 的節點找出是否存在。 `is_in_ht()` ```c static int is_in_ht(loff_t *offset) { int key = (int) *(offset); if (hlist_empty(&htable[key])) { printk(KERN_DEBUG "No find in hash table\n"); return 0; /* no in hash table */ } return 1; } ``` - `hlist_empty` 該 key 值對應到的 list 是否有值。執行 `client` ，測試程式為先計算 $Fib(0)$ 至 $Fib(100)$，接著再反著計算回來，使用 `printk` 測試是否有正確運行 ```shell No find in hash table No find in hash table No find in hash table ... No find in hash table find offset = 1000 find offset = 999 ... find offset = 0 ``` ### 記憶體釋放 >[lkmpg 4.3 The __init and __exit Macros](https://sysprog21.github.io/lkmpg/#the-init-and-exit-macros) >The __exit macro causes the omission of the function when the module is built into the kernel, and like __init , has no effect for loadable modules. Again, if you consider when the cleanup function runs, this makes complete sense; built-in drivers do not need a cleanup function, while loadable modules do. 在使用 `rmmod` 卸載 `fibdrv` 模組時會呼叫帶有 `__exit` macro 的函式， kernel 會將這個函式放入 read-only 的 `__exit` section 中， - [linux/init.h](https://github.com/torvalds/linux/blob/master/include/linux/init.h) ```c #define __exit __section(".exit.text") __exitused __cold notrace ``` - `__section(".exit.text")`: 告訴編譯器使用 `.exit.text` section - `__exitused`: 告訴編譯器確保不會將還未使用的 `__exit` 函式從 executable file 中刪除 - `__cold`: 告訴編譯器 `__exit` 函式標記為不常用的函式，以進行更好的優化 - `notrace`: 不要在函式執行期間 trace 故需在帶有 `__exit` macro 的 `exit_fib_dev` 進行記憶體釋放走訪 hashtable 並釋放有 allocate 的記憶體，在 `fib_read` 中若在 hashtable 中找不到對應的值，則分配記憶體給 `hdata_node` 結構體以及其元素之一的 `bn` 以儲存大數，新增一個 function 來處理記憶體釋放。 ```c static void release_memory(void) { struct hlist_node *n = NULL; /* go through and free hashtable */ for(int i = 0; i < MAX_LENGTH; i++) { hlist_for_each_entry_safe(dnode, n, &htable[i], list) { bn_free(dnode->data); hlist_del(&dnode->list); kfree(dnode); } } } ``` 並在帶有 `__exit` macro 的 function 新增卸載核心模組的記憶體釋放操作。 ```diff static void __exit exit_fib_dev(void) { + release_memory(); mutex_destroy(&fib_mutex); device_destroy(fib_class, fib_dev); class_destroy(fib_class); cdev_del(fib_cdev); unregister_chrdev_region(fib_dev, 1); } ``` ### hashtable 大數測試測試計算 $Fib(0)$ 至 $Fib(10000)$，並從 $Fib(10000)$ 至 $Fib(0)$ 的時間 () ![](https://i.imgur.com/oZ7zoOW.png) - 圖中測量的 x 座標不是 $Fib(n)$ ，而是第 n 次的時間量測，x 座標 `10000` 至 `20000` 為由 $Fib(10000)$ 至 $Fib(0)$ 的時間測量在前 $10000$ 筆中計算 $Fib(0)$ 至 $Fib(10000)$ ，因 hash table 中都沒有值，所以和原本沒有實作 hash table 的趨勢一樣，而第 $10000$ 至 $20000$ 比測量中計算 $Fib(10000)$ 至 $Fib(0)$ ，因 hash table 中都已經有儲存計算過的值，所以直接取用即可將後半段的資料獨立出來看 (found it in hash table) ![](https://i.imgur.com/9wARxju.png) 可以看到雖然有高低起伏，但基本上已經是常數時間了，特別的突起原因不明，但因為數值 (y 座標) 很小，且在 `hlist_entry` 中的 `contain_of` 也是常數時間的函式，這邊特別突起的原因我認為只是量測的雜訊。 ### 初步進行多執行緒測試 >[commit 661e543](https://github.com/chiangkd/fibdrv/commit/661e54310e9622ce3302ddbbd50cbb14068829b7)、[commit da57624](https://github.com/chiangkd/fibdrv/commit/da57624289fe2c8681f4193d7ddf3402e90d648c) 前述提到在讀取時不會有 race condition 問題，但在執行 hash table 的寫入時須避免重複的寫入 (雖然以上面的設計來說不會有 collosion，但重複寫入顯得沒效率) 以及查找到值才做讀取，原程式的 `mutex_trylock` 置於 `fib_open` 避免多個執行緒同時 access 這個模組，一旦偵測到 `mutex` 被 lock 住則跳出並 `exit`，修改並允許多個執行緒 access 這個模組，並保護讀寫 hash table 的值，避免執行緒 A 判斷 hash table 沒有值並進行 fibonacci number 的計算，但是在存入 hash table 之前被執行緒 B 給搶斷，以至於執行緒 B 也判斷 hash table 中沒有值，並進行 fibonacci number 計算 ```c tatic ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { char *p = NULL; size_t len = 0, left = 0; bn *fib = NULL; int key = (int) *offset; /* critical section */ mutex_lock(&fib_mutex); if (is_in_ht(offset)) { printk(KERN_INFO "find offset = %d in thread %d\n", key, current->pid); fib = hlist_entry(htable[key].first, hdata_node, list)->data; } else { fib = bn_alloc(1); dnode = kcalloc(1, sizeof(hdata_node), GFP_KERNEL); if (dnode == NULL) printk("kcalloc failed \n"); dnode->data = bn_alloc(1); mode_select(); fib_method(fib, *offset); INIT_HLIST_NODE(&dnode->list); bn_cpy(dnode->data, fib); hlist_add_head(&dnode->list, &htable[key]); // add to hash table } mutex_unlock(&fib_mutex); p = bn_to_string(fib); len = strlen(p) + 1; left = copy_to_user(buf, p, len); bn_free(fib); kfree(p); return left; } ``` 在 kernel module 中引入 `sched.h` 並印出當前 process ID，參考輸出 ```shell= $ sudo dmesg [ 970.788604] No find offset = 0 in thread 6160 [ 970.788654] No find offset = 1 in thread 6160 [ 970.788668] No find offset = 2 in thread 6160 [ 970.788683] No find offset = 3 in thread 6160 [ 970.788696] No find offset = 4 in thread 6160 [ 970.788709] No find offset = 5 in thread 6160 [ 970.788724] No find offset = 6 in thread 6160 [ 970.788737] find offset = 0 in thread 6161 [ 970.788757] No find offset = 7 in thread 6160 [ 970.788778] No find offset = 8 in thread 6160 [ 970.788792] No find offset = 9 in thread 6160 [ 970.788836] find offset = 1 in thread 6161 [ 970.788866] find offset = 2 in thread 6161 [ 970.788896] find offset = 3 in thread 6161 [ 970.788922] find offset = 4 in thread 6161 [ 970.788960] find offset = 5 in thread 6161 [ 970.788988] No find offset = 10 in thread 6160 [ 970.789020] find offset = 6 in thread 6161 [ 970.789042] find offset = 7 in thread 6161 [ 970.789062] find offset = 8 in thread 6161 [ 970.789083] find offset = 9 in thread 6161 [ 970.789100] find offset = 10 in thread 6161 ``` - 在 `line 2` 至 `line 8` 由 `thread 6160` 先運行，因 hash table 中沒有值，所以會進行 fibonacci number 的運算並將計算後的結果放入 hashtable - 在 `line 9` 中第二個執行緒 `thread 6161` 開始運行並查找 `offset` 為 0 存在於 hashtable 中就直接取用，不必進行 fibonacci number 的運算。 ### 測試多執行緒運行 >**測試流程:** 掛載模組 $\rightarrow$ 建立 10 個執行緒並運算 $Fib(10)$ $\rightarrow$ 程式結束 $\rightarrow$ 在運行一次建立 10 個執行緒並運算 $\rightarrow$ 卸載模組 **第一次運行** ```shell $ sudo dmesg [ 4984.178783] No find offset = 0 in thread 14846 [ 4984.178831] No find offset = 1 in thread 14846 [ 4984.178848] No find offset = 2 in thread 14846 [ 4984.178865] No find offset = 3 in thread 14846 [ 4984.178876] find offset = 0 in thread 14847 [ 4984.178890] find offset = 0 in thread 14848 [ 4984.178904] No find offset = 4 in thread 14846 [ 4984.178925] find offset = 1 in thread 14847 [ 4984.178948] find offset = 2 in thread 14847 [ 4984.178963] No find offset = 5 in thread 14846 [ 4984.178975] find offset = 3 in thread 14847 [ 4984.178997] find offset = 4 in thread 14847 [ 4984.179012] find offset = 0 in thread 14849 [ 4984.179056] find offset = 5 in thread 14847 [ 4984.179070] No find offset = 6 in thread 14846 [ 4984.179080] find offset = 6 in thread 14847 [ 4984.179124] find offset = 1 in thread 14848 [ 4984.179150] find offset = 1 in thread 14849 [ 4984.179160] find offset = 2 in thread 14848 [ 4984.179173] find offset = 2 in thread 14849 [ 4984.179182] find offset = 0 in thread 14850 [ 4984.179188] find offset = 3 in thread 14848 [ 4984.179202] find offset = 3 in thread 14849 [ 4984.179220] find offset = 4 in thread 14848 [ 4984.179246] find offset = 5 in thread 14848 [ 4984.179275] No find offset = 7 in thread 14847 [ 4984.179301] No find offset = 8 in thread 14847 [ 4984.179364] find offset = 1 in thread 14850 [ 4984.179370] find offset = 0 in thread 14851 [ 4984.179391] find offset = 2 in thread 14850 [ 4984.179425] find offset = 3 in thread 14850 [ 4984.179444] find offset = 4 in thread 14850 [ 4984.179457] find offset = 0 in thread 14852 [ 4984.179479] find offset = 7 in thread 14846 [ 4984.179491] No find offset = 9 in thread 14847 [ 4984.179513] find offset = 8 in thread 14846 [ 4984.179539] find offset = 0 in thread 14853 [ 4984.179553] find offset = 1 in thread 14851 [ 4984.179596] find offset = 1 in thread 14853 [ 4984.179605] find offset = 0 in thread 14854 [ 4984.179641] find offset = 1 in thread 14852 [ 4984.179666] find offset = 6 in thread 14848 [ 4984.179682] find offset = 7 in thread 14848 [ 4984.179697] find offset = 8 in thread 14848 [ 4984.179709] find offset = 9 in thread 14848 [ 4984.179731] No find offset = 10 in thread 14848 [ 4984.179776] find offset = 10 in thread 14847 [ 4984.179793] find offset = 4 in thread 14849 [ 4984.179848] find offset = 2 in thread 14851 [ 4984.179868] find offset = 5 in thread 14850 [ 4984.179898] find offset = 6 in thread 14850 [ 4984.179921] find offset = 0 in thread 14855 [ 4984.179926] find offset = 7 in thread 14850 [ 4984.179943] find offset = 1 in thread 14855 [ 4984.179955] find offset = 2 in thread 14852 [ 4984.179982] find offset = 9 in thread 14846 [ 4984.179999] find offset = 5 in thread 14849 [ 4984.180005] find offset = 10 in thread 14846 [ 4984.180022] find offset = 3 in thread 14851 [ 4984.180045] find offset = 4 in thread 14851 [ 4984.180068] find offset = 5 in thread 14851 [ 4984.180098] find offset = 6 in thread 14851 [ 4984.180122] find offset = 7 in thread 14851 [ 4984.180144] find offset = 2 in thread 14855 [ 4984.180161] find offset = 3 in thread 14855 [ 4984.180179] find offset = 4 in thread 14855 [ 4984.180198] find offset = 5 in thread 14855 [ 4984.180217] find offset = 6 in thread 14855 [ 4984.180237] find offset = 7 in thread 14855 [ 4984.180254] find offset = 8 in thread 14855 [ 4984.180270] find offset = 9 in thread 14855 [ 4984.180285] find offset = 10 in thread 14855 [ 4984.180350] find offset = 6 in thread 14849 [ 4984.180372] find offset = 7 in thread 14849 [ 4984.180387] find offset = 8 in thread 14849 [ 4984.180406] find offset = 2 in thread 14853 [ 4984.180425] find offset = 3 in thread 14853 [ 4984.180442] find offset = 4 in thread 14853 [ 4984.180460] find offset = 5 in thread 14853 [ 4984.180475] find offset = 6 in thread 14853 [ 4984.180490] find offset = 7 in thread 14853 [ 4984.180506] find offset = 8 in thread 14853 [ 4984.180524] find offset = 9 in thread 14853 [ 4984.180540] find offset = 10 in thread 14853 [ 4984.180574] find offset = 8 in thread 14851 [ 4984.180592] find offset = 9 in thread 14851 [ 4984.180662] find offset = 1 in thread 14854 [ 4984.180684] find offset = 8 in thread 14850 [ 4984.180701] find offset = 9 in thread 14850 [ 4984.180716] find offset = 3 in thread 14852 [ 4984.180742] find offset = 4 in thread 14852 [ 4984.180763] find offset = 5 in thread 14852 [ 4984.180781] find offset = 10 in thread 14850 [ 4984.180799] find offset = 10 in thread 14851 [ 4984.180820] find offset = 9 in thread 14849 [ 4984.180844] find offset = 2 in thread 14854 [ 4984.180863] find offset = 3 in thread 14854 [ 4984.180879] find offset = 4 in thread 14854 [ 4984.180896] find offset = 5 in thread 14854 [ 4984.180918] find offset = 10 in thread 14849 [ 4984.181006] find offset = 6 in thread 14854 [ 4984.181041] find offset = 7 in thread 14854 [ 4984.181087] find offset = 6 in thread 14852 [ 4984.181105] find offset = 8 in thread 14854 [ 4984.181133] find offset = 9 in thread 14854 [ 4984.181158] find offset = 7 in thread 14852 [ 4984.181170] find offset = 10 in thread 14854 [ 4984.181192] find offset = 8 in thread 14852 [ 4984.181236] find offset = 9 in thread 14852 [ 4984.181257] find offset = 10 in thread 14852 ``` - 初次運算的值會顯示 `Not find offset` 再次運算則會找到並直接取用 **第二次運行** ```shell $ sudo dmesg [ 5101.955518] find offset = 0 in thread 15029 [ 5101.955606] find offset = 1 in thread 15029 [ 5101.955623] find offset = 2 in thread 15029 [ 5101.955630] find offset = 0 in thread 15030 [ 5101.955647] find offset = 0 in thread 15031 [ 5101.955684] find offset = 1 in thread 15030 [ 5101.955736] find offset = 0 in thread 15032 [ 5101.955753] find offset = 1 in thread 15031 [ 5101.955787] find offset = 0 in thread 15033 [ 5101.955805] find offset = 1 in thread 15032 [ 5101.955843] find offset = 1 in thread 15033 [ 5101.955901] find offset = 0 in thread 15034 [ 5101.955948] find offset = 0 in thread 15035 [ 5101.955956] find offset = 2 in thread 15031 [ 5101.955976] find offset = 1 in thread 15035 [ 5101.955989] find offset = 3 in thread 15031 [ 5101.955998] find offset = 2 in thread 15035 [ 5101.956016] find offset = 4 in thread 15031 [ 5101.956166] find offset = 1 in thread 15034 [ 5101.956250] find offset = 0 in thread 15036 [ 5101.956267] find offset = 0 in thread 15037 [ 5101.956296] find offset = 2 in thread 15030 [ 5101.956316] find offset = 1 in thread 15037 [ 5101.956337] find offset = 3 in thread 15030 [ 5101.956342] find offset = 2 in thread 15037 [ 5101.956383] find offset = 4 in thread 15030 [ 5101.956406] find offset = 2 in thread 15033 [ 5101.956428] find offset = 0 in thread 15038 [ 5101.956445] find offset = 5 in thread 15031 [ 5101.956454] find offset = 2 in thread 15034 [ 5101.956489] find offset = 1 in thread 15038 [ 5101.956567] find offset = 1 in thread 15036 [ 5101.956595] find offset = 3 in thread 15037 [ 5101.956619] find offset = 4 in thread 15037 [ 5101.956640] find offset = 5 in thread 15037 [ 5101.956660] find offset = 6 in thread 15037 [ 5101.956680] find offset = 7 in thread 15037 [ 5101.956727] find offset = 5 in thread 15030 [ 5101.956745] find offset = 3 in thread 15033 [ 5101.956768] find offset = 6 in thread 15030 [ 5101.956795] find offset = 7 in thread 15030 [ 5101.956819] find offset = 3 in thread 15034 [ 5101.956842] find offset = 4 in thread 15034 [ 5101.956861] find offset = 5 in thread 15034 [ 5101.956881] find offset = 6 in thread 15034 [ 5101.956898] find offset = 7 in thread 15034 [ 5101.956946] find offset = 3 in thread 15029 [ 5101.956969] find offset = 2 in thread 15032 [ 5101.956978] find offset = 4 in thread 15029 [ 5101.956990] find offset = 3 in thread 15032 [ 5101.957004] find offset = 5 in thread 15029 [ 5101.957012] find offset = 4 in thread 15032 [ 5101.957029] find offset = 6 in thread 15029 [ 5101.957054] find offset = 7 in thread 15029 [ 5101.957077] find offset = 8 in thread 15029 [ 5101.957092] find offset = 9 in thread 15029 [ 5101.957107] find offset = 6 in thread 15031 [ 5101.957128] find offset = 7 in thread 15031 [ 5101.957142] find offset = 8 in thread 15031 [ 5101.957162] find offset = 2 in thread 15038 [ 5101.957186] find offset = 3 in thread 15038 [ 5101.957223] find offset = 2 in thread 15036 [ 5101.957256] find offset = 3 in thread 15035 [ 5101.957280] find offset = 5 in thread 15032 [ 5101.957306] find offset = 6 in thread 15032 [ 5101.957330] find offset = 4 in thread 15033 [ 5101.957347] find offset = 5 in thread 15033 [ 5101.957371] find offset = 8 in thread 15030 [ 5101.957397] find offset = 9 in thread 15031 [ 5101.957425] find offset = 8 in thread 15034 [ 5101.957449] find offset = 4 in thread 15038 [ 5101.957467] find offset = 3 in thread 15036 [ 5101.957484] find offset = 4 in thread 15036 [ 5101.957501] find offset = 5 in thread 15036 [ 5101.957515] find offset = 8 in thread 15037 [ 5101.957528] find offset = 7 in thread 15032 [ 5101.957536] find offset = 9 in thread 15037 [ 5101.957555] find offset = 6 in thread 15033 [ 5101.957579] find offset = 9 in thread 15030 [ 5101.957601] find offset = 10 in thread 15030 [ 5101.957651] find offset = 5 in thread 15038 [ 5101.957667] find offset = 4 in thread 15035 [ 5101.957688] find offset = 6 in thread 15036 [ 5101.957717] find offset = 10 in thread 15029 [ 5101.957759] find offset = 8 in thread 15032 [ 5101.957781] find offset = 10 in thread 15037 [ 5101.957793] find offset = 9 in thread 15032 [ 5101.957802] find offset = 7 in thread 15033 [ 5101.957823] find offset = 10 in thread 15031 [ 5101.957831] find offset = 9 in thread 15034 [ 5101.957879] find offset = 7 in thread 15036 [ 5101.957896] find offset = 8 in thread 15036 [ 5101.957921] find offset = 10 in thread 15032 [ 5101.957950] find offset = 8 in thread 15033 [ 5101.957976] find offset = 9 in thread 15033 [ 5101.957992] find offset = 5 in thread 15035 [ 5101.958001] find offset = 10 in thread 15033 [ 5101.958012] find offset = 9 in thread 15036 [ 5101.958055] find offset = 6 in thread 15038 [ 5101.958125] find offset = 10 in thread 15034 [ 5101.958238] find offset = 10 in thread 15036 [ 5101.958287] find offset = 7 in thread 15038 [ 5101.958307] find offset = 6 in thread 15035 [ 5101.958325] find offset = 7 in thread 15035 [ 5101.958342] find offset = 8 in thread 15035 [ 5101.958358] find offset = 9 in thread 15035 [ 5101.958378] find offset = 10 in thread 15035 [ 5101.958415] find offset = 8 in thread 15038 [ 5101.958433] find offset = 9 in thread 15038 [ 5101.958449] find offset = 10 in thread 15038 ``` - 在第一次運行時，已建立好 hashtable，第二次運行時都可以直接從 hashtable 取用。 **卸載模組** 成功卸載沒有運行中的 thread 阻礙卸載。 ### 測試多執行緒下的效能 ```shell make FIBMODE=4 NOHASH=1 // fibonacci series algorithm no.4 w/o hash table implementation make FIBMODE=4 // fibonacci series algorithm no.4 with hash table implementation ``` 測試案例為 10 個執行緒，各自計算至 $Fib(1000)$ 的值，且用 `pthread_mutex_lock` 及 `pthread_mutex_unlock` 保護 user level 的時間量測，其中 kernel level 計算時間的範圍為 critical section 的範圍，若為有實作 hash table 版本則包含找到 `key` 的 `hlist_entry` 找資料、沒找到 `key` 的節點空間分配、計算 Fibonacci number、初始化節點以及將節點插入 hash table。 ![](https://i.imgur.com/LPJP7cC.png) - 10 個執行緒各自 1000 筆資料，故總共有 10000 筆而未實作 hash table 版本的效能如下 ![](https://i.imgur.com/MNREGq4.png) 兩者 user level 計算 fibonacci number 的比較 ![](https://i.imgur.com/x9UTgGm.png) - 由圖上明顯可見實作 hash table 的版本在多執行緒的情況下表現明顯比較優秀，因為真正需要進行計算的只有第一個計算到的值兩者 kernel level 計算 fibonacci number 的比較 ![](https://i.imgur.com/uGERuqg.png) 由圖上可以看到 10 個執行緒分別計算至 $Fib(1000)$ 的條件下，有實作 hashtable 的表現明顯比沒有實作得來的好，在有實作 hashtable 的程式中 (紫色線段)，只要計算過且讓 hashtable 儲存後直接取用的話即為常數時間，而若該值沒有被計算過則花費的時間會和沒有實作 hashtable 的差不多 (綠色線段)，即都是要計算 Fibnacci number。 - 紫色線段的後半段比較多 hash table 沒有找到的情況，**推測**是因為在計算 $Fib(n)$ ， $n$ 值小的情況下計算的很快，在沒有被其他執行緒搶掉的情況下可以運算比較多組 Fibonacci number，數字比較大的 Fibonacci number 都集中在後半段運算。 ## Reference - [What Is a Hypervisor? Types, Use Cases, and Career Opportunities](https://www.coursera.org/articles/what-is-hypervisor)