2023q1 Homework3 (fibdrv)

--- tags: linux --- # 2023q1 Homework3 (fibdrv) contributed by < `kata1219` > ## 開發環境 ```bash! $ gcc --version gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04) $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 CPU max MHz: 5600.0000 CPU min MHz: 800.0000 BogoMIPS: 4992.00 Vendor ID: GenuineIntel CPU family: 6 Model: 151 Model name: 12th Gen Intel(R) Core(TM) i5-12400 Stepping: 2 BogoMIPS: 4991.99 Virtualization: VT-x L1d cache: 288 KiB L1i cache: 192 KiB L2 cache: 7.5 MiB L3 cache: 18 MiB ``` ## Fibonacci 運算費氏數列以義大利數學家費波那契(Leonardo Fibonacci)命名，他在他的著作中描述兔子生長的數目時用上了這數列，有以下規則： * 第一個月初有一對剛誕生的兔子 * 第二個月之後（第三個月初）牠們可以生育 * 月每對可生育的兔子會誕生下一對新兔子 * 兔子永不死去將規則整理後可得到以下的方程式 $${a}_{n+1} = {a}_n + {b}_n\\{b}_{n+1} = {a}_n$$ 方程式整理為矩陣的形式 $$\left(\begin{array}{c} {a}_{n+1} \\ {b}_{n+1} \\ \end{array}\right)=\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ \end{array}\right)\left(\begin{array}{c} {a}_{n} \\ {b}_{n} \\ \end{array}\right) $$ 可進一步整理 $$Q=\left(\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ \end{array}\right)=\left(\begin{array}{cc} F_2 & F_1 \\ F_1 & F_0 \\ \end{array}\right) \\ Q^n=\left(\begin{array}{cc} F_{n+1} & F_n \\ F_n & F_{n-1} \\ \end{array}\right) $$ ### Fast Doubling 從 $2n+1$ 和 $2n$ 的矩陣出發 $$\begin{split}\left[\begin{array}{c} F(2n+1) \\ F(2n) \\ \end{array}\right]&=\left[\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ \end{array}\right]^{2n}\left[\begin{array}{cc} F(1) \\ F(0) \\ \end{array}\right] \\ &= \left[\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ \end{array}\right]^{n}\left[\begin{array}{cc} 1 & 1 \\ 1 & 0 \\ \end{array}\right]^{n}\left[\begin{array}{cc} F(1) \\ F(0) \\ \end{array}\right] \\ &=\left[\begin{array}{cc} F(n+1) & F(n) \\ F(n) & F(n-1) \\ \end{array}\right] \left[\begin{array}{cc} F(n+1) & F(n) \\ F(n) & F(n-1) \\ \end{array}\right] \left[\begin{array}{c} 1 \\ 0 \\ \end{array}\right] \\ &=\left[\begin{array}{c} F(n+1)^2+F(n)^2 \\ F(n)F(n+1)+F(n-1)F(n) \\ \end{array}\right] \end{split} $$ 可得到 $$ F(2k)=F(k)[2F(k+1)-F(k)]\\ F(2k+1) = F(k+1)^2+F(k) $$ 時間複雜度由原本的 $O(n)$ 降至 $O(log\ n)$。 #### clz 應用於 Fast Doubling 由於二進制的特性，我們可以從左至右計算每一個 bit，當讀到 0 代表數值乘二，讀到 1 代表將數值乘二加一，將以下 pseudocode 應用於 fast doubling。 ``` while(bit) { if(bit == 0) fib(n) = fib(2n) else if(bit == 1) fib(n) = fib(2n+1) bit = next_bit } ``` 在本次作業中，輸入的數值為 long long，除非數字非常大，否則 n 轉換為二進制時，前面會多很多不需要計算的 0，透過 clz / clzll 可以跳過那些 0。 ## 大數運算使用大數運算計算 fibonacci 數，結構如下圖，bignum_head 紀錄 list 長度，bignum_node 紀錄 $10^{18}$ 以內的正整數，若 value 超過 $10^{18}$ 則新增一個 bignum_node。 ```graphviz digraph struct { rankdir=LR; node[shape=record, style=bold]; subgraph cluster_0 { node [shape=record]; head_link [label="{link}"]; len [label="{len}"]; style=bold; label=bignum_head } subgraph cluster_1{ node [shape=record]; node1_link [label="{link}"]; value1 [label="{value}"]; style=bold; label=bignum_node; } subgraph cluster_2{ node [shape=record]; node2_link [label="{link}"]; value2 [label="{value}"]; style=bold; label=bignum_node; } head_link->node1_link; node1_link->node2_link; node2_link->link; } ``` 計算 fib(100000) 的時間為 0.437s，結果與 [wolframalpha](https://www.wolframalpha.com/input?i=fib%2810000%29) 上相同。 ### 加速 Fast Doubling #### 第一版執行時間幾乎與 n 等比例成長，有**很大**的進步空間。 | n | bignum_fib | bignum_fast_doubling | | -------- | -------- | -------- | | 100000 | 0.437s | 0.288s | | 1000000 | 42.579s | 26.524s | 使用 perf 觀察到執行時間幾乎都花費在大數乘法運算上 ``` $ sudo perf record -g --call-graph dwarf ./bignum $ sudo perf report --stdio -g graph,0.5,caller # Children Self Command Shared Object Symbol # ........ ........ ....... .................... ........................... # 99.80% 0.00% bignum bignum [.] _start | ---_start __libc_start_main_impl (inlined) __libc_start_call_main (inlined) main bn_fast_doubling | --99.70%--bignum_mul ``` ## 實作 fibdrv.ko ### 回收記憶體 kernel module 與一般程式不同，在結束後不會[自動刪除記憶體](https://stackoverflow.com/questions/11657387/is-memory-allocated-by-kmalloc-ever-automatically-freed)，因此在使用 kernel 配置記憶體時需要特別小心。 * 使用 valgrind 檢查是否有未收回的記憶體，確認都完整收回後再開發 kernel module。 ``` $ valgrind --tool=memcheck ./bignum --leak-check=full gcc bignum.c -o bignum ==37880== Memcheck, a memory error detector ==37880== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==37880== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==37880== Command: ./bignum --leak-check=full ==37880== fib 100: 354224848179261915075 ==37880== ==37880== HEAP SUMMARY: ==37880== in use at exit: 0 bytes in 0 blocks ==37880== total heap usage: 82 allocs, 82 frees, 2,966 bytes allocated ==37880== ==37880== All heap blocks were freed -- no leaks are possible ==37880== ==37880== For lists of detected and suppressed errors, rerun with: -s ==37880== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ``` ### 驗證程式碼 * 將大數運算搬進 kernel 後，可以使用 `printk` 查看是否運算正確。 ``` 6,2372,9769804943,-;fib 93: 12200160415121876738 ``` > 可以使用 `sudo cat /dev/kmsg` 或 `sudo cat /proc/kmsg` 查看，實測後發現後者不能穩定輸出，常會有送出多次只收到一次的情形，根據 [difference between /proc/kmsg and /dev/kmsg](https://unix.stackexchange.com/questions/585919/what-is-the-difference-between-proc-kmsg-and-dev-kmsg) 推測可能是同時有多個程式在查看導致被 lock 住。 **更新:** * printk 執行時間容易受影響，改為使用 sysfs 回傳 log。 ### 回傳結果 & 驗證由於原本的方法是用 ssize_t 來回傳結果，但超過 fib(92) 的數都會超過 ssize_t 的大小，使用 `<linux/uaccess.h>` 中的 `copy_to_user` 將結果搬到 user space 中並回傳 copy_to_user 的大小，用來檢查是否成功回傳至 user space。 > 在使用 `copy_to_user(buf, output, len)` 時，原本是直接把 `read()` 函式中的 `size_t size` 當作 `copy_to_user` 的長度，結果發現它雖然可以正常印出 buffer 的大小，但無法轉換為 unsigned long，導致分配給 `copy_to_user` 的長度為 0 並引發以下錯誤 - Kernel memory exposure attempt detected from SLUB object 'kmalloc-8' (offset 0, size 8192)!。原本用來驗證的檔案超過 fib(92) 的數都固定為 fib(92)，將 `expected.txt` 及 `verify.py` 修改成最高計算到 fib(500) 並測試通過。 :::success Passed [-] ::: :::warning 善用作業說明提到的 `sysfs`，這樣可更容易解析 fibdrv 的運算結果。 :notes: jserv > 好的 ::: ### 時間測量與效能分析 #### 改寫 fib_write