# 2022q1 Homework3 (fibdrv) contributed by < [2020leon](https://github.com/2020leon) > ## [作業要求](https://hackmd.io/@sysprog/linux2022-fibdrv) ### 自我檢查清單 - [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 研讀 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU),指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措? - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢? > 搭配閱讀 [The Linux driver implementer’s API guide » Driver Basics](https://www.kernel.org/doc/html/latest/driver-api/basics.html) - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 #### 作業要求 - [ ] 回答上述==自我檢查清單==的所有問題,需要附上對應的參考資料和必要的程式碼,以第一手材料 (包含自己設計的實驗) 為佳 :::warning :warning: 如果你在 2022 年 3 月 1 日前,已從 GitHub [sysprog21/fibdrv](https://github.com/sysprog21/fibdrv) 進行 fork,請依據 ==[Alternatives to forking into the same account](https://github.community/t5/Support-Protips/Alternatives-to-forking-into-the-same-account/ba-p/7428)== 一文,對舊的 repository 做對應處置,然後重新 fork ::: - [ ] 在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv),目標是修正 Fibonacci 數計算的正確性 (現有實作存在缺陷,請指出),隨後改善 `fibdrv` 計算 Fibinacci 數列的執行效率,過程中需要量化執行時間,可在 Linux 核心模組和使用層級去測量 - 原本的程式碼只能列出到 $Fibonacci(100)$ 而且==部分輸出是錯誤的數值==,請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用) - 務必研讀上述 ==Linux 效能分析的提示== 的描述,降低效能分析過程中的干擾; - 在 Linux 核心模組中,可用 ktime 系列的 API; - 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API; - 善用統計模型,除去極端數值,過程中應詳述你的手法 - 分別[用 gnuplot 製圖](https://hackmd.io/@sysprog/Skwp-alOg),分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌) - 嘗試解讀上述實驗的時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響。可修改 VFS 函式,允許透過寫入 `/dev/fibonacci` 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。 ![](https://i.imgur.com/7xyCUVO.png) - [ ] 用 printk 固然方便,但其執行時間易受各種因素而不穩定,除了透過統計來觀察,也可改用核心的 sysfs 介面,後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作,可參見 [Sample kobject implementation](https://elixir.bootlin.com/linux/latest/source/samples/kobject/kobject-example.c) (注意: 切換到對應的 Linux 核心版本)。 - [ ] 逐步最佳化 Fibonacci 的執行時間,引入[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number) 提到的策略,並善用 [clz / ffs](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令 (Linux 核心有對應的包裝函式),過程中都要充分量化 - [ ] 嘗試研讀 [bignum](https://github.com/0xff07/bignum/tree/fibdrv) (`fibdrv` 分支) 的實作,理解其中的技巧並與你的實作進行效能比較,探討個中值得取鏡之處。 - 原理和分析可見 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU) - [ ] 當 $Fib(n)$ 隨著 $n$ 越來越大時,由 Linux 核心傳遞字串形式的十進位就成為效能瓶頸,研讀〈[Integer Encoding Algorithm 筆記](https://kkc.github.io/2021/01/30/integer-compression-note/)〉所提及的策略,在 Linux 核心模組先對數值進行編碼和壓縮,以二進位方式傳遞到使用者層級後,再由應用程式予以顯示 $Fib(n)$ 數值 - 儘量降低由核心傳遞到應用程式的資料量 - 確保應用程式不需要過多的計算量才可重現 Fibonacci 數列 ## 環境 ```shell $ uname -a Linux leon-ubuntu-20 5.13.0-35-generic #40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 36 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 58 Model name: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz Stepping: 9 CPU MHz: 1700.000 CPU max MHz: 3800.0000 CPU min MHz: 1600.0000 BogoMIPS: 6784.21 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-3 ``` ## 開發紀錄 ### 修正可變長度陣列( Variable-Length Array, VLA ) 原程式碼如下。 ```c static long long fib_sequence(long long k) { /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */ long long f[k + 2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[k]; } ``` 其中宣告處可見 `long long f[k + 2]` ,其雖合乎 C99 標準,但其實有更好的實做可以避免 VLA 。修正後如下。 ```c static long long fib_sequence(long long k) { long long a = 0, b = 1; if (k <= 1) return k; for (long long i = k; i > 1; i--) { k = a + b; a = b; b = k; } return k; } ``` ### 利用 fast doubling 實做 直接以費波那契數的定義實做,時間複雜度為 $O(n)$ ,若改以 fast doubling 的方式則可使時間複雜度降為 $O(logn)$ 。 根據作業要求所給出的虛擬碼實做 fast doubling ,首次嘗試如下。 ```c #define clz(x) __builtin_clzll(x) static long long fib_sequence(long long k) { long long a = 0, b = 1; int k_clz = clz(k); if (k <= 1) return k; k <<= k_clz; for (int i = sizeof(long long) * 8; i > k_clz; i--, k <<= 1) { long long t1 = a * (2 * b - a); long long t2 = b * b + a * a; a = t1; b = t2; if (k & (long long) ~(~0ULL >> 1)) { t1 = a + b; a = b; b = t1; } } return a; } ``` 在以上嘗試中,將 `k` 最高位的 1 對齊 MSB ,並逐次將 `k` 左移偵測最高位以實做 fast doubling 。 後來發現更為簡化的實做方式。 ```c #define clz(x) __builtin_clzll(x) static long long fib_sequence(long long k) { if (k <= 1) return k; long long a = 0, b = 1; for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0; mask >>= 1) { long long t1 = a * (2 * b - a); b = b * b + a * a; a = t1; if (k & mask) { t1 = a + b; a = b; b = t1; } } return a; } ``` 以上實做中,不再將 `k` 進行位移。反之,以 `mask` 變數作為遮罩,逐次將 `mask` 右移以對 `k` 進行檢查。 最後則是為未來大數運算做準備。由於目前的程式碼在 `fib_read` 函式中直接回傳所算得之費波那契數,然而此機制無法應付數值超過 `ssize_t` 所能表示之最大值的情況。因此回歸 `read` 函式之用法,將所得之數放進 `fib_read` 函式的 `buf` 參數中。 原 `fib_read` 實做如下。 ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { return (ssize_t) fib_sequence(*offset); } ``` 先修改 `fib_read` 。當 `buf` 的大小不足以放入 `long long` 整數時,回傳 `-1` 。 ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { if (size >= sizeof(long long)) { *(long long *) buf = fib_sequence(*offset); return (ssize_t) size; } return -1; } ``` 再修改 `client.c` 關於 `read` 的部份。 ```diff int main() { long long sz; - char buf[1]; + char buf[sizeof(long long)]; ... for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); - sz = read(fd, buf, 1); + sz = read(fd, buf, sizeof(buf)); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%lld.\n", - i, sz); + i, *(long long *) buf); } for (int i = offset; i >= 0; i--) { lseek(fd, i, SEEK_SET); - sz = read(fd, buf, 1); + sz = read(fd, buf, sizeof(buf)); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%lld.\n", - i, sz); + i, *(long long *) buf); } close(fd); return 0; } ``` ### 實做大數 在此實做固定長度有號大數。新增 `bignum.h` ,定義 `struct bignum` 。 ```c #define BN_ARRAY_SIZE 7 /* Little-endian */ struct bignum { uint32_t num[BN_ARRAY_SIZE]; int32_t num_and_sign; }; ``` `struct bignum` 以 little endian 的方式儲存數值:最低位位於 `num[0]` ,最高位位於 `num_and_sign` 。在此選擇以 32 位元整數作為成員型態的原因是考慮在進行運算時可以 64 位元的變數儲存會溢位的數值。將最高位獨立為 `int32_t` 型態可避免在實做時過度轉型。此結構的數值範圍為 $[-2^{255}, 2^{255} - 1]$ 。 接下來實做基本大數運算於 `bignum.c` 。函式原型如下。 ```c /* bignum = 0 */ void bignum_init(struct bignum *bignum); void bignum_from_int(struct bignum *bignum, int64_t i); void bignum_to_dec(const struct bignum *bignum, char *str, size_t size); /* c = a + b */ void bignum_add(const struct bignum *a, const struct bignum *b, struct bignum *c); /* c = a - b */ void bignum_sub(const struct bignum *a, const struct bignum *b, struct bignum *c); /* c = a * b */ void bignum_mul(const struct bignum *a, const struct bignum *b, struct bignum *c); /* c = a / b */ void bignum_div(const struct bignum *a, const struct bignum *b, struct bignum *c); /* b = -a */ void bignum_neg(const struct bignum *a, struct bignum *b); /* b = |a| */ void bignum_abs(const struct bignum *a, struct bignum *b); /* b = a << 1 */ void bignum_shl1(const struct bignum *a, struct bignum *b, uint32_t lsb); /* b = a >> 1 */ void bignum_shr1(const struct bignum *a, struct bignum *b, uint32_t msb); ``` 實做後,需修改 `Makefile` 以將 `bignum` 導入模組及 client 中。 ```diff ... -TARGET_MODULE := fibdrv +TARGET_MODULE := fibdrvmodule ... +$(TARGET_MODULE)-objs := fibdrv.o bignum.o ... -client: client.c +client: client.c bignum.c $(CC) -o $@ $^ ``` 在編譯前,需了解到在 user mode 和 kernel mode 所引入的標頭檔不盡相同。可以藉由檢查 `__KERNEL__` 巨集是否有被定義來決定所引入的標頭檔。如下。 ```c #ifdef __KERNEL__ #include <linux/kernel.h> #include <linux/module.h> #include <linux/string.h> #else #include <stdio.h> #include <string.h> #endif ``` 接著以 fast doubling 和大數實做費波那契數,最後稍稍修改 `fib_read` 及 `client.c` 。 ```c static void fib_sequence(long long k, struct bignum *result) { if (!result) return; if (k <= 1) { bignum_from_int(result, k); return; } struct bignum a, b; bignum_from_int(&a, 0); bignum_from_int(&b, 1); for (int mask = 1 << (sizeof(long long) * 8 - 1 - clz(k)); mask > 0; mask >>= 1) { struct bignum t; bignum_shl1(&b, &t, 0); bignum_sub(&t, &a, &t); bignum_mul(&t, &a, &t); bignum_mul(&b, &b, &b); bignum_mul(&a, &a, &a); bignum_add(&a, &b, &b); a = t; if (k & mask) { bignum_add(&a, &b, &t); a = b; b = t; } } *result = a; } ``` 最後執行 `make check` 發現無法通過測試,原因為 `scripts/expected.txt` 中大於 92 的費波那契數為錯誤的。修改後方可通過測試。 ```diff Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. -Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. +Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738. +Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167. +Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905. +Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072. +Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977. +Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049. +Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026. +Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075. +Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075. +Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026. +Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049. +Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977. +Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072. +Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905. +Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167. +Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. ``` ### 效能分析 根據作業要求,將可能會影響效能的因素排除。 ```bash # Disable ASLR sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" # Disable inline turbo sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" # Scaling govenor for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do sudo sh -c "echo performance > ${i}" done ``` 接著再次改寫程式碼,將多個需比較的內容寫入 driver 中,並以 `enum` 來決定所要執行的程式碼區段。 ```c #define FIBDRV_MODE_SIZE 5 enum fibdrv_mode { FIBDRV_BIGNUM_FAST, FIBDRV_BIGNUM_ORIG, FIBDRV_LL_FAST, FIBDRV_LL_ORIG, FIBDRV_TIME }; ``` 改寫 `fib_write` ,使之成為使用者改變 driver mode 的函式。 ```c static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { if (size == sizeof(mode) && *(enum fibdrv_mode *) buf < FIBDRV_MODE_SIZE) { if (copy_from_user(&mode, buf, sizeof(mode)) != 0) return -1; } else mode = FIBDRV_BIGNUM_FAST; return 1; } ``` 接著在 driver 中安插計時相關程式碼,所得之時間單位為奈秒。 ```c ktime_t kt = ktime_get(); // codes duration = ktime_sub(ktime_get(), kt); // duration is a global var ``` 並在 user mode 中也安插計時相關程式碼。 ```c struct timespec t0, t1; clock_gettime(CLOCK_MONOTONIC, &t0); // code clock_gettime(CLOCK_MONOTONIC, &t1); // to nanosecond long long ut = (long long) (t1.tv_sec * 1e9 + t1.tv_nsec) - (t0.tv_sec * 1e9 + t0.tv_nsec); ``` 最後利用 gnuplot 將資料轉換為折線圖。 第一張圖為利用 bignum fast doubling 所得之數據。 ![](https://i.imgur.com/Y0LhkZ3.png) 第二張及第三張圖為根據不同的演算法所得之時間數據。 ![](https://i.imgur.com/O9fVTSX.png) ![](https://i.imgur.com/ZKbKncj.png)