2022q1 Homework3 (fibdrv)

--- tags: linux2022 --- # 2022q1 Homework3 (fibdrv) contributed by < [Eddielin0926](https://github.com/Eddielin0926) > ## :male-teacher: 自我檢查清單 - [ ] 研讀 [Linux 效能分析](https://hackmd.io/@sysprog/linux2022-fibdrv#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA) 的提示描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作 → 從中也該理解為何不希望在虛擬機器中進行實驗。 - [ ] 研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 [clz / ctz](https://hackmd.io/@sysprog/linux2022-fibdrv#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 研讀 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU) 的報告，指出針對大數運算，有哪些加速運算和縮減記憶體操作成本的舉措？ - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`，這是 “each module’s use count and a list of referring modules”，但如何實作出來呢？模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢？ > 搭配閱讀 The Linux driver implementer’s API guide » Driver Basics - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/Reference_counting) 的程式碼來確認。 ## :gear: 開發環境 ### 測試環境 ```shell= ➜ ~ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 ➜ ~ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 6 On-line CPU(s) list: 0-5 Thread(s) per core: 2 Core(s) per socket: 3 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 21 Model: 2 Model name: AMD FX(tm)-6300 Six-Core Processor Stepping: 0 Frequency boost: enabled CPU MHz: 1400.000 CPU max MHz: 3500.0000 CPU min MHz: 1400.0000 BogoMIPS: 7033.04 Virtualization: AMD-V L1d cache: 48 KiB L1i cache: 192 KiB L2 cache: 6 MiB L3 cache: 8 MiB NUMA node0 CPU(s): 0-5 ``` ### 檢查環境 ```shell= ➜ ~ uname -r 5.13.0-30-generic ➜ ~ dpkg -L linux-headers-`uname -r` | grep "/lib/modules" /lib/modules /lib/modules/5.13.0-30-generic /lib/modules/5.13.0-30-generic/build ➜ ~ whoami eddie ➜ ~ sudo whoami root ``` ## Fix C99 VLA 原本的程式碼使用了動態宣告陣列，但實際上這個函式只關心費氏數列第 k 個元素，因此沒有必要儲存前面計算的結果。計算的過程中只會需要 $fib(n - 1)$ 和 $fib(n - 2)$，所以只需要宣告三個變數即可。 $$ fib(n) = fib(n - 1) + fib(n - 2) $$ ```c static long long fib_sequence(long long k) { long long f[3] = {0, 1, 1}; if (k <= 2) return f[k]; for (int i = 2; i <= k; i++) { f[2] = f[1] + f[0]; f[0] = f[1]; f[1] = f[2]; } return f[2]; } ``` ## Fast Doubling 根據 Fast Doubling 手法，將 `a` 視為 $F(k)$、`b` 視為 $F(k + 1)$。 $$ F(2k) = F(k)[2F(k+1) - F(k)] \\ F(2k+1) = F(k+1)^2+F(k)^2 $$ 這邊利用了 `__builtin_clzll` 來計算 bit width，`__builtin_clzll` 計算了左側有幾個零，因此扣掉 `long long` 的 64 為位元相減就知道這個數有幾位。 ```c #define __binary_digitll(x) ((sizeof(x) * 8) - __builtin_clzll(x)) static long long fib_sequence(long long k) { long long a = 0, b = 1; for (int i = __binary_digitll(k); i > 0; i--) { long long t1 = a * (2 * b - a); long long t2 = (b * b) + (a * a); a = t1; b = t2; if (k & ((1LL << i) >> 1)) { t1 = a + b; a = b; b = t1; } } return a; } ```  ## Timing ![](https://i.imgur.com/WMgSYBc.png) ## 大數運算 ```c #include <stdint.h> #define BIG_INT_SIZE 12 #define LSB 0 #define MSB (BIG_INT_SIZE - 1) struct bigint { uint32_t bytes[BIG_INT_SIZE]; }; struct bigint init_bigint(long long num); int bitwidth_bigint(struct bigint num); int str_bigint(char *buf, int size, struct bigint num); struct bigint add_bigint(struct bigint a, struct bigint b); struct bigint sub_bigint(struct bigint a, struct bigint b); struct bigint neg_bigint(struct bigint a); struct bigint not_bigint(struct bigint a); struct bigint shr_bigint(struct bigint a, int shift); struct bigint shl_bigint(struct bigint a, int shift); ``` ### Double Dabble ```c int str_bigint(char *buf, int size, struct bigint num) { // Double Dabble Algorithm if (!buf || size < 2) return -1; int n = bitwidth_bigint(num); int bcd_sz = (n + 4 * ((n + 2) / 3)) / 8; int idx = 0; uint8_t bcd[bcd_sz]; if (num.bytes[MSB] >> 31) { if (bcd_sz + 1 > size) return -1; num = neg_bigint(num); buf[idx++] = '-'; } else { if (bcd_sz > size) return -1; } for (int i = 0; i < bcd_sz; i++) { bcd[i] = 0; } for (int i = 0; i < BIG_INT_SIZE * 32; i++) { for (int j = 0; j < bcd_sz; j++) { if ((bcd[j] & 0xF) >= 5) { bcd[j] += 3; } if ((bcd[j] >> 4) >= 5) { bcd[j] += 0x30; } } for (int j = bcd_sz - 1; j > 0; j--) { bcd[j] <<= 1; bcd[j] |= ((bcd[j - 1] >> 7) & 1); } bcd[0] <<= 1; bcd[0] |= ((num.bytes[MSB] >> 31) & 1); for (int j = BIG_INT_SIZE - 1; j > 0; j--) { num.bytes[j] <<= 1; num.bytes[j] |= ((num.bytes[j - 1] >> 31) & 1); } num.bytes[0] <<= 1; } bool lead = true; for (int i = bcd_sz - 1; i >= 0; i--) { if (!lead || (bcd[i] >> 4)) { lead = false; buf[idx++] = (bcd[i] >> 4) + '0'; } if (!lead || (bcd[i] & 0xF)) { lead = false; buf[idx++] = (bcd[i] & 0xF) + '0'; } } if (lead) { buf[idx++] = '0'; } buf[idx++] = '\0'; return idx; } ```