Linux 核心專題: 重做 fibdrv

# Linux 核心專題: 重做 fibdrv > 執行人: seasonwang0905 > [期末專題解說影片](https://youtu.be/hFbArw7STjk) :::success :question: 提問清單 * ? ::: ## 任務簡述依據 [fibdrv](https://hackmd.io/@sysprog/linux2023-fibdrv) 作業規範，繼續投入 Linux 核心模組和相關程式的開發。 ## TODO: 紀錄閱讀作業說明中所有的疑惑閱讀 [fibdrv](https://hackmd.io/@sysprog/linux2023-fibdrv) 作業規範，包含「作業說明錄影」和「Code Review 錄影」，本著「誠實面對自己」的心態，在本頁紀錄所有的疑惑，並與授課教師預約討論。過程中，彙整 [Homework3](https://hackmd.io/@sysprog/linux2023-homework3) 學員的成果，挑選至少三份開發紀錄，提出值得借鏡之處，並重現相關實驗。 ## TODO: 回覆「自我檢查清單」回答「自我檢查清單」的所有問題，需要附上對應的參考資料和必要的程式碼，以第一手材料 (包含自己設計的實驗) 為佳 ## TODO: 以 [sysprog21/bignum](https://github.com/sysprog21/bignum) 為範本，實作有效的大數運算理解其中的技巧並導入到 fibdrv 中，並留意以下: * 在 Linux 核心模組中，可用 ktime 系列的 API; * 在 userspace 可用 clock_gettime 相關 API; * 善用統計模型，除去極端數值，過程中應詳述你的手法 * 分別用 gnuplot 製圖，分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷，單位需要用 us 或 ns (自行斟酌) ## 彙整及重現加速 Fibonacci 運算參考 [eecheng87](https://hackmd.io/@eecheng/H18gYdsEI) 的實作，他在 fast doubling 的基礎上加入了 `__builtin_clz()` 來加速 Fibonacci 的運算。 Fast doubling 無 `__builtin_clz()` 版本: ```c unsigned long long double_fib1(int n) { unsigned long long f1 = 0, f2 = 1, t1, t2; for (int i = 31; i >= 0; i--) { t1 = f1 * (f2 * 2 - f1); t2 = f2 * f2 + f1 * f1; f1 = t1; f2 = t2; if ((n & (1 << i))) { t1 = f1 + f2; f1 = f2; f2 = t1; } } return f1; } ``` Fast doubling 有 `__builtin_clz()` 版本: ```c unsigned long long double_fib2(int n) { unsigned long long f1 = 0, f2 = 1; int i = 31 - __builtin_clz(n); for (; i >= 0; i--) { unsigned long long t1, t2; t1 = f1 * (f2 * 2 - f1); t2 = f2 * f2 + f1 * f1; f1 = t1; f2 = t2; if ((n & (1 << i))) { t1 = f1 + f2; f1 = f2; f2 = t1; } } return f1; } ``` 在加入 `__builtin_clz()` 前，不論欲求的 n 值為何，迴圈必定會執行 32 ，加入後，可以減少迭代的次數，避免在 n 值很小的時候執行過多的計算再來，參考並修改了 [Hsuhaoxiang](https://hackmd.io/@Hsuhaoxiang/fibdrv) 的作法，他先計算了傳入的 n 值 (這裡為 k) 在 bitwise 中有多少個 1 並紀錄，以此切換 a 及 b 的位置 ```c unsigned long long double_fib3(int k) { int bin_k[8]; int count = 0; while (k) { bin_k[count] = k % 2; k = k >> 1; count++; } unsigned long long int a = 0, b = 1, t1, t2, temp; for (int i = count - 1; i >= 0; i--) { t1 = a * ((2 * b) - a); t2 = b * b + a * a; a = t1; b = t2; if (bin_k[i] == 1) { temp = a + b; a = b; b = temp; } } return a; } ``` 這裡直接計算 k 的 1 位元個數，然後由 fast doubling 可減少一半迭代數的特性算出 `count` ，在效能上已和使用 `__builtin_clz()` 類的手法接近，改善的方式可能要從記憶體的使用或是加入大數運算 ## 加入大數運算因為 c 語言沒有合適的資料型態可以存儲超過 $2^{64}-1$ 或 $-2^{64}$ 的數值，故我們需要使用其他手段來作大數運算，參考 [eecheng87](https://hackmd.io/@oscarshiang/linux_fibdrv#%E8%A8%88%E7%AE%97%E7%AC%AC-92-%E9%A0%85%E4%B9%8B%E5%BE%8C%E7%9A%84-Fibonacci-%E6%95%B8) 的作業中所述，他使用以下結構來存取一個動態長度的陣列 ```c typedef struct { char digits[MAXDIGITS]; /* represent the number */ int signbit; /* 1 if positive, -1 if negative */ int lastdigit; /* index of high-order digit */ } bignum; ``` 加法的實作為 ```c int add_bignum(bignum *a, bignum *b, bignum *c) { .. if (a->lastdigit < b->lastdigit) return add_bignum(b, a, c); int k = c->lastdigit = a->lastdigit + 1; c->digits[k--] = '\0'; carry = 0; n_carry = 0; for (i = b->lastdigit - 1, j = a->lastdigit - 1; i >= 0; i--, j--) { carry = b->digits[i] - '0' + a->digits[j] - '0' + carry; c->digits[k--] = (carry % 10) + '0'; carry = carry / 10; if (carry) n_carry++; } for (; j >= 0; j--) { carry = a->digits[j] - '0' + carry; c->digits[k--] = (carry % 10) + '0'; carry = carry / 10; if (carry) n_carry++; } if (carry) c->digits[k] = carry + '0'; else { char string[MAXDIGITS]; strlcpy(string, &c->digits[1], MAXDIGITS); strlcpy(c->digits, string, MAXDIGITS); c->lastdigit = c->lastdigit - k - 1; } return n_carry; } ``` 上面省去了判斷加負數的部份，加負數可以替代成減法的形式。我們需要以字串來紀錄大數，使用時再將其一個一個輸出成大數的樣子再來是減法的部份 ```c int subtract_bignum(bignum *a, bignum *b, bignum *c) { // int borrow; /* has anything been borrowed? */ // int v; /* placeholder digit */ register int i, j, op = 0; /* counter */ int n_borrow; int temp; c->signbit = PLUS; if ((a->signbit == MINUS) || (b->signbit == MINUS)) { b->signbit = -1 * b->signbit; n_borrow = add_bignum(a, b, c); b->signbit = -1 * b->signbit; return n_borrow; } .. .. int k = c->lastdigit = MAX(a->lastdigit, b->lastdigit); n_borrow = 0; c->digits[k--] = '\0'; for (i = a->lastdigit - 1, j = b->lastdigit - 1; j >= 0; i--, j--) { temp = a->digits[i] - '0' - (b->digits[j] - '0' + op); if (temp < 0) { temp += 10; op = 1; n_borrow++; } else op = 0; c->digits[k--] = temp + '0'; } while (op) { temp = a->digits[i--] - op - '0'; if (temp < 0) { temp += 10; op = 1; n_borrow++; } else op = 0; c->digits[k--] = temp + '0'; } for (; i >= 0; i--) c->digits[k--] = a->digits[i]; for (i = 0; !(c->digits[i] - '0'); i++) ; c->lastdigit = c->lastdigit - i; if (i == a->lastdigit) strlcpy(c->digits, "0", MAXDIGITS); else { char string[MAXDIGITS]; strlcpy(string, &c->digits[i], MAXDIGITS); strlcpy(c->digits, string, MAXDIGITS); } return n_borrow; } ``` 減法中有很多借位的算法，目前還在嘗試理解 :::info 這邊有個疑問：加法、減法輸出 `n_carry` 、`n_borrow` 進位或是借位目的是什麼，用在哪裡？ ::: 最後是乘法 ```c void multiply_bignum(bignum *a, bignum *b, bignum *c) { // long int n_d; register long int i, j, k = 0; short int num1[MAXDIGITS], num2[MAXDIGITS], of = 0, res[MAXDIGITS] = {0}; // n_d = (a->lastdigit < b->lastdigit) ? b->lastdigit : a->lastdigit; // n_d++; for (i = 0, j = a->lastdigit - 1; i < a->lastdigit; i++, j--) num1[i] = a->digits[j] - 48; for (i = 0, j = b->lastdigit - 1; i < b->lastdigit; j--, i++) num2[i] = b->digits[j] - 48; res[0] = 0; for (j = 0; j < b->lastdigit; j++) { for (i = 0, k = j; i < a->lastdigit || of; k++, i++) { if (i < a->lastdigit) res[k] += num1[i] * num2[j] + of; else res[k] += of; of = res[k] / 10; res[k] = res[k] % 10; } } for (i = k - 1, j = 0; i >= 0; i--, j++) c->digits[j] = res[i] + 48; c->digits[j] = '\0'; c->lastdigit = k; c->signbit = a->signbit * b->signbit; } ``` 因為字串沒有所謂的相乘，所以只能把數字擷取出來再做乘法，最後把結果存到 `c.->digits` 有了以上大數運算，就可以運用到 Fibonacci 的數列上了 Fast doubling 演算法支援大數運算 ```c bignum a, b; bignum big_two; int_to_bignum(0, &a); int_to_bignum(1, &b); int_to_bignum(2, &big_two); for (int i = 31 - __builtin_clz(k); i >= 0; i--) { bignum t1, t2; bignum tmp1, tmp2; multiply_bignum(&b, &big_two, &tmp1); (void) subtract_bignum(&tmp1, &a, &tmp2); multiply_bignum(&a, &tmp2, &t1); multiply_bignum(&a, &a, &tmp1); multiply_bignum(&b, &b, &tmp2); (void) add_bignum(&tmp1, &tmp2, &t2); copy(&a, &t1); copy(&b, &t2); if ((k & (1 << i)) > 0) { (void) add_bignum(&a, &b, &t1); copy(&a, &b); copy(&b, &t1); } } return a; ``` 輸入第 100 、200 項時，看看輸出結果為何 ```c 354224848179261915075 \\ n = 100 280571172992510140037611932413038677189525 \\ n = 200 ``` 比對 [The Fibonacci numbers](https://r-knott.surrey.ac.uk/Fibonacci/fibtable.html) 發現相同，大數運算成功執行。 ## 研讀上述費氏數列相關材料（包含論文），摘錄關鍵手法，並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 * Fibonacci演算法實作程式碼: [seasonwang0905](https://github.com/seasonwang0905/fibdrv) James L. Holloway 發表於 1988 年的碩士論文〈[Algorithms for Computing Fibonacci Numbers Quickly](https://ir.library.oregonstate.edu/downloads/t435gg51w)〉針對多種 Fibonacci 數列之演算法進行模擬及分析，包括遞迴、重複相加、[Binet's Formula](https://proofwiki.org/wiki/Euler-Binet_Formula) (公式解)、[Vorobev等式](https://hackmd.io/SNFjz6Y5SVeYsvooDu-z5A?both)、二項式係數法等等逐項討論，以實際成果來展示各演算法在 Fibonacci 數列之項數 n 改變的情況下，其執行時間及效率究竟如何，最後驗證其論文結果。 * 遞迴 (natural recursive) $fib(n)=\begin{cases} 0 &if \quad n=0\\ 1 &if \quad n=1\\ fib(n-1)+fib(n-2) \quad &if \quad n \ge 2 \end{cases}$ 在介紹函式遞迴的時，常見的例子即計算 Fibonacci 數，因重複呼叫函式的關係，演算法時間複雜度來到 $\mathcal{O}(\lambda_1^{n})$，其中 $\lambda_1>1.5$ ，隨著 n 值增加，執行時間會以指數倍成長，故在 n 值較大時，幾乎不會選用此演算法。 * 重複相加 (repeated addition) 原理與前述相同，只不過使用了 for 迴圈來取代函式遞迴的動作，效率卻有顯著的提昇，隨著 n 值增加，執行時間則呈線性增長。另外，論文中提到的 [K-Fibonacci](https://www.geeksforgeeks.org/k-fibonacci-series/) 數列也有相關的演算法，不過這裡暫不討論此部份。透過 gnuplot 製圖: ![](https://hackmd.io/_uploads/Sy1jhX0D3.png) 橫軸代表 Fibonacci 數列第 n 項，即 $fib(n)$，縱軸則是其執行時間，不難看出僅僅是 n = 10 ，遞迴演算法的執行時間已來到最初的 6 倍了，其效率可說是驚人的低落。重複相加的演算法可以作為 gnuplot 的練習及效能的參考。接著，列出改善計算 Fibonacci 數列的演算法 * [Fast Doubling](https://www.nayuki.io/page/fast-fibonacci-algorithms) $F(2n)=2F(n) \times F(n+1)-F(n)^2 \\ F(2n+1)=F(n)^2+F(n+1)^2$ 根據上述關係，著手撰寫程式碼 ```c unsigned long long fast_doubling(int n) { if (n <= 2) return !!n; unsigned long long f1 = 1,f2 = 1, t1, t2, temp; int count = n / 2; for (int i = 0; i < count; i++) { t1 = 2 * f1 * f2 - f1 * f1; t2 = f1 * f1 + f2 * f2; temp = f2; f2 = f1 + temp; f1 = temp; } if (n & 1) return t2; return t1; } ``` 寫完後，馬上來測試效能如何，結果如下圖所示 ![](https://hackmd.io/_uploads/B1X8XNmO2.png) 藍色的線是參考上述提及的 eecheng87 的 fast doubling 無硬體加速版本的演算法，綠色則是我實作的部份，可以發現綠色的執行時間與 n 呈線性成長，效能只有比重複相加的演算法好一些提問：在 fast doubling 演算法中，最多只需要做 $n/2$ 次的迭代即可算完每項 $F(n)$，即使我設定 n 不會超過 32 次，輸出結果是正確的，但執行時間仍就無法降低呢？是否是因為多了 `f2 = f1 + temp` 這一項呢？又或者應該要支援大數運算後再重新比較效能看看？ :::warning 你用 perf 一類的工具去觀察執行的表現，會發現裡頭存在耗時的指令 :notes: jserv :::