# 2023q1 Homework3 (fibdrv) contributed by < `visitorckw` > ## 自我檢查清單 - [ ] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 學習指出針對大數運算的加速運算和縮減記憶體操作成本的舉措,紀錄你的認知和疑惑 - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 $\to$ 搭配閱讀〈[並行和多執行緒程式設計](https://hackmd.io/@sysprog/concurrency)〉 ## 開發環境 ```bash $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 165 Model name: Intel(R) Core(TM) i5-10500 CPU @ 3.10GHz Stepping: 3 CPU MHz: 3100.000 CPU max MHz: 4500.0000 CPU min MHz: 800.0000 BogoMIPS: 6199.99 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 192 KiB L2 cache: 1.5 MiB L3 cache: 12 MiB NUMA node0 CPU(s): 0-11 ``` ## 開發過程 由於原本的程式碼當計算到第 92 項過後時,會造成使用 long long 型別變數儲存時出現 overflow 的錯誤。因此改以字串的形式來計算以及儲存結果。 所以首先需要一個字串相加的函數: ```c static void string_number_add(char *a, char *b, char *c) { if (strlen(a) < strlen(b)) { char *tmp = a; a = b; b = tmp; } size_t lenA = strlen(a); size_t lenB = strlen(b); reverse_str(a); reverse_str(b); int carry = 0, i = 0; for (i = 0; i < lenA; i++) { int numA = a[i] - '0'; int numB = (i < lenB ? b[i] - '0' : 0); c[i] = numA + numB + carry; carry = c[i] / 10; c[i] %= 10; c[i] += '0'; } if (carry) c[i++] = '1'; c[i] = '\0'; reverse_str(a); reverse_str(b); reverse_str(c); } ``` 接著再透過上述的 ```string_number_add``` 函數用 ```O(N)``` 的時間複雜度取得費氏數列第 N 項的值: ```c for (int i = 2; i <= k; i++) { string_number_add(a, b, c); char *tmp = a; a = b; b = c; c = tmp; } ``` 另外由於程式是執行在 kernel space 之中,因此需要使用 ```copy_to_user``` 函數傳到 user space 之中。 ```c if (__copy_to_user(buf, b, strlen(b)+1)) { kfree(a); kfree(b); kfree(c); return -EFAULT; } size_t len = strlen(b); kfree(a); kfree(b); kfree(c); return len; ``` ## bignum 支援大數 其開發的思想是用陣列來儲存二補數形式的大數。 * 基本結構 ```c typedef struct bignum bignum; struct bignum{ unsigned char* arr; int len; }; ``` * 加法/減法 由於是二補數的形式,因此加法與減法可以共用同一種運算。 ```c bignum *addsub(bignum *a, bignum *b) { unsigned char signedA = (a->arr[a->len - 1] & (1U << 7)) >> 7; unsigned char signedB = (b->arr[b->len - 1] & (1U << 7)) >> 7; unsigned int newLength = MAX(a->len, b->len); bignum *c; INIT(c, newLength); unsigned int carry = 0; for (int i = 0; i < newLength; i++) { unsigned int A = i < a->len ? a->arr[i] : 0xff * signedA; unsigned int B = i < b->len ? b->arr[i] : 0xff * signedB; unsigned int C = A + B + carry; carry = (C & (1U << 8)) >> 8; c->arr[i] = C & 0xff; } if (carry && !(signedA ^ signedB) && (signedA ^ ((c->arr[c->len - 1] & (1U << 7)) >> 7))) { RESIZE(c, c->len + 1); c->arr[c->len - 1] = signedA ? 0x01 : 0xff; } REDUCE_LENGTH(c); return c; } ``` * 二補數 若遇到負號/減法操作,則需要先轉成二補數。 ```c bignum* twoComp(bignum* a) { bignum* One; INIT(One, 1); One->arr[0] = 0x01; bignum* notA = not(a); bignum* res = addsub(notA, One); FREE(One); return res; } ``` * 乘法 目前採用兩層迴圈暴力計算的方式實現乘法操作,未來可以使用 FFT 或者 karatsuba 演算法進行效率上的改善。 ```c bignum* mul(bignum* a, bignum* b) { bignum* c; INIT(c, a->len + b->len + 1); for(int i = 0; i < a->len; i++){ for(int j = 0; j < b->len; j++){ unsigned int A = a->arr[i]; unsigned int B = b->arr[j]; unsigned int C = A * B; c->arr[i+j] += C & 0xff; c->arr[i+j+1] += (C & 0xff00) >> 8; } } if(c->arr[c->len-1] & (1U << 7)){ RESIZE(c, c->len + 1); c->arr[c->len-1] = 0; } REDUCE_LENGTH(c); return c; } ``` * bignum 費氏數列計算 最後則是利用 fast doubling 的手法搭配 bignum 計算。 ```c bignum* fib_bignum_fastdouble(uint64_t target) { bignum* fib_n0; bignum* fib_n1; bignum* fib_2n0; bignum* fib_2n1; INIT(fib_n0, 1); INIT(fib_n1, 1); fib_n0->arr[0] = 0x01; fib_n1->arr[0] = 0x01; if(!target){ FREE(fib_n0); fib_n1->arr[0] = 0x00; return fib_n1; } if(target <= 2){ FREE(fib_n0); return fib_n1; } // find first 1 uint8_t count = 63 - __builtin_clzll(target); // uint64_t fib_n0 = 1, fib_n1 = 1; for (uint64_t i = count; i-- > 0;) { // fib_2n0 = fib_n0 * ((fib_n1 << 1) - fib_n0); fib_2n0 = mul(fib_n0, addsub(Lshift(fib_n1), twoComp(fib_n0))); // fib_2n1 = fib_n0 * fib_n0 + fib_n1 * fib_n1; fib_2n1 = addsub(mul(fib_n0, fib_n0), mul(fib_n1, fib_n1)); if (target & (1UL << i)) { fib_n0 = fib_2n1; // fib_n1 = fib_2n0 + fib_2n1; fib_n1 = addsub(fib_2n0, fib_2n1); } else { fib_n0 = fib_2n0; fib_n1 = fib_2n1; } } return fib_n0; } ``` ## 時間測量與效率分析 在 client.c 加入了以下程式碼,透過 timespec 以及 clock_gettime 來取得時間: ```c struct timespec tstart = {0, 0}, tend = {0, 0}; clock_gettime(CLOCK_MONOTONIC, &tstart); sz = read(fd, buf, 128); clock_gettime(CLOCK_MONOTONIC, &tend); long long usertime = (1e9 * tend.tv_sec + tend.tv_nsec) - (1e9 * tstart.tv_sec + tstart.tv_nsec); long long kerneltime = write(fd, write_buf, strlen(write_buf)); ``` 在 fibdrv.c 中則是使用教材提及的 fib_time_proxy 函數來獲取時間: ```c static long long fib_time_proxy(long long k, char *buf) { kt = ktime_get(); // long long result = fib_sequence(k); long long result = fib(k, buf); kt = ktime_sub(ktime_get(), kt); return result; } ``` * 使用 fib_sequence 函數進行運算 ( 僅能正確計算到第 92 項的數值 ) ![](https://i.imgur.com/SgxO8o3.png) * 使用字串作儲存數值進行大數運算,採用線性時間演算法 ![](https://i.imgur.com/0mgEwVY.png) * 使用 fast_doubling_iter 函數進行運算 ( 原始碼可見於[連結](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d) ) ![](https://i.imgur.com/9JgkErP.png) * 使用 bn_kernel 作為大數進行計算 ( 原始碼可見於[連結](https://github.com/KYG-yaya573142/fibdrv/blob/master/bn_kernel.c) ) ![](https://i.imgur.com/sko2Ftt.png) * 使用自己撰寫的 bignum 作為大數進行計算 ![](https://i.imgur.com/WTYFaov.png)