# 2023q1 Homework3 (fibdrv) contributed by < [tintinjian12999](https://github.com/tintinjian12999) > ## 實驗環境 ```shell $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 $ ls cpu 架構: x86_64 CPU 作業模式: 32-bit, 64-bit Byte Order: Little Endian Address sizes: 43 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數: 2 每通訊端核心數: 4 Socket(s): 1 NUMA 節點: 1 供應商識別號: AuthenticAMD CPU 家族: 23 型號: 24 Model name: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx 製程: 1 Frequency boost: enabled CPU MHz: 1700.000 CPU max MHz: 2300.0000 CPU min MHz: 1400.0000 BogoMIPS: 4591.40 虛擬: AMD-V L1d 快取: 128 KiB L1i 快取: 256 KiB L2 快取: 2 MiB L3 快取: 4 MiB NUMA node0 CPU(s): 0-7 ``` ## 開發過程 ### 計算 F(93) 以後的 Fibonacci 數 參考[AdrianHuang](https://github.com/AdrianHuang/fibdrv/tree/big_number)的作法利用字串的加法做大數運算,並參考 [qwe661234](https://hackmd.io/@qwe661234/linux2022q1-homework3) 的作法先相加完成後再做字串反轉,可以節省多次翻轉字串的時間 ```clike static void string_number_add(char *a, char *b, char *out) { int carry = 0, sum, i = 0; size_t a_len = strlen(a), b_len = strlen(b); // Check string a is longer than string b if (a_len < b_len) { XOR_SWAP(a, b, char); XOR_SWAP(&a_len, &b_len, size_t); } for (i = 0; i < b_len; i++) { sum = (a[i] - '0') + (b[i] - '0') + carry; out[i] = (sum % 10) + '0'; carry = sum / 10; } for (i = b_len; i < a_len; i++) { sum = (a[i] - '0') + carry; out[i] = (sum % 10) + '0'; carry = sum / 10; } if (carry) out[i++] = carry + '0'; out[i] = '\0'; } ``` 利用 copy_to_user 將運算結果傳給 cilent ```clike static long long fib_sequence(long long k, char *buf) { /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */ int i = 0; str_t *f = kmalloc((k + 2) * sizeof(str_t), GFP_KERNEL); strncpy(f[0].numstr, "0", 1); f[0].numstr[1] = '\0'; strncpy(f[1].numstr, "1", 1); f[1].numstr[1] = '\0'; for (i = 2; i <= k; i++) { string_number_add(f[i - 1].numstr, f[i - 2].numstr, f[i].numstr); } size_t f_size = strlen(f[k].numstr); str_reverse(f[k].numstr, f_size); if (copy_to_user(buf, f[k].numstr, f_size - 1)) return -EFAULT; return f_size; } ``` 同時在 client 端也須調整顯示的格式 ```clike for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, sizeof(buf)); buf[sz] = 0; printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%s.\n", i, buf); } ``` ~~由此可以達成 Fibonacci sequence 的運算,但F(93) 以後的數仍然無法正確顯示~~(已解決)。 能成功計算到 F(500) 以上 ``` $ sudo ./client Reading from /dev/fibonacci at offset 499, returned the sequence 86168291600238450732788312165664788095941068326060883324529903470149056115823592713458328176574447204501. Reading from /dev/fibonacci at offset 500, returned the sequence 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125. ``` ### 效能測試 分別利用 ktime 和 clock_gettime 測試 kernel space 與 user space 執行所花的時間 `ktime:` ```clike //fibdrv.c static ktime_t kt; static long long fib_time_proxy(long long k, char *buf) { kt = ktime_get(); long long result = fib_sequence(k, buf); kt = ktime_sub(ktime_get(), kt); return result; } static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { return (ssize_t) fib_time_proxy(*offset, buf); } ``` ```clike //client.c krtime = write(fd, write_buf, strlen(write_buf)); ``` `clock_gettime:` ```clike //client.c struct timespec t1, t2; clock_gettime(CLOCK_MONOTONIC, &t1); sz = read(fd, buf, sizeof(buf)); clock_gettime(CLOCK_MONOTONIC, &t2); user_time = (t2.tv_sec - t1.tv_sec) * 1E9 + (t2.tv_nsec - t1.tv_nsec); ``` 得到這些執行時間之後就可以透過作圖來得知程式在 kernel space, user space 執行的時間,與 kernal to user 所花的時間,其結果如下 ![](https://i.imgur.com/9ahaGYt.png) 上圖是使用原始的程式碼(如下)做 Fibonacci sequence 運算的執行時間 ```clike static long long fib_sequence(long long k, char *buf) { long long *f = kmalloc((k + 2) * sizeof(long long), GFP_KERNEL); f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[k]; } ``` 可以看到大部分的時間都是用來做運算,而 kernel to user 的時間相較起來影響不大,預計使用 Fast doubling 改善運算的時間。 ![](https://i.imgur.com/HQuPggH.png) 同樣的結論也可以從利用字串加法做運算的結果看出。 ### 利用 fast doubling 加速 Fibonacci 運算 參考[加速 Fibonacci 運算](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d)的作法 ```clike static uint64_t fast_doubling(uint32_t target) { if(target <= 2) return 1; uint64_t n = fast_doubling(target >> 1); uint64_t n1 = fast_doubling((target >> 1) + 1); if(target & 1) return n * n + n1 * n1; return n * ((n1 << 1) - n); } ``` 執行時間如下 ![](https://i.imgur.com/WfOXQao.png) > 利用 Bottom up fast doubling 進一步加速運算(待做) ### 支援其他大數運算 前面的 Fast Doubling 程式仍舊會有無法顯示到 F(92) 以上的問題,因此需要導入大數運算才能解決,我們先前已經成功實做了字串加法的大數運算並運用在 Fibonacci Sequence 的運算上,然而若要使用 Fast Doubling 的技巧加速運算的話,其他大數的也必須要被實做。 #### string_number_sub ```clike void is_borrow (char *a , int i, int borrow) { if (borrow) { if (a[i] != '0') { a[i]--; } else { int j = i; while (a[j] == '0') { a[j] = '9'; } a[j + 1]--; } } }; void string_sub (char *a, char *b, char *out) { int i = 0, borrow = 0; size_t a_len, b_len; a_len = strlen(a); b_len = strlen(b); // String a must be longer than string b for (i = 0; i < b_len; i++) { is_borrow (a, i, borrow); if (a[i] < b[i]) { borrow = 1; //borrow from the next digit out[i] = (((a[i] - '0') + 10) - (b[i] - '0')) + '0'; } else { borrow = 0; out[i] = ((a[i] - '0') - (b[i] - '0')) + '0'; } } for (i = b_len; i < a_len; i++) { is_borrow (a, i, borrow); out[i] = a[i]; } //Remove the leading 0s e.g 0011 -> 11 while (out[i - 1] == '0') --i; out[i] = '\0'; }; ``` #### string_number_mul ```clike void string_mul (char *a, char *b, char *out) { int i = 0, j = 0; size_t a_len = strlen(a); size_t b_len = strlen(b); size_t total_len = a_len + b_len; int count = 0; int *value = kmalloc(total_len * sizeof(int), GFP_KERNEL); memset(value, 0, sizeof(int) * total_len); if (a_len < b_len) { XOR_SWAP(a, b, char); XOR_SWAP(&a_len, &b_len, size_t); } // Store the value after multiplication of each digit. for (i = 0; i < a_len; i++) for (j = 0; j < b_len; j++) { value[i + j] += (a[i] - '0') * (b[j] - '0'); } // Deal with carry for (i = 0; i < total_len; i++) { value[i + 1] += value[i] / 10; value[i] %= 10; } // Detecting the leading zeros while (value[total_len - count - 1] == 0) { count ++; } count = total_len - count; for (i = 0; i < count; i++) out[i] = value[i] + '0'; out[i] = '\0'; }; ``` ### 利用字串大數運算實做 Fast Doubling 參考原有的 fast doubling ```clike static uint64_t fast_doubling(uint32_t target) { if(target <= 2) return 1; uint64_t n = fast_doubling(target >> 1); uint64_t n1 = fast_doubling((target >> 1) + 1); if(target & 1) return n * n + n1 * n1; return n * ((n1 << 1) - n); } ``` 改寫成字串運算的形式 ```clike char *string_fast_doubling(long long k) { char reg1[STR_NUM], reg2[STR_NUM], reg3[STR_NUM], reg4[STR_NUM]; static char out[STR_NUM]; if(k == 0) { return "0"; } else if(k == 1) { return "1"; } else if(k == 2) { return "1"; } strcpy(reg1, string_fast_doubling(k >> 1)); strcpy(reg2, string_fast_doubling((k >> 1) + 1)); if (k & 1) { string_mul(reg1, reg1, reg3); string_mul(reg2, reg2, reg4); string_add(reg4, reg3, out); } else { string_mul(reg2, "2", reg3); string_sub(reg3, reg1, reg4); string_mul(reg4, reg1, out); } f_size = strlen(out); return out; } ``` ### 比較使用 Iterative 和 Fast Doubling 在 N = 500 時的執行時間 ![](https://i.imgur.com/MuMZgKZ.png) 從上圖可以看出當 N 超過一定數量後 Fast Doubling 的作法快於 Iterative 的作法 (因其時間複雜度為 O(logn)), 與預期相符。 > 使用統計方法取平均與消除極端值並觀察N更大時的情形。(待做) ### 用統計手法克服極端值 參考 [4ce43c4](https://github.com/colinyoyo26/fibdrv/commit/4ce43c4e7ee0b80c4ce9e6dcd995bbfdf239206c) 的作法, 執行 100 次的結果取平均,並刪除離群值。 #### 使用 fast doubling: ![](https://i.imgur.com/SlnrhSc.png) #### 使用 iterative: ![](https://i.imgur.com/D8gTzUu.png) 可以看到使用 iterative 方法的執行時間在 n 大於約 400 之後就開始慢於使用 fast doubling 的執行時間,此結果也符合我們的預期。