# 2020q1 Homework2 (fibdrv) contributed by < `jeeeff` > ## 實驗環境 作業系統 ```shell $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic ``` 編譯器 ```shell $ gcc --version gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 Copyright (C) 2017 Free Software Foundation, Inc. ``` ## 作業要求 - [ ] 撰寫適用於 Linux 核心層級的程式 * 學習 ktimer, copy_to_user 一類的核心 API - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise) - [ ] 初探 Linux VFS - [ ] 自動測試機制 - [x] 透過工具進行效能分析 - [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢? - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題 ## 實做過程 ### client.c ```cpp int main() { long long sz; char buf[100]; char write_buf[] = "testing writing"; int offset = 100; /* TODO: try test something bigger than the limit */ int fd = open(FIB_DEV, O_RDWR); if (fd < 0) { perror("Failed to open character device"); exit(1); } for (int i = 0; i <= offset; i++) { sz = write(fd, write_buf, strlen(write_buf)); printf("Writing to " FIB_DEV ", returned the sequence %lld\n", sz); } for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); big_fibnum(buf, sz); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%s.\n", i, buf); } for (int i = offset; i >= 0; i--) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); big_fibnum(buf, sz); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%s.\n", i, buf); } close(fd); return 0; } ``` ### 基本工具準備 * 老師都幫我們準備好了基本工具,只要貼上就好,而且也是準備做大數的形式 :::danger 請依循 Linux 核心原始碼風格,避免 [camelCase](https://en.wikipedia.org/wiki/Camel_case),變數或韓式名稱用簡潔的全小寫。 :notes: jserv ::: ```cpp struct BigN { unsigned long long lower, upper; }; static inline void addBigN(struct BigN *output, struct BigN x, struct BigN y) { output->upper = x.upper + y.upper; if (y.lower > ~x.lower) output->upper++; output->lower = x.lower + y.lower; } ``` ### addBigN() * 做大數加法的關鍵,這個工具老師也幫我們準備好 ```cpp static inline void addBigN(struct BigN *output, struct BigN x, struct BigN y) { output->upper = x.upper + y.upper; if (y.lower > ~x.lower) output->upper++; output->lower = x.lower + y.lower; } ``` ### fib_sequence() * 工具都準備好了,用數學課遞迴的方式做 fibonacci sequence 運算 ```cpp static struct BigN fib_sequence(long long k) { /* FIXME: use clz/ctz and fast algorithms to speed up */ struct BigN f[k + 2]; f[0].upper = 0; f[0].lower = 0; f[1].upper = 0; f[1].lower = 1; for (int i = 2; i <= k; i++) { addBigN(&f[i], f[i - 1], f[i - 2]); } return f[k]; } ``` * gnuplot 部份是參考 < `pingsutw` > 同學的 analysis * 這邊值得一提的是因為 k+2 這麼長的陣列比起用 3 個變數做相對浪費記憶體空間,但不確定效能是是否有明顯影響,所以拿 fib_sequence 和下面這個 code 做比較 ```cpp static struct BigN fib_3(int k) { struct BigN t1,t2,t3; t1.upper = 0; t1.lower = 1; t2.upper = 0; t2.lower = 0; for(int i=2;i<=k;i++) { addBigN(&t3,&t1,&t2); t2 = t1; t1 = t3; } return t3; } ``` * 做一萬次取平均,先貼 code 再貼圖 :::danger 避免用「平均」,改用統計手法去除 [outlier](https://en.wikipedia.org/wiki/Outlier) :notes: jserv ::: ```cpp #define count 10000 int main() { FILE *fp = fopen("./performance", "w"); char time_buf[100] = {0}; double t1_rs, t3_rs; for (int i = 0; i <= MAX_LENGTH; i++) { struct timespec start, stop; t1_rs = 0; t3_rs = 0; /* fib_sequence */ for(int j = 0; j < count; j++) { clock_gettime(CLOCK_MONOTONIC, &start); struct BigN res1 = fib_sequence(i); clock_gettime(CLOCK_MONOTONIC, &stop); //if(j == 0) // display_big_fibnum(i, res1); double t1 = diff_in_ns(start, stop); t1_rs += t1; } /* fib_3 */ for(int j = 0; j < count; j++) { clock_gettime(CLOCK_MONOTONIC, &start); struct BigN res3 = fib_3(i); clock_gettime(CLOCK_MONOTONIC, &stop); //if(j == 0) // display_big_fibnum(i, res3); double t3 = diff_in_ns(start, stop); t3_rs += t3; } t1_rs /= count; t3_rs /= count; snprintf(time_buf, sizeof(time_buf), "%d %.10lf %.10lf\n", i, t1_rs, t3_rs); fputs(time_buf, fp); } return 0; } ``` ![](https://i.imgur.com/A55Icif.png) * 因為把 display_big_fibnum 註解掉所以時間有再降更低,不過若要看答案是否正確,要刪掉註解 * 結果與猜的相反!原本以為 K + 2 的不需要做 t1, t2, t3 交換的動作會比較快,結果證實 3-variable 較快,可能是因為耗費 memory 很少,讓 CPU 不需要做 paging ``` 92 = 7540113804746346429 92 = 7540113804746346429 93 = 12200160415121876738 93 = 12200160415121876738 94 = 19740274219868223167 94 = 19740274219868223167 ... 132 = 1725375039079340637797070384 132 = 1725375039079340637797070384 133 = 2791715456571051233611642553 133 = 2791715456571051233611642553 134 = 45170904956460969042((712937 134 = 45170904956460969042((712937 gnuplot analysis.gp eog performance.png ``` * 對照 [The first 300 Fibonacci numbers](http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fibtable.html) 答案是對的,可以做到 fib(133) ,從 fib(134) 開始會出現一些符號 ``` 92 : 7540113804746346429 = 3 x 139 x 461 x 4969 x 28657 x 275449 93 : 12200160415121876738 = 2 x 557 x 2417 x 4531100550901 94 : 19740274219868223167 = 2971215073 x 6643838879 ... 132 : 1725375039079340637797070384 = 24 x 32 x 43 x 89 x 199 x 307 x 9901 x 19801 x 261399601 133 : 2791715456571051233611642553 = 13 x 37 x 113 x 3457 x 42293 x 351301301942501 134 : 4517090495650391871408712937 = 269 x 4021 x 116849 x 1429913 x 24994118449 ``` * fib_sequence (k + 2), fib_3 和 fib_ctz 都各自跑一萬次取平均值的結果 * 之前跟同學寫作業的時候,fast doubling 就一直都比 recursion 累加的方式慢 ### fast_doubling * 這邊寫的 fib_fast_logn 就是 fast_doubling ,單純因為[來源網站](https://www.hackerearth.com/zh/practice/notes/fast-doubling-method-to-find-nth-fibonacci-number/),只是速度仍然比 fib_sequence 還慢,費氏數列一樣可以算到 n = 133 還是對的,從 n = 134 開始出現一些符號 ```cpp void fib_fast_doubling(int k, uint128_t *ans0, uint128_t *ans1) { if (k == 0) { ans0->upper = 0llu; ans0->lower = 0llu; ans1->upper = 0llu; ans1->lower = 1llu; return; } fib_fast_doubling((k/2), ans0, ans1); uint128_t a = *ans0; uint128_t b = *ans1; uint128_t b2; uint128_t c = {.upper = 0llu, .lower = 0llu}; uint128_t d = {.upper = 0llu, .lower = 0llu}; add_t(&b2,b,b); sub_t(&c,b2,a); uint128_t cpa = {.upper = 0llu, .lower = 0llu}; mul_t(&cpa,a,c); mul_t(&a,a,a); mul_t(&b,b,b); add_t(&d,a,b); if (k%2) { *ans0 = d; add_t(ans1,cpa,d); } else { *ans0 = cpa; *ans1 = d; } } ``` ![](https://i.imgur.com/5N3qHDL.png) ``` 92 = 7540113804746346429 92 = 7540113804746346429 92 = 7540113804746346429 93 = 12200160415121876738 93 = 12200160415121876738 93 = 12200160415121876738 94 = 19740274219868223167 94 = 19740274219868223167 94 = 19740274219868223167 ... 132 = 1725375039079340637797070384 132 = 1725375039079340637797070384 132 = 1725375039079340637797070384 133 = 2791715456571051233611642553 133 = 2791715456571051233611642553 133 = 2791715456571051233611642553 134 = 45170904956460969042((712937 134 = 45170904956460969042((712937 134 = 45170904956460969042((712937 ``` * 因為抖動蠻大的,所以 < `pingsutw` > 同學建議我跑一百萬次,跑出來的圖在下面: ![](https://i.imgur.com/xZ2znsJ.png) * 程式在單核上執行,過程中電腦有自己切換 CPU ,下圖是從 CPU1 切到 CPU3 的截圖 ![](https://i.imgur.com/Lu2Iwtm.png) ### clz 和 ctz * 試驗有用 clz, ctz 的演算法和沒有用 clz, ctz 的,分別是 fib_clzctz() 和 fib_fast_doubling() #### fib_clzctz * 這邊借鑒 < `AndybnACT` > 同學的演算法,然後前面 fast_doubling 的減法器和乘法器也是借鑒 < `AndybnACT` > 同學的,所以可以確保實驗變數只有一個(是否使用 clz 和 ctz) ```cpp static uint128_t fib_clzctz(int k) { uint128_t a = {.lower = 0, .upper = 0}; uint128_t b = {.lower = 1, .upper = 0}; if (k <= 1) { if(k == 0) a = a; else a = b; return a; } int clz = __builtin_clzll(k); for (int i = (64 - clz); i > 0; i--) { uint128_t t1, t2, tmp; ls_t(&tmp, b, 1); // tmp = 2b sub_t(&tmp, tmp, a); // tmp = tmp -a mul_t(&t1, a, tmp); // t1 = a*tmp mul_t(&a, a, a); // a = a^2 mul_t(&b, b, b); // b = b^2 add_t(&t2, a, b); // t2 = a^2 + b^2 a = t1; b = t2; if (k & (1ull << (i - 1))) { // current bit == 1 add_t(&t1, a, b); a = b; b = t1; } } return a; } ``` * 跑一萬次取平均 ![](https://i.imgur.com/Hx3gkcx.png) * 跑一百萬次取平均 (這裡的 fib_fast_logn 就是 fib_fast_doubling) ![](https://i.imgur.com/iEShVFk.png) * 雖然 n > 133 之後答案都是錯的,但可以看出有用 clz, ctz 的演算法略快一些(綠色線大部份是貼在紫色線上面的) ## 參考資料 * [[ gcc Built-in Functions for binary ] gcc 內建處理二進位函數](http://sunmoon-template.blogspot.com/2017/04/gcc-built-in-functions-for-binary-gcc.html) * [The first 300 Fibonacci numbers](http://www.maths.surrey.ac.uk/hosted-sites/R.Knott/Fibonacci/fibtable.html) * [uint128_t does not name a type](https://stackoverflow.com/questions/34588650/uint128-t-does-not-name-a-type) * [Fast Doubling method to find nth Fibonacci number](https://www.hackerearth.com/zh/practice/notes/fast-doubling-method-to-find-nth-fibonacci-number/)