2020q1 Homework2 (fibdrv)

# 2020q1 Homework2 (fibdrv) contributed by < `pingsutw` > ## 開發環境 ```shell $ uname -a Linux kevin 5.0.0-38-generic #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 19.04 Release: 19.04 Codename: disco $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 60 Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz Stepping: 3 CPU MHz: 1527.846 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 5188.08 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-7 ``` ## 作業要求 - [ ] 撰寫適用於 Linux 核心層級的程式 * 學習 ktimer, copy_to_user 一類的核心 API - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise) - [ ] 初探 Linux VFS - [ ] 自動測試機制 - [x] 透過工具進行效能分析 - [x] 研讀上述 ==Linux 效能分析的提示== 描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`，這是 "each module's use count and a list of referring modules"，但如何實作出來呢？模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢？ - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題 ## 開發過程執行 `make check` 遇到以下錯誤，因為當 fibonacci 數超過93時會 overflow，需對 `fibdrv.c` 進行改寫 ```bash= Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738 ``` 添加一個特製的結構來處理大數運算，並添加其加法器 ```cpp= struct BigN { unsigned long long lower, upper; }; static inline void addBigN(struct BigN *output, struct BigN x, struct BigN y) { output->upper = x.upper + y.upper; if (y.lower > ~x.lower) output->upper++; output->lower = x.lower + y.lower; } ``` 一開始沒有特別的運算加速，單存用新的結構 `BigN` 進行大數運算, 並對其進行效能分析 ```cpp= static struct BigN fib_sequence(long long k) { struct BigN f[k + 2]; f[0].upper = 0; f[0].lower = 0; f[1].upper = 0; f[1].lower = 1; for (int i = 2; i <= k; i++) { addBigN(&f[i], f[i - 1], f[i - 2]); } return f[k]; } ``` ![](https://i.imgur.com/dSU8EId.png) TODO: 在第一次計算的時間，明顯比前20個數慢很多很多，需找出原因 TODO: 需測量更多次實驗 ($10^5$) 並取 $95$% 的信賴區間 - 執行結果都正確，接下來是效能優化 ``` Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738. Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167. Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905. Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072. Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977. Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049. Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026. Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075. ``` - 利用 Fast Doubling 進行加速，以下為虛擬碼，需在為大數運算實做減法器與乘法器 ```cpp= Fast_Fib(n) a = 0; b = 1; // m = 0 for i = (number of binary digit in n) to 1 t1 = a*(2*b - a); t2 = b^2 + a^2; a = t1; b = t2; // m *= 2 if (current binary digit == 1) t1 = a + b; // m++ a = b; b = t1; return a; ``` - 減法器 ```cpp= static struct BigN *subtractBigN(struct BigN *r, struct BigN *x, struct BigN *y) { if (x->lower < y->lower) { unsigned long long mycarry = ULLONG_MAX; r->lower = mycarry + x->lower - y->lower + 1; r->upper = x->upper - y->upper - 1; } else { r->lower = x->lower - y->lower; r->upper = x->upper - y->upper; } return r; } ``` - 乘法器 ```cpp= static struct BigN *multiplieBigN(struct BigN *x, struct BigN *y) { struct BigN *r = (struct BigN *) malloc(sizeof(struct BigN)); r->lower = 0; r->upper = 0; size_t w = 8 * sizeof(unsigned long long); for (size_t i = 0; i < w; i++) { if ((y->lower >> i) & 0x1) { struct BigN tmp; r->upper += x->upper << i; tmp.lower = (x->lower << i); tmp.upper = i == 0 ? 0 : (x->lower >> (w - i)); addBigN(r, r, &tmp); } } for (size_t i = 0; i < w; i++) { if ((y->upper >> i) & 0x1) { r->upper += (x->lower << i); } } return r; } ``` - fast doubling 實做 ```cpp= static struct BigN fast_fib_sequence_wo_clz(long long k) { int bs = 0; long long t = k; while (t) { bs++; t >>= 1; } struct BigN a, b; a.upper = 0; a.lower = 0; b.upper = 0; b.lower = 1; for (int i = bs; i >= 1; i--) { struct BigN t; struct BigN t1; struct BigN t2; addBigN(&t ,&b, &b); t1 = *(multiplieBigN(&a, subtractBigN(&t, &t, &a))); addBigN(&t2, multiplieBigN(&a, &a), multiplieBigN(&b, &b)); a = t1; b = t2; if (k & (1 << (i - 1)) && k > 0) { addBigN(&t1, &a, &b); a = b; b = t1; k &= ~(1 << (i - 1)); } } return a; } ``` - **大數**運算 fast doubling 跑得比沒有特別運算加速還慢，後面會慢慢找出原因 ![](https://i.imgur.com/5Fs5Byc.png) - **小數**運算的 fast doubling 明顯比一般運算快很多，在大數運算時不能依照小數運算那樣的思維去做運算，更應該多用一些 bitwise operation 來提高大數運算的效率 ![](https://i.imgur.com/0rh8fS8.png) ### 效能優化 fast doubling 這邊宣告三個變數來儲存加法器與乘法器的結果，這些變數可以移出 for 回去避免每次迴圈重新宣告，效能有提昇一些，後面持續優化。 ```cpp= for (int i = bs; i >= 1; i--) { struct BigN t; struct BigN t1; struct BigN t2; ........ ``` ![](https://i.imgur.com/G2Zv2o1.png) - 算數左移 ```cpp= static inline struct BigN *lsftBigN(struct BigN *r, struct BigN *x, unsigned char shift) { if (shift == 0) { return x; } else if (shift >= 64) { r->upper = x->lower << (shift - 64); r->lower = 0ull; return r; } r->upper = x->upper << shift; r->lower = x->lower << shift; r->upper |= x->lower >> (64 - shift); return r; } ``` ![](https://i.imgur.com/e2ABQkj.png) - 把乘法器的 malloc 拿掉，多傳一個 result 的指標給乘法器，並把最後結果寫到這個指標，大概快了 2000~3000 ns ```cpp= static struct BigN *multiplieBigN(struct BigN *r, struct BigN *x, struct BigN *y) ``` ![](https://i.imgur.com/pbEaXht.png) ### 錯誤分析用 sprintf 遇到以下錯誤，錯誤原因是 sprintf 沒有檢查 buffer 邊界，所以可能導致 overflow ```bash= Dangerous function detected in /home/kevin/git/fibdrv/client.c 16: sprintf(lower, "%s", buf); 23: sprintf(scale, "%s", "18446744073709551616"); ``` 解決辦法用 `snprintf` 改寫成以下 ```bash= sprintf(lower, sizeof(lower), "%s", buf); ``` ## 參考資料 - [H03: fibdrv](https://hackmd.io/CTwO5ldOSgKxP-N4YNRVOA?view) - [c++中size_t與ssize_t詳解](https://www.itread01.com/content/1544835455.html) - [淺談探索 Linux 系統設計之道](https://www.slideshare.net/jserv/linux-discovery) - [Character Device Drivers](https://lwn.net/Articles/195805/) - [The cdev interface](https://lwn.net/Articles/195805/) - [Common vulnerabilities guide for C programmers](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml) - [Arbitrary-precision arithmetic Explanation](https://stackoverflow.com/questions/1218149/arbitrary-precision-arithmetic-explanation) ###### tags: `sysprog2020`