# 2020q1 Homework2 (fibdrv) contributed by < `gagachang` > ###### tags: `linux2020` > [H03: fibdrv](https://hackmd.io/@sysprog/linux2020-fibdrv) ## 自我檢查清單 - [ ] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢? - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題 ## 測試環境 測試電腦為 MacBook Air 2017,並啟動安裝在外接硬碟中的實體 Linux。 ```shell $ cat /proc/version Linux version 5.3.0-42-generic (buildd@lcy01-amd64-019) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 $ uname -a Linux 5.3.0-42-generic #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux ``` :::warning 請在 [2020 年系統軟體系列課程討論區](https://www.facebook.com/groups/system.software2020/) 發文分享你如何在 MacBook Air 2017 透過外接儲存裝置來啟動 Ubuntu Linux :notes: jserv ::: ## 測試原始版本與 Fast Doubling ### 原始版本 在 `fibdrv.c` 中套用 ktime API,這部份先參考老師共筆的範例程式碼: ```cpp static ktime_t kt; static long long fib_sequence(long long k) { long long f[k + 2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[k]; } static long long fib_time_proxy(long long k) { kt = ktime_get(); long long result = fib_sequence(k); kt = ktime_sub(ktime_get(), kt); printk(KERN_INFO "The time: %lld", ktime_to_ns(kt)); return result; } /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { return (ssize_t) fib_time_proxy(*offset); } /* write operation is skipped */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { return 1; } ``` 執行 `make check` 的同時,將 `ktime_to_ns(kt)` 回傳值以 `printk` 輸出,執行完成後使用 `dmesg` 查看結果,以下擷取片段: ```clike [ 1042.752671] The computing time of 80 th fib number: 247 [ 1042.752673] The computing time of 81 th fib number: 248 [ 1042.752676] The computing time of 82 th fib number: 252 [ 1042.752679] The computing time of 83 th fib number: 260 [ 1042.752681] The computing time of 84 th fib number: 260 [ 1042.752684] The computing time of 85 th fib number: 260 [ 1042.752686] The computing time of 86 th fib number: 267 [ 1042.752689] The computing time of 87 th fib number: 270 [ 1042.752691] The computing time of 88 th fib number: 271 [ 1042.752694] The computing time of 89 th fib number: 273 [ 1042.752696] The computing time of 90 th fib number: 277 [ 1042.752699] The computing time of 91 th fib number: 280 [ 1042.752702] The computing time of 92 th fib number: 282 [ 1042.752704] The computing time of 92 th fib number: 282 ``` 將以上數據繪製成相對應的 gnuplot 圖表,x 軸紀錄至第 93 個費氏數列,y 軸紀錄至 300 nano seconds: ![](https://i.imgur.com/yA38dLz.png) ### Fast Doubling 版本 參考共筆中提供的資料之實作,程式碼如下: ```clike static long long fib_sequence_fast_doubling(long long k) { if (k < 0) return 0; long long h = 0; for (long long i = k; i; ++h, i >>= 1); long long a = 0, b = 1; for (long long mask = 1 << (h - 1); mask; mask >>= 1) { long long c = a * (2 * b - a); long long d = b * b + a * a; if (mask & k) { a = d; b = c + d; } else { a = c; b = d; } } return a; } ``` `dmesg` 查看費式數列至第 93 個數字的計算時間,以下擷取片段: ```clike [ 2723.202714] The computing time of 80 th fib number: 78 [ 2723.202716] The computing time of 81 th fib number: 58 [ 2723.202718] The computing time of 82 th fib number: 58 [ 2723.202721] The computing time of 83 th fib number: 65 [ 2723.202723] The computing time of 84 th fib number: 75 [ 2723.202725] The computing time of 85 th fib number: 60 [ 2723.202728] The computing time of 86 th fib number: 60 [ 2723.202731] The computing time of 87 th fib number: 71 [ 2723.202734] The computing time of 88 th fib number: 87 [ 2723.202737] The computing time of 89 th fib number: 65 [ 2723.202739] The computing time of 90 th fib number: 76 [ 2723.202742] The computing time of 91 th fib number: 66 [ 2723.202744] The computing time of 92 th fib number: 73 [ 2723.202746] The computing time of 92 th fib number: 60 ``` 將 Fast Doubling 的計算時間繪製成 gnuplot 圖表,x 軸紀錄至 93,y 軸紀錄至 300 nano seconds: ![](https://i.imgur.com/kssXvQj.png) ### 兩版本做比較 最後將原始版本與 Fast Doubling 版本畫在一起,如下圖: ![](https://i.imgur.com/oVGIbL7.png) 可發現從第 15 個數列開始,有較明顯的計算時間差距。 ## 提升效能 ### 關閉 CPU 0 並限定 CPU 0 給 `client` 使用 先查看電腦 CPU 資訊: ```clike Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 61 Model name: Intel(R) Core(TM) i5-5350U CPU @ 1.80GHz Stepping: 4 CPU MHz: 2068.793 CPU max MHz: 2900.0000 CPU min MHz: 500.0000 BogoMIPS: 3600.20 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 ``` 預計將 `client` 單獨執行在 CPU 0 上,先試著關閉第 0 顆 CPU。 修改 `/etc/default/grub` 檔案,在 `GRUB_CMDLINE_LINUX_DEFAULT=` 後方添加 `isolcpus=0` 接著執行 `$ update-grub` 後重開機。 重開機後執行 `htop`,可看見 CPU 0 已不參與排程: ![](https://i.imgur.com/ncbLm6z.png) :::info 以下測試均使用 Fast Doubling 版本,且尚未處理第 93 個數列之後的計算。 ::: 接著使用 taskset 指定在 CPU 0 上單獨執行 `client`,以下為執行結果: ```clike ~/fibdrv $ sudo taskset 0x1 make check ... ... make[1]: Leaving directory '/home/gagachang/Sysprog21/fibdrv' Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738 ``` 使用 `dmesg` 查看計算時間,以下擷取片段: ```clike [ 830.721700] The computing time of 80 th fib number: 76 [ 830.721701] The computing time of 81 th fib number: 64 [ 830.721703] The computing time of 82 th fib number: 72 [ 830.721705] The computing time of 83 th fib number: 54 [ 830.721707] The computing time of 84 th fib number: 50 [ 830.721709] The computing time of 85 th fib number: 65 [ 830.721711] The computing time of 86 th fib number: 50 [ 830.721712] The computing time of 87 th fib number: 54 [ 830.721714] The computing time of 88 th fib number: 75 [ 830.721716] The computing time of 89 th fib number: 47 [ 830.721718] The computing time of 90 th fib number: 61 [ 830.721720] The computing time of 91 th fib number: 65 [ 830.721722] The computing time of 92 th fib number: 66 [ 830.721724] The computing time of 92 th fib number: 51 ``` 將原先未指定 CPU 的 Fast Doubling 版本,與目前限定在 CPU 0 執行的版本做比較,並使用 gnuplot 畫出以下圖表,將 y 軸限定為 150 nano seconds 以內,可看出限定 CPU 執行後 (綠線),效能略為提升: ![](https://i.imgur.com/XSoNzio.png) ### 抑制 address space layout randomization (ASLR) ```clike $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" ``` 以 Fast Doubling 版本,限定執行在 CPU 0 上,比較 ASLR 抑制與否的效能,y 軸加大到 300 nano seconds,可發現關閉 ASLR 的效能較穩定: ![](https://i.imgur.com/JIs47rr.png) ## TODO * 使用 `__builtin_clzll()` * 大數運算