# 2020q1 Homework2 (fibdrv)
contributed by < `gagachang` >
###### tags: `linux2020`
> [H03: fibdrv](https://hackmd.io/@sysprog/linux2020-fibdrv)
## 自我檢查清單
- [ ] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題
## 測試環境
測試電腦為 MacBook Air 2017,並啟動安裝在外接硬碟中的實體 Linux。
```shell
$ cat /proc/version
Linux version 5.3.0-42-generic (buildd@lcy01-amd64-019)
(gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1))
#34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020
$ uname -a
Linux 5.3.0-42-generic #34~18.04.1-Ubuntu SMP
Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
```
:::warning
請在 [2020 年系統軟體系列課程討論區](https://www.facebook.com/groups/system.software2020/) 發文分享你如何在 MacBook Air 2017 透過外接儲存裝置來啟動 Ubuntu Linux
:notes: jserv
:::
## 測試原始版本與 Fast Doubling
### 原始版本
在 `fibdrv.c` 中套用 ktime API,這部份先參考老師共筆的範例程式碼:
```cpp
static ktime_t kt;
static long long fib_sequence(long long k)
{
long long f[k + 2];
f[0] = 0;
f[1] = 1;
for (int i = 2; i <= k; i++) {
f[i] = f[i - 1] + f[i - 2];
}
return f[k];
}
static long long fib_time_proxy(long long k)
{
kt = ktime_get();
long long result = fib_sequence(k);
kt = ktime_sub(ktime_get(), kt);
printk(KERN_INFO "The time: %lld", ktime_to_ns(kt));
return result;
}
/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
char *buf,
size_t size,
loff_t *offset)
{
return (ssize_t) fib_time_proxy(*offset);
}
/* write operation is skipped */
static ssize_t fib_write(struct file *file,
const char *buf,
size_t size,
loff_t *offset)
{
return 1;
}
```
執行 `make check` 的同時,將 `ktime_to_ns(kt)` 回傳值以 `printk` 輸出,執行完成後使用 `dmesg` 查看結果,以下擷取片段:
```clike
[ 1042.752671] The computing time of 80 th fib number: 247
[ 1042.752673] The computing time of 81 th fib number: 248
[ 1042.752676] The computing time of 82 th fib number: 252
[ 1042.752679] The computing time of 83 th fib number: 260
[ 1042.752681] The computing time of 84 th fib number: 260
[ 1042.752684] The computing time of 85 th fib number: 260
[ 1042.752686] The computing time of 86 th fib number: 267
[ 1042.752689] The computing time of 87 th fib number: 270
[ 1042.752691] The computing time of 88 th fib number: 271
[ 1042.752694] The computing time of 89 th fib number: 273
[ 1042.752696] The computing time of 90 th fib number: 277
[ 1042.752699] The computing time of 91 th fib number: 280
[ 1042.752702] The computing time of 92 th fib number: 282
[ 1042.752704] The computing time of 92 th fib number: 282
```
將以上數據繪製成相對應的 gnuplot 圖表,x 軸紀錄至第 93 個費氏數列,y 軸紀錄至 300 nano seconds:

### Fast Doubling 版本
參考共筆中提供的資料之實作,程式碼如下:
```clike
static long long fib_sequence_fast_doubling(long long k)
{
if (k < 0)
return 0;
long long h = 0;
for (long long i = k; i; ++h, i >>= 1);
long long a = 0, b = 1;
for (long long mask = 1 << (h - 1); mask; mask >>= 1) {
long long c = a * (2 * b - a);
long long d = b * b + a * a;
if (mask & k) {
a = d;
b = c + d;
}
else {
a = c;
b = d;
}
}
return a;
}
```
`dmesg` 查看費式數列至第 93 個數字的計算時間,以下擷取片段:
```clike
[ 2723.202714] The computing time of 80 th fib number: 78
[ 2723.202716] The computing time of 81 th fib number: 58
[ 2723.202718] The computing time of 82 th fib number: 58
[ 2723.202721] The computing time of 83 th fib number: 65
[ 2723.202723] The computing time of 84 th fib number: 75
[ 2723.202725] The computing time of 85 th fib number: 60
[ 2723.202728] The computing time of 86 th fib number: 60
[ 2723.202731] The computing time of 87 th fib number: 71
[ 2723.202734] The computing time of 88 th fib number: 87
[ 2723.202737] The computing time of 89 th fib number: 65
[ 2723.202739] The computing time of 90 th fib number: 76
[ 2723.202742] The computing time of 91 th fib number: 66
[ 2723.202744] The computing time of 92 th fib number: 73
[ 2723.202746] The computing time of 92 th fib number: 60
```
將 Fast Doubling 的計算時間繪製成 gnuplot 圖表,x 軸紀錄至 93,y 軸紀錄至 300 nano seconds:

### 兩版本做比較
最後將原始版本與 Fast Doubling 版本畫在一起,如下圖:

可發現從第 15 個數列開始,有較明顯的計算時間差距。
## 提升效能
### 關閉 CPU 0 並限定 CPU 0 給 `client` 使用
先查看電腦 CPU 資訊:
```clike
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i5-5350U CPU @ 1.80GHz
Stepping: 4
CPU MHz: 2068.793
CPU max MHz: 2900.0000
CPU min MHz: 500.0000
BogoMIPS: 3600.20
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3
```
預計將 `client` 單獨執行在 CPU 0 上,先試著關閉第 0 顆 CPU。
修改 `/etc/default/grub` 檔案,在 `GRUB_CMDLINE_LINUX_DEFAULT=` 後方添加 `isolcpus=0`
接著執行 `$ update-grub` 後重開機。
重開機後執行 `htop`,可看見 CPU 0 已不參與排程:

:::info
以下測試均使用 Fast Doubling 版本,且尚未處理第 93 個數列之後的計算。
:::
接著使用 taskset 指定在 CPU 0 上單獨執行 `client`,以下為執行結果:
```clike
~/fibdrv $ sudo taskset 0x1 make check
...
...
make[1]: Leaving directory '/home/gagachang/Sysprog21/fibdrv'
Passed [-]
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738
```
使用 `dmesg` 查看計算時間,以下擷取片段:
```clike
[ 830.721700] The computing time of 80 th fib number: 76
[ 830.721701] The computing time of 81 th fib number: 64
[ 830.721703] The computing time of 82 th fib number: 72
[ 830.721705] The computing time of 83 th fib number: 54
[ 830.721707] The computing time of 84 th fib number: 50
[ 830.721709] The computing time of 85 th fib number: 65
[ 830.721711] The computing time of 86 th fib number: 50
[ 830.721712] The computing time of 87 th fib number: 54
[ 830.721714] The computing time of 88 th fib number: 75
[ 830.721716] The computing time of 89 th fib number: 47
[ 830.721718] The computing time of 90 th fib number: 61
[ 830.721720] The computing time of 91 th fib number: 65
[ 830.721722] The computing time of 92 th fib number: 66
[ 830.721724] The computing time of 92 th fib number: 51
```
將原先未指定 CPU 的 Fast Doubling 版本,與目前限定在 CPU 0 執行的版本做比較,並使用 gnuplot 畫出以下圖表,將 y 軸限定為 150 nano seconds 以內,可看出限定 CPU 執行後 (綠線),效能略為提升:

### 抑制 address space layout randomization (ASLR)
```clike
$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
```
以 Fast Doubling 版本,限定執行在 CPU 0 上,比較 ASLR 抑制與否的效能,y 軸加大到 300 nano seconds,可發現關閉 ASLR 的效能較穩定:

## TODO
* 使用 `__builtin_clzll()`
* 大數運算