# 2022q1 Homework3 (fibdrv)
###### tags: `Linux 核心設計` `成大課程`
contributed by < `hbr890627` >
## 實驗環境
```shell
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 151
Model name: 12th Gen Intel(R) Core(TM) i5-12400
Stepping: 5
CPU MHz: 2500.000
CPU max MHz: 5600.0000
CPU min MHz: 800.0000
BogoMIPS: 4992.00
Virtualization: VT-x
L1d cache: 288 KiB
L1i cache: 192 KiB
L2 cache: 7.5 MiB
L3 cache: 18 MiB
NUMA node0 CPU(s): 0-11
```
## 前期準備
* 檢查 Linux 核心版本
```shell
$ uname -r
5.13.0-35-generic
```
* 安裝 linux-headers 套件
```shell
$ sudo apt install linux-headers-`uname -r`
```
* 確認 linux-headers 套件已正確安裝於開發環境
```shell
$ dpkg -L linux-headers-`uname -r` | grep "/lib/modules"
/lib/modules
/lib/modules/5.13.0-35-generic
/lib/modules/5.13.0-35-generic/build
```
* 檢驗目前的使用者身份
```shell
$ whoami
hbr
$ sudo whoami
root
```
* 取得原始程式碼
```shell
$ git clone https://github.com/hbr890627/fibdrv
```
* 編譯並測試
```shell
$ cd fibdrv
$ make check
Git hooks are installed successfully.
cc -o client client.c
make: cc: Command not found
make: *** [Makefile:28:client] 錯誤 127
```
並未看到預期的輸出,還以為是前面某個步驟做錯了,仔細觀察錯誤訊息
```shell
make: cc: Command not found
```
檢查是否尚未安裝 gcc
```shell
$ gcc
Command 'gcc' not found, but can be installed with:
sudo apt install gcc
```
gcc 安裝完成後,再次進行 make check,成功編譯,出現預期輸出
```shell
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738
```
* 觀察產生的 fibdrv.ko 核心模組
```shell
$ modinfo fibdrv.ko
filename: /home/hbr/Desktop/linux2022/fibdrv/fibdrv.ko
version: 0.1
description: Fibonacci engine driver
author: National Cheng Kung University, Taiwan
license: Dual MIT/GPL
srcversion: 9A01E3671A116ADA9F2BB0A
depends:
retpoline: Y
name: fibdrv
vermagic: 5.13.0-35-generic SMP mod_unload modversions
```
* 將模組載入核心
```shell
$ sudo insmod fibdrv.ko
```
* 觀察 fibdrv.ko 核心模組在 Linux 核心掛載後的行為
```shell
$ ls -l /dev/fibonacci
crw------- 1 root root 510, 0 三 12 16:52 /dev/fibonacci
$ cat /sys/class/fibonacci/fibonacci/dev
510:0
$ cat /sys/module/fibdrv/version
0.1
$ lsmod | grep fibdrv
fibdrv 16384 0
$ cat /sys/module/fibdrv/refcnt
0
```
> TODO:
> 為何 $ cat /sys/class/fibonacci/fibonacci/dev 的結果是 510 ?
## 效能分析
### 查看行程的 CPU affinity
```shell
$ taskset -p 1
pid 1's current affinity mask: fff
$ taskset -cp 1
pid 1's current affinity list: 0-11
```
:::info
此實驗環境的CPU核心數的確為 12
:::
### 將行程固定在特定的 CPU 中執行
```shell
$ taskset -p 0x001 1
pid 1's current affinity list: 0-11
taskset: failed to set pid 1's affinity: Operation not permitted
```
並未開啟 [CAP_SYS_NICE](https://man7.org/linux/man-pages/man7/capabilities.7.html) 這個權限
> TODO: 要怎麼開啟?
### 以特定的 CPU 執行程式
```shell
$ taskset COREMASK EXECUTABLE
```
### 限定 CPU 給特定的程式使用
修改 `/etc/default/grub` ,讓第 12 個 CPU 核心被保留
```
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=11"
```
並依照文件中的說明,輸入指令更新
```
$ sudo update-grub
```
再重新開機後,使用 `htop` 確認第 12 個 CPU 核心並沒有再使用,也使用 `taskset` 再確認一次
```
$ taskset -p 1
pid 1's current affinity mask: 7ff
$ taskset -cp 1
pid 1's current affinity list: 0-10
```
### 排除干擾效能分析的因素
* 抑制 address space layout randomization (ASLR)
```
$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
```
* 設定 scaling_governor 為 performance。準備以下 shell script,檔名為 performance.sh
```
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
echo performance > ${i}
done
```
之後再用 `$ sudo sh performance.sh` 執行
* 針對 Intel 處理器,關閉 turbo mode
```
$ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
```
## 修正 Fibonacci 數計算的正確性
原本給予計算 Fibonacci 數的程式碼如下,是最基本的算法
```
static long long fib_sequence(long long k)
{
/* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
long long f[k + 2];
f[0] = 0;
f[1] = 1;
for (int i = 2; i <= k; i++) {
f[i] = f[i - 1] + f[i - 2];
}
return f[k];
}
```
發現有一段註解 `FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel.` ,在編譯時也會跳出警告訊息
```
warning: ISO C90 forbids variable length array ‘f’ [-Wvla]
```
修改程式碼,將 VLA 的部份移除
```
static long long fib_sequence(long long k)
{
long long a = 0, b = 1, t;
for (int i = 2; i <= k; i++) {
t = a + b;
a = b;
b = t;
}
return b;
}
```
## Fast Doubling
以下為虛擬碼
```c
Fast_Fib(n)
a = 0; b = 1; // m = 0
for i = (number of binary digit in n) to 1
t1 = a*(2*b - a);
t2 = b^2 + a^2;
a = t1; b = t2; // m *= 2
if (current binary digit == 1)
t1 = a + b; // m++
a = b; b = t1;
return a;
```
對應的實作
```c
unsigned long long Fast_Fib(unsigned int n)
{
unsigned long long a = 0, b = 1;
unsigned long long t1, t2;
for (int i = 256; i > 0; i /= 2)
{
t1 = a * (2 * b - a);
t2 = b * b + a * a;
a = t1;
b = t2; // m *= 2
if (n & i)
{
t1 = a + b;
a = b;
b = t1;
}
}
return a;
}
```
> TODO: 比較使用 Fast Doubling 後帶來的效能提升
## 自我檢查清單
- [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] 研讀 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU),指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措?
- [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
> 搭配閱讀 [The Linux driver implementer’s API guide » Driver Basics](https://www.kernel.org/doc/html/latest/driver-api/basics.html)
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。