---
tags: linux2022
---
# 2022q1 Homework3 (fibdrv)
contributed by < [ShienF](https://github.com/ShienF/fibdrv) >
> [](https://hackmd.io/@sysprog/linux2022-fibdrv)
## 實驗環境
```shell
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Stepping: 9
CPU MHz: 2800.000
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 5599.85
Virtualization: VT-x
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 1 MiB
L3 cache: 6 MiB
NUMA node0 CPU(s): 0-7
```
## 自我檢查清單
- [ ] 研讀 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] 研讀 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU),指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措?
- [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
> 搭配閱讀 [The Linux driver implementer’s API guide » Driver Basics](https://www.kernel.org/doc/html/latest/driver-api/basics.html)
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。
## Makefile
```cmake=
CONFIG_MODULE_SIG = n
TARGET_MODULE := fibdrv
obj-m := $(TARGET_MODULE).o
ccflags-y := -std=gnu99 -Wno-declaration-after-statement
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
GIT_HOOKS := .git/hooks/applied
all: $(GIT_HOOKS) client
$(MAKE) -C $(KDIR) M=$(PWD) modules
$(GIT_HOOKS):
@scripts/install-git-hooks
@echo
clean:
$(MAKE) -C $(KDIR) M=$(PWD) clean
$(RM) client out
load:
sudo insmod $(TARGET_MODULE).ko
#install module
unload:
sudo rmmod $(TARGET_MODULE) || true >/dev/null
#remove module
client: client.c
$(CC) -o $@ $^
PRINTF = env printf
PASS_COLOR = \e[32;01m
NO_COLOR = \e[0m
pass = $(PRINTF) "$(PASS_COLOR)$1 Passed [-]$(NO_COLOR)\n"
check: all
$(MAKE) unload
$(MAKE) load
sudo ./client > out
$(MAKE) unload
@diff -u out scripts/expected.txt && $(call pass)
@scripts/verify.py
```
- reference [makefile說明](https://hackmd.io/@sysprog/SySTMXPvl)
- 行尾不能有空格
- target: prerequisites
\<tab>instructions
- = recursive assignment (?)
- := immediate assignment
- $(MAKE) 預設 make, 防止 user 將 make 指令修改
- $(MAKE) -C $(KDIR) M=\$(PWD) modules
used to invoke make to build kernel modules located in the current working directory $(PWD) using the kernel source tree specified by $(KDIR)
- obj-m, ccflags-y
When the make command is invoked with the target modules, the kernel build system will use the value of obj-m to determine which modules to build. In this case, it will attempt to build the module specified by $(TARGET_MODULE).o.
the kernel build system will use the value of ccflags-y to pass additional compiler flags to the compiler when building the kernel module
## 設置環境
* 依據作業要求設置環境,在編譯並測試時遇到以下錯誤:
```shell
$ cd fibdrv
$ make check
Git hooks are installed successfully.
cc -o client client.c
make: cc: Command not found
make: *** [Makefile:28: client] Error 127
```
檢查錯誤訊息後發現可能是沒有安裝 gcc,但跑完安裝指令後發現沒有安裝與更新任何東西
```shell
$ sudo apt update
$ sudo apt install build-essential
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
```
查看 gcc 版本,發現確實有安裝 gcc
```shell
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```
所以再查查看 `make: cc: Command not found` 這種錯誤,發現可能沒有建立好 symbolic link
```shell
$ make CC=gcc
```
再編譯並測試就可以了
```shell
$ make check
Passed [-]
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738
```
* 觀察產生的 `fibdrv.ko` 核心模組
```shell
$ modinfo fibdrv.ko
```
掛載模組及檢查是否掛載成功
```shell
$ sudo insmod fibdrv.ko
```
- /dev => hardwares and devices, eg. disks, cpu, network, fibonacci(which we load)...
```shell
$ ls -l /dev/fibonacci
crw------- 1 root root 235, 0 一 5 17:10 /dev/fibonacci
##first c means characters device
##wner, group, others permision
##first root is owner
##second root is group
##235 means size in bytes
```
- the /sys directory is a virtual filesystem that provides an interface to the kernel, allowing userspace programs to interact with various aspects of the system's hardware, drivers, and kernel configuration.
```shell
$ cat /sys/class/fibonacci/fibonacci/dev
235:0
```
關鍵在 fibdrv.c 中的函式 `register_chrdev()`, 用來註冊 kernel device major and minor number, major number 代表 device type, minor number 代表在此 type 中, fibdrv 這個 device
- 查看 module 的版本,和 fibdrv.c 透過 `MODULE_VERSION` 所指定的版本號碼相同
```shell
$ cat /sys/module/fibdrv/version
```
- 計算此 module [reference counting](https://en.wikipedia.org/wiki/Reference_counting) (storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.), 意指此 module 被多少 device referenced, 若為 0, 代表可以安全釋放此 kernel module, 因為沒有被其他 device 使用
```shell
$ lsmod | grep fibdrv
fibdrv 16384 0
## size
$ cat /sys/module/fibdrv/refcnt
0
```
## Iterative 方式
* 修正原程式碼 VLA 無法在 linux kernel 的情況
原程式碼:
```c
static long long fib_sequence(long long k)
{
/* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
long long f[k + 2];
f[0] = 0;
f[1] = 1;
for (int i = 2; i <= k; i++) {
f[i] = f[i - 1] + f[i - 2];
}
return f[k];
}
```
迭代 i 時只需記錄前兩項數字,不需要把 $F(0)$ 到 $F(k)$ 都保存起來,因此修改後的程式碼:
```c
static long long fib_sequence(long long k)
{
if(k < 2) {
return k;
}
long long k1 = 0, k2 = 1, sum = 0;
for(int i = 2; i <= k; i++) {
sum = k1 + k2;
k1 = k2;
k2 = sum;
}
return k2;
}
```
## 探討使用者層級程式 `client.c` 與 fibdrv 的互動
在` Makefile` 中,`check` 執行了下述:
```cmake
check: all
$(MAKE) unload #unload kernel
$(MAKE) load
sudo ./client > out #execute client.c and output to out file
$(MAKE) unload
@diff -u out scripts/expected.txt && $(call pass)
@scripts/verify.py
```
可以看出 `out`(提供部分內容如下) 的內容是由 `client.c` 開始執行的,於是下述將從 `client.c` 開始 trace。
```
...
Writing to /dev/fibonacci, returned the sequence 1
Writing to /dev/fibonacci, returned the sequence 1
Writing to /dev/fibonacci, returned the sequence 1
Writing to /dev/fibonacci, returned the sequence 1
Reading from /dev/fibonacci at offset 0, returned the sequence 0.
Reading from /dev/fibonacci at offset 1, returned the sequence 1.
Reading from /dev/fibonacci at offset 2, returned the sequence 1.
Reading from /dev/fibonacci at offset 3, returned the sequence 2.
Reading from /dev/fibonacci at offset 4, returned the sequence 3.
Reading from /dev/fibonacci at offset 5, returned the sequence 5.
Reading from /dev/fibonacci at offset 6, returned the sequence 8.
...
```
### Linux system call in [`client.c`](https://github.com/ShienF/fibdrv/blob/master/client.c)
* `open()` 根據[手冊](https://man7.org/linux/man-pages/man2/open.2.html)定義如下:
> int open(const char *pathname, int flags);
從 `pathname` 中開啟 `flags` 相對應權限的檔案。若成功開啟檔案,回傳的值為 file descriptor,根據[參考](https://www.computerhope.com/jargon/f/file-descriptor.htm):
> When a process makes a successful request to open a file, the kernel returns a file descriptor which points to an entry in the kernel's global file table.
* `write()` 根據[手冊](https://man7.org/linux/man-pages/man2/write.2.html)定義如下:
> ssize_t write(int fd, const void *buf, size_t count);
將 `buf` 裡的內容寫入 `count` bytes 到 `fd` 這個 file table 的位置。若成功寫入,回傳的值為成功寫入多少 bytes。
* `lseek()` 根據[手冊](https://man7.org/linux/man-pages/man2/lseek.2.html)定義如下:
> off_t lseek(int fd, off_t offset, int whence);
根據 `whence` 相應的設定方式,及 `offset` 去更改 `fd` 裡的讀寫檔案游標位置(read/write file offset)。若成功更改,回傳值為現在的游標位置(從檔案開使到現在這個位置的 byte 數)。
* `read()` 根據[手冊](https://man7.org/linux/man-pages/man2/read.2.html)定義如下:
> ssize_t read(int fd, void *buf, size_t count);
從 `fd` 內讀取 `count` bytes 到 `buf` 裡,並且增加相應的 file offset。若成功讀取,回傳值為實際讀取了多少 bytes。
顯然地 `out` 的內容並非上述 `client.c` 中的 system call 直接輸出的,下述為閱讀作業說明後得到`out` 來源的解答。
### [`fibdrv.c`](https://github.com/ShienF/fibdrv/blob/master/fibdrv.c) 如何連動 `client.c` 中的 system call?
根據[作業說明](https://hackmd.io/@sysprog/linux2022-fibdrv#K04-fibdrv)其中《`fibdrv` 核心模組內部》及《Linux Virtual File System 介面》兩節,透過 [Linux Virtual File System 介面](https://www.win.tue.nl/~aeb/linux/lk/lk-8.html),宣告一個 file_operations 的資料型態(如下程式碼),並設定一些對應到 VFS system call 的函式。當在使用者模式程式(`client.c`)中呼叫到 system call(`read()`,` write()`...)時,藉由 VFS 將其對應到在 file_operations 裡設定的函式(`fib_read()`, `fib_write()`...)。
```c
const struct file_operations fib_fops = {
.owner = THIS_MODULE,
.read = fib_read,
.write = fib_write,
.open = fib_open,
.release = fib_release,
.llseek = fib_device_lseek,
};
```
:::warning
TODO:
#define FIB_DEV "/dev/fibonacci" 意義?註冊到 kernel?
說明實際 fib 如何輸出
fib_read 為何需要用不到的參數?
lseek接到llseek? offset又是如何傳遞到fib_read裡的?
:::
## Fast doubling
根據[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number)一文,探討到當數列來到大數區,公式解當中本來我們假定 $O(1)$ 的加法、乘法都不再是常數時間,而是與數字的位元數有關。因此只需 $O(logn)$ 次遞迴呼叫的演算法:Fast Doubling,在大數區的計算上有其優勢。
根據[作業要求](https://hackmd.io/@sysprog/linux2022-fibdrv#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97) Fast Doubling 的遞迴式子為下述:
> $F(2k) = F(k)[2F(k+1)-F(k)]$
> $F(2k+1) = F(k+1)^2 + F(k)^2$
由式子可知若知道 $F(k)$ 及 $F(k+1)$,便可求出 $F(2k)$ 及 $F(2k+1)$。舉個例子說明,若要求 F(10),需要遞迴幾次才能得出呢?
```graphviz
digraph fib{
S [label="F(0)=0"]
0 [label="F(0)"]
1 [label="F(1)=1", style=filled]
2 [label="F(2)=1", style=filled]
3 [label="F(3)=2"]
4 [label="F(4)"]
5 [label="F(5)=5", style=filled]
6 [label="F(6)=8"]
7 [label="F(7)"]
8 [label="F(8)"]
9 [label="F(9)"]
10 [label="F(10)=55", style=filled]
11 [label="F(11)=89"]
S->0 S->1
1->2 1->3
2->4 2->5
3->6 3->7
4->8 4->9
5->10 5->11
}
```
從 $F(0)$ 開始往下找 $F(2*0)=F(0)$ 及 $F(2*0+1)=F(1)$,$F(1)$ 再往下找 $F(2*1)=F(2)$ 及 $F(2*1+1)=F(3)$,$F(2)$ 往下找...$F(5)$ 往下找得到 $F(2*5)=F(10)$ 及 $F(2*5+1)=F(11)$,共找了 4 次。
對應的[虛擬碼](https://hackmd.io/@sysprog/linux2020-fibdrv#-fibdrv-%E5%8F%AF%E8%BC%B8%E5%87%BA-Fibonacci-%E6%95%B8%E5%88%97%E7%9A%84-Linux-%E6%A0%B8%E5%BF%83%E6%A8%A1%E7%B5%84)如下:
```python
Fast_Fib(n)
a = 0; b = 1; // m = 0
for i = (number of binary digit in n) to 1
t1 = a*(2*b - a);
t2 = b^2 + a^2;
a = t1; b = t2; // m *= 2
if (current binary digit == 1)
t1 = a + b; // m++
a = b; b = t1;
return a;
```
e.g. 求 $F(10)$,我們知道 $10_{10}=1010_{2}$:
| i | start | 4 | 3 | 2 | 1 | result |
|---|-------|----------|----------|----------|---------|--------|
| n | - | ==1==010 | 1==0==10 | 10==1==0 | 101==0== | - |
|F(m) | F(0) | F(0*2+==1==) | F(1*2+==0==) | F(2*2+==1==) | F(5*2+==0==) | F(10) |
| a | 0 | 1 | 1 | 5 | 55 | 55 |
| b | 1 | 1 | 2 | 8 | 89 | - |
實作如下:
```c
long long fib_sequence_fastb(long long k)
{
if (k < 2) {
return k;
}
int bits = 0;
for (unsigned int i = k; i; bits++, i >>= 1);
long long k1 = 0, k2 = 1;
for (unsigned int i = bits; i; i -= 1) {
long long Tk1, Tk2;
Tk1 = k1 * (2 * k2 - k1);
Tk2 = k2 * k2 + k1 * k1;
k1 = Tk1;
k2 = Tk2;
if (k & (1UL << (i-1))) {
Tk1 = k1 + k2;
k1 = k2;
k2 = Tk1;
}
}
return k1;
}
```
## 效能分析
### 分析前準備
參考作業說明及 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0) <排除干擾效能分析的因素> 一節
* 獨立 CPU 給特定程式使用
```shell
$ cd /etc/default
$ sudo vim grub
revise:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=1" #core name will be 0~7 if possesses 8 cores
reboot
$ cd /sys/devices/system/cpu
$ cat isolated
>> 1
$ taskset -cp 1 #check the affinity of PID=1
>> 0,2-7
or
$ taskset -p 1
>> pid 1's current affinity mask: fd #0xfd means 11111101
```
* 限定程式在特定 CPU 執行
```shell
$ sudo taskset -c 1 ./client
```
因為 clent 只有在 fibdrv 掛載時才能被執行, 因此這個指令將會寫在一個專門量測的 bash 檔
* 抑制 [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) (ASLR)
查看 [Linux randomize_va_space](https://docs.kernel.org/admin-guide/sysctl/kernel.html?highlight=randomize_va_space#randomize-va-space):
0: Turn the process address space randomization off.
1:
2: ...full randomization
```shell
$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
```
* 設定 scaling_governor 為 "performance" 這個級別(意指最高速運行)
```shell
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>> powersave
>> powersave
>> powersave
>> powersave
>> powersave
>> powersave
>> powersave
>> powersave
```
同樣將以下指令放入專門量測的 bash 檔
```shell
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
echo performance > ${i}
done
```
* 針對 Intel 處理器,關閉 turbo boost
```shell
$ cat /sys/devices/system/cpu/intel_pstate/no_turbo
>> 0
$ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
```
### 時間量測方式
參考作業說明及 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU#%E9%87%8F%E6%B8%AC%E6%99%82%E9%96%93%E7%9A%84%E6%96%B9%E5%BC%8F) <量測時間的方式> 一節
* User space - include <time.h> 使用 [`clock_gettime`](https://man7.org/linux/man-pages/man2/clock_getres.2.html) 來取得時間
```c
struct timespec start, end;
for (int i = 0; i <= offset; i++) {
lseek(fd, i, SEEK_SET);
clock_gettime(CLOCK_MONOTONIC, &start);
sz = read(fd, buf, 1);
clock_gettime(CLOCK_MONOTONIC, &end);
printf("%lld\n", (long long)((end.tv_sec * 1e9 + end.tv_nsec)- \
(start.tv_sec * 1e9 + start.tv_nsec)));
}
```
* Kernel space - 使用 [ktime相關API](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html)
因為 fib_write() 沒有用到, 因此寫在此函式裡
```c
static ssize_t fib_write(struct file *file,
const char *buf,
size_t size,
loff_t *offset)
{
static ktime_t kt;
long long res;
switch (size) {
case 0:
kt = ktime_get();
res = fib_sequence_vla(*offset);
kt = ktime_sub(ktime_get(), kt);
break;
case 1:
kt = ktime_get();
res = fib_sequence(*offset);
kt = ktime_sub(ktime_get(), kt);
break;
case 2:
kt = ktime_get();
res = fib_sequence_fastd(*offset);
kt = ktime_sub(ktime_get(), kt);
break;
default:
return 0;
}
return (ssize_t)ktime_to_ns(kt);
}
```
在另個 client 程式裡把在 kernel 所花時間存下來
```c
long long fib, fib_fastd, fib_vla;
char *filename = "performance.txt";
FILE *fp = fopen(filename, "w");
for (int i = 0; i <= offset; i++) {
lseek(fd, i, SEEK_SET);
fib_vla = write(fd, buf, 0); //fib VLA
fib = write(fd, buf, 1); //fib
fib_fastd = write(fd, buf, 2); //fib_fastdoubling
fprintf(fp, "%d %lld %lld %lld\n", i, fib_vla, fib, fib_fastd);
}
```
* 準備 gnuplot, 建立專門量測的 bash 檔及修改 Makefile
透過 [gnuplot](https://hackmd.io/@sysprog/Skwp-alOg) 讀進先前存下來的時間
```shell
set title "Iterative vs Fast-doubling"
set xlabel "n-th"
set ylabel "ns"
set terminal png font " Verdana,12 "
set output "plot.png"
set xtics 0 ,10 ,92
set key left
plot \
"performance.txt" using 1:2 with linespoints linewidth 2 title "VLA", \
"performance.txt" using 1:3 with linespoints linewidth 2 title "iterative", \
"performance.txt" using 1:4 with linespoints linewidth 2 title "fast-doubling", \
```
新增 Makefile 內容
```cmake
client_m: client_measurement.c
$(CC) -o $@ $^
plot: all
sh performance.sh
```
專門量測的 bash 檔
```shell
#!/bin/sh
#This script adjust some system settings to avoid interference while analyzing the performance of fibdrv.
#2022/04/19
CPUID=1
ORIG_ASLR=`cat /proc/sys/kernel/randomize_va_space`
ORIG_GOV=`cat /sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor`
ORIG_TURBO=`cat /sys/devices/system/cpu/intel_pstate/no_turbo`
# adjust system setting
sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
sudo sh -c "echo performance > /sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor"
sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
# analyze the performance
make client_m
make unload
make load
sudo taskset -c 1 ./client_m
gnuplot plot.gp
make unload
# restore system setting
sudo sh -c "echo $ORIG_ASLR > /proc/sys/kernel/randomize_va_space"
sudo sh -c "echo $ORIG_GOV > //sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor"
sudo sh -c "echo $ORIG_TURBO > /sys/devices/system/cpu/intel_pstate/no_turbo"
```
* VLA vs Iterative vs Fast-Doubling
下指令 $ make plot 繪製
![](https://i.imgur.com/nVdi9yi.png)
量測出來的時間十分不穩定,因此透過統計手法除去極端值
* 核心計算和傳遞到 userspace 的時間開銷
![](https://i.imgur.com/CcpGWsL.png)
:::warning
TODO:
閱讀 affinity 的優點
統計手法除去極端值
為何不希望在虛擬機器中進行實驗?
:::
## 大數問題
`fib_sequence()` 的參數 `k` 在超過 92 時,輸出的答案皆相同,那是因為 `fibdrv.c` 裡設定最大數僅為 92,如此設置的原因為 `fib_sequence(93)` 的真實數值會 overflow,詳細解說[在此](https://hackmd.io/@KYWeng/rkGdultSU#F92-%E4%BB%A5%E5%BE%8C%E7%9A%84%E6%95%B8%E5%80%BC%E9%8C%AF%E8%AA%A4%E7%9A%84%E5%8E%9F%E5%9B%A0)。
作業說明中提到兩種大數相加等其他運算的實作方式,一種是將大數拆成前後半兩個 long long 型別的數,如此可以儲存 16 bytes 的數,最大有號數可為 $2^{127} - 1_{10}$。另一種是將大數$_{10}$ 以 string 的形式儲存,如此就可以表達會造成 overflow 的數值。
* 在此研讀 [AdrianHuang 的大數運算](https://github.com/AdrianHuang/fibdrv) :將大數$_{10}$ 以 string 的形式儲存
首先來看在 `xs.h` 大數的結構 :
```c=
#define MAX_STR_LEN_BITS (54)
#define MAX_STR_LEN ((1UL << MAX_STR_LEN_BITS) - 1)
#define LARGE_STRING_LEN 256
#define xs_literal_empty() \
(xs) { .space_left = 15 }
#define xs_tmp(x) xs_new(&xs_literal_empty(), x)
typedef union {
/* allow strings up to 15 bytes to stay on the stack
* use the last byte as a null terminator and to store flags
* much like fbstring:
* https://github.com/facebook/folly/blob/master/folly/docs/FBString.md
*/
char data[16];
struct {
uint8_t filler[15],
/* how many free bytes in this stack allocated string
* same idea as fbstring
*/
space_left : 4, //剩多少 bytes
/* if it is on heap, set to 1 */
is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1;
};
/* heap allocated */
struct {
char *ptr;
/* supports strings up to 2^MAX_STR_LEN_BITS - 1 bytes */
size_t size : MAX_STR_LEN_BITS,//十進制位數, 一位數代表一個 character = 1 byte
/* capacity is always a power of 2 (unsigned)-1 */
capacity : 6;
/* the last 4 bits are important flags */
};
} xs;
...
static inline char *xs_data(const xs *x)
{
if (!xs_is_ptr(x))//stack
return (char *) x->data;
if (xs_is_large_string(x)) {//heap and large string
return (char *) (x->ptr + 4);
}
return (char *) x->ptr;//heap
}
...
static inline void xs_set_ref_count(const xs *x, int val)
{
*((int *) ((size_t) x->ptr)) = val;
}
```
xs 實際為 union,其內容為其中資料結構佔最多 bytes 的結構。
另外可以從註解判斷 xs 裡的第一個 struct 裡處理 stack 有關的資料,第二個stuct處理heap有關的資料。
搭配創建大數操作來分析:
```c=
#include "xs.h"
static void xs_allocate_data(xs *x, size_t len, bool reallocate)//heap
{
size_t n = 1 << x->capacity;//配置比原 string 還大的空間
/* Medium string */
if (len < LARGE_STRING_LEN) {
x->ptr = reallocate ? krealloc(x->ptr, n, GFP_KERNEL)
: kmalloc(n, GFP_KERNEL);//address in heap
return;
}
/*
* Large string
*/
x->is_large_string = 1;
/* The extra 4 bytes are used to store the reference count */
x->ptr = reallocate ? krealloc(x->ptr, n + 4, GFP_KERNEL)
: kmalloc(n + 4, GFP_KERNEL);
xs_set_ref_count(x, 1);//set ->ptr int part?
}
xs *xs_new(xs *x, const void *p)
{
*x = xs_literal_empty();
size_t len = strlen(p) + 1;
if (len > 16) {//heap
x->capacity = ilog2(len) + 1;//二進制位數, + 1 為 '\0' 的位數
x->size = len - 1;//十進制位數, 一位數代表一個 character = 1 byte
x->is_ptr = true;//in heap
xs_allocate_data(x, x->size, 0);
memcpy(xs_data(x), p, len);
} else {//stack
memcpy(x->data, p, len);
x->space_left = 15 - (len - 1);
}
return x;
}
```
第 28 行初始化 xs,設置其在 stack 裡有 15 bytes 可使用。
第 30 行開始,如果大數$_{10}$ 超過 16 位,將大數透過 `xs_allocate_data` 存在 heap 裡,且透過log2.h 裡以 macro 形式用 bitwise operation 算出的 logrithem: ilog2 算出 capacity(2 進制佔多少 bits)。在動態配置記憶體時,使用 kernel API-[kmalloc](https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html),其中參數 GFP_KERNEL 意義為 Allocate normal kernel ram. May sleep. 若尚未超過 16 位,直接放進 stack 裡。
若用 iterative 方式實作 fibonacci 數列,需要有針對大數加法的函式:
```c=
#define XOR_SWAP(a, b, type) \
do { \
type *__c = (a); \
type *__d = (b); \
*__c ^= *__d; \
*__d ^= *__c; \
*__c ^= *__d; \
} while (0)
static void __swap(void *a, void *b, size_t size)
{
if (a == b)
return;
switch (size) {
case 1:
XOR_SWAP(a, b, char);
break;
case 2:
XOR_SWAP(a, b, short);
break;
case 4:
XOR_SWAP(a, b, unsigned int);
break;
case 8:
XOR_SWAP(a, b, unsigned long);
break;
default:
/* Do nothing */
break;
}
}
static void reverse_str(char *str, size_t n)
{
for (int i = 0; i < (n >> 1); i++)
__swap(&str[i], &str[n - i - 1], sizeof(char));
}
static void string_number_add(xs *a, xs *b, xs *out)
{
char *data_a, *data_b;
size_t size_a, size_b;
int i, carry = 0;
int sum;
/*
* Make sure the string length of 'a' is always greater than
* the one of 'b'.
*/
if (xs_size(a) < xs_size(b))
__swap((void *) &a, (void *) &b, sizeof(void *));
data_a = xs_data(a);
data_b = xs_data(b);
size_a = xs_size(a);
size_b = xs_size(b);
reverse_str(data_a, size_a);
reverse_str(data_b, size_b);
char buf[size_a + 2];
for (i = 0; i < size_b; i++) {
sum = (data_a[i] - '0') + (data_b[i] - '0') + carry;
buf[i] = '0' + sum % 10;
carry = sum / 10;
}
for (i = size_b; i < size_a; i++) {
sum = (data_a[i] - '0') + carry;
buf[i] = '0' + sum % 10;
carry = sum / 10;
}
if (carry)
buf[i++] = '0' + carry;
buf[i] = 0;
reverse_str(buf, i);
/* Restore the original string */
reverse_str(data_a, size_a);
reverse_str(data_b, size_b);
if (out)
*out = *xs_tmp(buf);//將buf存入out
}
```
第 1 行開始的部份,若要交換兩個指標的內容,可以透過做 3 次 `xor` 來達成。
第 34 行開始,透過迭代一半字串長可以反轉字串。
核心即為兩個數字當作 string 來加法,透過將兩數反轉,將 character 平移成真實數值,從最高位開始相加,並處理進位問題,最後再反轉回來。
* 實作 [AdrianHuang 的大數運算](https://github.com/AdrianHuang/fibdrv):將大數$_{10}$ 以 string 的形式儲存
首先如果要把大數傳回 user space,因為在 user space 配置的記憶體位置地址,在 kernel space 裡去讀取有可能會得到不同的地址。因此需要靠 copy_to_user() 將大數傳回 user space。
一開始我將 xs 整合進 fib_sequence() 時犯了一個錯誤:
```c=
int fib_sequence_bigNum(long long k, char __user *buf)
{
char *result = (char *)kmalloc(sizeof(char), GFP_KERNEL);
int length = 1;
if(k == 0) {
*result = '0';
} else if(k == 1) {
*result = '1';
} else {
xs *k1 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL), *k2 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL),\
*sum = (xs *)kmalloc(sizeof(xs), GFP_KERNEL);//
*k1 = *xs_tmp("0");
*k2 = *xs_tmp("1");
// sum = xs_tmp("0");
for(int i = 2; i <= k; i++) {
string_number_add(k1, k2, sum);
*k1 = *xs_new(k1, xs_data(k2));
*k2 = *xs_new(k2, xs_data(sum));
}
length = xs_size(k2);
result = xs_data(k2);
printk(KERN_INFO "result = %s", xs_data(k2));
xs_free(k1);
xs_free(k2);
xs_free(sum);
}
copy_to_user(buf, result, length);
return length;
}
```
在第 3 行我只配置了一個 byte 的空間給 `result`,但當第 34 行回傳的數值超過 8 個 bytes 時 buffer 看起來裝了其他東西:
```
...
Reading from /dev/fibonacci at offset 81, returned the sequence 3788906237314390�.
Reading from /dev/fibonacci at offset 82, returned the sequence 6130579072161159.
Reading from /dev/fibonacci at offset 83, returned the sequence 9919485309475549.
...
```
改寫了一下把 result 這個變數拿掉,`out` 的答案就是對的了:
```c
int fib_sequence_bigNum(long long k, char __user *buf)
{
int length = 1;
if(k == 0) {
char *tmp = "0";
copy_to_user(buf, tmp, length);
} else if(k == 1) {
char *tmp = "1";
copy_to_user(buf, tmp, length);
} else {
xs *k1 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL), *k2 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL),\
*sum = (xs *)kmalloc(sizeof(xs), GFP_KERNEL);//
*k1 = *xs_tmp("0");
*k2 = *xs_tmp("1");
for(int i = 2; i <= k; i++) {
string_number_add(k1, k2, sum);
*k1 = *xs_new(k1, xs_data(k2));
*k2 = *xs_new(k2, xs_data(sum));
}
length = xs_size(k2);
// printk(KERN_INFO "result = %s", xs_data(k2));
copy_to_user(buf, xs_data(k2), length);
xs_free(k1);
xs_free(k2);
xs_free(sum);
}
return length;
}
```
:::info
此法以十進制思考,若改以二進制執行(eg.GCD),十進制顯示,可以改善(做實驗)
:::
* 大數以二進制儲存,實作時以二進制計算,輸出再以十進制表示。==好處??==
在此參考 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU) 及 [cwl0429](https://hackmd.io/@cwl0429/fibdrv) 的大數運算來實作
* 大數${_2}$ 結構為動態的 array,array[0] 放大數${_2}$ 從 lsb 開始算的前 32 bits,array[1] 放第前 33~64 bits,...以此類推,例如將以 $2^{33} + 1$ 2進制表示為:00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000001 其存放在大數結構如下示意圖:
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|0|0|1"]
a1[label = "0|0|0|0|...|0|0|1|0"]
node[shape=none]
array0[label = "array[0] total 32 bits"]
array0 -> a0
array1[label = "array[1] total 32 bits"]
array1 -> a1
}
```
* 若大數為負數,則大數裡的動態 array 存放的為 $-(大數)$,由 sign 為 0 表示負數。
* 大數${_2}$ 結構
```c
/* number[size - 1] = msb 32 bits, number[0] = lsb 32 bits*/
typedef struct _bn {
unsigned *int number;
/* the length of array: number */
unsigned int size;
/* 0 means positive number, 1 means negative number */
int sign;
} bn;
```
* 大數${_2}$ 結構轉十進制字串
以例子說明轉換方式: bn a 的 a->number[0] 為 0...001101 = $13_{10}$,a->size 為 1,十進制位數為 13,對應結構如下:
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"]
a1[label = "0|0|0|0|0|0|0|0|0|0|0|0|\\0"]
node[shape=none]
array0[label = "a->number[0]"]
array0 -> a0
array1[label = "十進位以 string 表示"]
array1 -> a1
}
```
從 a->number[0] 的 msb 開始往右掃,若為1則carry為1,並更新整個string:對每個string的位元更新s[j] += s[j] - '0' + carry,更新後的 s[j] 若 > '10',則 carry 再為 1,讓輪到 s[j-1] 更新時的 carry 為 1,達到進位的效果, 此時s[j] -= '10'
實際流程:
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"]
a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>0 + 0 + 1 = 1|\\0"]
node[shape=none]
array0[label = "current bit"]
array0 -> a0:1
array1[label = "s[j]"]
array1 -> a1:2
}
```
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"]
a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>1 + 1 + 1 = 3|\\0"]
node[shape=none]
array0[label = "current bit"]
array0 -> a0:2
array1[label = "s[j]"]
array1 -> a1:2
}
```
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"]
a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>3 + 3 + 0 = 6|\\0"]
node[shape=none]
array0[label = "current bit"]
array0 -> a0:3
array1[label = "s[j]"]
array1 -> a1:2
}
```
```graphviz
digraph{
node[shape="record"]
a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"]
a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>6 + 6 + 1 = 13|\\0"]
node[shape=none]
array0[label = "current bit"]
array0 -> a0:4
array1[label = "s[j]"]
array1 -> a1:2
}
```
因為 s[j]溢位:
```graphviz
digraph{
node[shape="record"]
a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0 + 0 + 1 = 1|<2>13 - 10 = 3|\\0"]
node[shape=none]
array1[label = "s[j]"]
array1 -> a1:2
array2[label = "s[j-1]"]
array2 -> a1:1
}
```
其實 s[j] = s[j] + (s[j] - '0') 便為兩倍 s[j] 當中的數值,所以每更新整個 string 即為乘以二之意,而加上 carry, 代表為下一輪的乘以二做準備,更新 string 次數減一, 即為乘以二的次數, 以 1011 的 msb 來看, 即為 1000_2 = 2 的 3 次方, 更新次數為4-1=3, 也乘以2 3次
```c
char *bnToString(bn *src)
{
/* total bits in decimal = log10(x)
* log10(x) = log2(x) / log2(10)
* log2(10) ~= 3.~
* log2(x) = totals bits in binary
* so log10(x) ~= totals bits in binary / 3 + 1
* one more bit for '\0'
* and another bit for sign */
size_t len = (8 * sizeof(unsigned int) * src->size) / 3 + 2 + src->sign;
char *s = (char *) kmalloc(len, GFP_KERNEL);
char *p = s;
/* initialize decimal string */
memset(s, '0', len - 1);
s[len - 1] = '\0';
/* binary to decimal string*/
for (int i = src->size - 1; i >= 0; i--) {
/* starts from msb of binary number */
for (unsigned int check = 1U << 31; check; check >>= 1) {
int carry = !!(check & src->number[i]);
/* starts from 2nd lsb of decimal number, lsb = '\0' */
for (int j = len - 2; j >= 0; j--) {
s[j] += s[j] - '0' + carry; //(s[j] + s[j]) equals (2 * s[j])
carry = (s[j] > '9');
if (carry) {
s[j] -= 10;
}
}
}
}
/* skip leading zeros */
while (p[0] == '0' && p[1] != '\0') {
p++;
}
if (src->sign)
*(--p) = '-'; //add sign bit
memmove(s, p, strlen(p) + 1);
return s;
}
```
* 大數${_2}$ 加法
首先探討加數與被加數的正負號, 若同號則由`bnDoAdd`實作無號數相加, 若異號則為相減之意, 先確保被加數為正數, 再來判斷被加數與加數的絕對值大小, 因為接下來要以`bnDoSub`無號數相減來計算, 不論相加減, 皆以無號數運算, 最後再確認相應的sign bit
```c
/* c = a + b */
void bnAdd(bn *a, bn *b, bn *c)
{
/* a,b > 0 or a,b < 0 */
if (a->sign == b->sign) {
bnDoAdd(a, b, c);
c->sign = a->sign;
}
/* a and b have different sign */
else {
if (a->sign)
__swap(a, b);
int cmp = bnCmp(a, b);
/* |a| > |b|, a > 0, b < 0, a + b --> |a| - |b| */
if (cmp == 1) {
bnDoSub(a, b, c);
c->sign = 0;
}
/* |a| < |b|, a > 0, b < 0, a + b --> -(|b| - |a|) */
else if (cmp == -1) {
bnDoSub(b, a, c);
c->sign = 1;
}
/* |a| = |b| */
else {
bnResize(c, 1);
c->number[0] = 0;
c->sign = 0;
}
}
}
```
舉例a, 與b同號, 計算 a + b = c, 無號數相加過程為先計算c需要多大多大的 number array(32bits) 來儲存大數
接下來從a, b 的lsb開始前32bits相加, 再把結果複製到c的前32bits, 若有overflow的部分將由carry承接, 計算a,b下一個32bits的相加時一併加上carry
```c
/* the sign of a and b are the same */
void bnDoAdd(bn *a, bn *b, bn *c)
{
int d = __max(digitsAtmsb(a), digitsAtmsb(b)) + 1; // count a and b's digits in binary; + 1 is sign bit
d = divRoundUp(d, 32);
bnResize(c, d);
unsigned long long int carry = 0; // 64 bits
for (int i = 0; i < c->size; i++) {
unsigned int tmpA = i < a->size ? a->number[i] : 0;
unsigned int tmpB = i < b->size ? b->number[i] : 0;
carry += (unsigned long long int) tmpA + tmpB; // operate in unsigned long long int to prevent unsigned int overflow
c->number[i] = carry;
carry >>= 32;
}
if (!c->number[c->size - 1] && c->size > 1)
bnResize(c, c->size - 1);
}
```
無號數相減
```c
/* |a| > |b| */
void bnDoSub(bn *a, bn *b, bn *c)
{
bnResize(c, a->size);
long long int carry = 0;
for (int i = 0; i < c->size; i++) {
unsigned int tmpA = i < a->size ? a->number[i] : 0;
unsigned int tmpB = i < b->size ? b->number[i] : 0;
carry = (long long int) tmpA - tmpB - carry;
c->number[i] = carry;
if (carry < 0) {
carry = 1;
} else {
carry = 0;
}
}
int d = bnClz(c) / 32;
if (d == c->size)
d--;
bnResize(c, c->size - d);
}
```
## clz / ctz 搭配 Fast doubling
:::warning
TODO:
並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助, 請列出關鍵程式碼並解說
實作大數+fast doubling(研讀 KYG-yaya573142 的報告,指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措?)
作業要求最後兩點
自我檢查清單最後兩點
sysfs 介面
:::