# 2023q1 Homework3 (fibdrv)
contributed by < `visitorckw` >
## 自我檢查清單
- [ ] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] 學習指出針對大數運算的加速運算和縮減記憶體操作成本的舉措,紀錄你的認知和疑惑
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 $\to$ 搭配閱讀〈[並行和多執行緒程式設計](https://hackmd.io/@sysprog/concurrency)〉
## 開發環境
```bash
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i5-10500 CPU @ 3.10GHz
Stepping: 3
CPU MHz: 3100.000
CPU max MHz: 4500.0000
CPU min MHz: 800.0000
BogoMIPS: 6199.99
Virtualization: VT-x
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 1.5 MiB
L3 cache: 12 MiB
NUMA node0 CPU(s): 0-11
```
## 開發過程
由於原本的程式碼當計算到第 92 項過後時,會造成使用 long long 型別變數儲存時出現 overflow 的錯誤。因此改以字串的形式來計算以及儲存結果。
所以首先需要一個字串相加的函數:
```c
static void string_number_add(char *a, char *b, char *c)
{
if (strlen(a) < strlen(b)) {
char *tmp = a;
a = b;
b = tmp;
}
size_t lenA = strlen(a);
size_t lenB = strlen(b);
reverse_str(a);
reverse_str(b);
int carry = 0, i = 0;
for (i = 0; i < lenA; i++) {
int numA = a[i] - '0';
int numB = (i < lenB ? b[i] - '0' : 0);
c[i] = numA + numB + carry;
carry = c[i] / 10;
c[i] %= 10;
c[i] += '0';
}
if (carry)
c[i++] = '1';
c[i] = '\0';
reverse_str(a);
reverse_str(b);
reverse_str(c);
}
```
接著再透過上述的 ```string_number_add``` 函數用 ```O(N)``` 的時間複雜度取得費氏數列第 N 項的值:
```c
for (int i = 2; i <= k; i++) {
string_number_add(a, b, c);
char *tmp = a;
a = b;
b = c;
c = tmp;
}
```
另外由於程式是執行在 kernel space 之中,因此需要使用 ```copy_to_user``` 函數傳到 user space 之中。
```c
if (__copy_to_user(buf, b, strlen(b)+1)) {
kfree(a);
kfree(b);
kfree(c);
return -EFAULT;
}
size_t len = strlen(b);
kfree(a);
kfree(b);
kfree(c);
return len;
```
## bignum 支援大數
其開發的思想是用陣列來儲存二補數形式的大數。
* 基本結構
```c
typedef struct bignum bignum;
struct bignum{
unsigned char* arr;
int len;
};
```
* 加法/減法
由於是二補數的形式,因此加法與減法可以共用同一種運算。
```c
bignum *addsub(bignum *a, bignum *b)
{
unsigned char signedA = (a->arr[a->len - 1] & (1U << 7)) >> 7;
unsigned char signedB = (b->arr[b->len - 1] & (1U << 7)) >> 7;
unsigned int newLength = MAX(a->len, b->len);
bignum *c;
INIT(c, newLength);
unsigned int carry = 0;
for (int i = 0; i < newLength; i++) {
unsigned int A = i < a->len ? a->arr[i] : 0xff * signedA;
unsigned int B = i < b->len ? b->arr[i] : 0xff * signedB;
unsigned int C = A + B + carry;
carry = (C & (1U << 8)) >> 8;
c->arr[i] = C & 0xff;
}
if (carry && !(signedA ^ signedB) &&
(signedA ^ ((c->arr[c->len - 1] & (1U << 7)) >> 7))) {
RESIZE(c, c->len + 1);
c->arr[c->len - 1] = signedA ? 0x01 : 0xff;
}
REDUCE_LENGTH(c);
return c;
}
```
* 二補數
若遇到負號/減法操作,則需要先轉成二補數。
```c
bignum* twoComp(bignum* a)
{
bignum* One;
INIT(One, 1);
One->arr[0] = 0x01;
bignum* notA = not(a);
bignum* res = addsub(notA, One);
FREE(One);
return res;
}
```
* 乘法
目前採用兩層迴圈暴力計算的方式實現乘法操作,未來可以使用 FFT 或者 karatsuba 演算法進行效率上的改善。
```c
bignum* mul(bignum* a, bignum* b)
{
bignum* c;
INIT(c, a->len + b->len + 1);
for(int i = 0; i < a->len; i++){
for(int j = 0; j < b->len; j++){
unsigned int A = a->arr[i];
unsigned int B = b->arr[j];
unsigned int C = A * B;
c->arr[i+j] += C & 0xff;
c->arr[i+j+1] += (C & 0xff00) >> 8;
}
}
if(c->arr[c->len-1] & (1U << 7)){
RESIZE(c, c->len + 1);
c->arr[c->len-1] = 0;
}
REDUCE_LENGTH(c);
return c;
}
```
* bignum 費氏數列計算
最後則是利用 fast doubling 的手法搭配 bignum 計算。
```c
bignum* fib_bignum_fastdouble(uint64_t target)
{
bignum* fib_n0;
bignum* fib_n1;
bignum* fib_2n0;
bignum* fib_2n1;
INIT(fib_n0, 1);
INIT(fib_n1, 1);
fib_n0->arr[0] = 0x01;
fib_n1->arr[0] = 0x01;
if(!target){
FREE(fib_n0);
fib_n1->arr[0] = 0x00;
return fib_n1;
}
if(target <= 2){
FREE(fib_n0);
return fib_n1;
}
// find first 1
uint8_t count = 63 - __builtin_clzll(target);
// uint64_t fib_n0 = 1, fib_n1 = 1;
for (uint64_t i = count; i-- > 0;) {
// fib_2n0 = fib_n0 * ((fib_n1 << 1) - fib_n0);
fib_2n0 = mul(fib_n0, addsub(Lshift(fib_n1), twoComp(fib_n0)));
// fib_2n1 = fib_n0 * fib_n0 + fib_n1 * fib_n1;
fib_2n1 = addsub(mul(fib_n0, fib_n0), mul(fib_n1, fib_n1));
if (target & (1UL << i)) {
fib_n0 = fib_2n1;
// fib_n1 = fib_2n0 + fib_2n1;
fib_n1 = addsub(fib_2n0, fib_2n1);
} else {
fib_n0 = fib_2n0;
fib_n1 = fib_2n1;
}
}
return fib_n0;
}
```
## 時間測量與效率分析
在 client.c 加入了以下程式碼,透過 timespec 以及 clock_gettime 來取得時間:
```c
struct timespec tstart = {0, 0}, tend = {0, 0};
clock_gettime(CLOCK_MONOTONIC, &tstart);
sz = read(fd, buf, 128);
clock_gettime(CLOCK_MONOTONIC, &tend);
long long usertime = (1e9 * tend.tv_sec + tend.tv_nsec) -
(1e9 * tstart.tv_sec + tstart.tv_nsec);
long long kerneltime = write(fd, write_buf, strlen(write_buf));
```
在 fibdrv.c 中則是使用教材提及的 fib_time_proxy 函數來獲取時間:
```c
static long long fib_time_proxy(long long k, char *buf)
{
kt = ktime_get();
// long long result = fib_sequence(k);
long long result = fib(k, buf);
kt = ktime_sub(ktime_get(), kt);
return result;
}
```
* 使用 fib_sequence 函數進行運算 ( 僅能正確計算到第 92 項的數值 )
![](https://i.imgur.com/SgxO8o3.png)
* 使用字串作儲存數值進行大數運算,採用線性時間演算法
![](https://i.imgur.com/0mgEwVY.png)
* 使用 fast_doubling_iter 函數進行運算 ( 原始碼可見於[連結](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d) )
![](https://i.imgur.com/9JgkErP.png)
* 使用 bn_kernel 作為大數進行計算 ( 原始碼可見於[連結](https://github.com/KYG-yaya573142/fibdrv/blob/master/bn_kernel.c) )
![](https://i.imgur.com/sko2Ftt.png)
* 使用自己撰寫的 bignum 作為大數進行計算
![](https://i.imgur.com/WTYFaov.png)