# 2023q1 Homework3 (fibdrv)
contributed by < [tintinjian12999](https://github.com/tintinjian12999) >
## 實驗環境
```shell
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
$ ls cpu
架構: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
每核心執行緒數: 2
每通訊端核心數: 4
Socket(s): 1
NUMA 節點: 1
供應商識別號: AuthenticAMD
CPU 家族: 23
型號: 24
Model name: AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx
製程: 1
Frequency boost: enabled
CPU MHz: 1700.000
CPU max MHz: 2300.0000
CPU min MHz: 1400.0000
BogoMIPS: 4591.40
虛擬: AMD-V
L1d 快取: 128 KiB
L1i 快取: 256 KiB
L2 快取: 2 MiB
L3 快取: 4 MiB
NUMA node0 CPU(s): 0-7
```
## 開發過程
### 計算 F(93) 以後的 Fibonacci 數
參考[AdrianHuang](https://github.com/AdrianHuang/fibdrv/tree/big_number)的作法利用字串的加法做大數運算,並參考 [qwe661234](https://hackmd.io/@qwe661234/linux2022q1-homework3) 的作法先相加完成後再做字串反轉,可以節省多次翻轉字串的時間
```clike
static void string_number_add(char *a, char *b, char *out)
{
int carry = 0, sum, i = 0;
size_t a_len = strlen(a), b_len = strlen(b);
// Check string a is longer than string b
if (a_len < b_len) {
XOR_SWAP(a, b, char);
XOR_SWAP(&a_len, &b_len, size_t);
}
for (i = 0; i < b_len; i++) {
sum = (a[i] - '0') + (b[i] - '0') + carry;
out[i] = (sum % 10) + '0';
carry = sum / 10;
}
for (i = b_len; i < a_len; i++) {
sum = (a[i] - '0') + carry;
out[i] = (sum % 10) + '0';
carry = sum / 10;
}
if (carry)
out[i++] = carry + '0';
out[i] = '\0';
}
```
利用 copy_to_user 將運算結果傳給 cilent
```clike
static long long fib_sequence(long long k, char *buf)
{
/* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
int i = 0;
str_t *f = kmalloc((k + 2) * sizeof(str_t), GFP_KERNEL);
strncpy(f[0].numstr, "0", 1);
f[0].numstr[1] = '\0';
strncpy(f[1].numstr, "1", 1);
f[1].numstr[1] = '\0';
for (i = 2; i <= k; i++) {
string_number_add(f[i - 1].numstr, f[i - 2].numstr, f[i].numstr);
}
size_t f_size = strlen(f[k].numstr);
str_reverse(f[k].numstr, f_size);
if (copy_to_user(buf, f[k].numstr, f_size - 1))
return -EFAULT;
return f_size;
}
```
同時在 client 端也須調整顯示的格式
```clike
for (int i = 0; i <= offset; i++) {
lseek(fd, i, SEEK_SET);
sz = read(fd, buf, sizeof(buf));
buf[sz] = 0;
printf("Reading from " FIB_DEV
" at offset %d, returned the sequence "
"%s.\n",
i, buf);
}
```
~~由此可以達成 Fibonacci sequence 的運算,但F(93) 以後的數仍然無法正確顯示~~(已解決)。
能成功計算到 F(500) 以上
```
$ sudo ./client
Reading from /dev/fibonacci at offset 499, returned the sequence 86168291600238450732788312165664788095941068326060883324529903470149056115823592713458328176574447204501.
Reading from /dev/fibonacci at offset 500, returned the sequence 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125.
```
### 效能測試
分別利用 ktime 和 clock_gettime 測試 kernel space 與 user space 執行所花的時間
`ktime:`
```clike
//fibdrv.c
static ktime_t kt;
static long long fib_time_proxy(long long k, char *buf)
{
kt = ktime_get();
long long result = fib_sequence(k, buf);
kt = ktime_sub(ktime_get(), kt);
return result;
}
static ssize_t fib_read(struct file *file,
char *buf,
size_t size,
loff_t *offset)
{
return (ssize_t) fib_time_proxy(*offset, buf);
}
```
```clike
//client.c
krtime = write(fd, write_buf, strlen(write_buf));
```
`clock_gettime:`
```clike
//client.c
struct timespec t1, t2;
clock_gettime(CLOCK_MONOTONIC, &t1);
sz = read(fd, buf, sizeof(buf));
clock_gettime(CLOCK_MONOTONIC, &t2);
user_time = (t2.tv_sec - t1.tv_sec) * 1E9 + (t2.tv_nsec - t1.tv_nsec);
```
得到這些執行時間之後就可以透過作圖來得知程式在 kernel space, user space 執行的時間,與 kernal to user 所花的時間,其結果如下

上圖是使用原始的程式碼(如下)做 Fibonacci sequence 運算的執行時間
```clike
static long long fib_sequence(long long k, char *buf)
{
long long *f = kmalloc((k + 2) * sizeof(long long), GFP_KERNEL);
f[0] = 0;
f[1] = 1;
for (int i = 2; i <= k; i++) {
f[i] = f[i - 1] + f[i - 2];
}
return f[k];
}
```
可以看到大部分的時間都是用來做運算,而 kernel to user 的時間相較起來影響不大,預計使用 Fast doubling 改善運算的時間。

同樣的結論也可以從利用字串加法做運算的結果看出。
### 利用 fast doubling 加速 Fibonacci 運算
參考[加速 Fibonacci 運算](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d)的作法
```clike
static uint64_t fast_doubling(uint32_t target)
{
if(target <= 2)
return 1;
uint64_t n = fast_doubling(target >> 1);
uint64_t n1 = fast_doubling((target >> 1) + 1);
if(target & 1)
return n * n + n1 * n1;
return n * ((n1 << 1) - n);
}
```
執行時間如下

> 利用 Bottom up fast doubling 進一步加速運算(待做)
### 支援其他大數運算
前面的 Fast Doubling 程式仍舊會有無法顯示到 F(92) 以上的問題,因此需要導入大數運算才能解決,我們先前已經成功實做了字串加法的大數運算並運用在 Fibonacci Sequence 的運算上,然而若要使用 Fast Doubling 的技巧加速運算的話,其他大數的也必須要被實做。
#### string_number_sub
```clike
void is_borrow (char *a , int i, int borrow)
{
if (borrow) {
if (a[i] != '0') {
a[i]--;
}
else {
int j = i;
while (a[j] == '0') {
a[j] = '9';
}
a[j + 1]--;
}
}
};
void string_sub (char *a, char *b, char *out)
{
int i = 0, borrow = 0;
size_t a_len, b_len;
a_len = strlen(a);
b_len = strlen(b);
// String a must be longer than string b
for (i = 0; i < b_len; i++) {
is_borrow (a, i, borrow);
if (a[i] < b[i]) {
borrow = 1; //borrow from the next digit
out[i] = (((a[i] - '0') + 10) - (b[i] - '0')) + '0';
}
else {
borrow = 0;
out[i] = ((a[i] - '0') - (b[i] - '0')) + '0';
}
}
for (i = b_len; i < a_len; i++) {
is_borrow (a, i, borrow);
out[i] = a[i];
}
//Remove the leading 0s e.g 0011 -> 11
while (out[i - 1] == '0')
--i;
out[i] = '\0';
};
```
#### string_number_mul
```clike
void string_mul (char *a, char *b, char *out)
{
int i = 0, j = 0;
size_t a_len = strlen(a);
size_t b_len = strlen(b);
size_t total_len = a_len + b_len;
int count = 0;
int *value = kmalloc(total_len * sizeof(int), GFP_KERNEL);
memset(value, 0, sizeof(int) * total_len);
if (a_len < b_len) {
XOR_SWAP(a, b, char);
XOR_SWAP(&a_len, &b_len, size_t);
}
// Store the value after multiplication of each digit.
for (i = 0; i < a_len; i++)
for (j = 0; j < b_len; j++) {
value[i + j] += (a[i] - '0') * (b[j] - '0');
}
// Deal with carry
for (i = 0; i < total_len; i++) {
value[i + 1] += value[i] / 10;
value[i] %= 10;
}
// Detecting the leading zeros
while (value[total_len - count - 1] == 0) {
count ++;
}
count = total_len - count;
for (i = 0; i < count; i++)
out[i] = value[i] + '0';
out[i] = '\0';
};
```
### 利用字串大數運算實做 Fast Doubling
參考原有的 fast doubling
```clike
static uint64_t fast_doubling(uint32_t target)
{
if(target <= 2)
return 1;
uint64_t n = fast_doubling(target >> 1);
uint64_t n1 = fast_doubling((target >> 1) + 1);
if(target & 1)
return n * n + n1 * n1;
return n * ((n1 << 1) - n);
}
```
改寫成字串運算的形式
```clike
char *string_fast_doubling(long long k)
{
char reg1[STR_NUM], reg2[STR_NUM], reg3[STR_NUM], reg4[STR_NUM];
static char out[STR_NUM];
if(k == 0) {
return "0";
}
else if(k == 1) {
return "1";
}
else if(k == 2) {
return "1";
}
strcpy(reg1, string_fast_doubling(k >> 1));
strcpy(reg2, string_fast_doubling((k >> 1) + 1));
if (k & 1) {
string_mul(reg1, reg1, reg3);
string_mul(reg2, reg2, reg4);
string_add(reg4, reg3, out);
}
else {
string_mul(reg2, "2", reg3);
string_sub(reg3, reg1, reg4);
string_mul(reg4, reg1, out);
}
f_size = strlen(out);
return out;
}
```
### 比較使用 Iterative 和 Fast Doubling 在 N = 500 時的執行時間

從上圖可以看出當 N 超過一定數量後 Fast Doubling 的作法快於 Iterative 的作法 (因其時間複雜度為 O(logn)), 與預期相符。
> 使用統計方法取平均與消除極端值並觀察N更大時的情形。(待做)
### 用統計手法克服極端值
參考 [4ce43c4](https://github.com/colinyoyo26/fibdrv/commit/4ce43c4e7ee0b80c4ce9e6dcd995bbfdf239206c) 的作法, 執行 100 次的結果取平均,並刪除離群值。
#### 使用 fast doubling:

#### 使用 iterative:

可以看到使用 iterative 方法的執行時間在 n 大於約 400 之後就開始慢於使用 fast doubling 的執行時間,此結果也符合我們的預期。