# 分組報告 : fibdrv
contributed by < `zodf0055980`, `njjack` >
[GitHub repository](https://github.com/zodf0055980/fibdrv)
:dart: 重做 [fibdrv](https://hackmd.io/@sysprog/SJ2DKLZ8E?type=view) 作業,符合所有指定要求並彙整 [2019 年春季班](https://hackmd.io/s/rygjaEK8V) 其他同學的成果
## copy_to_user 與直接對 user space 的記憶體做直接操作
首先先透過利用 ktime_t 的函式去取得 kernel 的執行時間,並把時間從 kernelspace 回傳給 userspace。回想到大三上 OS 課程寫的 [mailbox](https://github.com/zodf0055980/OSclass_hw2_mailbox/blob/master/module/mailbox.c) 作業,kernel 透過一個 char *buf 去把要回傳的東西寫入 userspace 的記憶體位置中,來完成 user 與 kernel 的溝通。因此想套用此方法去取得回傳的時間
```clike
ktime_t start = ktime_get();
ssize_t re = fib_sequence(*offset);
ktime_t end = ktime_get();
ktime_t runtime = ktime_sub(end, start);
char tmp[128];
memset(tmp, 0, 128);
strncpy(buf,tmp,128);
```
但發現程式執行到一半會結束執行,並且觀察程式的輸出檔 out ,發現程式會執行到 write 中途就結束了。
```
Writing to /dev/fibonacci, returned the sequence 1
Writing to
```
在 [Linux Device Drivers](https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch03.html) 一書中的第三章中 read and write 有提到:
>1. Depending on which architecture your driver is running on, and how the kernel was configured, ==the user-space pointer may not be valid while running in kernel mode at all.== There may be no mapping for that address, or it could point to some other, random data.
>2. Even if the pointer does mean the same thing in kernel space, user-space memory is paged, and ==the memory in question might not be resident in RAM when the system call is made. Attempting to reference the user-space memory directly could generate a page fault, which is something that kernel code is not allowed to do.== The result would be an "oops," which would result in the death of the process that made the system call.
>3. ==The pointer in question has been supplied by a user program, which could be buggy or malicious. If your driver ever blindly dereferences a user-supplied pointer, it provides an open doorway allowing a user-space program to access or overwrite memory anywhere in the system.== If you do not wish to be responsible for compromising the security of your users' systems, you cannot ever dereference a user-space pointer directly.
因此改透過 copy_to_user() 去避免 kernel space 對傳入的指標所指向的記憶體空間進行操作,可能造成非預期的結果的問題。
```clike
ktime_t start = ktime_get();
ssize_t re = fib_sequence(*offset);
ktime_t end = ktime_get();
ktime_t runtime = ktime_sub(end, start);
char tmp[128];
memset(tmp, 0, 128);
sprintf(tmp, "%lld\n", runtime);
copy_to_user(buf, tmp, 128);
```
## 初步測量的時間
![](https://i.imgur.com/hjWKGuW.png)
由上圖可以發現除了一開始因為要設定初值要花時間較多外,時間大致以 O(n) 上升。
## Q-matrix
參考 [Zames-Chang的共筆](https://hackmd.io/@YISK29ORR0awElwR06OpkQ/ry2gJh1vN?type=view) 實作非遞迴的 q-matrix 演算法
$Q$ 和 $Q^n$ 為 2x2 矩陣
$Q =\begin{pmatrix}
1 & 1 \\
1 & 0
\end{pmatrix}$
$Q^n = \begin{pmatrix}
F_{n + 1} & F_n \\
F_ n & F_{n - 1}
\end{pmatrix}$
則有如下公式
if $m = \left\lfloor \frac{n}{2} \right\rfloor$
$Q^n = \begin{cases}
\left(Q^m\right)^2 & \textrm{ if } n \equiv 0 \pmod {2},\\
\left(Q^m\right)^2 Q & \textrm { if } n \equiv 1 \pmod{2}
\end{cases}$
其時間複雜度為 $O(log_2K)$
```clike
static long long fib_sequence_qmatrix(long long k) // qmatrix
{
if (k == 0)
return 0;
long long fn[2][2] = {{1, 1}, {1, 0}};
long long f1[2][2] = {{1, 1}, {1, 0}};
int log = 100;
int stack[log];
int n = -1;
while (k > 1) {
if (k % 2 == 1) {
stack[++n] = 0;
}
stack[++n] = 1;
k /= 2;
}
for (int i = n; i >= 0; i--) {
if (!stack[i]) {
matrix_mult(fn, f1);
} else {
matrix_mult(fn, fn);
}
}
return fn[0][1];
}
```
## Fast doubling
我們可以透過矩陣乘法相將兩個 n 次方矩陣相乘而得到新的 Fibonacci 公式:
$\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^n =
\begin{bmatrix}
F(n+1) & F(n) \\
F(n) & F(n-1)
\end{bmatrix}$
$\begin{split}
\begin{bmatrix}
F(2n+1) \\
F(2n)
\end{bmatrix} &=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^{2n}
\begin{bmatrix}
F(1) \\
F(0)
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^n
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^n
\begin{bmatrix}
F(1) \\
F(0)
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
F(n+1) & F(n) \\
F(n) & F(n-1)
\end{bmatrix}
\begin{bmatrix}
F(n+1) & F(n) \\
F(n) & F(n-1)
\end{bmatrix}
\begin{bmatrix}
1 \\
0
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
F(n+1)^2 + F(n)^2\\
F(n)F(n+1) + F(n-1)F(n)
\end{bmatrix}
\end{split}$
透過上述之數學推論,我們最後可得兩公式可供使用。
$\begin{split}
F(2k) &= F(k)[2F(k+1) - F(k)] \\
F(2k+1) &= F(k+1)^2+F(k)^2
\end{split}$
我們可以用上面兩個公式求出 O(logn) 的演算法。
```clike
if (k == 0)
return 0;
long long fn = 1;
long long fn1 = 1;
long long f2n = 1;
long long f2n1 = 0;
long long i = 1;
while (i < k) {
if ((i << 1) <= k) {
f2n1 = fn1 * fn1 + fn * fn;
f2n = fn * (2 * fn1 - fn);
fn = f2n;
fn1 = f2n1;
i = i << 1;
} else {
fn = f2n;
f2n = f2n1;
f2n1 = fn + f2n1;
i++;
}
}
return f2n;
```
求出的時間如下:
![](https://i.imgur.com/YWB5krQ.png)
並去比較 Iterative 和 Fast Doubling 在kernel 中執行時間:
![](https://i.imgur.com/C6dwj8U.png)
我們發現當數字越大, Fast Doubling 的執行時間越短,但由於我們只有測量到 FIB(100),因此看不太出來 logn 的趨勢,但明顯 Fast Doubling 是比較快的。
## 使用 CLZ 加速 Fast doubling
對於 Fast doubling 求得 f(n),可以由要 n 的二進位的每個 bit 是 0 還是 1 去做判斷的簡化。
首先,先利用 clz 去求位數,並由位數由高到低去做判斷。
如果是 1 則先求 f(2n) 和 f(2n+1),並再依此求 f(2n+2)
如果是 0 則求 f(2n) 和 f(2n+1)即可。
假如要求Fib(10),10的二進位表示式為1010。
| bit | start | 1 | 0 | 1 | 0 | return |
| ------ | ------ | ------ | ------ | ------ | ------| ------|
| f(n) | f(0) | f(2*0+1) | f(1*2) | f(2*2+1) | f(5*2) | f(10)|
| a | 0 | 1 | 1 | 5 | 55 | 55 |
| b | 1 | 1 | 2 | 8 | 89 | |
因此能透過clz去得知判斷次數,減少不必要的判斷。
我們使用 64 bit 的 byte-shift version 的 clz:
```clike
int n = 0;
unsigned long long clz = k;
if (clz <= 0x00000000FFFFFFFF) {
n += 32;
clz <<= 32;
}
if (clz <= 0x0000FFFFFFFFFFFF) {
n += 16;
clz <<= 16;
}
if (clz <= 0x00FFFFFFFFFFFFFF) {
n += 8;
clz <<= 8;
}
if (clz <= 0x0FFFFFFFFFFFFFFF) {
n += 4;
clz <<= 4;
}
if (clz <= 0x3FFFFFFFFFFFFFFF) {
n += 2;
clz <<= 2;
}
if (clz <= 0x7FFFFFFFFFFFFFFF) {
n += 1;
clz <<= 1;
}
k <<= n;
n = 64 - n;
```
最後求出除了 clz 外的 bit 數作為判斷次數。並把 k 作往左位移使 MSB 為 1 使方便作判斷,並透過做 k 做 Fast doubling。
```clike
unsigned long long fn = 0;
unsigned long long fn1 = 1;
unsigned long long f2n = 0;
unsigned long long f2n1 = 0;
int i;
for (i = 0; i < n; i++) {
f2n1 = fn1 * fn1 + fn * fn;
f2n = fn * (2 * fn1 - fn);
if (k & 0x8000000000000000) {
fn = f2n1;
fn1 = f2n + f2n1;
} else {
fn = f2n;
fn1 = f2n1;
}
k <<= 1;
}
return fn;
```
利用 k & 0x8000000000000000 去判斷 MSB 的位元,如果是 1 則先求 f(2n+1) 和 f(2n+2),如果是 0 則求 f(2n) 和 f(2n+1)。
測試結果:
![](https://i.imgur.com/aauPmOn.png)
由上圖可見透過 clz 可以有效加速 Fast doubling。
而為何 Fast doubling 執行時間有較大的變化,使之執行時間產生鋸齒狀,我認為和 bit 有關。例如: 19 的二進為表示式為 10011,除了五次利用公式求出 f(2n) 外,還需要3次加法。而 24 的二進位表示是為 11000 卻只需要2次加法而已。這種因為 1 bit 個數差距所導致額外的加法我想是造成執行時間起伏的原因。
# 大數運算
先實作出大數版本 Fast Doubling 會用到的大數加法、減法、乘法
```clike
#define BASE 100000000
#define bits_per_part 8
#define part_num 8
typedef struct bigNum {
long long part[part_num];
} bigNum;
```
```clike
void big_add(bigNum a, bigNum b, bigNum *result)
{
memset(result, 0, sizeof(bigNum));
long long carry = 0;
for (int i = 0; i < part_num; i++) {
long long tmp = carry + a.part[i] + b.part[i];
result->part[i] = tmp % BASE;
carry = tmp / BASE;
}
}
```
```clike
void big_sub(bigNum a, bigNum b, bigNum *result)
{
big_assign(result, &a);
for (int i = 0; i < part_num; i++) {
result->part[i] -= b.part[i];
if (result->part[i] < 0) {
result->part[i] += BASE;
result->part[i + 1]--;
}
}
}
```
```clike
void big_mul(bigNum a, bigNum b, bigNum *result)
{
memset(result, 0, sizeof(bigNum));
for (int i = 0; i < part_num; i++) {
result->part[i] = 0;
}
for (int i = 0; i < part_num; i++) {
long long carry = 0;
for (int j = 0; i + j < part_num; j++) {
long long tmp = a.part[i] * b.part[j] + carry + result->part[i + j];
result->part[i + j] = tmp % BASE;
carry = tmp / BASE;
}
}
}
```
# 大數運算的時間
我們利用上述大數運算去建立 fibonacci 的計算公式,並且每5次運算取其執行時間的中位數去建立執行時間的圖表並分析,避免產生過大的誤差。
## 初始:
```clike=
static unsigned long long fib_sequence(long long k, bigNum *result)
{
if (k == 0) {
memset(result, 0, sizeof(bigNum));
return 0;
}
bigNum *f0, *f1, *tmp;
f0 = kmalloc(sizeof(struct bigNum), GFP_KERNEL);
f1 = kmalloc(sizeof(struct bigNum), GFP_KERNEL);
memset(f0, 0, sizeof(bigNum));
memset(f1, 0, sizeof(bigNum));
f1->part[0] = 1;
for (int i = 2; i <= k; i++) {
big_add(*f0, *f1, f0);
tmp = f0;
f0 = f1;
f1 = tmp;
}
big_assign(result, f1);
return 0;
}
```
![](https://i.imgur.com/O0Vh6SW.png)
因為我們使用大數算法去計算,如果使用舊版的方法要計算很大的數字的話,會需要很大量的空間。因此我們使用 swap 去取代建立大量的儲存空間。
## Fast doubling
```clike
static unsigned long long fib_sequence_fd(long long n,
bigNum *result) // Fast doubling
{
if (n == 0)
return 0;
bigNum t[7]; // 0:F(n) 1:F(n + 1) 2: F(2n) 3:F(2n+1) 4:tmp 5:tmp 6:2
for (int i = 0; i < 7; i++) {
memset(&t[i], 0, sizeof(bigNum));
}
t[0].part[0] = t[1].part[0] = t[2].part[0] = 1;
t[6].part[0] = 2;
int i = 1;
while (i < n) {
if ((i << 1) <= n) {
big_mul(t[1], t[1], &t[4]); // t4 = t1 * t1 + t0 * t0;
big_mul(t[0], t[0], &t[5]);
big_add(t[4], t[5], &t[3]);
big_mul(t[6], t[1], &t[5]); // t3 = t0 * (2 * t1 - t0);
big_sub(t[5], t[0], &t[4]);
big_mul(t[0], t[4], &t[2]);
big_assign(&t[0], &t[2]); // t0 = t3
big_assign(&t[1], &t[3]); // t1 = t4
i = i << 1;
} else {
big_assign(&t[0], &t[2]); // t0 = t3
big_assign(&t[2], &t[3]); // t3 = t4
big_add(t[0], t[3], &t[4]); // t4 = t0 + t4;
big_assign(&t[3], &t[4]);
i++;
}
}
big_assign(result, &t[2]);
return 0;
}
```
![](https://i.imgur.com/GDkYC5B.png)
## Fast doubling with clz
```clike
static unsigned long long fib_sequence_fd_clz(unsigned long long k,
bigNum *result)
{
if (k == 0)
return 0;
int n = 0;
unsigned long long clz = k;
if (clz <= 0x00000000FFFFFFFF) {
n += 32;
clz <<= 32;
}
if (clz <= 0x0000FFFFFFFFFFFF) {
n += 16;
clz <<= 16;
}
if (clz <= 0x00FFFFFFFFFFFFFF) {
n += 8;
clz <<= 8;
}
if (clz <= 0x0FFFFFFFFFFFFFFF) {
n += 4;
clz <<= 4;
}
if (clz <= 0x3FFFFFFFFFFFFFFF) {
n += 2;
clz <<= 2;
}
if (clz <= 0x7FFFFFFFFFFFFFFF) {
n += 1;
clz <<= 1;
}
k <<= n;
n = 64 - n;
bigNum f[7]; // 0:fn 1:fn1 2:f2n 3:f2n1 4:tmp 5:tmp 6:2
for (int i = 0; i < 7; i++) {
memset(&f[i], 0, sizeof(bigNum));
}
f[1].part[0] = 1;
f[6].part[0] = 2;
int i;
for (i = 0; i < n; i++) {
big_mul(f[0], f[0], &f[4]); // f2n1 = fn1 * fn1 + fn * fn;
big_mul(f[1], f[1], &f[5]);
big_add(f[4], f[5], &f[3]);
big_mul(f[6], f[1], &f[4]); // f2n = fn * (2 * fn1 - fn)
big_sub(f[4], f[0], &f[5]);
big_mul(f[0], f[5], &f[2]);
if (k & 0x8000000000000000) {
big_assign(&f[0], &f[3]); // fn = f2n1
big_add(f[2], f[3], &f[1]); // fn1 = f2n + f2n1;
} else {
big_assign(&f[0], &f[2]); // fn = f2n
big_assign(&f[1], &f[3]); // fn1 = f2n1
}
k <<= 1;
}
big_assign(result, &f[0]);
return 0;
}
```
![](https://i.imgur.com/2VAQspg.png)
## 總比較
![](https://i.imgur.com/lqb6Oq1.png)
我們發現 clz 在大數運算中可以有效的加速,而單純使用 Fast doubling 的執行時間和原本的相去不遠,和沒有使用大數運算的結果比較,可以猜測因為大數乘法需要大量的運算,即使 Fast doubling 的時間複雜度是 O(log n),運算反而比單純使用加法運算方法的還要多,而在沒有大數運算時,因為目前電腦有對單純的乘法指令做加速,因此時間才會比單純使用加法的方法還要迅速。
###### tags: `linux2019`