# 2020q1 Homework2 (fibdrv)
contributed by < `pingsutw` >
## 開發環境
```shell
$ uname -a
Linux kevin 5.0.0-38-generic #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.04
Release: 19.04
Codename: disco
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping: 3
CPU MHz: 1527.846
CPU max MHz: 3600.0000
CPU min MHz: 800.0000
BogoMIPS: 5188.08
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
```
## 作業要求
- [ ] 撰寫適用於 Linux 核心層級的程式
* 學習 ktimer, copy_to_user 一類的核心 API
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)
- [ ] 初探 Linux VFS
- [ ] 自動測試機制
- [x] 透過工具進行效能分析
- [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題
## 開發過程
執行 `make check` 遇到以下錯誤,因為當 fibonacci 數超過93時會 overflow,需對 `fibdrv.c` 進行改寫
```bash=
Passed [-]
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738
```
添加一個特製的結構來處理大數運算,並添加其加法器
```cpp=
struct BigN {
unsigned long long lower, upper;
};
static inline void addBigN(struct BigN *output, struct BigN x, struct BigN y)
{
output->upper = x.upper + y.upper;
if (y.lower > ~x.lower)
output->upper++;
output->lower = x.lower + y.lower;
}
```
一開始沒有特別的運算加速,單存用新的結構 `BigN` 進行大數運算, 並對其進行效能分析
```cpp=
static struct BigN fib_sequence(long long k)
{
struct BigN f[k + 2];
f[0].upper = 0;
f[0].lower = 0;
f[1].upper = 0;
f[1].lower = 1;
for (int i = 2; i <= k; i++) {
addBigN(&f[i], f[i - 1], f[i - 2]);
}
return f[k];
}
```

TODO: 在第一次計算的時間,明顯比前20個數慢很多很多,需找出原因
TODO: 需測量更多次實驗 ($10^5$) 並取 $95$% 的信賴區間
- 執行結果都正確,接下來是效能優化
```
Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
```
- 利用 Fast Doubling 進行加速,以下為虛擬碼,需在為大數運算實做減法器與乘法器
```cpp=
Fast_Fib(n)
a = 0; b = 1; // m = 0
for i = (number of binary digit in n) to 1
t1 = a*(2*b - a);
t2 = b^2 + a^2;
a = t1; b = t2; // m *= 2
if (current binary digit == 1)
t1 = a + b; // m++
a = b; b = t1;
return a;
```
- 減法器
```cpp=
static struct BigN *subtractBigN(struct BigN *r, struct BigN *x, struct BigN *y)
{
if (x->lower < y->lower) {
unsigned long long mycarry = ULLONG_MAX;
r->lower = mycarry + x->lower - y->lower + 1;
r->upper = x->upper - y->upper - 1;
} else {
r->lower = x->lower - y->lower;
r->upper = x->upper - y->upper;
}
return r;
}
```
- 乘法器
```cpp=
static struct BigN *multiplieBigN(struct BigN *x, struct BigN *y)
{
struct BigN *r = (struct BigN *) malloc(sizeof(struct BigN));
r->lower = 0;
r->upper = 0;
size_t w = 8 * sizeof(unsigned long long);
for (size_t i = 0; i < w; i++) {
if ((y->lower >> i) & 0x1) {
struct BigN tmp;
r->upper += x->upper << i;
tmp.lower = (x->lower << i);
tmp.upper = i == 0 ? 0 : (x->lower >> (w - i));
addBigN(r, r, &tmp);
}
}
for (size_t i = 0; i < w; i++) {
if ((y->upper >> i) & 0x1) {
r->upper += (x->lower << i);
}
}
return r;
}
```
- fast doubling 實做
```cpp=
static struct BigN fast_fib_sequence_wo_clz(long long k)
{
int bs = 0;
long long t = k;
while (t) {
bs++;
t >>= 1;
}
struct BigN a, b;
a.upper = 0;
a.lower = 0;
b.upper = 0;
b.lower = 1;
for (int i = bs; i >= 1; i--) {
struct BigN t;
struct BigN t1;
struct BigN t2;
addBigN(&t ,&b, &b);
t1 = *(multiplieBigN(&a, subtractBigN(&t, &t, &a)));
addBigN(&t2, multiplieBigN(&a, &a), multiplieBigN(&b, &b));
a = t1;
b = t2;
if (k & (1 << (i - 1)) && k > 0) {
addBigN(&t1, &a, &b);
a = b;
b = t1;
k &= ~(1 << (i - 1));
}
}
return a;
}
```
- **大數**運算 fast doubling 跑得比沒有特別運算加速還慢,後面會慢慢找出原因

- **小數**運算的 fast doubling 明顯比一般運算快很多,在大數運算時不能依照小數運算那樣的思維去做運算,更應該多用一些 bitwise operation 來提高大數運算的效率

### 效能優化
fast doubling 這邊宣告三個變數來儲存加法器與乘法器的結果,這些變數可以移出 for 回去避免每次迴圈重新宣告,效能有提昇一些,後面持續優化。
```cpp=
for (int i = bs; i >= 1; i--) {
struct BigN t;
struct BigN t1;
struct BigN t2;
........
```

- 算數左移
```cpp=
static inline struct BigN *lsftBigN(struct BigN *r, struct BigN *x, unsigned char shift)
{
if (shift == 0) {
return x;
}
else if (shift >= 64) {
r->upper = x->lower << (shift - 64);
r->lower = 0ull;
return r;
}
r->upper = x->upper << shift;
r->lower = x->lower << shift;
r->upper |= x->lower >> (64 - shift);
return r;
}
```

- 把乘法器的 malloc 拿掉,多傳一個 result 的指標給乘法器,並把最後結果寫到這個指標,大概快了 2000~3000 ns
```cpp=
static struct BigN *multiplieBigN(struct BigN *r, struct BigN *x, struct BigN *y)
```

### 錯誤分析
用 sprintf 遇到以下錯誤,錯誤原因是 sprintf 沒有檢查 buffer 邊界,所以可能導致 overflow
```bash=
Dangerous function detected in /home/kevin/git/fibdrv/client.c
16: sprintf(lower, "%s", buf);
23: sprintf(scale, "%s", "18446744073709551616");
```
解決辦法用 `snprintf` 改寫成以下
```bash=
sprintf(lower, sizeof(lower), "%s", buf);
```
## 參考資料
- [H03: fibdrv](https://hackmd.io/CTwO5ldOSgKxP-N4YNRVOA?view)
- [c++中size_t與ssize_t詳解](https://www.itread01.com/content/1544835455.html)
- [淺談探索 Linux 系統設計之道](https://www.slideshare.net/jserv/linux-discovery)
- [Character Device Drivers](https://lwn.net/Articles/195805/)
- [The cdev interface](https://lwn.net/Articles/195805/)
- [Common vulnerabilities guide for C programmers](https://security.web.cern.ch/security/recommendations/en/codetools/c.shtml)
- [Arbitrary-precision arithmetic Explanation](https://stackoverflow.com/questions/1218149/arbitrary-precision-arithmetic-explanation)
###### tags: `sysprog2020`