---
tags: linux2022
---
# 2022q1 Homework3 (fibdrv)
contributed by < [Eddielin0926](https://github.com/Eddielin0926) >
## :male-teacher: 自我檢查清單
- [ ] 研讀 [Linux 效能分析](https://hackmd.io/@sysprog/linux2022-fibdrv#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA) 的提示描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 → 從中也該理解為何不希望在虛擬機器中進行實驗。
- [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://hackmd.io/@sysprog/linux2022-fibdrv#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [ ] 研讀 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU) 的報告,指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措?
- [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 “each module’s use count and a list of referring modules”,但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
> 搭配閱讀 The Linux driver implementer’s API guide » Driver Basics
- [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/Reference_counting) 的程式碼來確認。
## :gear: 開發環境
### 測試環境
```shell=
➜ ~ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0
➜ ~ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 2
Core(s) per socket: 3
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 2
Model name: AMD FX(tm)-6300 Six-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 1400.000
CPU max MHz: 3500.0000
CPU min MHz: 1400.0000
BogoMIPS: 7033.04
Virtualization: AMD-V
L1d cache: 48 KiB
L1i cache: 192 KiB
L2 cache: 6 MiB
L3 cache: 8 MiB
NUMA node0 CPU(s): 0-5
```
### 檢查環境
```shell=
➜ ~ uname -r
5.13.0-30-generic
➜ ~ dpkg -L linux-headers-`uname -r` | grep "/lib/modules"
/lib/modules
/lib/modules/5.13.0-30-generic
/lib/modules/5.13.0-30-generic/build
➜ ~ whoami
eddie
➜ ~ sudo whoami
root
```
## Fix C99 VLA
原本的程式碼使用了動態宣告陣列,但實際上這個函式只關心費氏數列第 k 個元素,因此沒有必要儲存前面計算的結果。計算的過程中只會需要 $fib(n - 1)$ 和 $fib(n - 2)$,所以只需要宣告三個變數即可。
$$
fib(n) = fib(n - 1) + fib(n - 2)
$$
```c
static long long fib_sequence(long long k)
{
long long f[3] = {0, 1, 1};
if (k <= 2)
return f[k];
for (int i = 2; i <= k; i++) {
f[2] = f[1] + f[0];
f[0] = f[1];
f[1] = f[2];
}
return f[2];
}
```
## Fast Doubling
根據 Fast Doubling 手法,將 `a` 視為 $F(k)$、`b` 視為 $F(k + 1)$。
$$
F(2k) = F(k)[2F(k+1) - F(k)] \\
F(2k+1) = F(k+1)^2+F(k)^2
$$
這邊利用了 `__builtin_clzll` 來計算 bit width,`__builtin_clzll` 計算了左側有幾個零,因此扣掉 `long long` 的 64 為位元相減就知道這個數有幾位。
```c
#define __binary_digitll(x) ((sizeof(x) * 8) - __builtin_clzll(x))
static long long fib_sequence(long long k)
{
long long a = 0, b = 1;
for (int i = __binary_digitll(k); i > 0; i--) {
long long t1 = a * (2 * b - a);
long long t2 = (b * b) + (a * a);
a = t1;
b = t2;
if (k & ((1LL << i) >> 1)) {
t1 = a + b;
a = b;
b = t1;
}
}
return a;
}
```
<!-- Why is `typedef` considered harmful?
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs -->
## Timing
![](https://i.imgur.com/WMgSYBc.png)
## 大數運算
```c
#include <stdint.h>
#define BIG_INT_SIZE 12
#define LSB 0
#define MSB (BIG_INT_SIZE - 1)
struct bigint {
uint32_t bytes[BIG_INT_SIZE];
};
struct bigint init_bigint(long long num);
int bitwidth_bigint(struct bigint num);
int str_bigint(char *buf, int size, struct bigint num);
struct bigint add_bigint(struct bigint a, struct bigint b);
struct bigint sub_bigint(struct bigint a, struct bigint b);
struct bigint neg_bigint(struct bigint a);
struct bigint not_bigint(struct bigint a);
struct bigint shr_bigint(struct bigint a, int shift);
struct bigint shl_bigint(struct bigint a, int shift);
```
### Double Dabble
```c
int str_bigint(char *buf, int size, struct bigint num)
{
// Double Dabble Algorithm
if (!buf || size < 2)
return -1;
int n = bitwidth_bigint(num);
int bcd_sz = (n + 4 * ((n + 2) / 3)) / 8;
int idx = 0;
uint8_t bcd[bcd_sz];
if (num.bytes[MSB] >> 31)
{
if (bcd_sz + 1 > size)
return -1;
num = neg_bigint(num);
buf[idx++] = '-';
}
else
{
if (bcd_sz > size)
return -1;
}
for (int i = 0; i < bcd_sz; i++)
{
bcd[i] = 0;
}
for (int i = 0; i < BIG_INT_SIZE * 32; i++)
{
for (int j = 0; j < bcd_sz; j++)
{
if ((bcd[j] & 0xF) >= 5)
{
bcd[j] += 3;
}
if ((bcd[j] >> 4) >= 5)
{
bcd[j] += 0x30;
}
}
for (int j = bcd_sz - 1; j > 0; j--)
{
bcd[j] <<= 1;
bcd[j] |= ((bcd[j - 1] >> 7) & 1);
}
bcd[0] <<= 1;
bcd[0] |= ((num.bytes[MSB] >> 31) & 1);
for (int j = BIG_INT_SIZE - 1; j > 0; j--)
{
num.bytes[j] <<= 1;
num.bytes[j] |= ((num.bytes[j - 1] >> 31) & 1);
}
num.bytes[0] <<= 1;
}
bool lead = true;
for (int i = bcd_sz - 1; i >= 0; i--)
{
if (!lead || (bcd[i] >> 4))
{
lead = false;
buf[idx++] = (bcd[i] >> 4) + '0';
}
if (!lead || (bcd[i] & 0xF))
{
lead = false;
buf[idx++] = (bcd[i] & 0xF) + '0';
}
}
if (lead)
{
buf[idx++] = '0';
}
buf[idx++] = '\0';
return idx;
}
```