Try   HackMD

2020q1 Homework2 (fibdrv)

contributed by < pingsutw >

開發環境

$ uname -a
Linux kevin 5.0.0-38-generic #41-Ubuntu SMP Tue Dec 3 00:27:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 19.04
Release:	19.04
Codename:	disco

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               60
Model name:          Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
Stepping:            3
CPU MHz:             1527.846
CPU max MHz:         3600.0000
CPU min MHz:         800.0000
BogoMIPS:            5188.08
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-7

作業要求

  • 撰寫適用於 Linux 核心層級的程式
    • 學習 ktimer, copy_to_user 一類的核心 API
  • 複習 C 語言 數值系統bitwise operation
  • 初探 Linux VFS
  • 自動測試機制
  • 透過工具進行效能分析
  • 研讀上述 Linux 效能分析的提示 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作
    從中也該理解為何不希望在虛擬機器中進行實驗;
  • 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
  • 複習 C 語言 數值系統bitwise operation,思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
  • lsmod 的輸出結果有一欄名為 Used by,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 (reference counting) 在 Linux 核心如何追蹤呢?
  • 注意到 fibdrv.c 存在著 DEFINE_MUTEX, mutex_trylock, mutex_init, mutex_unlock, mutex_destroy 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題

開發過程

執行 make check 遇到以下錯誤,因為當 fibonacci 數超過93時會 overflow,需對 fibdrv.c 進行改寫

Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738

添加一個特製的結構來處理大數運算,並添加其加法器

struct BigN { unsigned long long lower, upper; }; static inline void addBigN(struct BigN *output, struct BigN x, struct BigN y) { output->upper = x.upper + y.upper; if (y.lower > ~x.lower) output->upper++; output->lower = x.lower + y.lower; }

一開始沒有特別的運算加速,單存用新的結構 BigN 進行大數運算, 並對其進行效能分析

static struct BigN fib_sequence(long long k) { struct BigN f[k + 2]; f[0].upper = 0; f[0].lower = 0; f[1].upper = 0; f[1].lower = 1; for (int i = 2; i <= k; i++) { addBigN(&f[i], f[i - 1], f[i - 2]); } return f[k]; }


TODO: 在第一次計算的時間,明顯比前20個數慢很多很多,需找出原因
TODO: 需測量更多次實驗 (
105
) 並取
95
% 的信賴區間

  • 執行結果都正確,接下來是效能優化
Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738.
Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167.
Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905.
Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072.
Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977.
Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049.
Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026.
Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075.
  • 利用 Fast Doubling 進行加速,以下為虛擬碼,需在為大數運算實做減法器與乘法器
Fast_Fib(n) a = 0; b = 1; // m = 0 for i = (number of binary digit in n) to 1 t1 = a*(2*b - a); t2 = b^2 + a^2; a = t1; b = t2; // m *= 2 if (current binary digit == 1) t1 = a + b; // m++ a = b; b = t1; return a;
  • 減法器
static struct BigN *subtractBigN(struct BigN *r, struct BigN *x, struct BigN *y) { if (x->lower < y->lower) { unsigned long long mycarry = ULLONG_MAX; r->lower = mycarry + x->lower - y->lower + 1; r->upper = x->upper - y->upper - 1; } else { r->lower = x->lower - y->lower; r->upper = x->upper - y->upper; } return r; }
  • 乘法器
static struct BigN *multiplieBigN(struct BigN *x, struct BigN *y) { struct BigN *r = (struct BigN *) malloc(sizeof(struct BigN)); r->lower = 0; r->upper = 0; size_t w = 8 * sizeof(unsigned long long); for (size_t i = 0; i < w; i++) { if ((y->lower >> i) & 0x1) { struct BigN tmp; r->upper += x->upper << i; tmp.lower = (x->lower << i); tmp.upper = i == 0 ? 0 : (x->lower >> (w - i)); addBigN(r, r, &tmp); } } for (size_t i = 0; i < w; i++) { if ((y->upper >> i) & 0x1) { r->upper += (x->lower << i); } } return r; }
  • fast doubling 實做
static struct BigN fast_fib_sequence_wo_clz(long long k) { int bs = 0; long long t = k; while (t) { bs++; t >>= 1; } struct BigN a, b; a.upper = 0; a.lower = 0; b.upper = 0; b.lower = 1; for (int i = bs; i >= 1; i--) { struct BigN t; struct BigN t1; struct BigN t2; addBigN(&t ,&b, &b); t1 = *(multiplieBigN(&a, subtractBigN(&t, &t, &a))); addBigN(&t2, multiplieBigN(&a, &a), multiplieBigN(&b, &b)); a = t1; b = t2; if (k & (1 << (i - 1)) && k > 0) { addBigN(&t1, &a, &b); a = b; b = t1; k &= ~(1 << (i - 1)); } } return a; }
  • 大數運算 fast doubling 跑得比沒有特別運算加速還慢,後面會慢慢找出原因
  • 小數運算的 fast doubling 明顯比一般運算快很多,在大數運算時不能依照小數運算那樣的思維去做運算,更應該多用一些 bitwise operation 來提高大數運算的效率

效能優化

fast doubling 這邊宣告三個變數來儲存加法器與乘法器的結果,這些變數可以移出 for 回去避免每次迴圈重新宣告,效能有提昇一些,後面持續優化。

for (int i = bs; i >= 1; i--) { struct BigN t; struct BigN t1; struct BigN t2; ........

  • 算數左移
static inline struct BigN *lsftBigN(struct BigN *r, struct BigN *x, unsigned char shift) { if (shift == 0) { return x; } else if (shift >= 64) { r->upper = x->lower << (shift - 64); r->lower = 0ull; return r; } r->upper = x->upper << shift; r->lower = x->lower << shift; r->upper |= x->lower >> (64 - shift); return r; }

  • 把乘法器的 malloc 拿掉,多傳一個 result 的指標給乘法器,並把最後結果寫到這個指標,大概快了 2000~3000 ns
static struct BigN *multiplieBigN(struct BigN *r, struct BigN *x, struct BigN *y)

錯誤分析

用 sprintf 遇到以下錯誤,錯誤原因是 sprintf 沒有檢查 buffer 邊界,所以可能導致 overflow

Dangerous function detected in /home/kevin/git/fibdrv/client.c 16: sprintf(lower, "%s", buf); 23: sprintf(scale, "%s", "18446744073709551616");

解決辦法用 snprintf 改寫成以下

sprintf(lower, sizeof(lower), "%s", buf);

參考資料

tags: sysprog2020