sternacht
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: linux2022 --- # 2022q1 Homework3 (fibdrv) contributed by < `sternacht` > ## 自我檢查清單 - [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [x] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 研讀 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU),指出針對大數運算,有哪些加速運算和縮減記憶體操作成本的舉措? - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 &#34;each module&#39;s use count and a list of referring modules&#34;,但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢? &gt; 搭配閱讀 [The Linux driver implementer’s API guide » Driver Basics](https://www.kernel.org/doc/html/latest/driver-api/basics.html) - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 ## [作業要求](https://hackmd.io/@sysprog/linux2022-fibdrv#-作業要求) ## 開發過程 ### 環境搭建 為了排除可能的干擾,根據[作業中的提示](https://hackmd.io/@sysprog/linux2022-fibdrv#排除干擾效能分析的因素)執行以下步驟調整至適合實驗的環境 - 抑制 address space layout randomization (ASLR) ```cmd $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" ``` - 設定 scaling_governor 為 performance。準備以下 shell script,檔名為 `performance.sh` : ```shell for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do echo performance > ${i} done ``` 之後再用 `$ sudo sh performance.sh` 執行 - 針對 Intel 處理器,關閉 turbo mode ```cmd $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` 修改 `/etc/default/grub` 裡的設定為 `GRUB_CMDLINE_LINUX="isolcpus=0,1"` ,用 `sudo update-grub` 命令使 grub 檔案更新,最後重新開機 ![](https://i.imgur.com/mXT3L2X.png) 開機後可以從內建的 System Monitor 中看到 cpu 0, 1(圖中編號是 1, 2) 已經不會有負載了 以下涉及執行時間觀察的實驗皆是在被保留的 cpu 中執行。 ### fast doubling 將檔案 git clone 下來之後,可以看到裡面計算 fibonacci sequence 的演算法是根據定義實作的,隨著數列的值增加,迴圈次數為 $O(n)$ 線性成長。根據作業內容,我們可以透過 fast doubling 的演算法來實作出迴圈次數為 $O(logn)$ 的程式如下 ```c static long long fast_doubling(int k){ if (k <= 0) return 0; long long a = 0, b = 1; long long t1, t2; for (int i = 31 - __builtin_clz(k); i; i--){ t1 = a * (2 * b - a); t2 = b * b + a * a; a = t1; b = t2; if (k & (1 << i)){ t1 = a + b; a = b; b = t1; } } return a; } ``` 接著修改 fib_read 讓程式執行的同時印出我們想要的訊息,這部分參考了 [OscarShiang](https://hackmd.io/@oscarshiang/linux_fibdrv#使用-fibdrv-進行實驗) 的筆記得知該如何使用 printk 來觀察結果 ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { ktime_t t1 = ktime_get(); ssize_t ret = fib_sequence(*offset); t1 = ktime_sub(ktime_get(), t1); ktime_t t2 = ktime_get(); fast_doubling(*offset); t2 = ktime_sub(ktime_get(), t2); printk(KERN_INFO "%d %lld %lld", *offset, ktime_to_ns(t1), ktime_to_ns(t2)); return ret; } ``` 執行 `./client` 後,再用 `dmesg` 命令來觀察,部分結果如下 ``` [ 9904.070177] 0 35 25 [ 9904.070182] 1 25 25 [ 9904.070184] 2 35 22 [ 9904.070186] 3 35 20 [ 9904.070188] 4 34 22 [ 9904.070190] 5 34 20 [ 9904.070191] 6 37 19 [ 9904.070193] 7 38 19 [ 9904.070195] 8 37 20 [ 9904.070197] 9 41 19 [ 9904.070199] 10 41 21 [ 9904.070201] 11 59 20 [ 9904.070203] 12 53 21 [ 9904.070205] 13 53 22 [ 9904.070207] 14 45 22 [ 9904.070209] 15 47 21 ... ``` 將結果利用 gnuplot 製作成圖表,fast doubling 花費的時間明顯比原本的演算法快上許多 ![](https://i.imgur.com/nbjne7z.png) > 註 : 此結果為不正確的,詳細請看接下來的討論 ### w/o clz 指令的影響與調整 接著比較一下若不使用 clz 這樣的指令會對系統的運算方面造成什麼影響。以下直接將 i 設為從 MSB 開始,這樣的話無論 k 的大小,迴圈數都是固定的。 ```c static long long fast_doubling_2(int k){ if (k <= 0) return 0; long long a = 0, b = 1; long long t1, t2; for (int i = 31; i; i--){ t1 = a * (2 * b - a); t2 = a * a + b * b; a = t1; b = t2; if (k & (1 << i)){ t1 = a + b; a = b; b = t1; } } return a; } ``` ![](https://i.imgur.com/zo7FduE.png) ![](https://i.imgur.com/IorIlHR.png) >原本以為不使用 clz 這樣個函式會對執行時間有所影響,但實驗結果在花費時間上卻差不多? 但若是搬到 user space 上執行,差距就很大了 (why?) 仔細看過 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU#clz-的影響)後,發現有一段有提到相同的問題,於是我根據裡面提到的步驟去察看編譯後的組合語言,確認程式執行的過程使用了哪些指令 ```cmd objdump -dS fibdrv.ko > out ``` [objdump](https://man7.org/linux/man-pages/man1/objdump.1.html) 可以查看 obj file 的內容,搭配 -S 可以將已經變成 binary code 的檔案反組譯,我們主要是在 fib_read 的函式中執行 fibonacci 數列的計算,因此直接看這一段的組合語言在做甚麼 ```asm 00000000000000c0 <fib_read>: { c0: e8 00 00 00 00 callq c5 <fib_read+0x5> c5: 55 push %rbp c6: 48 89 e5 mov %rsp,%rbp c9: 53 push %rbx t1 = ktime_get(); ca: e8 00 00 00 00 callq cf <fib_read+0xf> cf: 48 89 c3 mov %rax,%rbx t1 = ktime_sub(ktime_get(), t1); d2: e8 00 00 00 00 callq d7 <fib_read+0x17> d7: 48 29 d8 sub %rbx,%rax } da: 5b pop %rbx db: 5d pop %rbp dc: c3 retq dd: 0f 1f 00 nopl (%rax) ``` 對照 C 語言的程式碼 ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { ktime_t t1; t1 = ktime_get(); long long fib = fast_doubling_l(*offset); t1 = ktime_sub(ktime_get(), t1); return (ssize_t) ktime_to_ns(t1); } ``` 可以看到編譯過後的組合語言就只剩下兩個 `ktime_get()` 的系統呼叫,以及將兩數相減的過程,而 fibonacci 的計算則完全不見了,這並不是我們想要的。而之前的另一個疑問是,為什麼搬到 userspace 上兩者卻又有差距呢 ? 原因是 userspace 的程式編譯過程與 kernel module 不一樣,因此編譯出來的程式並沒有將這個函式呼叫省略掉,所以執行上是正常的 ```asm= main: .LFB7: .cfi_startproc endbr64 pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl $50, -12(%rbp) movl $0, -16(%rbp) jmp .L8 .L9: movl -12(%rbp), %eax cltq movq %rax, %rdi call fast_doubling movq %rax, -8(%rbp) addl $1, -16(%rbp) .L8: movl -16(%rbp), %eax cmpl -12(%rbp), %eax jl .L9 movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc ``` > 第 18 行有呼叫 `fast_doubling`,是真的有執行到 至於要如何修掉這個問題呢 ? 我猜測這個問題是由於最佳化所導致的,雖然不明白是甚麼使得程式認為這段可以省略(因為沒有用到變數 `fib` ?),但是把最佳化關掉試試看吧,在 Makefile 這段後面加上關閉最佳化的選項 `-O0` ``` ccflags-y := -std=gnu99 -Wno-declaration-after-statement -O0 ``` 重新編譯並再次輸出 `fibdrv.ko` 的資訊 ```asm 0000000000001263 <fib_read>: /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { 1263: e8 00 00 00 00 callq 1268 <fib_read+0x5> 1268: 55 push %rbp 1269: 48 89 e5 mov %rsp,%rbp 126c: 48 83 ec 30 sub $0x30,%rsp 1270: 48 89 7d e8 mov %rdi,-0x18(%rbp) 1274: 48 89 75 e0 mov %rsi,-0x20(%rbp) 1278: 48 89 55 d8 mov %rdx,-0x28(%rbp) 127c: 48 89 4d d0 mov %rcx,-0x30(%rbp) t1 = ktime_get(); 1280: e8 00 00 00 00 callq 1285 <fib_read+0x22> 1285: 48 89 45 f0 mov %rax,-0x10(%rbp) fib = fast_doubling(*offset); 1289: 48 8b 45 d0 mov -0x30(%rbp),%rax 128d: 48 8b 00 mov (%rax),%rax 1290: 48 89 c7 mov %rax,%rdi 1293: e8 9c fe ff ff callq 1134 <fast_doubling> 1298: 48 89 45 f8 mov %rax,-0x8(%rbp) t1 = ktime_sub(ktime_get(), t1); 129c: e8 00 00 00 00 callq 12a1 <fib_read+0x3e> 12a1: 48 2b 45 f0 sub -0x10(%rbp),%rax 12a5: 48 89 45 f0 mov %rax,-0x10(%rbp) return (ssize_t) ktime_to_ns(t1); 12a9: 48 8b 45 f0 mov -0x10(%rbp),%rax 12ad: 48 89 c7 mov %rax,%rdi 12b0: e8 5a ed ff ff callq f <ktime_to_ns> } 12b5: c9 leaveq 12b6: c3 retq ``` 這次就可以看的到 `fast_doubling` 的函式呼叫了 最後我們再看一下關閉最佳化的情況下,是否用 `__builin_clzll` ,以及原本的方式,三者之間的執行時間差距 ![](https://i.imgur.com/6Kph2w9.png) 雖然未能完全解決關於干擾的問題,但從圖上還是可以明顯看出使用 clz 的 fast_doubling 是最快的,不使用 clz 的 fast_doubling 與加法的實作則隨著 fibonacci 數列增長而有交叉,最終仍是 fast_doubling 略勝一籌。 > 值得一提的是,因為最佳化會導致函式沒有被執行的問題,因此在這張圖中,加法實作比起有最佳化慢了約一倍的時間,所以如果有辦法使函式不被最佳化吃掉的話, fast_doubling 的速度也能更快,例如在前面提到 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU#clz-的影響)中便有使用其他方式去避免最佳化不去執行函式。 ### 計算 fib(92)之後的數 fib(92) 之後的數在 long long 的形態中會造成 overflow (unsigned long long 可以容納至第 93 項),一開始我利用如下的自定義結構來儲存大於 64 bits 的數 ```c typedef struct{ unsigned long long lower, upper; } bignum; ``` 當數字小於 $2^{64}$ 時儲存在 lower 中,一旦超過就使 upper + 1 ,如此一來可以達到相當於 __int128 的大小。針對這種結構的加減法計算相對簡單,只要判斷是否有進位或借位的狀況發生去調整就好,但遇到乘法的時候就不行了,因為結構的大小是固定的,超出範圍的進位無法表示,亦即上限仍然存在。因此考慮另一種可動態調整大小的整數陣列如下 ```c typedef struct{ int len; unsigned long long *bn; } bignum; ``` len 用來記錄陣列 bn 中有多少個元素,而每個元素則是一個 long long 型態的整數,當達到存取的上限時,會動態的去調整陣列大小。 加減法的實作如下 ```c bignum bignum_add(bignum a, bignum b) { bignum ret = bignum_init(); int len = b.len; ret.len = len + 1; ret.bn = malloc(sizeof(long long) * (len + 1)); // c stand for 'carry or not' int i = 0, c = 0; while (i < a.len) { *(ret.bn + i) = *(a.bn + i) + *(b.bn + i) + c; c = (*(ret.bn + i) < *(a.bn + i)) ? 1 : 0; i++; } while (i < b.len) { *(ret.bn + i) = *(b.bn + i) + c; c = (*(ret.bn + i) < *(b.bn + i)) ? 1 : 0; i++; } *(ret.bn + i) = c; bignum_check(&ret); return ret; } bignum bignum_sub(bignum a, bignum b) { bignum ret = bignum_init(); ret.len = a.len; ret.bn = malloc(sizeof(long long) * a.len); int i = 0, c = 0; while (i < b.len) { *(ret.bn + i) = *(a.bn + i) - *(b.bn + i) - c; c = (*(b.bn + i) + c > *(a.bn + i)) ? 1 : 0; i++; } while (i < a.len) { *(ret.bn + i) = *(a.bn + i) - c; c = *(a.bn + i) ? 0 : 1; i++; } bignum_check(&ret); return ret; } ``` 加法會預先多給空間、減法則可能會使陣列的最後一個元素變成 0 ,`bignum_check` 用來檢查是否有最尾端的元素為 0 的情況發生,進而調整記憶體的使用。 為了測試這兩個函示是否正常運作,因此我先參考了[這篇討論](https://stackoverflow.com/questions/8023414/how-to-convert-a-128-bit-integer-to-a-decimal-ascii-string-in-c)的內容,改寫成適用於動態 bignum 結構的輸出十進位值之實作,程式碼如下 ```c char *bignum2dec(bignum a) { char s[64 * a.len / 3 + 2]; long long n[a.len]; char *p = s; int i; memset(s, '0', sizeof(s) - 1); s[sizeof(s) - 1] = '\0'; memcpy(n, a.bn, sizeof(n)); // INT64_M = 0xFFFFFFFFFFFFFFFF // INT64_H = 0x8000000000000000 for (i = 0; i < 64 * a.len; i++) { int j, c, l = a.len; c = (n[l - 1] >= INT64_H); while (--l) { n[l] = ((n[l] << 1) & INT64_M) + (n[l - 1] >= INT64_H); } n[0] = ((n[0] << 1) & INT64_M); for (j = sizeof(s) - 2; j >= 0; j--) { s[j] += s[j] - '0' + c; c = (s[j] > '9'); if (c) s[j] -= 10; } } while (*p == '0' && p < &s[sizeof(s) - 2]) p++; return p; } ``` 概念是透過 bitwise 操作逐步將十進位的值以每次一個乘 2 的方式算出來,並記錄是否進位到下一個字元,缺點是 s 大小是根據 bignum 陣列長度設置,因此還需要在最後將多餘的 '0' 給排除掉。 因為 `*p` 是宣告在函式中,回傳後 stack 被釋放了,並不能保證原本位址裡面的值不會被改變,印出來的數字會發生錯誤,因此前面計算的部分保留,修改 `*p` 的宣告以及給值的方式如下 ```c i = 0; while (s[i] == '0' && i < sizeof(s) - 2) i++; char *p = malloc(sizeof(s) - i); memcpy(p, s + i, sizeof(s) - i); return p; ``` 結合加法與輸出的實作,已經可以計算出相當大的 fibonacci 數了,但是只是根據定義的加法實作,而前面所提到的 fast doubling 還需要乘法。以下是一個簡易的實作方式,主要利用的概念是,兩個 long long 長度的數相乘,其積不可能超過兩個 long long 的長度,因此用一個較大的型態儲存乘積,再將乘積分開成前後兩部份分別加到相對應的陣列位置當中,對兩個 bignum 的陣列種每兩兩一組元素重複著個動作,並考慮進位的情況做調整,就可以得到答案。 再來考慮進位的問題,當乘法循環時,高 64 位元數可能有來自低 64 位元的進位、以及上一次迭代時所產生的進位兩種,因此需要分別計算進位是在哪個時期產生 ```c bignum bignum_mul(bignum a, bignum b) { bignum ret = bignum_init(); ret.len = a.len + b.len; ret.bn = calloc(sizeof(long long) * ret.len); int i, j; for(i = 0; i < a.len; i++){ // 前 64 bits 與結果相加時產生的進位 int uc = 0; for(j = 0; j < b.len; j++){ // 後 64 bits 與結果相加時產生的進位 int lc = 0; __int128 tmp = (__int128)(*(a.bn + i)) * (__int128)(*(b.bn + j)); *(ret.bn + i + j) += (tmp & INT64_M); lc = (*(ret.bn + i + j) < (tmp & INT64_M))? 1: 0; *(ret.bn + i + j + 1) += (long long)(tmp >> 64) + lc + uc; uc = (*(ret.bn + i + j + 1) < (long long)(tmp >> 64))? 1: 0; } *(ret.bn + i + j) += uc; } bignum_check(&ret); return ret; } ``` 於是我們可以用以下的 fast doubling 函式來計算更大的 fibonacci 數了, ```c bignum fast_doubling(int n) { bignum a = bignum_init(), b = bignum_init(); *(b.bn) = 1; bignum t1, t2; for (int i = 32 - __builtin_clz(n); i--;){ // t2 = a * a + b * b t1 = bignum_mul(a, a); t2 = bignum_mul(b, b); t2 = bignum_add(t1, t2); // t1 = a * (2 * b - a) t1 = bignum_add(b, b); t1 = bignum_sub(t1, a); t1 = bignum_mul(t1, a); a = t1; b = t2; if(n & (1 << i)){ t1 = bignum_add(a, b); a = b; b = t1; } } return a; } ``` 輸出結果與查詢到的[前 500 fibonacci 數列](https://blog.abelotech.com/posts/first-500-fibonacci-numbers/)亦符合。 ![](https://i.imgur.com/pLDnfjs.png) ### memory leak 到目前為止的 bignum 實作還是在 userspace 的環境下,在搬到 kernel 之前,我先透過 `valgrind --leak-check=full ./test` 命令來查看 memory leak 相關的問題 ![](https://i.imgur.com/CJHLDPv.png) 果不其然有相當嚴重的 memory leak 的問題,因為大部分的函式都有用到 malloc ,但是都沒 free 掉,而主要會發生 memory leak 的地方有三個 : 第一個是 fast_doubling 函式會回傳一個 bignum 結構,而這個結構中的記憶體沒有釋放掉,並在下一次被覆蓋掉,造成 memory leak ,需要在輸出結果後連同結果的字串將其釋放 第二個是在計算每一個 fibonacci 數時,需要額外幾個暫存器放置運算所需的數,這些數要在函數回傳之前全部釋放(a 回傳 , b, t1, t2 釋放) > 更新 : 最後的版本中是回傳字串, a, b, t1, t2 全部在函式中釋放 第三個則是在運算的過程中所產生的,原本的函式設計是會產生一個新的 bignum 作為回傳,但如此一來原先的變數中所佔據的記憶體卻沒被釋放,於是這裡我改成另一種寫法,將原本回傳要放的變數位址同時傳給函式,並直接改動裡面的值,如下 ```c void bignum_add(bignum a, bignum b, bignum *ret) { ret->len = b.len + 1; // 被加數與加數可能與和是同一個 bignum , bn 的指標是相同的 // 因此先將計算結果寫入在另一塊記憶體上,最後合併到 ret uint64_t *bn = malloc(sizeof(uint64_t) * (ret->len)); int i = 0, c = 0; while (i < a.len) { *(bn + i) = *(a.bn + i) + *(b.bn + i) + c; c = (*(bn + i) < *(a.bn + i)) ? 1 : 0; i++; } while (i < b.len) { *(bn + i) = *(b.bn + i) + c; c = (*(bn + i) < *(b.bn + i)) ? 1 : 0; i++; } *(bn + i) = c; // 替換前將舊的 bn 指標釋放 free(ret->bn); ret->bn = bn; bignum_check(ret); return; } ``` 將全部的函式整理過後再測試一次 ![](https://i.imgur.com/7QU9yMj.png) ### 引入 fibdrv 先看一下原本的函式是如何處理傳入值以及回傳值的, `fib_read` 會把 fibonacci 數的計算結果當做 read 的返回值直接回傳給變數 `sz`,而後者的回傳型態是 long long ,並在 `client.c` 輸出。 而在改成 bignum 結構後,回傳的方式必須要改成用作業說明裡提到的 `copy_to_user` 寫入,然後再從 `client.c` 裡面的 buf 中讀取。 接著在改程式碼的過程中,又做了一些修改,首先是因為 stdint.h 屬於 userspace , kernel 不好使用,因此改成 kernel 底下有的 type,如 `__u64 (unsigned long long)`,但也因為不能使用像`__int128` 這樣大的整數結構,所以將整段程式的存取位元減半成 `__u32`。 再來是 stdlib.h 同樣屬於 userspace,底下的 malloc, free 等函式,也要改成 kernel 使用的 [kmalloc](https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html), [kfree](https://www.kernel.org/doc/htmldocs/kernel-api/API-kfree.html) 等。 > fibdrv.c ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { // fast_doubling() retrun a string of number char *ret = fast_boudling(*offset); copy_to_user(buf, ret, strlen(ret) + 1); kfree(ret); return 0; } ``` > client.c ```c for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%s.\n", i, buf); } ``` 此外,在編譯時期會出現以下警告 ``` warning: ISO C90 forbids variable length array 'f' ``` 這是因為陣列的記憶體分配是在編譯時期就會做處理,變動長度會使得系統不知道該分配多少空間給該陣列,同時若傳入作為長度的變數太大,程式又缺乏檢查機制,就容易造成 stack overflow,因此改成用 malloc, kmalloc 等方式來索取記憶體是比較安全的方式 最後將 `scripts/expected.txt` 裡的 fib(93) ~ fib(100) 修改成正確的值後,執行 `make check` ,`scripts/verify.py` 終於沒有報錯了~。 ### 效能比較 拿來比較的是老師的 [bignum 實作](https://github.com/sysprog21/bignum),為了方便反覆測試,這裡選擇在 userspace 上測量時間。 ![](https://i.imgur.com/l1fO6Ni.png) 最初的實作慢了相當多,於是第一步先考慮有哪些程式碼可以更加精簡,例如原本 fast_doubling 中有一段程式碼如下註解部分,要將 a = b, b = a + b,原先的做法是先做加法併存到另一個位置,然後再將內容複製過去,但其實可以將 a, b 內容先對調,再把相加的結果存到 b 之中,結果相同 ```c if(n & (1UL << i)){ // bignum_add(a, b, t1); // bignum_copy(a, b); // bignum_copy(b, t1); bignum_swap(a, b); bignum_add(b, a, b); } ``` 以及另一段乘以 2 的實作,原本是用加法來實現,但相對於加法,直接用二進位的位移操作其實更快,因此可以寫一個函式來做乘以二的乘法 ```c void bignum_bwdouble(bignum *ret) { int len = ret->len; if (*(ret->bn + len - 1) > INT64_H){ ret->bn = realloc(ret->bn, sizeof(uint64_t) * (len + 1)); *(ret->bn + len) = 1; } while(--len){ *(ret->bn + len) <<= 1; *(ret->bn + len) |= (*(ret->bn + len - 1) > INT64_H); } *(ret->bn) <<= 1; return; } ``` 第一步的改進成果如下 ![](https://i.imgur.com/tQax82L.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully