王彥傑
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2023q1 Homework3 (fibdrv) contributed by < `yanjiew1` > > [GitHub](https://github.com/yanjiew1/fibdrv) ## 實驗環境 ```bash $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz CPU family: 6 Model: 142 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 10 CPU max MHz: 3400.0000 CPU min MHz: 400.0000 BogoMIPS: 3600.00 ``` ## 改用 Fast Doubling 計算費氏數 原本的 fibdrv 是使用 Dynamic Programming 來計算,時間複雜度為 $O(n)$ ,且使用可變長度陣列來存放第 0 ~ n 的費氏數,空間複雜度為 $O(n)$ 。 參考[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a)和 [Calculating Fibonacci Numbers by Fast Doubling](https://chunminchang.github.io/blog/post/calculating-fibonacci-numbers-by-fast-doubling),修改 fibdrv 用 Fast Doubling 來計算: ```c static long long fib_sequence(long long k) { if (k < 2) return k; long long fib[2] = {0, 1}; int len = fls64(k); k <<= 64 - len; for (; len; len--, k <<= 1) { /* Fast doubling */ long long tmp = fib[0]; fib[0] = fib[0] * (2 * fib[1] - fib[0]); fib[1] = fib[1] * fib[1] + tmp * tmp; if (k & (1ULL << 63)) { /* Fast doubling + 1 */ tmp = fib[0]; fib[0] = fib[1]; fib[1] = fib[1] + tmp; } } return fib[0]; } ``` 實作方式是先用 Linux kernel 提供的 `fls64` 來得到最高位元是在哪一位,並且把它移到整個 64 bits 整數的最左邊。另外也順便得到 `len` 就是整數二進位有效位數。例: `0010001`, `len` 就會是 5 ,前面二個 0 沒作用。 接下來跟據老師題供的[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a)公式,計算 fast doubling ,並且遇到左邊 bit 為 1 時,再額外計算下一個費氏數。公式如下: $F_{2k} = F_k(2F_{k+1}-F_k)$ $F_{2k+1} = {F_{k+1}}^2+{F_k}^2$ 編譯載入核心模組後,確認可以正確計算到第 92 個費氏數: ```bash $ make $ make load $ sudo ./client ... Reading from /dev/fibonacci at offset 88, returned the sequence 1100087778366101931. Reading from /dev/fibonacci at offset 89, returned the sequence 1779979416004714189. Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. ... ``` 上述程式改進後,時間複雜度變為 $O(\log n)$,空間複雜度為 $O(1)$ 。 :::warning TODO: 提交 pull request,讓 GitHub Actions 得以自動測試。 :notes: jserv > 已提交自動測試 GitHub Actions [Pull Request](https://github.com/sysprog21/fibdrv/pull/6) ::: ## 效能分析工具設定與測試 作業要求中,需要針對每一個實作分析效能。為了及早分析,故在開始實作前,先建立效能分析的架構。 我們需要精確計時器來分析效能。在作業說明中提到使用 `ktime_t` 來做,此外此作業 `fibdrv` 中的 `write` 剛好沒作用,可以用來量測和回傳量測結果。另外參考 [`KYG-yaya573142`](https://hackmd.io/@KYWeng/rkGdultSU) 的實作來開發。 ### 初步 `ktime_t` 測試 宣告 `static` 變數: ```c static ktime_t kt; ``` 修改 `fib_read` 進行量測: ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { ktime_t start = ktime_get(); long long result = fib_sequence(*offset); kt = ktime_sub(ktime_get(), start); return (ssize_t) result; } ``` 修改 `fib_write` 用來查詢量測結果: ```c /* write operation is skipped */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { return kt; } ``` 新增 [`measure.c`](https://github.com/yanjiew1/fibdrv/blob/8889aed7971a27429c9b638afbb905df3001ce1b/measure.c) 測試程式,測試程式中,使用 `write(fd, write_buf, 0)` 來取得上次執行的時間。 執行 `./measure` 後,成功輸出計算費氏數時間,輸出如下: ``` Reading from /dev/fibonacci at offset 0, returned the sequence 0. Time taken to calculate the sequence 43. Reading from /dev/fibonacci at offset 1, returned the sequence 1. Time taken to calculate the sequence 39. Reading from /dev/fibonacci at offset 2, returned the sequence 1. Time taken to calculate the sequence 73. Reading from /dev/fibonacci at offset 3, returned the sequence 2. Time taken to calculate the sequence 60. Reading from /dev/fibonacci at offset 4, returned the sequence 3. Time taken to calculate the sequence 76. Reading from /dev/fibonacci at offset 5, returned the sequence 5. Time taken to calculate the sequence 46. Reading from /dev/fibonacci at offset 6, returned the sequence 8. Time taken to calculate the sequence 57. Reading from /dev/fibonacci at offset 7, returned the sequence 13. Time taken to calculate the sequence 49. Reading from /dev/fibonacci at offset 8, returned the sequence 21. Time taken to calculate the sequence 76. Reading from /dev/fibonacci at offset 9, returned the sequence 34. Time taken to calculate the sequence 62. Reading from /dev/fibonacci at offset 10, returned the sequence 55. ... ``` ### 使用者模式時間同步 因為要測量運算完成後,把資料從核心搬到使用者模式,及使用者模式收到資料的時間。故在使用者模式取得到時間要能跟核心模式進行對應。 在使用者模式可以用 `clock_gettime` 函式來取得時間,其中又有不同的 clock id 。跟據 [Linux 核心 ktime API 文件](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html),在 Linux 核心內,以 `ktime_get` 函式所取得的時間可以對應到 `CLOCK_MONOTONIC` 的時間。 跟據 Linux 原始碼 [`include/linux/ktime.h`](https://github.com/torvalds/linux/blob/master/include/linux/ktime.h), `ktime_t` 的單位為柰秒,並且為 64-bit 有號數型態。 ```c /* Nanosecond scalar representation for kernel time values */ typedef s64 ktime_t; ``` `struct timespec` 為使用者模式透過 `clock_gettime` 取得的時間。要轉成 `ktime_t` 也很簡單。 `struct timespec` 定義如下: ```c struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; ``` 把 `tv_sec` 乘上 $10^9$ 再加上 `tv_nsec` 即可,可直接在使用者模式實作。因此我們就可利用 `write` 的回傳值來傳遞在核心模式量測到的時間。而使用者模式量測到時間可以轉成 `ktime_t` 後再比較。 :::info **Linux 支援的最大日期** `ktime_t` 以柰秒來存放時間。故能設定的最大時間落在 2262 年左右。以 `date -s "2270-01-01"` 指令來設定時間果然不能運作。看來解決了 Y2038 問題,二世紀後的人們要面對 Y2262 的問題。 ::: ## 多演算法選擇 這個作業會需要比較多種演算法的效能。觀察目前 `read` 和 `write` 用到的參數,可以發現 `size` 未被使用。故就拿它來選擇演算法用。 `read` 會回傳計算完成的費氏數,並把花費時間記錄在一個全域變數內。而 `write` 中,若 `size` 傳入 0 ,則回傳全域變數內上次 `read` 所花費的時間。若傳入大於 0 的數字,則會選定演算法跑一次費氏數運算,但不回傳結果,而是回傳執行時間。 故我修改 `fib_sequence` 讓它可以選擇演算法 ```c /* Fibonacci Sequence using big number */ static long long fib_sequence(long long k, size_t choose) { switch (choose) { case 0: case 1: return fib_sequence_dp(k); default: return fib_sequence_fast_doubling(k); } } ``` 然後再修改 `read` 和 `write` 函式: ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { ktime_t start = ktime_get(); long long result = fib_sequence(*offset, size); kt = ktime_sub(ktime_get(), start); return (ssize_t) result; } static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { if (!size) return kt; static volatile long long result; ktime_t start = ktime_get(); result = fib_sequence(*offset, size); return ktime_sub(ktime_get(), start); } ``` ### 系統環境設定 先排除效能干擾因素 - 關閉 SMT (Hyperthread) 這是在作業說明沒寫,但我覺得也會影響測試結果的部份。因為在 SMT 開啟的狀態下,會有二個硬體的 Thread 跑在同一個 CPU 核心上,會互相競爭,故把它關閉。 ```bash $ sudo sh -c 'echo off > /sys/devices/system/cpu/smt/control' ``` - 針對 Intel CPU 關閉 Turbo boost ```bash $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` - 設定 scaling_governor 為 performance 按照作業說明,把以下內容複製到一個獨立檔案 `performance.sh` 之後再執行 `$ sudo sh performance.sh` ```bash for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do echo performance > ${i} done ``` - CPU isolation 把某一核心空出來給待測程式跑。原本的作業說明是在 GRUB 加上 `isolcpus=...` 。但我希望能夠在 Linux 執行時,能直接空出 CPU Core 不必改 boot loader 。 我查到[一篇說明文件](https://www.suse.com/c/cpu-isolation-practical-example-part-5/)說明如何隔離 CPU core ,使用的是 cpuset 。 先安裝 cpuset 套件: ```bash $ sudo apt install cpuset ``` 用以下指令來建立 cpuset 。這裡假設要隔離第 0 個 core ,而 1-3 這四個 core 不要隔離。 建立二個 cpuset 分別是 `others` 和 `isolated` 。我們把要放在第 0 個 core 的行程放在 `isolated` ,其他的行程都放在 `others` 。 ```bash $ sudo cset set -c 0 isolated $ sudo cset set -c 1-3 others ``` 確保 `isolated` 這個 cpuset 不執行 task scheduling : ```bash $ sudo sh -c "echo 0 > /cpusets/isolated/sched_load_balance" ``` 把所有行程放到 `others` 內 ```bash $ sudo cset proc -m root others ``` 至此就建立好量測環境了。可以用下列指令來讓行程放在 `isolated` 內執行 ```bash $ sudo cset proc -e isolated -- sh -c './measure > data.txt' ``` :::info 最近再次使用 `cset` 時,一直出現 `cset` 無法去掛載 `cpuset` 相關檔案系統到 `/cpusets` 錯誤訊息。後來是發現它跟 `cgroup v2` 有衝突,當使用 `cgroup v2` 的 `cpuset` 機制時,需要透過 `cgroup v2` 機制來設定 `cpuset` ,故 `cset` 就無法用傳統方式操作 `cpuset`。 原本系統沒有這個問題,不確定是哪次更新系統或安裝某個套件後開始有這個問題。 故後來的我先把 `cgroup v2` 關閉,改用 `cgroup v1` 。 修改 `/etc/default/grub` ,在 `GRUB_CMDLINE_LINUX_DEFAULT=` 後面,加上 `systemd.unified_cgroup_hierarchy=false` ,例如: ``` GRUB_CMDLINE_LINUX_DEFAULT="quiet splash systemd.unified_cgroup_hierarchy=false" ``` 再執行 `update-grub` 更新 GRUB 設定。 ::: ## 支援大數運算 在作業說明有提供二種方式進行第 93 費氏數後的運算,一個是用 `__int128` ,另一個方式是用十進位字串的方式。但因為十進位字串的方式感覺很浪費空間,故我決定實作一個使用 64-bit 無號整數組成陣列的方式來提供大數運算。 :::info Linux 內建的大數運算函式 [`include/linux/mpi.h`](https://elixir.bootlin.com/linux/v6.2.3/source/include/linux/mpi.h) ,可作為效能比較標的之一。 > 為何不直接使用 Linux 核心 MPI 相關 API 呢? > :notes: jserv > 我一開始打算按照作業說明提供的材料,自己練習寫一個大數運算的程式,應用作業說明提到的方式來試試看。後來看到 Linux 核心有 MPI 相關 API 後,打算也用 MPI 相關 API 來實作比較效能,當下沒有想到就直接使用它來做。 > > 今天下午我嘗試用 MPI 相關 API 來開發,但很關鍵的函式 `mpi_mul` (乘法) 沒有被 export 出來,編譯後沒辦法連結,無法使用。我用 [lxr](https://elixir.bootlin.com/linux/latest/source) 查,發現它是在 v6.0 以後才被 export 出來可以用在核心模組。之後我會安裝 v6.0 以後版本的核心再來試看看。 ::: ### 計算所需位元數 按照[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-b)來計算。 透過這個公式可直接計算第 $n$ 個費氏數: $$ F(n)= \frac{(\phi^n)-(1-\phi)^n}{\sqrt5} \\ \phi \text{ (golden ratio)} = \frac{1+\sqrt5}{2} \approx 1.61803398875 $$ 當 n 足夠大時,可得到約略值: $$ F(n) \approx \frac{\phi^n}{\sqrt5} \approx 0.4472135955 \times 1.61803398875^n $$ 取得以 2 為底的對數 $$ \log F(n) \approx -1.160964 + n\times0.694242 $$ 但因為 Linux 核心不能適合使用浮點數,故乘上 $10^6$ 轉成整數運算 $$ \lfloor\frac{-1160964 + n \times694242}{10^6}\rfloor + 1 $$ 寫成 C 語言函式。因為 `k` 小時誤差大,故多一個比較式確保至少有 2 個 bit 被使用。 ```c static int fib_num_of_bits(int k) { int digits = ((long) k * 694242 - 1160964) / (10 * 6) + 1; digits = digits < 2 ? 1 : digits; return digits; } ``` ### 存放數字的資料結構 定義一個資料結構,可以用來放大數。其中 `capacity` 代表 `digits` 有幾個 `unsigned long` ,而 `size` 代表用到幾個 `unsigned long` 來存放目前數字。 另外在計算費氏數時,不會出現負數,故不針對負數做處理。此外一開始就把所需的空間開好,故也不考慮溢位要增加空間的處理。 ```c struct bignum { unsigned long *digits; int capacity; int size; }; ``` 定義所需大數運算的操作。由 fast-doubling 公式可知,需要有左移一位、加法、減法和乘法運算。 ```c void bn_add(struct bignum *c, struct bignum *a, struct bignum *b); void bn_sub(struct bignum *c, struct bignum *a, struct bignum *b); void bn_mul(struct bignum *c, struct bignum *a, struct bignum *b); void bn_lshift1(struct bignum *c); ``` #### 加法 用小學減法來實作 `bn_add`: ```c void bn_add(struct bignum *c, struct bignum *a, struct bignum *b) { int i, j; int carry = 0; for (i = 0; i < a->size && i < b->size && i < c->capacity; i++) { c->digits[i] = a->digits[i] + b->digits[i] + carry; if (carry) carry = c->digits[i] <= a->digits[i]; else carry = c->digits[i] < a->digits[i]; } for (; i < a->size && i < c->capacity; i++) { c->digits[i] = a->digits[i] + carry; carry = c->digits[i] < a->digits[i]; } for (; i < b->size && i < c->capacity; i++) { c->digits[i] = b->digits[i] + carry; carry = c->digits[i] < b->digits[i]; } if (i < c->capacity && carry) { c->digits[i] = carry; i++; } j = i; for (; j < c->size; j++) c->digits[j] = 0; c->size = i; } ``` #### 減法 用小學減法來實作 `bn_sub`: ```c void bn_sub(struct bignum *c, struct bignum *a, struct bignum *b) { int i, j; int borrow = 0; for (i = 0; i < a->size && i < b->size && i < c->capacity; i++) { c->digits[i] = a->digits[i] - b->digits[i] - borrow; if (borrow) borrow = c->digits[i] >= a->digits[i]; else borrow = c->digits[i] > a->digits[i]; } for (; i < a->size && i < c->capacity; i++) { c->digits[i] = a->digits[i] - borrow; borrow = c->digits[i] > a->digits[i]; } for (; i < b->size && i < c->capacity; i++) { c->digits[i] = b->digits[i] - borrow; borrow = c->digits[i] > b->digits[i]; } j = i; for (; j < c->size; j++) c->digits[j] = 0; c->size = i; while (c->size > 0 && c->digits[c->size - 1] == 0) c->size--; } ``` #### 乘法 用小學乘法來做:因為二個 64-bit 整數相乘會得到 128-bit ,故此函式有用到 GCC Extension `__uint128_t` 。 ```c void bn_mul(struct bignum *c, struct bignum *a, struct bignum *b) { int i, j; /* Clear c */ for (i = 0; i < c->size; i++) c->digits[i] = 0; for (i = 0; i < a->size && i < c->capacity; i++) { for (j = 0; j < b->size && i + j < c->capacity; j++) { __uint128_t product = (__uint128_t) a->digits[i] * (__uint128_t) b->digits[j]; u64 product0 = product; u64 product1 = product >> 64; int carry = 0; c->digits[i + j] += product0; if (i + j + 1 >= c->capacity) continue; /* Overflow */ carry = c->digits[i + j] < product0; c->digits[i + j + 1] += product1 + carry; if (carry) carry = c->digits[i + j + 1] <= product1; else carry = c->digits[i + j + 1] < product1; for (int k = i + j + 2; k < c->capacity && carry; k++) { c->digits[k] += carry; carry = c->digits[k] < carry; } } } /* Calculate the size */ c->size = a->size + b->size; if (c->size > c->capacity) c->size = c->capacity; while (c->size > 0 && c->digits[c->size - 1] == 0) c->size--; } ``` #### 左移 1 位元 (乘以 2) 從最高位的左邊 1 位開始計算,每次左移 1 位元時,跟目前處理右邊一位的最高位元做 AND 運算。 ```c void bn_lshift1(struct bignum *c, struct bignum *a) { int i = a->size; if (i >= c->capacity) i = c->capacity - 1; int j = i; for (; i >= 1; i--) c->digits[i] = a->digits[i] << 1 | a->digits[i - 1] >> 63; c->digits[0] = a->digits[0] << 1; if (c->digits[j] == 0) c->size = j; else c->size = j + 1; } ``` ### 把計算結果放進 `buf` 目前的計算結果是直接把數字回傳,但因為回傳值型態是 `size_t` 為 64-bit 無號整數,在大的費氏數無法用。改把計算結果用 `buf` 傳回。 但是把結果放進 `buf` 會需要使用 `size` 參數才能知道 `buf` 大小,但又顧及到需動態選擇演算法,因為改成用 `write` 介面來選擇演算法。 我把計算結果暫存至 `struct file` 內的 `private_data`。 我新增了一個結構 `struct fibdrv_priv` ,這個結構會在 `open` 時被建立,然後讓 `private_data` 指向它,並在 `release` 時被釋放。 ```c struct fibdrv_priv { size_t size; size_t pos; char *result; int impl; struct mutex lock; ktime_t start; ktime_t end; }; ``` 這個資料結構可以針對每一個獨立開啟 file descriptor 保存一份狀態,這樣子就可以讓 fibdrv 被同時使用,此外原本放在全域的 mutex 就可以移除。 新增 `fib_init_priv` 和 `fib_free_priv` 二個函式來初始化和釋放 `struct fibdrv_priv` : ```c static void fib_init_priv(struct fibdrv_priv *priv) { priv->size = 0; priv->result = NULL; priv->impl = 0; priv->start = 0; priv->end = 0; mutex_init(&priv->lock); } ``` ```c static void fib_free_priv(struct fibdrv_priv *priv) { if (priv->result) kfree(priv->result); mutex_destroy(&priv->lock); kfree(priv); } ``` 另外修改 `open` 和 `release` 函式。`open` 會配置空間給 `struct fibdrv_priv` 並初始化它,而 `release` 則會釋放 `struct fibdrv_priv` 空間: ```c static int fib_open(struct inode *inode, struct file *file) { struct fibdrv_priv *priv = kmalloc(sizeof(struct fibdrv_priv), GFP_KERNEL); if (!priv) { printk(KERN_ALERT "kmalloc failed"); return -ENOMEM; } fib_init_priv(priv); file->private_data = priv; return 0; } ``` ```c static int fib_release(struct inode *inode, struct file *file) { fib_free_priv((struct fibdrv_priv *) file->private_data); return 0; } ``` 另外原本計算費氏數的函式會回傳計算結果,現在改成把計算結果寫進 `struct fibdrv_priv` : ```c priv->result = kmalloc(sizeof(long long), GFP_KERNEL); if (!priv->result) return -ENOMEM; priv->size = sizeof(long long); memcpy(priv->result, &f[k % 2], sizeof(long long)); ``` --- `read` 的行為修改成: - 如果呼叫時,沒有已計算好的費氏數結果,就會進行計算。 - 除了計算外,也會把最多 `size` 個位元組,複製到使用者傳入的 `buf` 中。 - 當複製完成後,即使用者提供的 buffer 空間大於所需的空間時,就把核心內部的計算結果清掉,不要佔用空間。 ```c /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { struct fibdrv_priv *priv = (struct fibdrv_priv *) file->private_data; ssize_t ret = 0; mutex_lock(&priv->lock); /* Whether we need to calculate new value */ if (!priv->result) { ktime_t start, end; start = ktime_get(); fib_sequence((int) *offset, priv); end = ktime_get(); priv->start = start; priv->end = end; } if (size) { /* Copy to user */ char *bufread = priv->result + priv->pos; int release = 0; if (priv->size - priv->pos < size) { /* The returned data is less than the data copied. * The user space program should know that it read all the data. */ release = 1; size = priv->size - priv->pos; } ret = size; if (copy_to_user(buf, bufread, size) != size) ret = -EFAULT; else priv->pos += size; if (release) { /* Discard the result */ kfree(priv->result); priv->result = NULL; } } mutex_unlock(&priv->lock); return ret; } ``` 修改 `lseek` ,在操作時要鎖住 mutex ,並且把先前的結果清除掉: (中間程式跟原來一樣,故省略) ```c static loff_t fib_device_lseek(struct file *file, loff_t offset, int orig) { struct fibdrv_priv *priv = (struct fibdrv_priv *) file->private_data; loff_t new_pos = 0; mutex_lock(&priv->lock); kfree(priv->result); priv->result = NULL; ... mutex_unlock(&priv->lock); return new_pos; } ``` --- `write` 可以用來選擇計算費氏數的實作和取得上次計算費氏數的時間: ```c /* write operation is skipped */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { struct fibdrv_priv *priv = (struct fibdrv_priv *) file->private_data; ssize_t ret = 0; mutex_lock(&priv->lock); if (size == 0) ret = priv->start; else if (size == 1) ret = priv->end; else priv->impl = (int) size; mutex_unlock(&priv->lock); return ret; } ``` ### 實作大數運算版本費氏數列 我分別實作了 Dynamic Programming 和 Fast doubling 版本如下: **Dynamic Programming 版本** ```c static int fib_sequence_bignum_dp(unsigned k, struct fibdrv_priv *priv) { struct bignum f[2], tmp; int bits = fib_num_of_bits(k); int nlong = bits / BITS_PER_LONG + 1; int ret = 0; bn_init(&f[0], nlong); bn_init(&f[1], nlong); bn_init(&tmp, nlong); bn_set_ul(&f[0], 0); bn_set_ul(&f[1], 1); for (int i = 2; i <= k; i++) { bn_add(&tmp, &f[0], &f[1]); bn_swap(&tmp, &f[i % 2]); } /* Save the result */ struct bignum *res = &f[k % 2]; if (res->size == 0) res->size = 1; priv->result = kmalloc(res->size * sizeof(unsigned long), GFP_KERNEL); if (!priv->result) { ret = -ENOMEM; goto out; } priv->size = res->size * sizeof(unsigned long); memcpy(priv->result, res->digits, priv->size); out: bn_free(&f[0]); bn_free(&f[1]); bn_free(&tmp); return ret; } ``` **Fast Doubling 版本** ```c static int fib_sequence_bignum_fast_doubling(unsigned k, struct fibdrv_priv *priv) { int len = fls(k); int bits = fib_num_of_bits(k); int nlong = bits / BITS_PER_LONG + 1; int ret = 0; if (len > 0 && len < 32) k <<= 32 - len; struct bignum f0, f1, tmp0, tmp1, tmp2; bn_init(&f0, nlong); bn_init(&f1, nlong); bn_init(&tmp0, nlong); bn_init(&tmp1, nlong); bn_init(&tmp2, nlong); bn_set_ul(&f0, 0); bn_set_ul(&f1, 1); for (; len; len--, k <<= 1) { /* Fast doubling */ bn_lshift1(&tmp0, &f1); bn_sub(&tmp1, &tmp0, &f0); bn_mul(&tmp0, &f0, &tmp1); bn_mul(&tmp1, &f0, &f0); bn_mul(&tmp2, &f1, &f1); bn_add(&f1, &tmp1, &tmp2); bn_swap(&f0, &tmp0); if (k & (1U << 31)) { /* Fast doubling + 1 */ bn_add(&tmp0, &f0, &f1); bn_swap(&f0, &f1); bn_swap(&f1, &tmp0); } } /* Save the result */ if (f0.size == 0) f0.size = 1; priv->result = kmalloc(f0.size * sizeof(unsigned long), GFP_KERNEL); if (!priv->result) { ret = -ENOMEM; goto out; } priv->size = f0.size * sizeof(unsigned long); memcpy(priv->result, f0.digits, priv->size); out: bn_free(&f0); bn_free(&f1); bn_free(&tmp0); bn_free(&tmp1); bn_free(&tmp2); return ret; } ``` :::warning TODO: 引入 hash table 來保存已經運算過的數值,再利用已知 Fibonacci 數列的原理來加速後續運算。 :notes: jserv ::: ### 把大數轉成十進位字串 要在使用者模式印出由核心模式傳來由 64-bit 無號整數組成的大數字串,需要先轉成 十進位由 0 ~ 9 組成的字串。 為了避免時常重新配置空間,字串所需大小可以用這個公式計算: $$ \lceil\frac{n}{\log_2 10}\rceil $$ 其中 $n$ 是原來數字是由幾個位元組成,計算出來的結果就是字串所需的長度。 因為直接對大數做除法很花時間,故先找到最大可以存進 64-bit 無號整數,以 1 為開頭後面都是 0 的十進位數字。可以找到 $1 \times 10^19$ 。 每次都先對大數除以 $1 \times 10^19$ 取得商數和餘數。把餘數轉成 10 進位數並串接到字串上(先以反方向串接)。再對商數重複上述步驟,直到商數為 0 。最後再把字串反轉就是答案了。 ```c static char *bn_to_str(char *buf, const uint64_t *bn, int nlimbs) { char *p = buf; uint64_t *q = (uint64_t *) malloc(nlimbs * sizeof(uint64_t)); int qlimbs = nlimbs; if (!q) return NULL; memcpy(q, bn, nlimbs * sizeof(uint64_t)); while (qlimbs > 0) { uint64_t rem; bn_div_ul(q, &rem, q, qlimbs, &qlimbs, 10000000000000000000ULL); uint64_t tmp = rem; for (int i = 0; i < 19; i++) { *p++ = '0' + tmp % 10; tmp /= 10; if (!tmp && !qlimbs) break; } } /* swap buffer */ size_t bufsize = p - buf; for (size_t i = 0; i < bufsize / 2; i++) { char tmp = buf[i]; buf[i] = buf[bufsize - i - 1]; buf[bufsize - i - 1] = tmp; } *p = '\0'; free(q); return buf; } ``` ## 初步效能分析 圖中的 kernel 代表在 kernel 量測計算費氏數時間的結果, user 則代表在 user space 量測時間的結果, kernel to user 則是把 user 時間減去 kernel 時間,代表 system call 和在 user space 呼叫 `clock_gettime` 取得時間的成本。 我使用作業說明提到的 [Python 程式](https://github.com/colinyoyo26/fibdrv/commit/4ce43c4e7ee0b80c4ce9e6dcd995bbfdf239206c)來進行分析。我把它[引進到我的程式](https://github.com/yanjiew1/fibdrv/blob/09cf4ceb8d695b26e8503c2201f654ac14da2d75/scripts/driver.py)中。 **64-bit 整數 DP 版本** ![](https://i.imgur.com/W4VbL8S.png) 可以看到花費時間,線性往上增加。從演算法也可知,DP 版本的時間複雜度為 $O(n)$ ,圖也符合這個趨勢。另外從 kernel to user 時間,可以知道 syscall overhead 相當多。 kernel time 的數據大約是從一開始的 138ns 到最後變成 471ns 。而 kernel to user 大約是 210ns。而 `kernel to user` 除了最前面的高峰外,大約在 1800ns 左右。 :::info 一開始在 kernel to user 有一個小高峰,目前推測是 cache 的問題。故我修改量測程式,讓量測是先把待測的程式都跑過一次,確保它們在 cache 內,解決了這個高峰的問題。 > Commit [7e7355f](https://github.com/yanjiew1/fibdrv/commit/7e7355f6190f95e228684b3eee97dc9a0a6d7150) 對比未排除 cache miss 因素的量測結果: ![](https://i.imgur.com/qLy47xE.png) 圖中的數據比前面的圖還快很多,是因為我忘了關 Turbo Boost 。後面的圖都會重新量測做調整。 ::: **64-bit 整數 Fast Doubling 版本** ![](https://i.imgur.com/5xkKWcv.png) Fast Doubling 的時間複雜度為 $O(\log n)$ ,圖中的線大致上是平的,但有時會上下坡動,量測可能還是有一些干擾在。kernel 的時間介於 123ns 到 212ns 之間 ,基本上是平的。而 kernel to user 大約是 1800ns 左右,開銷還是很大。 **大數 DP 版本** ![](https://i.imgur.com/JIR9PEh.png) 相較於前面花費不到 100ns 就可以算完,用大數運算成本增加非常多。最高 kernel 的時間就到 53788ns 。另外有點意外的是它在 93 以內,也是呈現階梯狀。因為在 93 以後,才會需要第 2 個 64-bit 無號整數來存,在第 93 費氏數以前的數字應該都固定用 1 個 64-bit 無號整數存就夠了。 圖中可知我寫的大數運算成本相當高,之後參考 [`sysprog21/bignum`](https://github.com/sysprog21/bignum) 來重新實作一個。 **大數 Fast Doubling 版本** 大數的 Fast Doubling 版本又比 DP 版本花的時間更長。目前估計是因為乘法沒做優化,乘法的時間複雜度是 $O(n^2)$ 除了時間複雜度高外,針對相同的二數相乘,應該可以把重複計算的部份省下來,但我的程式沒有這麼做。這都是未來可以優化的點。之後參考 [`sysprog21/bignum`](https://github.com/sysprog21/bignum) 並搭配快速乘法演算法,來重新實作。 ![](https://i.imgur.com/GcJRUmr.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully