坤諦江
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: linux2023 --- # fibdrv Note - follow [2023 年 Linux 核心設計/實作課程作業 —— fibdrv](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a) :::info 參考資料繁多,僅用以紀錄自己瀏覽狀況 - 預期目標+費氏數列 - [ ] 費氏數列相關研讀 - [x] 介紹短片、fast doubling、Fibonacci Q-martix - [ ] [PRNG](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) - [ ] [你所不知道的C語言:遞迴呼叫篇](https://hackmd.io/@sysprog/c-recursion) - [ ] [LFG](https://en.wikipedia.org/wiki/Lagged_Fibonacci_generator) - [ ] [Linux 核心的 hash table 實作](https://hackmd.io/@sysprog/linux-hashtable) - Linux 核心模組 - [x] [Part 1: Introduction](http://derekmolloy.ie/writing-a-linux-kernel-module-part-1-introduction/) - [x] [Part 2: A Character Device](http://derekmolloy.ie/writing-a-linux-kernel-module-part-2-a-character-device/) ::: ## 作業要求 - 在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv),目標是修正 Fibonacci 數計算並改善 fibdrv 核心模組的計算效率,過程中需要量化執行時間,可在 Linux 核心模組和使用者空間 (userspace) 中測量 - 原本的程式碼只能列出到 $Fibonacci(100)$ 而且部分輸出是錯誤的數值,請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用) - 務必研讀上述 Linux 效能分析的提示 的描述,降低效能分析過程中的干擾; - 在 Linux 核心模組中,可用 ktime 系列的 API; - 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API; - 善用統計模型,除去極端數值,過程中應詳述你的手法 - 分別用 [gnuplot 製圖](https://hackmd.io/@sysprog/Skwp-alOg),分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌) - 嘗試解讀上述實驗的時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響。可修改 VFS 函式,允許透過寫入 `/dev/fibonacci` 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。 ![](https://i.imgur.com/avhw95C.png) - 用 printk 固然方便,但其執行時間易受各種因素而不穩定,除了透過統計來觀察,也可改用核心的 sysfs 介面,後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作,可參見 [Sample kobject implementation](https://elixir.bootlin.com/linux/latest/source/samples/kobject/kobject-example.c) (注意: 切換到對應的 Linux 核心版本)。 - 逐步縮減 Fibonacci 的執行時間,過程中要充分量化 - 嘗試研讀 [sysprog21/bignum](https://github.com/sysprog21/bignum) 的實作,理解其中的技巧並與你的實作進行效能比較,探討個中值得取鏡之處。 - 當 $Fib(n)$ 隨著 $n$ 越來越大時,由 Linux 核心傳遞字串形式的十進位就成為效能瓶頸 - 儘量降低由核心傳遞到應用程式的資料量 - 確保應用程式不需要過多的計算量才可重現 Fibonacci 數列 - BONUS: 研讀〈[Integer Encoding Algorithm 筆記](https://kkc.github.io/2021/01/30/integer-compression-note/)〉所提及的策略,在 Linux 核心模組先對數值進行編碼和壓縮,再由應用程式予以顯示 $Fib(n)$ 數值 ## 前期準備 >步驟皆 follow [文件](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-a) 首先檢查 kernel 版本 ```shell uname -r ``` ```shell 5.19.0-35-generic ``` ```shell uname -a // full information ``` ```shell Linux aaron-System-Product-Name 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux ``` 確認 `linux-headers` 套件已經正確安裝於開發環境 ```shell dpkg -L linux-headers-5.19.0-35-generic | grep "/lib/modules" ``` 預期輸出 ```shell /lib/modules /lib/modules/5.19.0-35-generic /lib/modules/5.19.0-35-generic/build ``` 檢查目前使用者身份,避免使用 `root` ```shell whoami ``` 安裝工具 ```shell sudo apt install util-linux strace gnuplot-nox ``` - [strace(1)](https://man7.org/linux/man-pages/man1/strace.1.html): 可以追蹤系統呼叫 - `util-linux` (utility linux) ```shell dpkg -l | grep util-linux ``` - `dpkg -l` 列出以安裝的包 ```shell ii util-linux 2.37.2-4ubuntu3 amd64 miscellaneous system utilities ``` - linux 的系統工具 clone [fibdrv](https://github.com/sysprog21/fibdrv) 並嘗試編譯測試 ```shell make check ``` ```shell Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738 ``` 在 `f(93)` 的時候 `fail` 了 編譯時會產生 `.ko` 檔案,此為 kernel object 檔案,可以使用 [modinfo(8)](https://man7.org/linux/man-pages/man8/modinfo.8.html) 查看 kernel module 的資訊 ``` modinfo fibdrv.ko ``` 預期輸出 ``` filename: /home/aaron/linux2023/fibdrv/fibdrv.ko version: 0.1 description: Fibonacci engine driver author: National Cheng Kung University, Taiwan license: Dual MIT/GPL srcversion: 9A01E3671A116ADA9F2BB0A depends: retpoline: Y name: fibdrv vermagic: 5.19.0-35-generic SMP preempt mod_unload modversions ``` - 延伸閱讀 [EXPORT_SYMBOL用法](http://bluequiet.blogspot.com/2011/10/exportsymbolkernelkernel.html) 掛載進去 ```shell sudo insmod fibdrv.ko ``` 用 [dmesg(1)](https://man7.org/linux/man-pages/man1/dmesg.1.html) 可以看到相關資訊 ```shell $ sudo dmesg [ 3585.255612] fibdrv: loading out-of-tree module taints kernel. [ 3585.255647] fibdrv: module verification failed: signature and/or required key missing - tainting kernel ``` ```shell $ lsmod | grep fib /* Module Size Used by*/ fibdrv 16384 0 ``` - size: 16384 - use: 0 ```shell $ ls -l /dev/fibonacci crw------- 1 root root 509, 0 三 11 17:57 /dev/fibonacci ``` - `1` 是硬連結數 - `root` 為擁有者 (user) 和所屬群組 (group) - `509` 是文件大小 - 延伸閱讀 [Character device drivers](https://linux-kernel-labs.github.io/refs/heads/master/labs/device_drivers.html) **chatGPT 給的解釋** :::info 在 `ls -l` 命令的輸出中,如果一個文件的大小為 0,那麼第五欄的數字就會是 0。如果文件的大小不為 0,那麼第五欄的數字就代表該文件的大小,表示該文件佔用了多少字節的磁盤空間。 而在 `ls -l` 命令的輸出中,第一欄的第一個數字是用來表示該文件的類型的,例如普通文件、目錄、符號鏈接、設備文件等。在 Linux 系統中,設備文件是一種特殊的文件,用來表示硬體設備。設備文件的第一個數字通常是 0,表示這是一個設備文件。 因此,如果一個文件的第一個數字是 4,這表示這是一個目錄文件。如果一個文件的第一個數字是 0,這表示這是一個設備文件,其大小可能為 0,也可能不為 0,具體取決於該設備的容量。因此,同時出現 4 和 0 的情況,代表這是一個目錄文件,並且其中一些條目是設備文件。 ::: ```shell $ cat /sys/module/fibdrv/refcnt 0 ``` - 代表 [reference counting](https://en.wikipedia.org/wiki/Reference_counting) 在重新 `make check` 之前要 `sudo rmmod fibdrv` ## 什麼是費氏數列 - [ ] [費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number) ![](https://i.imgur.com/pDjDhtH.png) 舉例來說 $$ 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233…… $$ - 下一個數字為前兩個數字相加 Recursive 實作 (Python) ```python def normal_recursion(n): if n < 2: return n return normal_recursion(n - 1) + normal_recursion(n - 2) ``` - 就是執行的效率很差 ![](https://i.imgur.com/RTqq7sF.png) - 做了非常多重複的運算 ### Fast doubling 計算方法 根據 [Fast doubling](https://chunminchang.github.io/blog/post/matrix-difference-equation-for-fibonacci-sequence) 可得: $$ \begin{split} &F&(2k) = F(k)[2F(k+1) - F(k)] \\ &F&(2k+1) = F(k+1)^2+F(k)^2 \end{split} $$ 且遞迴過程縮減,如下 ```graphviz strict digraph G { 1[label="F(6)"] 2[label="F(3)"] 3[label="F(4)"] 4[label="F(1)", style=filled] 5[label="F(2)", style=filled] 6[label="F(2)", style=filled] 7[label="F(3)"] 8[label="F(1)", style=filled] 9[label="F(2)", style=filled] 1 -> {2, 3} 2 -> {4, 5} 3 -> {6, 7} 7 -> {8, 9} } ``` 此時再次利用 $F(3)$ 和遞迴 $F(3)$ 時所得到的 $F(2)$ 來計算 $F(4)$ ,可以再次降低運算的次數 ```graphviz strict digraph G { 1[label="F(6)"] 2[label="F(3)"] 3[label="F(4)"] 4[label="F(1)", style=filled] 5[label="F(2)", style=filled] 6[label=" " ,color=white] 7[label=" " ,color=white] {rank = same; 2;3;} {rank = same; 4;5;} 1 -> {2, 3} 2 -> {4, 5} 2 -> 3 [style=dashed; arrowhead=vee] 5 -> 3 [style=dashed; arrowhead=vee] 3 -> {6, 7} [color=white] } ``` ### 硬體加速 $F(n)$ 的手法 透過 [clz/ctz](https://en.wikipedia.org/wiki/Find_first_set) 搭配 fast doubling 1. 省略 $F(0)$,直接從 $F(1)$ 開始 2. clz/[ffs](https://man7.org/linux/man-pages/man3/ffs.3.html) - `a = 0b1000`, `ffs(a)=4` - `a = 0b1010`, `ffs(a)=2` - `a = 0b1011`, `ffs(a)=1` - 遇到 `0` -> fast doubling,求 $F(2n)$ 和 $F(2n+1)$ - 遇到 `1` -> fast doubling,求 $F(2n)$ 和 $F(2n+1)$,相加後求 $F(2n+2)$ **求解 $F(11)$** (11~10~ = 1011~2~) | | 1 | 0 |1|1| result | -------- | -------- | -------- |-------- |-------- |-------- | | F(n)| F(0*2+1)| F(1*2) |F(2*2+1)|F(5*2+1) |F(11) | a |1 |1|5|89|89 | b |1|2|8|144 - 左邊第一欄,遇到 `1`,計算 $F(2n)=F(2*0)=0$ 以及 $F(2n+1)=F(1)=1 = a$、相加後求得 $b = F(2*0+2)=F(2)=1$ - 左邊第二欄,遇到 `0`,計算 $F(2n)=F(2*1)=1=a$ 以及 $F(2n+1)=F(3)=2=b$ - 左邊第三欄,遇到 `1`,計算 $F(2n)=F(4)=3$ 以及 $F(2n+1) = 5 = a$、相加後求得 $F(2n+2)=8=b$ - 左邊第四欄,遇到 `1`,計算 $F(2n)=F(10)=F(5)[2F(5+1)-F(5)]=5[2*8-5]=55$,計算 $F(2n+1)=F(11)=F(6)^2+F(5)^2=64+25=89=a$,相加得 $F(2n+2)=F(12)=55+89=144$ ### Fibonacci 數的應用 [Pseudorandom number generator](https://en.wikipedia.org/wiki/Pseudorandom_number_generator) ### 加速 Fibonacci 運算 範例: 計算 $F(6)$ ```graphviz strict digraph G { 1[label="F(6)"] 2[label="F(3)", color=red] 3[label="F(4)"] 4[label="F(2)", style=filled] 5[label="F(1)", style=filled] 6[label="F(3)", color=red] 7[label="F(2)", style=filled] 8[label="F(2)", style=filled] 9[label="F(1)", style=filled] {rank = same; 2;3;} {rank = same; 4;5;6;7} {rank = same; 8;9} 1 -> {2, 3} 2 -> {4, 5} 3 -> {6, 7} 6 -> {8, 9}; } ``` - 根據 fast doubling 計算,但是還是有重複計算的部份,當 `target` 數值越大,重複的計算效能衝擊越顯著。 ### Bottom-up Fast Doubling >[Top-down vs Bottom-up approach in Dynamic Programming](https://www.enjoyalgorithms.com/blog/top-down-memoization-vs-bottom-up-tabulation) 以 87~10~ 為例: ```pyhon 87 = 1010111 (87 >> i+1) i = 0 : 43 = (1010111 - 1) >> 1 = 101011 i = 1 : 21 = ( 101011 - 1) >> 1 = 10101 i = 2 : 10 = ( 10101 - 1) >> 1 = 1010 i = 3 : 5 = ( 1010 - 0) >> 1 = 101 i = 4 : 2 = ( 101 - 1) >> 1 = 10 i = 5 : 1 = ( 10 - 0) >> 1 = 1 i = 6 : 0 = ( 1 - 1) >> 1 = 0 ^ 87 的第 i 個位元 ``` 移項並反過來看 ```python (87 >> i+1) i = 6 : 1 = 0 << 1 + 1 = 1 i = 5 : 2 = 1 << 1 + 0 = 10 i = 4 : 5 = 10 << 1 + 1 = 101 i = 3 : 10 = 101 << 1 + 0 = 1010 i = 2 : 21 = 1010 << 1 + 1 = 10101 i = 1 : 43 = 10101 << 1 + 1 = 101011 i = 0 : 87 = 101011 << 1 + 1 = 1010111 ^ 87 的第 i 個位元 ``` 這邊感謝 [Eroiko 的筆記](https://hackmd.io/@Eroiko/fibdrv-impl) 幫助很大。 ### $F(92)$ 以後的數值錯誤原因 測試結果如同作業說明及二補數計算,在返回值為 `long long` 的情況下 $F(93)$ 會造成 overflow ```shell F(93) = -6246583658587674878 ``` 二補數計算 $$ \begin{split} if(A+B) &>& T_{Max} (overflow)\\ result &=& A+B-2^{64}\\ &=& F(91)+F(92)-2^{64}\\ &=& -6246583658587674878 \end{split} $$ ## 初步支援大數運算 引入 `bn` 結構體,計算 92 項以後的費氏數列 ```c /* number[size - 1] = msb, number[0] = lsb */ typedef struct _bn { unsigned int *number; unsigned int size; int sign; } bn; ``` - `number` - 指向儲存的數值,之後以 **array** 的形式來取用 - `size` - 配置的記憶體大小,單位為 `sizeof(usigned int)` - `sign` - 0 為 正數,1 為負數 由於大數沒辦法直接以數值的形式列出,故改用**字串**來做呈現,轉換部份利用 **ASCII** 特性並搭配 **fast doubling** 的邏輯來**組合**出 10 進位 Trace 其運作流程 ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { bn *fib = bn_alloc(1); bn_fib_fdoubling(fib, *offset); // bn_fib(fib, *offset); char *p = bn_to_string(fib); size_t len = strlen(p) + 1; size_t left = copy_to_user(buf, p, len); // printk(KERN_DEBUG "fib(%d): %s\n", (int) *offset, p); bn_free(fib); kfree(p); return left; // return number of bytes that could not be copied } ``` 首先宣告一指標 `fib` 指向 `bn` 結構體,並分配空間 `bn_alloc` ```c /* * alloc a bn structure with the given size * the value is initialized to +0 */ bn *bn_alloc(size_t size) { bn *new = (bn *) kmalloc(sizeof(bn), GFP_KERNEL); new->number = kmalloc(sizeof(int) * size, GFP_KERNEL); memset(new->number, 0, sizeof(int) * size); new->size = size; new->sign = 0; return new; } ``` - `bn_alloc` 會使用 `kmalloc` 分配 `bn` 大小的空間,`kmalloc` 和 `vmalloc` 都用以分配核心的記憶體空間,但 `kmalloc` 會分配**連續的虛擬以及實體地址**,但 `vmalloc` 只保證**連續的虛擬地址**,**不保證連續的實體地址**, kernel 中常用 `kmalloc` 因為使用 `vamlloc` 需要進行映射 (mapping) 會讓效能變差 - 參考 [Memory Management APIs](https://www.kernel.org/doc/html/latest/core-api/mm-api.html) (useful GFP flag combinations): - `GFP_KERNEL` is typical for kernel-internal allocations. The caller requires `ZONE_NORMAL` or a lower zone for direct access but can direct reclaim. - 其中 Linux 會將 kernel 地址分成三部份 (`ZONE_DMA`、`ZONE_NORMAL` 和 `ZONE_HIGHMEM`) - 之後便將其結構體中的元素做初始化 接下來將分配好空間的 `bn` 結構體 `fib` 以及要計算的 $n^{th}$ 費式數透過 `bn_fib_fdoubling` 計算 `bn_fib_fdoubling` ```c /* * calc n-th Fibonacci number and save into dest * using fast doubling algorithm */ void bn_fib_fdoubling(bn *dest, unsigned int n) { bn_resize(dest, 1); if (n < 2) { // Fib(0) = 0, Fib(1) = 1 dest->number[0] = n; return; } bn *f1 = dest; /* F(k) */ bn *f2 = bn_alloc(1); /* F(k+1) */ f1->number[0] = 0; f2->number[0] = 1; bn *k1 = bn_alloc(1); bn *k2 = bn_alloc(1); /* walk through the digit of n */ for (unsigned int i = 1U << 31; i; i >>= 1) { /* F(2k) = F(k) * [ 2 * F(k+1) – F(k) ] */ bn_cpy(k1, f2); bn_lshift(k1, 1); bn_sub(k1, f1, k1); bn_mult(k1, f1, k1); /* F(2k+1) = F(k)^2 + F(k+1)^2 */ bn_mult(f1, f1, f1); bn_mult(f2, f2, f2); bn_cpy(k2, f1); bn_add(k2, f2, k2); if (n & i) { bn_cpy(f1, k2); bn_cpy(f2, k1); bn_add(f2, k2, f2); } else { bn_cpy(f1, k1); bn_cpy(f2, k2); } } // return f[0] bn_free(f2); bn_free(k1); bn_free(k2); } ``` - 這裡的實作方法和 iterattion 版本的 fast doubling 一樣,只是在四則運算都分別利用其他函式處理,以 `bn_add` 為例 `bn_add` (`c = a + b`) ```c /* * c = a + b * Note: work for c == a or c == b */ void bn_add(const bn *a, const bn *b, bn *c) { if (a->sign == b->sign) { // both positive or negative bn_do_add(a, b, c); c->sign = a->sign; } else { // different sign if (a->sign) // let a > 0, b < 0 SWAP(a, b); int cmp = bn_cmp(a, b); if (cmp > 0) { /* |a| > |b| and b < 0, hence c = a - |b| */ bn_do_sub(a, b, c); c->sign = 0; } else if (cmp < 0) { /* |a| < |b| and b < 0, hence c = -(|b| - |a|) */ bn_do_sub(b, a, c); c->sign = 1; } else { /* |a| == |b| */ bn_resize(c, 1); c->number[0] = 0; c->sign = 0; } } } ``` - 首先透過結構體中定義的 `sign` 判斷 `a, b` 正負號是否相等,如果相等相加過後輸出的正負號會相同,交由 `bn_do_add` 負責運算,且設定 `c` 符號相同 - 如果 `a, b` 符號不同,則透過比較正負號使 `a` 大於 `b`,再來判斷絕對值數值的大小,這會影響相加後結果的正負,例如 - 如果 `a = -100, b = 10`,做 `swap` 使得 `a = 10, b = -100`,透過比較絕對值後進行運算得到結果為**負的** `|b| - |a|` - 如果 `a = 100, b = -10`,不會做 `swap` 且運算的結果為**正的** `|a| - |b|` - 如果 `a, b` 不同號, 且 `|a| = |b|` ,則輸出為 `0` 實作判斷可以如下表格所示 (`a, b` 不同號,且 `a` > `b`) | Condition | $\|a\|>\|b\|$ | $\|a\| < \|b\|$ | | ---------- | ------------- | --------------- | | Output `c` | $c = +(\|a\|-\|b\|)$ | $c = -(\|b\|-\|a\|)$ | `bn_do_add` ```c /* |c| = |a| + |b| */ static void bn_do_add(const bn *a, const bn *b, bn *c) { // max digits = max(sizeof(a) + sizeof(b)) + 1 int d = MAX(bn_msb(a), bn_msb(b)) + 1; d = DIV_ROUNDUP(d, 32) + !d; bn_resize(c, d); // round up, min size = 1 unsigned long long int carry = 0; for (int i = 0; i < c->size; i++) { unsigned int tmp1 = (i < a->size) ? a->number[i] : 0; unsigned int tmp2 = (i < b->size) ? b->number[i] : 0; carry += (unsigned long long int) tmp1 + tmp2; c->number[i] = carry; carry >>= 32; } if (!c->number[c->size - 1] && c->size > 1) bn_resize(c, c->size - 1); } ``` - 其中 `bn_msb` 會回傳傳入的 `bn` 結構體中所有 array 儲存的資料的 leading zero 有幾個,取較多的那個 + 1 就會是相加後最多有幾個 digits (進位) - `bn_resize` 會將最終結果 `c` resize 成 `d` 的大小,代表著需要幾個 `unsigned int *number` 去儲存相加後的結果 假設結構體為 `uint8_t` 為例,若 `a = 0b11111111 = 255`,`b = 0b10000001 = 129` 我們想要獲得的答案為 `0b 1 10000000 = 384`,但只有 8 bit 的話無法表達,所以會輸出會用兩個 `uint8_t` 數值的陣列表示 ```graphviz digraph g{ rankdir=LR node[shape=record, height=.1]; labela[label="a"][shape=plaintext] labelb[label="b"][shape=plaintext] a[label="{1|1|1|1|1|1|1|1}"] b[label="{1|0|0|0|0|0|0|1}"] labela->a labelb->b labelc[label="c"][shape=plaintext] c2[label="{0|0|0|0|0|0|0|1}"] c1[label="{1|0|0|0|0|0|0|0}"] labelc->c1 labelc->c2 } ``` `bn_do_sub` ```c /* * |c| = |a| - |b| * Note: |a| > |b| must be true */ static void bn_do_sub(const bn *a, const bn *b, bn *c) { // max digits = max(sizeof(a) + sizeof(b)) int d = MAX(a->size, b->size); bn_resize(c, d); long long int carry = 0; for (int i = 0; i < c->size; i++) { unsigned int tmp1 = (i < a->size) ? a->number[i] : 0; unsigned int tmp2 = (i < b->size) ? b->number[i] : 0; carry = (long long int) tmp1 - tmp2 - carry; if (carry < 0) { c->number[i] = carry + (1LL << 32); carry = 1; } else { c->number[i] = carry; carry = 0; } } d = bn_clz(c) / 32; if (d == c->size) --d; bn_resize(c, c->size - d); } ``` 同樣的,在 `bn_do_sub` 中,輸出 `c` 最大的 digit 數為 兩個 digit 數高的那個 (不發生借位),計算 `carry` 值,若兩數值相減後 msb 還是 `1` 的話,代表發生借位,會將 `c->number[i]` 的值設定為 `carry + (1LL << 32)` ,這邊的 `(1LL << 32)` 看起來是要把第 32 bit (含) 以上的都設定為 0 (但實測後不做這件事情可以正確計算) 這裡的判斷是否借位可改寫成 ```diff - carry = (long long int) tmp1 - tmp2 - carry; + tmp1 - tmp2 - carry < 0 ? carry = 1 : 0; + c->number[i] = carry; - if (carry < 0) { - c->number[i] = carry + (1LL << 32); - carry = 1; - } else { - c->number[i] = carry; - carry = 0; - } ``` 一樣可以正確計算 Fibonacci number `bn_sub` ```c /* * c = a - b * Note: work for c == a or c == b */ void bn_sub(const bn *a, const bn *b, bn *c) { /* xor the sign bit of b and let bn_add handle it */ bn tmp = *b; tmp.sign ^= 1; // a - b = a + (-b) bn_add(a, &tmp, c); } ``` 這裡的 `bn_sub` 可以偷過 `bn_add` 來實作,因 `bn` 結構為 `unsigned int` 的 number 陣列,由另外的 `sign` 元素決定其正負,若要執行 `bn_sub`,可以透過反轉 b 的 `sign` 元素再進行 `bn_add` 即可 `bn_mult` ```c /* * c = a x b * Note: work for c == a or c == b * using the simple quadratic-time algorithm (long multiplication) */ void bn_mult(const bn *a, const bn *b, bn *c) { // max digits = sizeof(a) + sizeof(b)) int d = bn_msb(a) + bn_msb(b); d = DIV_ROUNDUP(d, 32) + !d; // round up, min size = 1 bn *tmp; /* make it work properly when c == a or c == b */ if (c == a || c == b) { tmp = c; // save c c = bn_alloc(d); } else { tmp = NULL; bn_resize(c, d); } for (int i = 0; i < a->size; i++) { for (int j = 0; j < b->size; j++) { unsigned long long int carry = 0; carry = (unsigned long long int) a->number[i] * b->number[j]; bn_mult_add(c, i + j, carry); } } c->sign = a->sign ^ b->sign; if (tmp) { bn_cpy(tmp, c); // restore c bn_free(c); } } ``` 實作一般直式長乘法,用輔助函式 `bn_mult_add` 負責將每一行的計算結果疊加上去,假設 `number` 為 4 bit,`a=0b0111 0111 (size=2), b=0b0101 0101 (size=2)` 計算過程如下 ![](https://i.imgur.com/T46FXy3.jpg) ![](https://i.imgur.com/M9FTMs7.jpg) ![](https://i.imgur.com/L2rUJRC.jpg) `bn_lshift` ```c /* left bit shift on bn (maximun shift 31) */ void bn_lshift(bn *src, size_t shift) { size_t z = bn_clz(src); shift %= 32; // only handle shift within 32 bits atm if (!shift) return; if (shift > z) bn_resize(src, src->size + 1); /* bit shift */ for (int i = src->size - 1; i > 0; i--) src->number[i] = src->number[i] << shift | src->number[i - 1] >> (32 - shift); src->number[0] <<= shift; } ``` 此處實作先僅限於移動 32 bit 以內,函式內用一個 bitwise or `|` 來處理跨越 `number` 之間的 bit shift `bn_to_string` ```c char *bn_to_string(const bn *src) { // log10(x) = log2(x) / log2(10) ~= log2(x) / 3.322 size_t len = (8 * sizeof(int) * src->size) / 3 + 2 + src->sign; char *s = kmalloc(len, GFP_KERNEL); char *p = s; memset(s, '0', len - 1); s[len - 1] = '\0'; /* src.number[0] contains least significant bits */ for (int i = src->size - 1; i >= 0; i--) { /* walk through every bit of bn */ for (unsigned int d = 1U << 31; d; d >>= 1) { /* binary -> decimal string */ int carry = !!(d & src->number[i]); for (int j = len - 2; j >= 0; j--) { s[j] += s[j] - '0' + carry; carry = (s[j] > '9'); if (carry) s[j] -= 10; } } } // skip leading zero while (p[0] == '0' && p[1] != '\0') { p++; } if (src->sign) *(--p) = '-'; memmove(s, p, strlen(p) + 1); return s; } ``` 上述實作中關鍵的程式碼為 ```c /* binary -> decimal string */ int carry = !!(d & src->number[i]); for (int j = len - 2; j >= 0; j--) { s[j] += s[j] - '0' + carry; carry = (s[j] > '9'); if (carry) s[j] -= 10; } ``` - 由高位 (MSB) 開始檢測 binary 中每個 bit - 若為 `1`,則輸出**乘二加一** - 若為 `0`,則輸出**乘二** ### 計算 F~93~ (包含) 之後的 Fibonacci 數 參考 [AdrianHuang/fibdrv](https://github.com/AdrianHuang/fibdrv/tree/big_number) 實作程式碼 ### bignum 以數字陣列來儲存(同時也可以指定陣列大小),以 `uint8_t` 為例,其數值範圍在 0\~255,若要表示 300 則以數字陣列 `digit` 來表示,假設陣列大小為 3,則 300 表示為 $$ \begin{split} 300&:&&00000000 \space &00000001 \space &00101100 \\ digit&:&&digit[2] &digit[1]&digit[0]\\ value&:& &0&1&44& \end{split} $$ 然後在透過 [format.c](https://github.com/sysprog21/bignum/blob/master/format.c) 格式轉換為十進位 可以透過以下兩種演算法實現大數的乘法 - **Karatsuba 演算法** - **Schönhage–Strassen Algorithm** ## 改善大數運算 ### 改善方案 1: 改寫 `bn_fib_fdoubling` 直接引入 [Reference](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-d#%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-1-%E6%94%B9%E5%AF%AB-bn_fib_fdoubling) 竟然報錯 ```shell sudo ./client > out *** stack smashing detected ***: terminated Aborted ``` 查看 `out` 檔案發現計算的值有很大的問題 ```shell Reading from /dev/fibonacci at offset 0, returned the sequence 0. Reading from /dev/fibonacci at offset 1, returned the sequence 1. Reading from /dev/fibonacci at offset 2, returned the sequence 10. Reading from /dev/fibonacci at offset 3, returned the sequence 24. Reading from /dev/fibonacci at offset 4, returned the sequence 392. Reading from /dev/fibonacci at offset 5, returned the sequence 724. ``` 在**尚未優化的版本中**, bignum 版本和正常版本的運算可以直接對應 **不支援大數的 fast doubling iteration** ```c static long long fib_sequence_fd_iter(long long k) { // Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, ... if (k <= 2) return !!k; ... for (uint64_t i = count, f2k0, f2k1; i-- > 0;) { /* F(2k) = F(k)[2F(k+1) - F(k)] * F(2k+1) = F(k+1)^2 + F(k)^2 */ f2k0 = fk0 * ((fk1 << 1) - fk0); f2k1 = fk1 * fk1 + fk0 * fk0; if (k & (1UL << i)) { fk0 = f2k1; fk1 = f2k0 + f2k1; } else { fk0 = f2k0; fk1 = f2k1; } } return fk0; } ``` **支援大數的 fast doubling iteration** ```c= void bn_fib_fdoubling(bn *dest, unsigned int n) { bn_resize(dest, 1); if (n < 2) { // Fib(0) = 0, Fib(1) = 1 dest->number[0] = n; return; } ... /* walk through the digit of n */ for (unsigned int i = 1U << 31; i; i >>= 1) { /* F(2k) = F(k) * [ 2 * F(k+1) – F(k) ] */ bn_cpy(k1, f2); bn_lshift(k1, 1); // k1 = 2 * F(k+1) bn_sub(k1, f1, k1); // k1 = 2 * F(k+1) - F(k) bn_mult(k1, f1, k1); // k1 = F(k) * (2F(k+1) - F(k)) /* F(2k+1) = F(k)^2 + F(k+1)^2 */ bn_mult(f1, f1, f1); // f1 = F(k)^2 bn_mult(f2, f2, f2); // f2 = F(k+1)^2 bn_cpy(k2, f1); // k2 = f1 = F(k)^2 bn_add(k2, f2, k2); // k2 = F(k+1)^2 + F(k)^2 if (n & i) { bn_cpy(f1, k2); // f1 = F(k+1)^2 + F(k)^2 bn_cpy(f2, k1); // f2 = F(k) * (2F(k+1) - F(k)) bn_add(f2, k2, f2); } else { bn_cpy(f1, k1); // f1 = F(k) * (2F(k+1) - F(k)) bn_cpy(f2, k2); // f2 = F(k+1)^2 + F(k)^2 } } ... } ``` - 注意根據函式引數, 我們需要的計算結果為 `f1` 也就是傳入的 `dest` ```c void bn_fib_fdoubling_v1(bn *dest, unsigned int n) { ... for (unsigned int i = 1U << (31 - __builtin_clz(n)); i; i >>= 1) { /* F(2k) = F(k) * [ 2 * F(k+1) – F(k) ] */ /* F(2k+1) = F(k)^2 + F(k+1)^2 */ bn_lshift_2(f2, 1, k1);// k1 = 2 * F(k+1) bn_sub(k1, f1, k1); // k1 = 2 * F(k+1) – F(k) bn_mult(k1, f1, k2); // k2 = k1 * f1 = F(2k) bn_mult(f1, f1, k1); // k1 = F(k)^2 bn_swap(f1, k2); // f1 <-> k2, f1 = F(2k) now bn_mult(f2, f2, k2); // k2 = F(k+1)^2 bn_add(k1, k2, f2); // f2 = f1^2 + f2^2 = F(2k+1) now if (n & i) { bn_swap(f1, f2); // f1 = F(2k+1) bn_add(f1, f2, f2); // f2 = F(2k+2) } } ... } ``` 而此改善方案旨在降低使用 `malloc` 及 `memcpy` 的次數,首先新的 `bn_lshift_2` 可以將第一個引數 `src` 左移第二個引數 `shift` 後儲存至第三個引數 `dest` `bn_lshift_2` 函式修改 >[wanghanchi](https://hackmd.io/@wanghanchi/linux2023-fibdrv#%E5%BC%95%E5%85%A5%E6%9B%B4%E4%B8%80%E6%AD%A5%E7%9A%84%E5%84%AA%E5%8C%96) ```diff void bn_lshift_2(const bn *src, size_t shift, bn *dest) { size_t z = bn_clz(src); shift %= 32; // only handle shift within 32 bits atm if (!shift) return; if (shift > z) { bn_resize(dest, src->size + 1); + dest->number[src->size] = src->number[src->size - 1] >> (32 - shift); } else { bn_resize(dest, src->size); } /* bit shift */ for (int i = src->size - 1; i > 0; i--) dest->number[i] = src->number[i] << shift | src->number[i - 1] >> (32 - shift); dest->number[0] = src->number[0] << shift; } ``` 結果在測試時還是錯誤 ```shell make check FIBMODE=3 ``` 檢查過後發現在 `bn_mult` 中當資料來源與目的**沒有**重疊時,並沒有將其 `number` 初始化,以至於產生非預期的取值,之前沒有發現是因為在 bignum 的 fast-doubing 版本中使用 `bn_mult` 的案例都屬於 `(c == a || c == b)` ```diff void bn_mult(const bn *a, const bn *b, bn *c) { // max digits = sizeof(a) + sizeof(b)) int d = bn_msb(a) + bn_msb(b); d = DIV_ROUNDUP(d, 32) + !d; // round up, min size = 1 bn *tmp; /* make it work properly when c == a or c == b */ if (c == a || c == b) { tmp = c; // save c c = bn_alloc(d); } else { tmp = NULL; + for (int i = 0; i < c->size; i++) + c->number[i] = 0; bn_resize(c, d); } for (int i = 0; i < a->size; i++) { for (int j = 0; j < b->size; j++) { unsigned long long int carry = 0; carry = (unsigned long long int) a->number[i] * b->number[j]; bn_mult_add(c, i + j, carry); } } c->sign = a->sign ^ b->sign; if (tmp) { bn_swap(tmp, c); // restore c bn_free(c); } } ``` 修正完成後此節改善方案可通過測試。 ### 改善方案 2: 運用 Q-Matrix [參考網站](https://chunminchang.github.io/blog/post/matrix-difference-equation-for-fibonacci-sequence) 所提供的 fast doubling 方法如下 $$ \begin{split} \begin{bmatrix} F(2n+1) & F(2n)\\ F(2n) & F(2n-1) \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{2n} \end{split} $$ 左右欄對調 $$ \begin{split} \begin{bmatrix} F(2n) & F(2n+1)\\ F(2n-1) & F(2n) \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}^{2n} \end{split} $$ 可以將第一欄取出為 $$ \begin{split} \begin{bmatrix} F(2n)\\ F(2n-1) \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}^{2n} \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\ &= \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}^{2n} \begin{bmatrix} F(1) \\ F(0) \end{bmatrix} \end{split} $$ 列對調 $$ \begin{split} \begin{bmatrix} F(2n-1)\\ F(2n) \end{bmatrix} &= \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^{2n} \begin{bmatrix} 0 \\ 1 \end{bmatrix}\\ &= \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^{2n} \begin{bmatrix} F(0) \\ F(1) \end{bmatrix} \end{split} $$ 觀察 [sysprog21/bignum](https://github.com/sysprog21/bignum) 中費氏數列的實作函式 [fibonacci](https://github.com/sysprog21/bignum/blob/master/fibonacci.c),會發現雖看似採用 fast doubling,但實際是 Q-matrix 這樣的變形,推導如下: $$ \begin{split} \begin{bmatrix} F(2n-1) \\ F(2n) \end{bmatrix} &= \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}^{2n} \begin{bmatrix} F(0) \\ F(1) \end{bmatrix}\\ \\ &= \begin{bmatrix} F(n-1) & F(n) \\ F(n) & F(n+1) \end{bmatrix} \begin{bmatrix} F(n-1) & F(n) \\ F(n) & F(n+1) \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\ \\ &= \begin{bmatrix} F(n)^2 + F(n-1)^2\\ F(n)F(n) + 2F(n)F(n-1) \end{bmatrix} \end{split} $$ 整理後可得 $$ \begin{split} F(2k-1) &= F(k)^2+F(k-1)^2 \\ F(2k) &= F(k)[2F(k-1) + F(k)] \end{split} $$ 使用上述公式改寫 `bn_fib_fdoubling`,搭配使用 `clz` ,後者可讓 n 值越小的時候,減少越多次迴圈運算,從而達成加速。 ### 改善方案 3: 引入 memory pool 原本在實作中在計算前會使用 `bn_resize` 來確保有足夠的空間儲存計算結果 (number 的數量),而現在引入 `capacity` 的方式來管理 `bn` 的可用大小,減少 `bn_resize` 呼叫 `realloc` 的次數。 ```c static int bn_resize(bn *src, size_t size) { if (!src) return -1; if (size == src->size) return 0; if (size == 0) // prevent krealloc(0) = kfree, which will cause problem return bn_free(src); if (size > src->capacity) { /* need to allocate larger capacity */ src->capacity = (size + (ALLOC_CHUNK_SIZE - 1)) & ~(ALLOC_CHUNK_SIZE - 1); // ceil to 4*n src->number = krealloc(src->number, sizeof(int) * src->capacity, GFP_KERNEL); } if (!src->number) { // realloc fails return -1; } if (size > src->size) { /* memset(src, 0, size) */ for (int i = src->size; i < size; i++) src->number[i] = 0; } src->size = size; return 0; } ``` - 只有當 `size` 超過 `capacity` 時才會 `realloc`,並以 4 為單位 - 計算仍以 `size` 作為範圍,不必計算多餘的空間 - trim 時只要縮小 size 就好,不需要實際 `realloc` 來縮小空間 ## [Part 1: Introduction](http://derekmolloy.ie/writing-a-linux-kernel-module-part-1-introduction/) 閱讀筆記 >This article is focused on the system configuration, tools and code required to build and deploy a “Hello World!” kernel module. ### 什麼是 Kernel Module? Loadable kernel module (LKM) 是一個在 Linux kernel run time 下新增、移除程式的機制。這樣使得 kernel 無須知道硬體是如何運作的便可以使 kernel 和 硬體進行通訊 (ideal for device drivers!) - LKM 的替代方案就是將每個驅動都直接灌到 Linux Kernel 中 如果沒有這個東西的話, Linux kernel 將會非常的龐大,且要更新 hardware driver 的話每次都要重新建構 kernel。 LKM 的缺點就是每個 device 的 driver 都需要被 maintain LKMs 是在 runtime 被加載的,但並不是在 user space 運行。 Kernel modules 在 kernel space 運行,Application 在 user space 運行 ![](https://i.imgur.com/SFpAlJU.png) - kernel space 和 user space 都有自己**唯一且不重疊的記憶體位置**,確保應用程式不管在任何硬體平台上在 user space 都有 consistent view。 ### 為什麼要寫 kernel module? 在嵌入式 Linux 中利用 `sysfs` 以及低階 (low-level) 檔案操作來與電子電路進行互動,這樣的方法非常沒有效率。但是,這樣的方法在各種應用程式上已經足夠 (原文提供證明)。 而替代方案就是使用支援中斷 (interrupt) 的 kernel module,但 kernel code 是非常編寫和除錯的。 :::warning 在寫測試 LKMs 時是非常容易 crash system 的,是有可能損壞檔案系統的,若有需要就要備份 ::: ### The Module Code 一般典型的電腦程式在 runtime life cycle 非常直觀。 loader 分配記憶體給程式,然後載入任何需要的 libraries。 指令執行的起始點通常東在 `main()`,然後接著執行、回報錯誤、例外狀況,分配動態記憶體,然後最終運行完成。在程式離開時,作業系統會將 memroy leaks 以及沒有 free 乾淨的 memory 丟到 pool 中。 **但 kernel module 沒有這些東西**,且沒有 `main()` function! 以下是一些主要區別: - **不會按照順序執行**: kernel module 是用自己的 initialization function 來註冊自己 handle request,該函數可以運行然後中止。它可以 handle 的類型會在 module code 中定義。這和 GUI 中的 [event-driven programming model](https://en.wikipedia.org/wiki/Event-driven_programming) 非常類似。 - **沒有自動清理 (automatic cleanup)**: 分配給 module 的任何資源 (resources) 都必須在卸載 (unloaded) module 的時候釋放掉 (released) ,否則在重開機之前這些資源**不可用** - **沒有 `printf()` 函式**: kernel code 無法訪問 (access) user space 的 libraries。kernel module 在 kernel space 生成及執行,**有自己的 memory address space**, kernel space 和 user space 的界面 (interface) 被明確定義,但確實有一個 `printk()` 函數可以輸出 information,且可以在 user space 查看 - **可以被中斷 (interrupt)**: kernel module 可以同時被多個不同的程式 (programs) 或 行程 (process) 使用,須仔細的建構以確保在被中斷時具有一致且有效的行為。就算是單核心的 processor ,也還是要考慮到多行程的問題。 - **具有更高級別的權限**: 一般來說,分配給 kernel module 的 cpu cycle 會比 user space 的 program 來的多,但是就必須要注意 kernel module 不要對整體效能產生不利的影響。 - **不支援浮點數**: kernel code 使用 traps 將系統從整數模式轉換為浮點數模式,但是在 kernel space 下難以執行這些 traps,替代方案是手動保存和恢復浮點數運算,但是最好還是留給 user space code 來處理。 - Reference [Linux 核心的浮點數運算](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-b#-Linux-%E6%A0%B8%E5%BF%83%E7%9A%84%E6%B5%AE%E9%BB%9E%E6%95%B8%E9%81%8B%E7%AE%97) ## [Part 2: A Character Device](http://derekmolloy.ie/writing-a-linux-kernel-module-part-2-a-character-device/) 閱讀筆記 ### Character Device Drivers ChatGPT 給的解釋 :::info Character Device 是指一種在 Unix/Linux 系統中的設備類型,該設備對外提供了一個字符(Character)流的接口,用於與其他軟件或硬件進行通信。一些例子包括終端設備(如 tty)、滑鼠、鍵盤、語音輸入、音訊裝置等。 Character Device 與另一種設備類型 Block Device 不同。Block Device 是指提供塊(Block)訪問方式的設備,例如硬碟驅動器,其將數據劃分為固定大小的塊(通常為512位元組或4KB),用於高效地存儲和讀取數據。 在 Linux 系統中,Character Device 和 Block Device 均被視為一種特殊的檔案,可以使用文件描述符(File Descriptor)對其進行讀寫操作。Character Device 提供了一種簡單的、無緩衝的 I/O 模型,允許使用者以字節為單位進行讀寫操作。相較之下,Block Device 的 I/O 模型則更複雜,需要進行塊緩存、讀寫計劃等操作,以達到更高的性能。 ::: Character decive 一般在 user application 收送資料,就像是 pipes 或 serial ports,立即讀寫 character-by-character stream 中的位元組資料。為典型的驅動軟體提供框架,連接 serial communcations、video capture、audio devices,其主要的替代品是 [block device](https://www.codingame.com/playgrounds/2135/linux-filesystems-101---block-devices/about-block-devices), block device 的行為和常規的文件相似,允許透過讀取、寫入、查找等等操作來查看或操作 buffered array of cached data 這兩種 device types 都可以被 attached 到 file system tree 的 device file 來 access 本文提供了一個運行在 linux kernel space 下,且可以讓 Linux user-space program 和 loadable kernel module (LKM) 之間 pass information 的 character driver,本例中, C user-space application 發送一段 string 給 LKM,然後 LKM 會回傳其收到的字節數。接著解釋為何要解決同步的問題 (本文透過 mutex lock 解決) ### Major and Minor Numbers Device drivers 有一個 major 和 minor 號碼, major number 是 kernel 在當這個 device 被 accessed 時用來識別正確的 device driver,而 minor number 取決於 device,並在驅動程式內部處理 ```shell :/dev$ ls -l |grep i2c crw------- 1 root root 89, 0 三 17 19:19 i2c-0 ``` 如果是 character deviceds ,首欄字母為 `c`,如果是 block device,首欄字母為 `b`,接下來就是常見的 access permission, owner, grops. 可以手動建立 block 或 character device,例如 `sudo mknod /dev/test c 92 1`,但是這個方法必須確認 `92` 這個數字沒有被使用 ### The File Operation Data Structure >`file_operations` 的 data structure 定義於 [`/linux/fs.h`](https://github.com/torvalds/linux/blob/master/include/linux/fs.h),內部結構包含 function pointer 並允許設計人員定義以下操作。 ```c struct file_operations { struct module *owner; // Pointer to the LKM that owns the structure loff_t (*llseek) (struct file *, loff_t, int); // Change current read/write position in a file ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); // Used to retrieve data from the device ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); // Used to send data to the device ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t); // Asynchronous read ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t); // Asynchronous write ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); // possibly asynchronous read ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); // possibly asynchronous write int (*iterate) (struct file *, struct dir_context *); // called when VFS needs to read the directory contents unsigned int (*poll) (struct file *, struct poll_table_struct *); // Does a read or write block? long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long); // Called by the ioctl system call long (*compat_ioctl) (struct file *, unsigned int, unsigned long); // Called by the ioctl system call int (*mmap) (struct file *, struct vm_area_struct *); // Called by mmap system call int (*mremap)(struct file *, struct vm_area_struct *); // Called by memory remap system call int (*open) (struct inode *, struct file *); // first operation performed on a device file int (*flush) (struct file *, fl_owner_t id); // called when a process closes its copy of the descriptor int (*release) (struct inode *, struct file *); // called when a file structure is being released int (*fsync) (struct file *, loff_t, loff_t, int datasync); // notify device of change in its FASYNC flag int (*aio_fsync) (struct kiocb *, int datasync); // synchronous notify device of change in its FASYNC flag int (*fasync) (int, struct file *, int); // asynchronous notify device of change in its FASYNC flag int (*lock) (struct file *, int, struct file_lock *); // used to implement file locking … } __randomize_layout; ``` - 結尾的 `__randomize_layout` 標示可參考 [Linux Kernel之randomized layout](http://xuxinting.cn/2020/12/20/2020-12-20-kernel-randomize-layout/),其目的在於增強 kernel 的安全性 這個 driver 提供了 `read`, `write`, `open`, `release` 等 system call file operation,如果沒有實作的話,那麼這些 function pointer 會指向 `NULL` ### The Device Driver Source Code 本節[原文](http://derekmolloy.ie/writing-a-linux-kernel-module-part-2-a-character-device/)以 BeagleBone 做為舉例,但其架構可從 [fibdrv](https://github.com/sysprog21/fibdrv) 找到相對應的 `init()`、 `exit()`,以及 `file_operations` 等等實作,其中 - `dev_open()`: Called each time the device is opened from user space. - `dev_read()`: Called when data is sent from the device to user space. - `dev_write()`: Called when data is sent from user space to the device. - `dev_release()`: Called when the device is closed in user space. - `llseak()`: Change current read/write position in a file - 在 userr-level 的 entry point 為`lseek(int __fd, off_t __offset, int __whence)` ,控制檔案的讀寫位置,引數 `__offset` 根據引數 `__whence` 來移動讀寫位置的位移數, `__whence` 可以是以下其中一種: - `SEEK_SET`: 引數 `__offset` 即為新的讀寫位置 - `SEEK_CUR`: 以目前的讀寫位置往後增加 `__offset` 個位移量 - `SEEK_END`: 將讀寫位置指向檔案未端後在增加 `__offset` 個位移量 - 在 `fibdrv.c` 中的 `fib_device_lseek` 有對應的實作 在 user space 呼叫 `read()` 函式時,是**無法直接指定文件 offset** 的,文件的偏移量是和 file descriptor 的屬性,只能透過呼叫 `lseek()` 函式或其他類似的 system call 來修改,故若要指定文件偏移量,需在**讀取文件之前使用 `lseek()` 設定偏移量,然後在呼叫 `read()` 函式進行讀取**,例如 `client.c` 中的作法 ```c for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); // cppcheck-suppress unreadVariable sz = read(fd, buf, 1); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%s.\n", i, buf); } ``` 在原文的留言處有提到 >**devnull**: You should add “.owner = THIS_MODULE,” in struct file_operations fops or you will get nasty kernel oopses when unloading or rebooting. Took me 2 weeks to find out. - 在 fibdrv 中有做 ## fibdrv 核心模組內部 可以發現有些東西是在 user-level 運作的 查看 `client.c` ```c #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <unistd.h> #define FIB_DEV "/dev/fibonacci" int main() { long long sz; char buf[1]; char write_buf[] = "testing writing"; int offset = 100; /* TODO: try test something bigger than the limit */ int fd = open(FIB_DEV, O_RDWR); if (fd < 0) { perror("Failed to open character device"); exit(1); } for (int i = 0; i <= offset; i++) { sz = write(fd, write_buf, strlen(write_buf)); printf("Writing to " FIB_DEV ", returned the sequence %lld\n", sz); } for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%lld.\n", i, sz); } for (int i = offset; i >= 0; i--) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%lld.\n", i, sz); } close(fd); return 0; } ``` 上述這段 code 會用系統呼叫 `open` 打開 device file `/dev/fibonacci`,其中也使用了 `open`、`write`、`lseek`、`read` 等系統呼叫來對 `fibonacci` 進行操作,操作之後就可以**從 kernel 讀取出之前運算出來的 fibonacci 數值**,然後就可以在 user space 中進行輸出 `client.c` ```c fd = open(FIB_DEV, O_RDWR); if (fd < 0) { perror("Failed to open character device"); exit(1); } for (i = 0; i <= offset; i++) { sz = write(fd, write_buf, strlen(write_buf)); printf("Writing to " FIB_DEV ", returned the sequence %lld\n", sz); } for (i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 1); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%lld.\n", i, sz); } ``` - 此處透過 `read` 系統呼叫來得到輸出 查看 `fibdrv.c` (參考 [物件導向程式設計篇](https://hackmd.io/@sysprog/c-oop#%E4%BD%A0%E6%89%80%E4%B8%8D%E7%9F%A5%E9%81%93%E7%9A%84-C-%E8%AA%9E%E8%A8%80%EF%BC%9A%E7%89%A9%E4%BB%B6%E5%B0%8E%E5%90%91%E7%A8%8B%E5%BC%8F%E8%A8%AD%E8%A8%88%E7%AF%87) (designated initializers)、[file_operations Struct Reference](https://docs.huihoo.com/doxygen/linux/kernel/3.7/structfile__operations.html)) ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { return (ssize_t) fib_sequence(*offset); } const struct file_operations fib_fops = { .owner = THIS_MODULE, .read = fib_read, .write = fib_write, .open = fib_open, .release = fib_release, .llseek = fib_device_lseek, }; ``` - 上述 `fib_read` 宣告為 `static` (表示其能見度只有在該檔案),但在 `fib_fops` 被初始化成 `read` function pointer,所以之後只要操作該 function pointer,就會像是在操作 `fib_read`,且可以避免**命名衝突** 這裡實作了 `fib_read` 函數內將 `fib_sequence` 的值回傳,透過使用者給定不同的 `offset` 作為 Fibonacci 數列的 $x$ 回傳 $fib(x)$ ### [Linux Virtual File System (VFS)](https://www.win.tue.nl/~aeb/linux/lk/lk-8.html) ![](https://i.imgur.com/lWt3Ymw.png) 在傳統 UNIX 系統中,檔案系統是固定的,但是在 Linux 中,為了結合各個檔案系統,因而加上了一層虛擬檔案系統 (VFS), VFS 是一組**檔案操作的抽象界面**,可以將任何真實的檔案系統透過 VFS **掛載**到 Linux 中。 VFS 並不直接處理檔案格式,而是規定這些處理請求的介面及操作語意,然後交由真實的檔案系統 (像是 EXT2) 去處理。 而透過 VFS, `fibdrv` 核心模組可以將計算出來 Fibonacci 數讓 `client.c` 程式得以存取 ![](https://i.imgur.com/8phveoG.png) Linux 的裝置驅動程式大致分為: - Character Device Driver - Block Device Driver - Network Device Driver 參照 [Linux 核心設計: 檔案系統概念及實作手法](https://hackmd.io/@sysprog/linux-file-system) ### Makefile - [How to autofill "obj-y" or "xxx-objs" in linux kernel module Makefile?](https://stackoverflow.com/questions/67673126/how-to-autofill-obj-y-or-xxx-objs-in-linux-kernel-module-makefile) - [objs in Makefile breaks kernel module](https://stackoverflow.com/questions/19934142/objs-in-makefile-breaks-kernel-module) 在 `Makefile` 中,`make check` 會去加載編譯好的 `fibdrv` kernel module,並將 `client` 輸出儲存至 `out`,透過以下指令 ```shell diff -u out scripts/expected.txt && $(call pass) ``` 來檢驗輸出和預期是否一樣,在第一版本的 `fibdrv` 中僅能輸出正確的 $Fib(92)$ ```shell Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. ``` 在和目標檔案 `expected.txt` 比對時會給 `pass` ,因為 `expected.txt` 本來就給定這樣的預期輸出,若要檢測後續的 Fibonacci number 是否正確則連同 `expected.txt` 也要一起改 ### `copy_[to/from]_user` - user-space 無法直接存取 kernel-space 的記憶體 - Linux device driver 與 user-space 間的 I/O 會與 `fops->read`、`fops->write`、`fops->ioctl` 這三個 system call 有關 不論是從 user-space 讀取資料至 kernel-space,或是將 kernel-space 的資料寫至 user-space, 必須透過 kernel 提供的兩個 API 來進行 - `long copy_tp_user(void *to, const void * from, long n)` - `long copy_from_user(void *to, const void* from, long n)` **`copy_to_user`** - `to`: 資料的目的位置 (指向 user-space 記憶體的指標) - `from`: 資料的來源位址 (指向 kernel-space 記憶體的指標) **`copy_from_user`** - `to`: 資料的目的位置 (指向 kernel-spcae 記憶體的指標) - `from`: 資料的來源位置 (指向 user-space 記憶體的指標) Kernel 裡面有 [copy_to_user](https://elixir.bootlin.com/linux/latest/ident/copy_to_user),可以把資料從 kernel space copy 到 user space - 注意到 `copy_to_user` 其實是有成本的 ## 核心模式的時間測量 >參照 [核心模式的時間測量](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c)、[Linux 核心設計: Timer 及其管理機制](https://hackmd.io/@sysprog/linux-timer) 如果只在 user space 做測量,那就只能測量進出 kernel 內部的時間,測量不到 kernel 內部,進出 kernel 其實有其他的成本開銷 (例如 [mode switch](https://www.ibm.com/docs/en/aix/7.1?topic=performance-mode-switching)) 這時候就需要用到 `hrtimer` (high-resolution timer),在 linux 上的通常可以達到 1 micro second 的精準度 `jiffies` 是一種計時機制,計算自開機以來計時器中斷發生的次數,**較舊的** Linux 核心有提到 timer wheel,而一個新的機制是將計時機制建立在新資料結構 `ktime_t` 上 >參照 [The high-resolution timer API](https://lwn.net/Articles/167897/)、[ktime 相關 API](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html) ### 關於 `client.c` 根據[作業說明](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c#%E9%97%9C%E6%96%BC-clientc)描述,在 user-mode 的位置配置一個 buffer 空間時, kernel driver 不能直接寫入該地址,而是要透過 [copy-to-user](https://elixir.bootlin.com/linux/latest/ident/copy_to_user),將想傳給 user-mode (運作中的 client) 的字串複製到 `fib_read` 的 `buf` 參數後, `client` 方可接收此字串 而 [Hierarchical performance measurement and modeling of the Linux file system](https://www.researchgate.net/publication/221556402_Hierarchical_performance_measurement_and_modeling_of_the_Linux_file_system) 研究指出,從 kernel-level 複製資料到 user-level 的時間成本是每位元組 22ns,故在進行效能分析時,**必須要考慮到 copy_to_user 函式的時間開銷**,特別留意**資料大小成長後造成的量測誤差** ## Linux 效能分析的提示 - 傳統多核心的系統使用 [SMP](https://zh.wikipedia.org/zh-tw/%E5%AF%B9%E7%A7%B0%E5%A4%9A%E5%A4%84%E7%90%86) 進行設計,各個 CPU 共享相同的記憶體,每個 CPU 都可以訪問記憶體中的任何地址,**所需要的時間都是相同的** - 由於 SMP 在擴展上的限制,非統一記憶體存取架構 (NUMA)因而誕生,可以把更多數量的 CPU 組合起來, CPU 可以同時訪問不同的記憶體,**大幅提高並行性**, NUMA 模式下,處理器被劃分為多個節點,每個節點有自己的 local memory,同樣的每個 CPU 都可以訪問所有的記憶體位址,但離自己越遠則需要花費比較多的時間。 ![](https://i.imgur.com/cwjkpa3.png) 參考[Linux 效能分析的提示](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-c#-Linux-%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E6%8F%90%E7%A4%BA)、[KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU),使用 [processor affinity / CPU pinning](https://en.wikipedia.org/wiki/Processor_affinity) 讓行程在特定的 CPU 中執行不受排程的干擾 ```shell $ cat /proc/cpuinfo | grep "^processor" | wc -l 8 ``` 也可以使用 `perf record` 來取樣並紀錄目標的執行狀態 ```shell sudo perf record -g --call-graph dwarf ./measure ``` - `-g` 回同時紀錄取樣點的 stack trace - `--call-graph` 指定走訪 stack trace 的方法 follow [The DWARF Debugging Standard](https://dwarfstd.org/) - 記得要先 `make load` 載入 module ### 限定 CPU 給特定的程式使用 修改 `/etc/default/grub` 內的 `GRUP_CMDLINE_LINUX_DEFAULT` ,加入 `isolcpus=7` 來指定編號 `7` 的核心 **不被排程器賦予任務**,修改完成後 `sudo update-grub` 來更新設定 - `quiet`: 啟動系統的過程中,如果沒有 `quiet`, kernel 就會輸出很多訊息,包括啟動過程中運行了哪些程式,如果系統可以正常啟動,通常就不會需要這些訊息,故會設定成 `quiet` - `splash`: 與可視化界面有關 - `GRUB_CMDLINE_LINUX`: 一直生效 - `GRUB_CMDLINE_LINUX_DEFAULT`: 恢復模式 (recovery mode) 下不會運作 ### 排除干擾效能的因素 - 抑制 [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) (ASLR) ```shell $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" ``` - 設定 scaling_governor 為 performance。 ```shell for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do echo performance > ${i} done ``` 以我的電腦來說,會將 `performance` 分別寫入 ```shell /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor ``` 參考 [CPU frequency scaling](https://wiki.archlinux.org/title/CPU_frequency_scaling),會讓 CPU 在最高頻率下運轉 ![](https://i.imgur.com/cAN9Vlc.png) - 針對 Intel 處理器,關閉 turbo mode ```shell $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` - 關閉 SMT (Hyper-threading) ```shell $ sudo sh -c "echo off > /sys/devices/system/cpu/smt/control" ``` ## gnuplot ```shell set title "Performance with setting affinity" set xlabel "n^{th} fibonacci number" set ylabel "Time(ns)" set terminal png font "Times_New_Roman, 12" set output "tt.png" set key left FILES = system("ls -1 *.txt") plot for [data in FILES]\ data using 3 with linespoints linewidth 2 title data\ ``` ## 觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題 >[並行和多執行緒設計](https://hackmd.io/@sysprog/concurrency/https%3A%2F%2Fhackmd.io%2F%40sysprog%2FS1AMIFt0D) ### POSIX Threads >POSIX Threads 是一套符合 [POSIX](https://en.wikipedia.org/wiki/POSIX) 標準的 API,方便開發者設計出 User-level 的多執行緒程式。 在 [linux/unistd.h](https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/unistd.h) 中提供了 POSIX 作業系統 API 的存取功能標頭檔名稱。通常為大量的 system call。 在可移植性方面,符合 POSIX 標準的 API 其函數名稱、返回值、參數類型等等都按照標準定義,而 POSIX 相容也就指定這些接口 (interface) 函式相容,但是**並不管 API 具體如何實現** 利用 `strace` 簡單測試一個 `printf` 到底呼叫了哪些 system call ```c int main() { printf("Hello!\n"); return 0; } ``` ```shell= execve("./strace", ["./strace"], 0x7ffe4f88a900 /* 54 vars */) = 0 brk(NULL) = 0x55b334ae3000 arch_prctl(0x3001 /* ARCH_??? */, 0x7fff9841a5c0) = -1 EINVAL (Invalid argument) mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc18d976000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=70567, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 70567, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fc18d964000 close(3) = 0 openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\237\2\0\0\0\0\0"..., 832) = 832 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 pread64(3, "\4\0\0\0 \0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0"..., 48, 848) = 48 pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0i8\235HZ\227\223\333\350s\360\352,\223\340."..., 68, 896) = 68 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=2216304, ...}, AT_EMPTY_PATH) = 0 pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784 mmap(NULL, 2260560, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fc18d600000 mmap(0x7fc18d628000, 1658880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x28000) = 0x7fc18d628000 mmap(0x7fc18d7bd000, 360448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fc18d7bd000 mmap(0x7fc18d815000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x214000) = 0x7fc18d815000 mmap(0x7fc18d81b000, 52816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fc18d81b000 close(3) = 0 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc18d961000 arch_prctl(ARCH_SET_FS, 0x7fc18d961740) = 0 set_tid_address(0x7fc18d961a10) = 21183 set_robust_list(0x7fc18d961a20, 24) = 0 rseq(0x7fc18d9620e0, 0x20, 0, 0x53053053) = 0 mprotect(0x7fc18d815000, 16384, PROT_READ) = 0 mprotect(0x55b333cc8000, 4096, PROT_READ) = 0 mprotect(0x7fc18d9b0000, 8192, PROT_READ) = 0 prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 munmap(0x7fc18d964000, 70567) = 0 newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x1), ...}, AT_EMPTY_PATH) = 0 getrandom("\xaf\xb4\x13\x5c\x53\x7c\x1c\x55", 8, GRND_NONBLOCK) = 8 brk(NULL) = 0x55b334ae3000 brk(0x55b334b04000) = 0x55b334b04000 write(1, "Hello!\n", 7Hello! ) = 7 exit_group(0) = ? +++ exited with 0 +++ ``` - 在 `line 37` 中 `write` system call 把字符串 `Hello!` 傳給 file descriptor 1 的設備 ```shell $ ls /dev/std* -l lrwxrwxrwx 1 root root 15 四 10 19:55 /dev/stderr -> /proc/self/fd/2 lrwxrwxrwx 1 root root 15 四 10 19:55 /dev/stdin -> /proc/self/fd/0 lrwxrwxrwx 1 root root 15 四 10 19:55 /dev/stdout -> /proc/self/fd/1 ``` - 對應到的為 `stdout` ### 建立執行緒 >[pthread API](https://www.ibm.com/docs/en/i/7.3?topic=ssw_ibm_i_73/apis/rzah4mst.html) 首先程式運行時第一個 tread 負責運行 `main()`,建立一個以上的執行緒則使用 [pthread_create](https://man7.org/linux/man-pages/man3/pthread_create.3.html) ,而執行緒完成工作後很多種結束的方法 >The new thread terminates in one of the following ways: > * It calls pthread_exit(3), specifying an exit status value that is available to another thread in the same process that calls pthread_join(3). > * It returns from start_routine(). This is equivalent to calling pthread_exit(3) with the value supplied in the return statement. > * It is canceled (see pthread_cancel(3)). > * Any of the threads in the process calls exit(3), or the main thread performs a return from main(). This causes the termination of all threads in the process. 如果不使用 `pthread_join` ,該執行緒會繼續佔用資源直到 process 結束。 `pthread_create` ```c int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg); ``` - `thread`: 指向 `pthread_t` 結構體的指標 - `attr`: 設定 thread 的 attribute,如果設定 `attr` 為 NULL ,則這個 thread 使用 default attributes - `start_routine`: 這個 thread 會 invoke `start_routine()` - `arg` 會作為 `start_routine` 的引數 ## 嘗試建立 hash table ```c hlist_add_head(&dnode->list, &htable[key]); // add to hash table ``` ```shell [ 7850.213620] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 7850.213621] #PF: supervisor write access in kernel mode [ 7850.213622] #PF: error_code(0x0002) - not-present page ``` - `#PF` 代表 page fault,當 page fault 發生時,CPU 會將相關的錯誤訊息除存在 `error code` 的特殊暫存器中 (32bit value) - ` supervisor write access in kernel mode`: 可能發生的原因包含 1. 程式碼對 kernel 空間的指標操作,但是這些指標操作沒有經過適當得初始化。 2. 對該資源進行寫操作,但是該資源已經被鎖定 or 被其他程式使用。 3. 使用了已經被取消或是被棄用的函式或 API,這些函式可能會對 kernel 的 memory 進行寫操作而沒有足夠的權限 ### Memroy management 在多執行緒的預期流程,`insmode` -> 多執行緒進行 hash table 的查表、填表 -> 結束運算,走訪 hash table 並釋放空間 -> `rmmod` >[lkmpg 4.3 The __init and __exit Macros](https://sysprog21.github.io/lkmpg/#the-init-and-exit-macros) >The __exit macro causes the omission of the function when the module is built into the kernel, and like __init , has no effect for loadable modules. Again, if you consider when the cleanup function runs, this makes complete sense; built-in drivers do not need a cleanup function, while loadable modules do. ## quick note - [function declaration isn't a prototype](https://stackoverflow.com/questions/42125/warning-error-function-declaration-isnt-a-prototype) - 引數加上 `void` ## Reference - [Linux 當中的 VFS 虛擬檔案系統](http://sp1.wikidot.com/linuxvfs) - [KYG-2020q1 Homework2 (fibdrv)](https://hackmd.io/@KYWeng/rkGdultSU) - [smp irq affinity介绍](https://blog.csdn.net/yue530tomtom/article/details/76095739) - [POSIX Thread 介紹](https://ithelp.ithome.com.tw/m/articles/10280830) - [posix 是什麼都不知道,就別說你懂 Linux 了!](https://www.readfog.com/a/1645003345925083136) - [Driver (驅動)](https://hackmd.io/@combo-tw/ryRp--nQS#Character-devices) - [調試工具ltrace strace ftrace的使用](https://jasonblog.github.io/note/linux_kernel/diao_shi_gong_ju_ltrace_strace_ftrace_de_shi_yong.html) - [ltrace-strace](https://littlebees.github.io/2021/10/ltrace-strace/)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully