2022q1 Homework3 (fibdrv)

--- tags: linux2022 --- # 2022q1 Homework3 (fibdrv) contributed by < `MicahDoo` > [Sidenotes & Checklist](https://hackmd.io/PWo8t85MRnm1wB-OsPILSw) ## 初期閱讀 ### [費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number)： > 這個演算法的效率改進不少 —— 當 n 不太大時堪用。計算 fib2(n) 時只需 $O(n)$ 個遞迴呼叫，但這表示這個演算法的時間複雜度是 $O(n)$ 嗎？ > 一般我們會說用反覆的加法（也就是用兩個變數，存最近的兩個 Fibonacci 數的手法，即 memoization）算 $F(n)$ 的演算法的時間複雜度為 $O(n)$。但這當然是假設加法運算只需 $O(1)$。如果考慮大數字的 bit 數，以這個演算法算 $F(n)$ 需要的 bit 運算數目大約是 $N(n^2/2 - n/2)$ (注意: 這邊說的都不是 big $O$）這邊讓我想到在 LeetCode 上解題時常常會用到一些利用 (exploit) 數值大小限制的奇招 (hack, maneuver) 來達到 $O(1)$ 複雜度的情況。由於題目的數字通常長度都會在 32 個位元之內，所以常常都可以想到常數時間的解法。另外關於最後一句「(注意: 這邊說的都不是 big $O$）」，由於 big O 通常講的是上界，所以我想撰寫者的意思應為該數字不代表其 order of magnitude 最大就如此。（也有可能是要讀者不要把 $N$ 看成 $O$？） > Binet's Formula (公式解) 呢？$\sqrt{5}$ 可以用牛頓法來算，假設每個 $n$ bit 除法需要 $n^2$ 個 bit 運算，但每一輪只需要留下前面 $N\ \ n + \log n$ 個 bit. 整體上說來需要的 bit 運算數中最大宗的大約是 $\log n \times (N n + \log n)^2$, 再加上一些較不顯著的運算 —— 其實沒有明顯地比反覆加法快。這邊看不太懂，所以直接把論文點開來看。第 31 頁底部寫道： > We will use $Nn$ to represent the number of bits in the $n$th Fibonacci number where $N$ is a constant > $$ > Nn = \log_{2}{{\lambda_1}^n} = n\log_{2}{\lambda_1} > $$ > So $N = n\log_{2}{\lambda_1} \approx 0.69424$ 所以 $N$ 是個常數，值為 0.69424，$N \times n$ 或 $Nn$ 其實就是前一段裡說到的 $N(n)$。第 35 頁第 5 行： > It will be assumed that a division requires $n^2$ bit operations, denoted as $D(n)$. > $$ > T(n) = T(n/2) + D(Nn/2) > $$ 可以看到作者一開始所說的 $n^2$ bit operations 應是指 $D(n)$ 的情況下（有 $n$ 個 bits）： > [Edited] It will be assumed that <span style="color:green">an $n$-bit division</span> requires $n^2$ bit operations, denoted as $D(n)$. 而往下讀也可看到其證明中已插入實際的 bit 數量 $D(Nn/2)$ （第 $n/2$ 個費氏數所需的除法運算數量）。綜上所述，推斷此段落所述之前兩個 $n$ 跟後面的 $n$ 所指不同，易造成誤導，茲試將該段落修改成如下： > Binet's Formula (公式解) 呢？$\sqrt{5}$ 可以用牛頓法來算，<span style="color:green">假設第 $n$ 個費氏數因有 $Nn$ 個 bits ，除法需要 $(Nn)^2$ 個 bit 運算</span>，但每一輪只需要留下前面 $Nn + \log n$ 個 bit. 整體上說來需要的 bit 運算數中最大宗的大約是 $\log n \times (N n + \log n)^2$, 再加上一些較不顯著的運算 —— 其實沒有明顯地比反覆加法快。 ### Fast Doubling 這個部分讀了很久才看懂，做點重點整理，以便未來參考： * Fast Doubling 意思就是透過「一次算兩個+下次能往上算的費氏數第次可達兩倍」這兩個原則，把本來算$F(n)$需要算$F(0)$到$F(n-1)$ 等 $n$ 個數字的情況改良成只需要算 $\log_2{n}$ 個數字。 * 二進位搭配前面 $\log$ 底為 2 的特性，兩者一起考慮，便能透過 bitwise operation 去做出有效率的實作： * 每一個位數代表計算的一個步驟（遞迴中的一次函式呼叫），例如 $F(100, 101)$ 是一個步驟，會用到$F(50)$和$F(51)$，接下來 $F(50, 51)$是第二個步驟，會用到$F(25)$和$F(26)$，但因為$F(25)$是奇數，因此往下算一步，下一步便為 $F(24,25)$是第三步驟，以此類推。 * 每一步驟都會用到兩個運算： 1. 除二，同等於向右位移一位。 2. 奇數偶數調整，也就是往前算一個費氏數。 * 以上兩者綜合起來就是從最低一個 bit 開始往高位一位一位檢視，每次都代表往下呼叫一次函式，且每次呼叫所算的第次會除二，再根據較小數的奇偶判斷要不要往前算一個費氏數。 * 文章中是使用迴圈的方式，因此一個 bit 代表一次迭代，且是從最高位開始檢視，並且計算的方式相反（乘二、往後算一個費氏數）。 :::warning TODO: 畫個圖表來解釋 Fib 的 bit ops，並用`clz`實作。 ::: ### [Writing a Linux Kernel Module — Part 1: Introduction](http://derekmolloy.ie/writing-a-linux-kernel-module-part-1-introduction/): > I begin with a straightforward “Hello World!” loadable kernel module (LKM) and work towards developing a module that can control GPIOs on an embedded Linux device (such as the BeagleBone) through the use of IRQs. Keywords: Loadable kernel module (LKM), GPIOs (Genertal-purpose I/O), IRQ (interrupt request). :::spoiler > I have broken the discussion up over a number of articles, each providing a practical example and outcome. > ... > I have also aligned the tasks performed against my book, Exploring BeagleBone, albeit the articles are self-contained and do not require that you own a copy of the book. ::: > This article is focused on the system configuration, tools and code required to build and deploy a “Hello World!” kernel module. **The second article in this series examines the topic of writing character device drivers** and how to write C/C++ programs in user space that can communicate with kernel space modules. * Character device: devices with which the driver communicates by sending and receiving single characters (bytes), e.g. serial ports, parallel ports, sounds cards. ## 前置作業紀錄  在跟著前置檢查跑的時候，用了下列的指令，將模組加進核心： ```shell $ sudo insmod fibdrv.ko ``` 接著以下的命令： ```shell cat /sys/class/fibonacci/fibonacci/dev ``` 得到 `235:0`。 > 新建立的裝置檔案 /dev/fibonacci，注意到 236 這個數字，在你的電腦也許會有出入。試著對照 fibdrv.c，找尋彼此的關聯。以上這個部分看不太懂。 (TODO) ## FIXME: VLA not allowed in Linux Kernel 在前置作業的測試中，會發現以下這行會出現問題： ```c long long f[k+2] ``` 這是因為 C 中在指定陣列長度時，不能在執行當下決定長度（用變數當長度）。這時通常會有兩個解決辦法： 1. 動態配置記憶體。 2. 直接不用陣列。這裡從 `fib_sequence` 的演算法中可以發現，運算中只需要維護 (maintain) 三個變數即可，因此可以直接不用陣列。 ```c static long long fib_sequence(long long k) { long long first = 0; long long second = 1; long long next = k; // initialize next to k to account for trivial cases (k = 0 or 1) for (int i = 2; i <= k; i++) { next = first + second; first = second; second = next; } return next; } ``` 改完後 `make check` 結果跟一開始一樣，代表結果應相同，沒有改錯地方。 ## 尋找缺陷在檢視程式碼的過程中，發現一件弔詭的事： `fib_sequence()` 回傳的就是費氏數計算的結果，但是往後看字元裝置 (character device) 讀取的函示，回傳的竟是 `fib_sequence()` 的回傳值型別轉換至 `ssize_t` ，「有號大小型別 (signed size type)」，讀至此，心裡出現幾個疑惑： 1. 何以使用 `ssize_t` 型別來表示大小、長度根據系統而可能不同的變數？ 2. 為什麼不用無號的 `size_t` 而要用 `ssize_t`？ 3. 為何用 `ssize_t` 來傳遞資料？為解疑惑，先到 Linux 的 manpage 找到 `ssize_t`： ```shell ssize_t Include: <sys/types.h>. Alternatively, <aio.h>, <monetary.h>, <mqueue.h>, <stdio.h>, <sys/msg.h>, <sys/socket.h>, <sys/uio.h>, or <unistd.h>. Used for a count of bytes or an error indication. According to POSIX, it shall be a signed integer type capable of storing values at least in the range [-1, SSIZE_MAX], and the implementation shall support one or more programming environments where the width of ssize_t is no greater than the width of the type long. Glibc and most other implementations provide a length modifier for ssize_t for the printf(3) and the scanf(3) families of functions, which is z; resulting commonly in %zd or %zi for printing ssize_t values. Although z works for ssize_t on most implementations, portable POSIX programs should avoid using it—for example, by converting the value to intmax_t and using its length modifier (j). Conforming to: POSIX.1-2001 and later. See also: read(2), readlink(2), readv(2), recv(2), send(2), write(2) See also the ptrdiff_t and size_t types in this page. ``` > Used for a count of bytes or an error indication. 這回答了第 2. 題：使用有號型別的原因是因為這樣才能使用 `-1`，來表示錯誤。另外，在 Linux Kernel Module Programming Guide 中，關於 `read` 的段落： ```c= /* Called when a process, which already opened the dev file, attempts to * read from it. */ static ssize_t device_read(struct file *filp, /* see include/linux/fs.h */ char __user *buffer, /* buffer to fill with data */ size_t length, /* length of the buffer */ loff_t *offset) { /* Number of bytes actually written to the buffer */ int bytes_read = 0; const char *msg_ptr = msg; if (!*(msg_ptr + *offset)) { /* we are at the end of message */ *offset = 0; /* reset the offset */ return 0; /* signify end of file */ } msg_ptr += *offset; /* Actually put the data into the buffer */ while (length && *msg_ptr) { /* The buffer is in the user data segment, not the kernel * segment so "*" assignment won't work. We have to use * put_user which copies data from the kernel data segment to * the user data segment. */ put_user(*(msg_ptr++), buffer++); length--; bytes_read++; } *offset += bytes_read; /* Most read functions return the number of bytes put into the buffer. */ return bytes_read; } ``` 從以上的程式碼來看，一般的慣例 (convention) 裡， read 做的事就是在 file 的某個位置讀取資料，把資料填到 buffer 再回傳成功讀取了幾個 bytes (line 34) 。而本次作業的每個 read 其實就是一個查詢 (query)，而結果是從直接運算得到，而非從檔案裡讀取。另外**答案不是丟進緩衝，而是直接回傳成一個變數**。這樣做也許是為了讓效能更好的巧思，但是這樣可能不符合慣例，也要跟客戶端能事先協調好回傳值的意義才行，不符合模組化的初衷。另外一個點是，`ssize_t` 的大小有所限制，因此回傳的數之範圍不大，且根據系統不同而有可能會有差異。以上觀察回答了 1. 和 3.：這應該就是本次作業為了凸顯核心運作特色以及演示大數字計算而故意寫出的缺陷，是需要去修改的部分。 ## 大數值的處理: Upper Half and Lower Half :::spoiler FIXME 在 StackOverflow 查到如何在 console 輸出 `dmesg` 後就不知道要如何關掉功能，導致之後使用 `make check` 只要有 `printk` 的部分就一定會同時印出來。當時輸入的指令為：`dmesg -wH &`。 ::: 首先一開始嘗試用 `struct BigNum()` 的方式存 128 位元的數值。 ```c typedef struct BigNum { unsigned long long lower, upper; } BigNum; static inline void addBigNum(BigNum *sum, BigNum x, BigNum y) { sum->upper = x.upper + y.upper; if (y.lower > ~x.lower) sum->upper++; sum->lower = x.lower + y.lower; } ``` 但是接著遇到的問題便是要如何把數字印出來。如果是一個變數就能表示的數，那便可以用 `sprintf` 和 `%llu` 存到字串裡，這裡則要分別考慮。一般若要 hand-code 十進位的表示式，通常都是 `% 10` 和 `/ 10` 兩者不停重複，把位數同個位數開始一位一位取出來。然而遇到用兩個變數表示時，就要特別考慮高的那一半對低的那一半的影響： 1. `% 10` 的部分：二的冪 `% 10` 後的值有規律可循（2 4 8 6 2 4 8 6 ...)，可以很快推知 `(2 << 64) % 10` 值為 6，因此只要將高位的餘數乘上六，跟低位的餘數加起來，再取一次餘數，其值就是整個數字的餘數了。 ```c (num->lower % 10 + (num->upper % 10) * 6) % 10 ``` 2. `/ 10` 的部分：由於高位的數字會下溢 (underflow) 到低位，因此不能只對兩個變數分別作 `/ 10`，還要再低位的部分加上高位下溢後的貢獻。同時要記得要特別把各自被截斷 (truncated) 的部分拿出來分開算，如此才有考慮到進位的部分。  ```c static inline void rightShiftBigNum(BigNum *num) { unsigned long long upper_rem = num->upper % 10; num->upper /= 10; unsigned long long lower_rem = num->lower % 10; unsigned long long max_num_64 = (unsigned long long) -1; /* Remember to account for carry from lower_rem + \ * upper_rem * 2^32 (or simply lower_rem + upper_rem * 6)*/ num->lower = num->lower / 10 + upper_rem * (max_num_64 / 10) + (lower_rem + upper_rem * 6) / 10; } ``` :::danger `expected.txt` 的數值似乎是錯誤的，要自己修改。或直接從 `Makefile` 裡把 `diff` 的指令去除。 ::: 以下是更改後的結果。這個做法最多可以算到 $F(187)$。 ```c static BigNum *fib_sequence(long long k) { BigNum *first = vmalloc(sizeof(*first)); first->upper = 0; first->lower = 0; BigNum *second = vmalloc(sizeof(*second)); second->upper = 0; second->lower = 1; BigNum *next = vmalloc(sizeof(*next)); next->upper = 0; next->lower = k; BigNum *temp = next; for (int i = 2; i <= k; i++) { addBigNum(next, *first, *second); temp = next; next = first; first = second; second = temp; } return temp; } ``` ```c static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { BigNum *num = fib_sequence(*offset); char str[45]; int idx = 0; while (idx < 44) { if (num->upper != 0) { str[idx] = (num->lower % 10 + (num->upper % 10) * 6) % 10 + '0'; rightShiftBigNum(num); } else { str[idx] = num->lower % 10 + '0'; num->lower /= 10; } if (!num->lower && !num->upper) break; ++idx; } buf[idx + 1] = '\0'; for (int j = idx; j >= 0; --j) { buf[j] = str[idx - j]; } return (ssize_t) 1; } ``` :::warning TODO: 用 `unsigned long long` 陣列或字串來擴大範圍。 :::