2019q1 Homework2 (fibdrv)

# 2019q1 Homework2 (fibdrv) contributed by < `tony2037` > ### Reviewed by `Zames-Chang` * 在 `fibdrv.c` 中一直的呼叫 kmalloc ，但是都沒有 free 掉，造成很嚴重的 memory leak，數字小可能沒事，但是如果數字大一點或是呼叫的次數多一點的話， kernel 會 stack overflow，電腦會當機。 `f[i] = kmalloc(2 * sizeof(unsigned long long)` * 然後建議可以在實作不同的方法時候，計算其時間與空間複雜度，舉例以 `How to use clz to optimize fast doubling` 這個主題，我大概可以猜到這個結果是 $O(logN)$ 且空間複雜度為 $O(1)$ 但是，這只是我的猜測，如果有寫上一些分析的話，可以與閱讀者之間交叉比對，且除了正確性，更應該驗證一下在電腦上實測跑過的結果(速度)。 * 實作大數的 `fib_sequence` 時你使用了兩個` unsigned long long `大小的記憶體空間作為大數的儲存結構，但實際上應該可以根據輸入的 K 來判別需要多大的記憶體，這個大小大概等於 $bits(N) = 0.7*N$ 。 :::danger 注意看作業要求，無論標題和內文中，中文和英文字元之間要有==空白字元== (對排版和文字搜尋有利)，及早變更！ :notes: jserv ::: > fix > ## `insmod` & `rmmod` * How insmod works 1) Insmod is a small program, which calls **init_module()** to intimate the kernel that a module is attempted to be loaded and transfers the control to the kernel. 2) In kernel, **sys_init_module()** is run. It does a sequence of operations as follows a) Verifies if the user who attempts to load the module has the permission to do so or not. b) After verification, **load_module** function is called. b.1) The **load_module** function assigns temporary memory and copies the elf module from user space to kernel memory using **copy_from_user**. b.2) It then checks the sanity of the ELF file (Verification if it is a proper ELF file etc) b.3) Then based on the ELF file interpretation, it generates offset in the temporary memory space allocated. This is called the convenience variables. b.4) User arguments to the module are also copied to the kernel memory b.5) The state of the module is updated to **MODULE_STATE_COMING** b.6) The actual location in the kernel memory is allocated using **SHF_ALLOC** b.7) Symbol resolution is done. b.8) The **load_module** function returns **a reference to the kernel module**. c) The reference to the module returned by load_module is added to a **doubly linked list that has a list of all the modules loaded in the system**. d) Then the **module_init function in the module code is called**. e) **Module state** is updated to **MODULE_STATE_LIVE** * How rmmod works 1) rmmod calls **delete_module** which hints the kernel that an rmmod request has come in and the control is transferred to kernel. 2) The **sys_delete_module()** is called in the kernel. This does following operations. a) Checks if the user who attempts to remove the module has the permission to do so or not. b) Checks if any other module that is loaded is dependent on the module attempted to be unloaded. This information can be obtained from the **modules_which_uses_me list**. c) Checks if the module is actually loaded or not in the kernel. This is done by verifying if the current state of the module is **MODULE_STATE_LIVE**. d) It executes the **module_exit** function written in the module code. e) Then **free_module** function is called, which does the following. e.1) Removes any **sysfs references of the module** e.2) Removes all the kernel module object references. e.3) Performs architecture specific clean up if any. e.4) Unloads the module from kernel e.5) Updates the state of module to **MODULE_STATE_GOING** e.6) **Frees the memory used by user space** arguments for the module. ## Character module [https://blog.csdn.net/zqixiao_09/article/details/50850475](https://blog.csdn.net/zqixiao_09/article/details/50850475) ## Measure time [https://stackoverflow.com/questions/3946842/measuring-time-taken-by-a-function-clock-gettime](https://stackoverflow.com/questions/3946842/measuring-time-taken-by-a-function-clock-gettime) [How to Measure Time to execute a function in Kernel space](https://www.linuxquestions.org/questions/linux-software-2/how-to-measure-time-to-execute-a-function-in-kernel-space-607837/) ## fibdrv: `fibdrv.c` * fib_sequence() : 用來直接硬幹 * const struct file_operations `fib_fops`: f_ops 結構體 * .owner = THIS_MODULE, * .read = fib_read: read() * .write = fib_write: write() * .open = fib_open: open() * .release = fib_release: user close() 時呼叫 * .llseek = fib_device_lseek : 改變檔案的存取位置,使得下次讀寫在新位置開始 [reference](https://blog.csdn.net/zqixiao_09/article/details/50850475) ## fibdrv: `client.c` * `#define FIB_DEV "/dev/fibonacci"`: 透過操控這個(使用 `struct file_operations` 的各函數)與 kernel module 互動 * First for loop: 基本就是一直 write(), 從0 到 `offset` , 對應 fib_write() always returns `1`. * Second for loop: 從0 read() 到 offset , 對應 fib_read(), 算出對應的 fib_sequece * Third for loop: 反過來從 offset read() 到 0, 反過來的 fib_sequence * buf: 指向 user space 的 **char pointer**, 但目前沒有被使用, 在後面的大數運算會拿來放置運算結果. ## Make k > 100 看一下 `fibdrv.c` ```cmake= #define MAX_LENGTH 92 ... static long long fib_sequence(long long k) { /* FIXME: use clz/ctz and fast algorithms to speed up */ long long f[k + 2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[k]; } ``` 基本就是硬幹幹出 fib sequence. 首先 `long long` 在 `4.15.0-45-generic #48~16.04.1-Ubuntu x86_64` 為 8 bytes 大, 其範圍為 `-9,223,372,036,854,775,808 ~ 9,223,372,036,854,775,807` 而根據 [https://www.dcode.fr/fibonacci-numbers](https://www.dcode.fr/fibonacci-numbers) 計算出來的結果: F(91) 4660046610375530309 F(92) 7,540,113,804,746,346,429 ======9,223,372,036,854,775,807 F(93) 12,200,160,415,121,876,738 F(94) 19,740,274,219,868,223,167 F(95) 31,940,434,634,990,099,905 `long long 型態顯然不允許 k > 92` 果不其然看看結果: ```= Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 94, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 95, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 96, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 97, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 98, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 99, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 100, returned the sequence 7540113804746346429. ``` 92 之後就滿了. 本來想說改成 `unsigned long long` 可以吧... 結果還是撐不過100 喔對了 commit 的時候如果是 `clang-format-6.0` git會報錯. 這時候 `export PATH=$PATH:/usr/lib/llvm-6.0/bin/` 就可. 所以得大數運算啦, 在這邊我決定用 `unsigned long long [2]` 也就是總共16 bytes 來表示大數 :::warning 參照 [lib/test_overflow.c](https://elixir.bootlin.com/linux/latest/source/lib/test_overflow.c)，注意裡頭 `u64` 和 `u32` 一類的用法，透過這些有明確聲明空間的衍生型態來改寫程式碼 :notes: jserv ::: 修改如下: - [ ] `fibdrv.c` ```clike static unsigned long long *fib_sequence(int k) { /* FIXME: use clz/ctz and fast algorithms to speed up */ unsigned long long *f[k + 2]; for (size_t i = 0; i < (k + 2); i++) { f[i] = kmalloc(2 * sizeof(unsigned long long), GFP_KERNEL); if (f[i] == NULL) { printk("kmalloc error"); return NULL; } f[i][0] = 0; f[i][1] = 0; } f[0][0] = 0; f[1][0] = 1; for (int i = 2; i <= k; i++) { // f[i] = f[i - 1] + f[i - 2]; char carry = 0; if ((ULONG_MAX - f[i - 2][0]) <= f[i - 1][0]) carry = 1; f[i][0] = f[i - 1][0] + f[i - 2][0]; f[i][1] = f[i - 1][1] + f[i - 2][1] + (unsigned long long) (carry); } return f[k]; } ... /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { // memcpy(buf, "Ztex", 4); unsigned long long *f; f = fib_sequence(*offset); memset(buf, 0, 16); for (size_t i = 0; i < 8; i++) { buf[i] = (f[0] >> (4 * i)) & 0xFF; } for (size_t i = 8; i < 16; i++) { buf[i] = (f[1] >> (4 * (i - 8))) & 0xFF; } return f[0]; } ``` 其中 `kmalloc()` 有點像是 kernel 版的 `malloc`, [reference](https://www.linuxjournal.com/article/6930) 根據 [https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html](https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html) 的描述 ``` Description kmalloc is the normal method of allocating memory for objects smaller than page size in the kernel. The flags argument may be one of: GFP_USER - Allocate memory on behalf of user. May sleep. GFP_KERNEL - Allocate normal kernel ram. May sleep. ... ``` 要比 **page size** 小. f[i][0] overflow 時, carry = 1. - [ ] `client.c` ```clike for (i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); sz = read(fd, buf, 16); printf("Reading from " FIB_DEV " at offset %d, returned the sequence " "%llu + (%d * 18446744073709551616).\n", i, sz, buf[8]); } ``` printf() 大數版本還沒有實作出來, 先驗證結果對不對 ```shell Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429 + (0 * 18446744073709551616). Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738 + (0 * 18446744073709551616). Reading from /dev/fibonacci at offset 94, returned the sequence 1293530146158671551 + (1 * 18446744073709551616). Reading from /dev/fibonacci at offset 95, returned the sequence 13493690561280548289 + (1 * 18446744073709551616). Reading from /dev/fibonacci at offset 96, returned the sequence 14787220707439219840 + (2 * 18446744073709551616). Reading from /dev/fibonacci at offset 97, returned the sequence 9834167195010216513 + (4 * 18446744073709551616). Reading from /dev/fibonacci at offset 98, returned the sequence 6174643828739884737 + (7 * 18446744073709551616). Reading from /dev/fibonacci at offset 99, returned the sequence 16008811023750101250 + (11 * 18446744073709551616). Reading from /dev/fibonacci at offset 100, returned the sequence 3736710778780434371 + (19 * 18446744073709551616) ``` :::warning 事先撰寫驗證工具 :notes: jserv ::: ``` F(92) 7540113804746346429 F(93) 12200160415121876738 F(94) 19740274219868223167 F(95) 31940434634990099905 F(96) 51680708854858323072 F(97) 83621143489848422977 F(98) 135301852344706746049 F(99) 218922995834555169026 F(100) 354224848179261915075 ``` ```python Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) [GCC 7.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> 1293530146158671551 + 18446744073709551616 19740274219868223167L (94) >>> 3736710778780434371 + 19 * 18446744073709551616 354224848179261915075L (100) ``` 看起來至少結果先對了, 也可以做到100以上了 (灑花, 現在要加速囉~(還有實作 `BIG NUMBER print` ) ## Speed up 為了使用快速 fibonacci sequence , 必須實作出 big number 的乘法器( multiplier ) * $f(2n)=2*f(n+1)*f(n)-{f(n)}^2$ * $f(2n+1)={f(n+1)}^2+{f(n)}^2$ (驗算過之後,覺得大開眼界) - [ ] multiplier ```clike= static unsigned long long *multiplier(unsigned long long *k1, unsigned long long *k2) { unsigned long long *r = kmalloc(2 * sizeof(unsigned long long), GFP_KERNEL); if (r == NULL) { printk("kmalloc error"); return NULL; } r[0] = 0; r[1] = 0; size_t width = 8 * sizeof(unsigned long long); for (size_t i = 0; i < width; i++) { if ((k2[0] >> i) & 0x1) { r[1] += k1[1] << i; unsigned long long t = k1[0]; (i == 0) ? (t = 0) : (t = t >> (width - i)); r[1] += t; r[0] += k1[0] << i; } } for (size_t i = 0; i < width; i++) { if ((k2[1] >> i) & 0x1) { r[1] += k1[0] << i; } } return r; } ``` :::warning TODO: 應該用 `u64` 取代 `unsigned long long`, working on it ::: :::info 我認為可以用 `ctz` 加速 `multiplier`, 與其 **bit-by-bit** 檢查 **least significant bit** 是否為1, 不如直接找出所需 shift 的量. 我打算先驗證 fast fib 的正確性再來 optimize 這部分 ::: - [ ] subtractor 為了 fast fibonacci 必須實作 big number subtractor, 處理 `borrow` 之類的問題先在 `user space` 驗證, 所以順便寫了個 unit test `foo.c`: ```cmake= #include <limits.h> #include <stdio.h> #include <stdlib.h> #include <assert.h> static unsigned long long *subtractor(unsigned long long *k1, unsigned long long *k2) { /* Assume k1 >= k2, return positive, or NULL as fail */ if (k1 == NULL || k2 == NULL) return NULL; if (k1[1] < k2[1]) return NULL; if ((k1[1] == k2[1]) && (k1[0] < k2[0])) return NULL; unsigned long long *r = malloc(2 * sizeof(unsigned long long)); if (r == NULL) { printf("kmalloc error"); return NULL; } if (k1[0] < k2[0]) { /* Borrow */ k1[1] -= 1; r[0] = ULONG_MAX + 1 - k2[0] + k1[0]; r[1] = k1[1] - k2[1]; return r; } else { r[1] = k1[1] - k2[1]; r[0] = k1[0] - k2[0]; return r; } } int main(int argc, char **argv) { unsigned long long k1[2] = {0}; unsigned long long k2[2] = {0}; /* Normal Case */ k1[1] = 2; k2[1] = 1; k1[0] = 100; k2[0] = 1; unsigned long long *r = subtractor(k1, k2); if (r == NULL) printf("Negative\n"); printf("Normal Case: [%llu] [%llu] \n", r[1], r[0]); assert(r[1] == 1); assert(r[0] == 99); /* Borrow Case */ k1[1] = 2; k2[1] = 1; k1[0] = 0; k2[0] = 1; unsigned long long *t = subtractor(k1, k2); if (t == NULL) printf("Negative\n"); printf("Borrow Case: [%llu] [%llu] \n", t[1], t[0]); assert(t[1] == 0); assert(t[0] == 0xFFFFFFFFFFFFFFFF); /* Negative Case */ k1[1] = 2; k2[1] = 3; k1[0] = 0; k2[0] = 1; unsigned long long *k = subtractor(k1, k2); if (k == NULL) printf("Negative\n"); assert(k == NULL); } ``` `Makefile`: ```shell= CC = gcc CFLAGS += -g -Wall foo: foo.c $(CC) $(CFLAGS) $^ -o $@ all: foo gdb: foo gdb $^ --tui clean: $(RM) foo ``` Result: ```shell= $> make gcc -g -Wall foo.c -o foo $> ./foo Normal Case: [1] [99] Borrow Case: [0] [18446744073709551615] Negative ``` 看起來是可以的... 只能先相信了 - [ ] Fastfib 把 **fastFib** 帶到 userspace 做 unit test 驗證一下結果 ```clike= int main(int argc, char **argv) { /* Base case */ unsigned long long *f0, *f1; f0 = fast_fib(0); f1 = fast_fib(1); printf("f(0): [%llu] [%llu]\n", f0[1], f0[0]); printf("f(1): [%llu] [%llu]\n", f1[1], f1[0]); assert(f0[1] == 0 && f0[0] == 0 && f1[1] == 0 && f1[0] == 1); /* Using fast fibonacci formula case k = 2 */ unsigned long long *f2; f2 = fast_fib(2); printf("f(2): [%llu] [%llu]\n", f2[1], f2[0]); assert(f2[1] == 0 && f2[0] == 1); /* Using fast fibonacci formula case k = 3 */ unsigned long long *f3; f3 = fast_fib(3); printf("f(3): [%llu] [%llu]\n", f3[1], f3[0]); assert(f3[1] == 0 && f3[0] == 2); /* Using fast fibonacci formula case k = 4 */ unsigned long long *f4; f4 = fast_fib(4); printf("f(4): [%llu] [%llu]\n", f4[1], f4[0]); assert(f4[1] == 0 && f4[0] == 3); /* Using fast fibonacci formula case k = 47 */ /* should value = [0][2971215073] */ unsigned long long *f47; f47 = fast_fib(47); printf("f(47): [%llu] [%llu]\n", f47[1], f47[0]); assert(f47[1] == 0 && f47[0] == 2971215073); /* Using fast fibonacci formula case k = 46 */ /* should value = [0][1836311903] */ unsigned long long *f46; f46 = fast_fib(46); printf("f(46): [%llu] [%llu]\n", f46[1], f46[0]); assert(f46[1] == 0 && f46[0] == 1836311903); /* Using fast fibonacci formula case k = 93 */ /* should value = [0][12200160415121876738] */ unsigned long long *f93; f93 = fast_fib(93); printf("f(93): [%llu] [%llu]\n", f93[1], f93[0]); assert(f93[1] == 0 && f93[0] == 12200160415121876738U); /* Using fast fibonacci formula case k = 94 */ /* should value = [1][1293530146158671552] */ unsigned long long *f94; f94 = fast_fib(94); printf("f(94): [%llu] [%llu]\n", f94[1], f94[0]); assert(f94[1] == 1 && f94[0] == 1293530146158671551U); } ``` 結果是對的欸～～可喜可賀. 接下來要搬到 kernel space 囉 ... - [ ] multiplier error: Overflow handle issue 在 `fastFib` Test case f(96) 時遇到問題 `f(96) = [0][...]` 在 `k=94` 時就已經溢位了, 明顯不合理啊. 原來在 `multiplier` 我是這麼寫的 ```clike= unsigned long long t = k1[0]; (i == 0) ? (t = 0) : (t = t >> (width - i)); r[1] += t; r[0] += k1[0] << i; ``` 其中第 4 行進行運算時一旦 **overflow** 也不會自動進位所以改成 ```clike= unsigned long long tmp[2] = {0}; tmp[1] = 0; tmp[0] = k1[0] << i; r = adder(r, tmp); ``` 使用之前寫好的 `adder` 進行大數運算才對 ### 移植到 kernel ``` (fast)Reading from /dev/fibonacci at offset 94, returned the sequence 1293530146158671551 + (1 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 95, returned the sequence 13493690561280548289 + (1 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 96, returned the sequence 14787220707439219840 + (2 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 97, returned the sequence 9834167195010216513 + (4 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 98, returned the sequence 6174643828739884737 + (7 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 99, returned the sequence 16008811023750101250 + (11 * 18446744073709551616). (fast)Reading from /dev/fibonacci at offset 100, returned the sequence 3736710778780434371 + (19 * 18446744073709551616). ``` 到 `k=100` 看起來結果都正確. 接下來要測量時間惹. ## Verify 之前老師有提示, `事先撰寫驗證工具`, 確實對於後面的驗證工作起到事半功倍效果新增了 * 一隻 script: `verify.py` * Makefile rule - [ ] verify.py ```python= expect = [0, 1] result = [] result_split = [] dics = [] for i in range(2, 101): expect.append(expect[i - 1] + expect[i - 2]) with open('result.txt', 'r') as f: tmp = f.readline() while (tmp): result.append(tmp) tmp = f.readline() f.close() for r in result: if (r.find('Reading') != -1): result_split.append(r.split(' ')) k = int(result_split[-1][5].split(',')[0]) f0 = int(result_split[-1][9]) f1 = int(result_split[-1][-3].split('(')[-1]) print('%s/%s/%s' %(k, f0, f1)) dics.append((k, f0, f1)) for i in dics: f = i[1] + i[2] * (0xFFFFFFFFFFFFFFFF + 1) if (expect[i[0]] == f): print('f(%s) sucess' % str(i[0])) else: print('f(%s) fail' % str(i[0])) print(f) print(expect[i[0]]) exit() ``` - [ ] Makefile ``` verify: verify.py client sudo ./client > result.txt $(PY) $< ``` - [ ] Result `$> Make verify` ``` f(0) sucess f(1) sucess f(2) sucess ... ``` ## How to use `clz` to optimize **fast doubling** 如何運用 `clz` 來加速 fast doubling 運算？感謝 `uno` 學長的幫忙, 我終於搞懂了先說說 [clz](https://en.wikipedia.org/wiki/Find_first_set) 的功用：算出 **The most significant bit** 前面有幾個 0 找到 **The most significant bit** 之後怎麼做呢？舉個栗子🌰 eg1: $f(10)$ 是多少呢？首先, 10 = b1010 那我們正常用 fast doubling 可以怎麼做呢？大概就是知道 $f(1)$ $f(2)$ $\rightarrow_{fast,doubling}$ $f(2) f(3)$ $\rightarrow_{fast,doubling}$ $f(4) f(5)$ $\rightarrow_{fast,doubling}$ $\rightarrow_{fast,doubling}$ $f(8) f(9)$ 再利用 $f(8) f(9)$ 直接 iterate 出 $f(10)$ 但是如果今天 $k=60$ 呢？那不就利用 **fast doubling** 算出 $f(32) f(33)$ 一直 iterate 到 $f(60)$ ? 回到 $k=10=0b1010$ 這個例子我們可以 **0b10=2** 代表先利用 **fast doubling** 算出$f(2) f(3)$ 接著遇到 1 $\rightarrow$ $0b101=5$ 代表先 **fast doubling** 到 $f(4)$ 然後 interate 算出 $f(5) f(6)$ 接著遇到 0 $\rightarrow$ 所以 **fast doubling** $f(10) f(11)$ eg2: $f(13)=?$ 首先 $13=b1101$ b1 -> f(1) f(2) b11 -> f(2) f(3) -> f(3) f(4) b110 -> f(6) f(7) b1101 -> f(12) f(13) -> f(13) f(14) summary: * 找到 **The most significant bit** * 由左而右 * 遇到 0: 直接 **fast doubling** * 遇到 1: 先 **fast doubling**, 再往前 **iterate** 1 個 ## Learning list * `MODULE_LICENSE`, `MODULE_AUTHOR`, `MODULE_DESCRIPTION`, `MODULE_VERSION` 舉例來說: ```clike= #define MODULE_LICENSE(_license) MODULE_INFO(license, _license) ``` https://www.ibm.com/developerworks/cn/linux/l-cn-kernelmodules/index.html ``` ``` ## Reference * [What does insmod internally do?](https://www.quora.com/What-does-insmod-internally-do) * [深入淺出 insmod](http://www.jollen.org/blog/2006/11/hello_world_insmod_1.html)