# 2020q2 Homework2 (fibdrv) contributed by < `nickyanggg` > ###### tags: `linux2020` ## 作業說明 - [fibdrv](https://hackmd.io/@sysprog/linux2020-fibdrv#H03-fibdrv) ## 實驗環境 ```shell $ uname -a Linux nickyanggg-ubuntu18 5.3.0-40-generic #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 ``` ## 環境設定 - 獨立出一顆專門只做效能分析的 cpu,以避免受到其他 process 干擾。 - 利用 `taskset` 檢查是否已經空出某特定 cpu。 - 關閉 [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) (ASLR)。 - 設定前面獨立出的 cpu 的 `scaling_governor` 為 performance。 - 針對 Intel 處理器,關閉 turbo mode。 ```shell $ taskset -cp 1 pid 1's current affinity list: 0-5 ``` 在 `/boot/grub/grub.cfg` 新增 `isolcpus=1`,並重新開機。 ```shell $ taskset -cp 1 pid 1's current affinity list: 0,2-5 $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" $ sudo sh -c "echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor" $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` ## Fibonacci - 參考 [費氏數列](https://hackmd.io/CTwO5ldOSgKxP-N4YNRVOA?view#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97)、[Fast Doubling](https://chunminchang.github.io/blog/post/calculating-fibonacci-numbers-by-fast-doubling)。 - 藉由下面這二條公式實作 `fast_fib` 函式。 $$ \begin{split} F(2k) &= F(k)[2F(k+1) - F(k)] \\ F(2k+1) &= F(k+1)^2+F(k)^2 \end{split} $$ - 比較 `fast_fib` 和使用 dynamic programming 方式實作的 `dp_fib` 的 performance。 - 實驗時利用 `taskset` 去指定被閒置的 cpu 來跑。 ### fast_fib ```cpp unsigned long long fast_fib(int n) { unsigned long long a = 0, b = 1, t1, t2; for (int i = 31; i >= 0; i--) { t1 = a * (2 * b - a); t2 = b * b + a * a; a = t1; b = t2; if ((n >> i) & 1) { t1 = a + b; a = b; b = t1; } } return a; } ``` ### dp_fib ```cpp unsigned long long dp_fib(int n) { unsigned long long f[n+2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= n; i++) f[i] = f[i-1] + f[i-2]; return f[n]; } ``` ### 實驗比較 ![](https://i.imgur.com/1pmCi6X.png) - `fast_fib` 在大概計算到 fib(50) 以後效能才比 `dp_fib` 優。 - `fast_fib` 在代入 $n=0$ ~ $2^{32}$ 效能皆相同,因為都是從 most significant bit 開始,也就是從第 31 位開始直到算到第 0 位。 - improve `fast_fib`,利用 `__builtin_clz` 來推導出非 0 的第一個 bit 的 position,使 time complexity = $O(logn)$。 ### faster_fib ```cpp unsigned long long faster_fib(int n) { unsigned long long a = 0, b = 1, t1, t2; int i = 31 - __builtin_clz(n); for (; i >= 0; i--) { t1 = a * (2 * b - a); t2 = b * b + a * a; a = t1; b = t2; if ((n >> i) & 1) { t1 = a + b; a = b; b = t1; } } return a; } ``` ### 實驗結果 ![](https://i.imgur.com/sI3vqT9.png) - `faster_fib` 效能明顯提升。 - 由於 n 目前最大值只代入 92,因此並不能明顯看出 `faster_fib` = $O(logn)$。 - 以 `faster_fib` 來實作 `fibdrv.c` 中的 `fib_sequence`。 ## Fibonacci Driver - fib(93) 開始超過 `long long` 儲存範圍,導致溢位。 - 實作可以儲存超過 64 bits 的資料結構。 - 實作 `bignum` 的 operations,包括加法、減法、乘法。 ### bignum - 原本打算參考只儲存 lower bits、upper bits 的方式,但是在思考之後要轉成十進位時實作會較困難,再加上此結構只能儲存 128 bits,因此更改成直接儲存十進位的 char 陣列。 - `MAX_DIGIT` 代表著 `bignum` 提供到幾位數,`MAX_DIGIT` 設為 100 能夠提供 `bignum` 99 位數 (最後一個 byte 要儲存結束字元)。 - `decimal` 儲存數字 (十進制),並且是從個位數往高位儲存,也就是 `decimal[0]` 儲存個位數、`decimal[1]` 儲存十位數 ...。 - `length` 代表著目前儲存的數字轉成字串的長度,也代表著此數字是幾位數。 - 不考慮負數 (unsigned),在 fibonacci 中並不會出現負數,且運算過程中也不會出現負數的狀況。 ```cpp #define MAX_DIGIT 100 typedef struct { char decimal[MAX_DIGIT]; int length; } bignum; ``` ### new ```cpp #define bn_tmp(x) \ bn_new(&(bignum) { .length = 0 }, x) bignum *bn_new(bignum *bn, char decimal[]) { bn->length = strlen(decimal); memcpy(bn->decimal, decimal, bn->length); return bn; } ``` ### add - `OFFSET` 為 '0' 的 ascii (十進位)。 - 為了方便運算,使 `x->length` $\geq$ `y->length`。 - 實作方式大概是從個位數開始往高位去一位一位相加,並且要考慮 `carry` (進位)。 - `decimal` 中的字元和 `int` 比較時要考慮到 `OFFSET`,二個字元相加時也要把多餘的 `OFFSET` 扣掉。 ```cpp #define OFFSET 48 void bn_add(bignum *x, bignum *y, bignum *dest) { if (x->length < y->length) return bn_add(y, x, dest); int carry = 0; for (int i = 0; i < y->length; i++) { dest->decimal[i] = y->decimal[i] + x->decimal[i] + carry - OFFSET; if (dest->decimal[i] >= 10 + OFFSET) { carry = 1; dest->decimal[i] -= 10; } else carry = 0; } for (int i = y->length; i < x->length; i++) { dest->decimal[i] = x->decimal[i] + carry; if (dest->decimal[i] >= 10 + OFFSET) { carry = 1; dest->decimal[i] -= 10; } else carry = 0; } dest->length = x->length; dest->decimal[dest->length] = carry + OFFSET; if (carry) dest->length++; dest->decimal[dest->length] = '\0'; } ``` ### subtract - 由於已知不會出現負數的狀況,因此 `bn_subtract` 是以 x 所代表的數字 > y 所代表的數字為前提。 - 實作概念是先從最高位相減,直到 x 的長度 $>$ y 的長度,接著再從 x 運算後的最高位提出 1 出來,變成每個位數都是 9 的數字以及 1,拿前者去和 y 運算後的結果去做相減,此時並不用考慮是否需要借位,因為前者每位數字皆為 9,最後再呼叫 `bn_add` 去加回 1 以及剩下的 x。 e.g. $123456 - 121789 \\= 2456 - 789 \\= (1456 + 999 + 1) - 789 \\= (999 - 789) + 1456 + 1 \\= 210 + 1456 + 1 \\= 1667$ ```cpp void bn_subtract(bignum *x, bignum *y, bignum *dest) { bignum *x_copy = bn_tmp(x->decimal); if (x->length == y->length) { int length = x->length; while (x->decimal[length - 1] == y->decimal[length - 1]) { length--; if (length == 1) { dest->length = 1; dest->decimal[0] = x->decimal[0] - y->decimal[0]; } } x_copy->decimal[length - 1] = x->decimal[length - 1] - y->decimal[length - 1] + OFFSET; x_copy->decimal[length] = '\0'; x_copy->length = length; } char nine[MAX_DIGIT]; for (int i = 0; i < x->length - 1; i++) nine[i] = '9'; nine[x_copy->length - 1] = '\0'; bignum *nines = bn_tmp(nine); if (!(--x_copy->decimal[x_copy->length - 1])) x_copy->length--; for (int i = 0; i < y->length; i++) { nines->decimal[i] -= y->decimal[i]; nines->decimal[i] += OFFSET; } bn_add(x_copy, nines, x_copy); bn_add(bn_tmp("1"), x_copy, dest); } ``` ### multiply ```cpp void bn_multiply(bignum *x, bignum *y, bignum *dest) { if (x->length < y->length) return bn_multiply(y, x, dest); bignum *sum = bn_tmp("0"); for (int i = 0; i < y->length; i++) { bignum *tmp = bn_tmp("0"); for (int j = 0; j < y->decimal[i] - OFFSET; j++) bn_add(x, tmp, tmp); for (int j = tmp->length; j >= 0; j--) tmp->decimal[j + i] = tmp->decimal[j]; for (int j = 0; j < i; j++) tmp->decimal[j] = '0'; tmp->length += i; bn_add(tmp, sum, sum); } dest->length = sum->length; for (int i = 0; i <= sum->length; i++) dest->decimal[i] = sum->decimal[i]; } ``` ### fib_sequence - 實作方式參考上面的 `faster_fib`。 - 要注意 `k` 為 0 的狀況,因為 `__builtin_clz(0)` 的回傳結果為 `undefined`。 ```cpp bignum *fib_sequence(int k) { /* FIXME: use clz/ctz and fast algorithms to speed up */ bignum *a = bn_tmp("0"), *b = bn_tmp("1"); bignum *t1 = bn_tmp("0"), *t2 = bn_tmp("0"); bignum *tmp = bn_tmp("2"); if (k == 0) return a; int i = 31 - __builtin_clz(k); for (; i >= 0; i--) { bn_multiply(bn_tmp("2"), b, tmp); bn_subtract(tmp, a, tmp); bn_multiply(a, tmp, t1); bn_multiply(b, b, tmp); bn_multiply(a, a, t2); bn_add(tmp, t2, t2); for (int j = 0; j < t2->length; j++) { a->decimal[j] = t1->decimal[j]; b->decimal[j] = t2->decimal[j]; } a->length = t1->length; b->length = t2->length; if ((k >> i) & 1) { bn_add(a, b, t1); for (int j = 0; j < t1->length; j++) { a->decimal[j] = b->decimal[j]; b->decimal[j] = t1->decimal[j]; } a->length = b->length; b->length = t1->length; } } return a; } ``` ### fib_read - `rev_fib` 儲存的值是以十進位反向存起來的,因此將結果反轉。 - 呼叫 [copy_to_user](https://www.fsl.cs.sunysb.edu/kernel-api/re256.html),將結果傳遞到 user space。 ```cpp static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { char *rev_fib = fib_sequence(*offset)->decimal; int length = strlen(rev_fib); char result[MAX_DIGIT]; for (int i = 0; i < length; i++) result[i] = rev_fib[length - 1 - i]; result[length] = '\0'; copy_to_user(buf, result, length + 1); return 0; } ``` ### 結果 - 順利的傳遞到 `client`。 - 比對 [fibonacci](http://www.protocol5.com/Fibonacci/) 結果一致。 ```shell Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738. Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167. Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905. Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072. Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977. Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049. Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026. Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075. Reading from /dev/fibonacci at offset 100, returned the sequence 354224848179261915075. Reading from /dev/fibonacci at offset 99, returned the sequence 218922995834555169026. Reading from /dev/fibonacci at offset 98, returned the sequence 135301852344706746049. Reading from /dev/fibonacci at offset 97, returned the sequence 83621143489848422977. Reading from /dev/fibonacci at offset 96, returned the sequence 51680708854858323072. Reading from /dev/fibonacci at offset 95, returned the sequence 31940434634990099905. Reading from /dev/fibonacci at offset 94, returned the sequence 19740274219868223167. Reading from /dev/fibonacci at offset 93, returned the sequence 12200160415121876738. Reading from /dev/fibonacci at offset 92, returned the sequence 7540113804746346429. Reading from /dev/fibonacci at offset 91, returned the sequence 4660046610375530309. Reading from /dev/fibonacci at offset 90, returned the sequence 2880067194370816120. ``` ### 觀察一 - 在 Linux kernel module 使用 [ktime](https://blog.csdn.net/tiantao2012/article/details/75633389)。 - 在 userspace 使用 [clock_gettime](https://starforcefield.wordpress.com/2012/08/06/c%E8%AA%9E%E8%A8%80%EF%BC%9A%E4%BD%BF%E7%94%A8clock_gettime%E8%A8%88%E7%AE%97%E7%A8%8B%E5%BC%8F%E7%A2%BC%E7%9A%84%E6%99%82%E9%96%93%E9%9C%80%E6%B1%82/)。 - 相減後大部分耗費的時間為 `copy_to_user`,從下圖可知隨著 size 增加,要從 kernel space 傳遞到 user space 的資料大小也會隨之增加,然而耗費的時間卻差不多。 ![](https://i.imgur.com/2XDZ2t7.png) ### 觀察二 - 嘗試代入更大的數字進去。 - 發現仍然有正常的傳回 user space,然而卻在最後 `return 0` 的時候 crash 了。 - 推測可能是呼叫 `read` 時,`copy_to_user` 傳到 `buf` 的內容過多,造成 buffer overflow,而因為 [canary](https://ctf-wiki.github.io/ctf-wiki/pwn/linux/mitigation/canary-zh/) 保護機制,因此提前終止。 ```shell ... Reading from /dev/fibonacci at offset 997, returned the sequence 10261062362033262336604926729245222132668558120602124277764622905699407982546711488272859468887457959087733119242564077850743657661180827326798539177758919828135114407499369796465649524266755391104990099120377. Reading from /dev/fibonacci at offset 998, returned the sequence 16602747662452097049541800472897701834948051198384828062358553091918573717701170201065510185595898605104094736918879278462233015981029522997836311232618760539199036765399799926731433239718860373345088375054249. Reading from /dev/fibonacci at offset 999, returned the sequence 26863810024485359386146727202142923967616609318986952340123175997617981700247881689338369654483356564191827856161443356312976673642210350324634850410377680367334151172899169723197082763985615764450078474174626. Reading from /dev/fibonacci at offset 1000, returned the sequence 43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875. *** stack smashing detected ***: <unknown> terminated Aborted ``` 使用 gdb 觀察,下斷點在 main,並觀察剛進 main function 的 stack 樣貌,可看出最後在執行 main function 在 return 時,會去取 0x7fffffffe418 的值,也就是跳到 0x7ffff7a05b97。 ```shell [------------------------------------stack-------------------------------------] 0000| 0x7fffffffe410 --> 0x555555554a30 (<__libc_csu_init>: push r15) 0008| 0x7fffffffe418 --> 0x7ffff7a05b97 (<__libc_start_main+231>: mov edi,eax) 0016| 0x7fffffffe420 --> 0x1 0024| 0x7fffffffe428 --> 0x7fffffffe4f8 --> 0x7fffffffe764 ("/home/nickyanggg/linux2020/fibdrv/client") 0032| 0x7fffffffe430 --> 0x100008000 0040| 0x7fffffffe438 --> 0x5555555548ea (<main>: push rbp) 0048| 0x7fffffffe440 --> 0x0 0056| 0x7fffffffe448 --> 0xcaa5ca8b917e084a [------------------------------------------------------------------------------] ``` 使用 `disas main` 觀察 main function,可推測最後在 `je 0x555555554a2e` 我們會繼續執行下去,而不會直接跳到 `leave`,因此就進 `call 0x555555554750 <__stack_chk_fail@plt>`。將斷點下在 `0x0000555555554a27` 並繼續執行。 ```shell 0x0000555555554a0b <+289>: mov eax,DWORD PTR [rbp-0x34] 0x0000555555554a0e <+292>: mov edi,eax 0x0000555555554a10 <+294>: call 0x555555554780 <close@plt> 0x0000555555554a15 <+299>: mov eax,0x0 0x0000555555554a1a <+304>: mov rcx,QWORD PTR [rbp-0x8] 0x0000555555554a1e <+308>: xor rcx,QWORD PTR fs:0x28 0x0000555555554a27 <+317>: je 0x555555554a2e <main+324> 0x0000555555554a29 <+319>: call 0x555555554750 <__stack_chk_fail@plt> 0x0000555555554a2e <+324>: leave 0x0000555555554a2f <+325>: ret ``` 執行後,觀察目前 stack 狀態不難看出 stack 已經被搞爆了,除了將 return 的 address 給覆蓋過去,也將 canary 的值給蓋過,因此觸發。 ```shell 0000| 0x7fffffffe3d0 --> 0x3e9000003e9 0008| 0x7fffffffe3d8 --> 0x3000003e8 0016| 0x7fffffffe3e0 --> 0x0 0024| 0x7fffffffe3e8 --> 0x3400000000000000 ('') 0032| 0x7fffffffe3f0 ("34665576869374564356885276750406258025646605173717804024817290895365554179490518904038798400792551692959225930803226347752096896232398733224711616429964409065331879382989696499285160037044761377951668"...) 0040| 0x7fffffffe3f8 ("86937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") 0048| 0x7fffffffe400 ("435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") 0056| 0x7fffffffe408 ("7675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") 0064| 0x7fffffffe410 ("25802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") 0072| 0x7fffffffe418 ("660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") 0080| 0x7fffffffe420 ("1780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875") ``` 後來發現原來是沒有將 `client.c` 中一開始初始化 `buf` 時配給它的空間增大,導致複製過來的資料大小超出預期,因此造成 buffer overflow,更改後程式也可以順利的執行完畢。