2022q1 Homework3 (fibdrv)

--- tags: linux2022 --- # 2022q1 Homework3 (fibdrv) contributed by < `oucs638` > > [Assignment](https://hackmd.io/@sysprog/linux2022-fibdrv) > [oucs638/fibdrv](https://github.com/oucs638/fibdrv) ## Development Environment :::spoiler Compiler ```shell $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 ``` ::: :::spoiler Hardware ```shell $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Stepping: 13 CPU MHz: 2600.000 CPU max MHz: 4500.0000 CPU min MHz: 800.0000 BogoMIPS: 5199.98 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 192 KiB L2 cache: 1.5 MiB L3 cache: 12 MiB NUMA node0 CPU(s): 0-11 ``` ::: :::spoiler Linux kernel version ```shell $ uname -r 5.13.0-30-generic ``` ::: :::spoiler Install `linux-headers` package. ```shell $ sudo apt install linux-headers-`uname -r` $ dpkg -L linux-headers-5.13.0-30-generic | grep "/lib/modules" /lib/modules /lib/modules/5.13.0-30-generic /lib/modules/5.13.0-30-generic/build ``` ::: :::spoiler Confirm user identity ```shell $ whoami oucs638 $ sudo whoami root ``` ::: :::spoiler Install tools ```shell $ sudo apt install util-linux strace gnuplot-nox ``` ::: ## Exclude factors that affect performance analysis - Isolate a CPU from the kernel scheduler. ```shell $ sudo vim /etc/default/grub ... // modify GRUB_CMDLINE_LINUX GRUB_CMDLINE_LINUX="isolcous=11" ``` - Update grub and reboot. ```shell $ sudo update-grub $ reboot ``` - Check whether the CPU was successfully isolated. ```shell $ taskset -cp 1 pid 1's current affinity list: 0-10 ``` - Restrain address space layout randomization (ASLR) ```shell $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" ``` - Create a shell script named `performance.sh` to set `scaling_governor` as `performance`. ```shell // performance.sh for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do echo performance > ${i} done // execute script $ sudo sh performance.sh ``` - For Intel processors, turn off turbo mode. ```shell $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` ## Use class `clz/ctz` instructions to improve Fibonacci number operations. - Definition of Fibonacci number: $$ \left \{ \begin{split} & F_0 = 0, F_1 = 1 \\ & F_n = F_{n-1} + F_{n-2} , n \ge 2 \end{split} \right. $$ - Fibonacci number can be expressed as: $$ \vec{F_n} = \begin{bmatrix} F_n \\ F_{n-1} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \cdot \begin{bmatrix} F_{n-1} \\ F_{n-2} \end{bmatrix} $$ - Implement in C code: ```c int F(int n) { if (n == 0 || n == 1) return n; return F(n - 1) + F(n - 2); } ``` - Assuming `n = 6`, the result of the program executing the function calls are as follows: ```graphviz strict digraph G { 1[label="F(6)"] 2[label="F(4)"] 3[label="F(5)"] 4[label="F(2)"] 5[label="F(3)"] 6[label="F(3)"] 7[label="F(4)"] 8[label="F(0)", style=filled] 9[label="F(1)", style=filled] 10[label="F(1)", style=filled] 11[label="F(2)"] 12[label="F(1)", style=filled] 13[label="F(2)"] 14[label="F(2)"] 15[label="F(3)"] 16[label="F(0)", style=filled] 17[label="F(1)", style=filled] 18[label="F(0)", style=filled] 19[label="F(1)", style=filled] 20[label="F(0)", style=filled] 21[label="F(1)", style=filled] 22[label="F(1)", style=filled] 23[label="F(2)", style=filled] 24[label="F(0)", style=filled] 25[label="F(1)", style=filled] 1 -> {2, 3} 2 -> {4, 5} 3 -> {6, 7} 4 -> {8, 9} 5 -> {10, 11} 6 -> {12, 13} 7 -> {14, 15} 11 -> {16, 17} 13 -> {18, 19} 14 -> {20, 21} 15 -> {22, 23} 23 -> {24, 25} } ``` - Obviously there are a lot of unnecessary repeated function calls. - Refer to [Calculating Fibonacci Numbers by Fast Doubling](https://chunminchang.github.io/blog/post/calculating-fibonacci-numbers-by-fast-doubling), the Fibonacci number can be redefined as: $$ \left \{ \begin{split} & F_0 = 0, F_1 = 1, F_2 = 1 \\ & F_{2n + 1} = {F_{n+1}}^2 + {F_{n}}^2 \\ & F_{2n} = F_n \cdot (2 \cdot F_{n + 1} - F_n) \\ & n \ge 0 \end{split} \right. $$ - Implement in C code: ```c int F(int n) { if (n == 0 || n == 1) return n; if (n == 2) return 1; int k = 0; if (n % 2) { k = (n - 1) / 2; return F(k) * F(k) + F(k + 1) * F(k + 1); } else { k = n / 2; return F(k) * (2 * F(k + 1) - F(k)); } } ``` - Assuming `n = 6`, the result of the program executing the function calls are as follows: ```graphviz strict digraph G { 1[label="F(6)"] 2[label="F(3)"] 3[label="F(4)"] 4[label="F(1)", style=filled] 5[label="F(2)", style=filled] 6[label="F(2)", style=filled] 7[label="F(3)"] 8[label="F(1)", style=filled] 9[label="F(2)", style=filled] 1 -> {2, 3} 2 -> {4, 5} 3 -> {6, 7} 7 -> {8, 9} } ``` - The Fast Doubling method can significantly reduce thr function calls required for program execution. - Because computers represent numbers in binary, we can start from MSB to determine which operation to perform. - Find `0` : Calculate $F_{2n}$ and $F_{2n + 1}$. - Find `1` : Calculate $F_{2n}$ and $F_{2n + 1}$ first, then calculate $F_{2n + 2}$. - But the actual number would not have lead 0s, so we can use class `ctz` instructions to count how many lead 0s and skip them. ## Development Progress Records - Test. ```shell $ make check make -C /lib/modules/5.13.0-30-generic/build M=/home/oucs638/GitHub/fibdrv modules make[1]: Entering directory '/usr/src/linux-headers-5.13.0-30-generic' make[1]: Leaving directory '/usr/src/linux-headers-5.13.0-30-generic' make unload make[1]: Entering directory '/home/oucs638/GitHub/fibdrv' sudo rmmod fibdrv || true >/dev/null rmmod: ERROR: Module fibdrv is not currently loaded make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv' make load make[1]: Entering directory '/home/oucs638/GitHub/fibdrv' sudo insmod fibdrv.ko make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv' sudo ./client > out make unload make[1]: Entering directory '/home/oucs638/GitHub/fibdrv' sudo rmmod fibdrv || true >/dev/null make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv' Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738 ``` - Observe the resulting `fibdrv.ko` module. ```shell $ modinfo fibdrv.ko filename: fibdrv.ko version: 0.1 description: Fibonacci engine driver author: National Cheng Kung University, Taiwan license: Dual MIT/GPL srcversion: 9A01E3671A116ADA9F2BB0A depends: retpoline: Y name: fibdrv vermagic: 5.13.0-30-generic SMP mod_unload modversions ``` - Observe the behavior of the `fibdrv.ko` after the linux kernel is mounted. ```shell $ sudo insmod fibdrv.ko $ ls -l /dev/fibonacci crw------- 1 root root 510, 0 Mar 7 01:50 /dev/fibonacci $ cat /sys/class/fibonacci/fibonacci/dev 510:0 $ cat /sys/module/fibdrv/version 0.1 $ lsmod | grep fibdrv fibdrv 16384 0 $ cat /sys/module/fibdrv/refcnt 0 ``` ### Calculate the Fibonacci number after the 93rd item (inclusive) - Refer to [assignment](https://hackmd.io/PMozqaxqRA-jF2KmgTZcPQ?view#%E8%A8%88%E7%AE%97-F93-%E5%8C%85%E5%90%AB-%E4%B9%8B%E5%BE%8C%E7%9A%84-Fibonacci-%E6%95%B8---%E4%BD%BF%E7%94%A8%E6%95%B8%E5%AD%97%E5%AD%97%E4%B8%B2%E4%B8%A6%E5%A5%97%E7%94%A8-quiz2-SSO-Small-String-Optimization), attempt to introduce big number that operate on string-number. - Use [Small/Short String Optimization](http://wdv4758h.github.io/notes/cpp/string-optimization.html). - Refer to the [example code](https://github.com/AdrianHuang/fibdrv/tree/big_number) provided in the assignment. - The example code only use string-number addition to calculate the Fibonacci number. - Expect to add a string-number subtraction and multiplication based on the addition-related code of the example code. - With string-number substraction and multiplication, fast doubling can be intoduced to calculate Fibonacci numbers. - But in this part, the Fibonacci number will be calculated by addtion first, and fast doubling will be introduced in the later part. - Because C99 variable-length array(VLA) is not allowed in Linux kernel, so I use `kmalloc` to allocate memory space for the array dynamically. ```shell ... /home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘fib_sequence’: /home/oucs638/GitHub/fibdrv/fibdrv.c:70:5: warning: ISO C90 forbids variable length array ‘f’ [-Wvla] 70 | bn f[k + 2]; | ^~ /home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘bn_add’: /home/oucs638/GitHub/fibdrv/fibdrv.c:263:5: warning: ISO C90 forbids variable length array ‘buf’ [-Wvla] 263 | char buf[size_a + 2]; | ^~~~ ... ``` - After calculate the Fibonacci number, the allocated memory space should be freed. ```c static int fib_sequence(long long k, char __user *buf) { /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */ // bn f[k + 2]; bn *f = (bn *) kmalloc((k + 2) * sizeof(bn), GFP_KERNEL); f[0] = *bn_tmp("0"); f[1] = *bn_tmp("1"); for (int i = 2; i <= k; i++) bn_add(&f[i - 1], &f[i - 2], &f[i]); int res = bn_size(&f[k]); if (copy_to_user(buf, bn_data(&f[k]), res)) return -EFAULT; for (int i = 0; i <= k; i++) bn_free(&f[i]); return res; } ``` - String-number addition will reverse the string first, then add digit one by one from the least significant digit to the most significant digit. ```c static void bn_add(bn *a, bn *b, bn *out) { char *data_a, *data_b; size_t size_a, size_b; int i, s, c = 0; if (bn_size(a) < bn_size(b)) __swap((void *) &a, (void *) &b, sizeof(void *)); data_a = bn_data(a); data_b = bn_data(b); size_a = bn_size(a); size_b = bn_size(b); reverse_str(data_a, size_a); reverse_str(data_b, size_b); // char buf[size_a + 2]; char *buf = (char *) kmalloc((size_a + 2), GFP_KERNEL); for (i = 0; i < size_b; i++) { s = (data_a[i] - '0') + (data_b[i] - '0') + c; buf[i] = '0' + s % 10; c = s / 10; } for (i = size_b; i < size_a; i++) { s = (data_a[i] - '0') + c; buf[i] = '0' + s % 10; c = s / 10; } if (c) buf[i++] = '0' + c; buf[i] = 0; reverse_str(buf, i); reverse_str(data_a, size_a); reverse_str(data_b, size_b); if (out) *out = *bn_tmp(buf); } ``` ### Calculating Fibonacci Numbers by Fast Doubling (big number) - I found it challenging to overcome positive-negtive sign problem issues when implementing "fasting doubling" with string numbers. - It needs a lot of reallocating memory space to implement big-number operations with string number. - Refer to [eecheng87](https://github.com/eecheng87/fibdrv), rebuilt the big number operation. - Preset a space large enough for Fibonacci number value. - As [previous part](https://hackmd.io/4u8VJAWiTEafpMav-6BTuA?both#Use-class-clzctz-instructions-to-improve-Fibonacci-number-operations), the Fibonacci number can be represented as: $$ \left \{ \begin{split} & F_0 = 0, F_1 = 1, F_2 = 1 \\ & F_{2n + 1} = {F_{n+1}}^2 + {F_{n}}^2 \\ & F_{2n} = F_n \cdot (2 \cdot F_{n + 1} - F_n) \\ & n \ge 0 \end{split} \right. $$ - Expressed as functions: $$ \left \{ \begin{split} f(0) &= 0 = f(k) \\ f(1) &= 1 = f(k + 1) \\ f(2k) &= f(k) \cdot [2 \cdot f(k + 1) - f(k)] \\ &= 2 \cdot f(k) \cdot f(k + 1) - f(k) \cdot f(k) \\ f(2k + 1) &= {f(k)}^2 + {f(k + 1)}^2 \\ &= f(k) \cdot f(k) + f(k + 1) \cdot f(k + 1)\\ k &\ge 0 \\ \end{split} \right. $$ - Assume: $$ \left \{ \begin{split} t(0) &= f(k) \cdot f(k) \\ t(1) &= f(k + 1) \cdot f(k + 1) \\ f(2) &= 2 \cdot f(k) \cdot f(k + 1) \\ \end{split} \right. $$ - Because computers represent numbers in binary, we can start from MSB to determine which operation to perform. - Find `1` : next $k$ is odd. $$ \left \{ \begin{split} next\ \ \ \ k &= 2k + 1 \\ f(k) &= f(2k + 1) \\ &= t(0) + t(1) \\ f(k + 1) &= f(2k + 2) \\ &= f(2k) + f(2k + 1) \\ &= t(2) - t(0) + t(0) + t(1)\\ &= t(2) + t(1) \end{split} \right. $$ - Find `0` : next $k$ is even. $$ \left \{ \begin{split} next\ \ \ \ k &= 2k \\ f(k) &= f(2k) \\ &= t(2) - t(0) \\ f(k + 1) &= f(2k + 1) \\ &= t(0) + t(1) \end{split} \right. $$ - But the actual number would not have lead 0s, so we can use class `ctz` instructions to count how many lead 0s and skip them. - Use `__builtin_clzll(k)` for `typeof(k) = unsigned long long` ```c static bn fib_sequence(unsigned long long k) { bn f[2]; bn_init(&f[0], 0); // f(k) bn_init(&f[1], 1); // f(k + 1) if (k < 2) return f[k]; // t(0) = f(k) * f(k) // t(1) = f(k + 1) * f(k + 1) // t(2) = 2 * f(k) * f(k + 1) bn t[3]; // fast doubling // f(2k) = f(k) * [2 * f(k + 1) - f(k)] // = 2 * f(k) * f(k + 1) - f(k) * f(k) // = t(2) - t(0) // f(2k + 1) = f(k) ^ 2 + f(k + 1) ^ 2 // = f(k) * f(k) + f(k + 1) * f(k + 1) // = t(0) + t(1) for (unsigned long long i = 1U << (63 - __builtin_clzll(k)); i; i >>= 1) { bn_mul(&f[0], &f[0], &t[0]); // t(0) bn_mul(&f[1], &f[1], &t[1]); // t(1) bn_mul(&f[0], &f[1], &t[2]); bn_add(&t[2], &t[2], &t[2]); // t(2) if (i & k) { // next k = 2k + 1 // f(k) = f(2k + 1) = t(0) + t(1) bn_add(&t[0], &t[1], &f[0]); // f(k + 1) = f(2k + 2) = f(2k) + f(2k + 1) // = t(2) - t(0) + t(0) + t(1) // = t(2) + t(1) bn_add(&t[2], &t[1], &f[1]); } else { // next k = 2k // f(k) = f(2k) bn_sub(&t[2], &t[0], &f[0]); // f(k + 1) = f(2k + 1) bn_add(&t[0], &t[1], &f[1]); } } // return f(k) return f[0]; } ``` ## Analyze - Use big number. ![](https://i.imgur.com/cyltHbr.png) - Use builtin `int128`. ![](https://i.imgur.com/wgNBRxw.png) - Found that if used builtin `int128`, the execution time would have unpredictable peaks.