# 2023q1 Homework3 (fibdrv) contributed by < [slipet](https://github.com/slipet/fibdrv) > <details> <summary><font color=darkred>開發環境</font></summary> ``` $ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 $lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: AuthenticAMD Model name: AMD Ryzen 5 5600X 6-Core Processor CPU family: 25 Model: 33 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 0 Frequency boost: enabled CPU max MHz: 4650.2920 CPU min MHz: 2200.0000 BogoMIPS: 7399.35 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_ts c rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_ lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale v mcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 192 KiB (6 instances) L1i: 192 KiB (6 instances) L2: 3 MiB (6 instances) L3: 32 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-11 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Mitigation; safe RET, no microcode Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected ``` </details> ## Checklist - [ ] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 學習指出針對大數運算的加速運算和縮減記憶體操作成本的舉措,紀錄你的認知和疑惑 - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 $\to$ 搭配閱讀〈[並行和多執行緒程式設計](https://hackmd.io/@sysprog/concurrency)〉 ## Experiment procedure ### Modification to calculate Fib(n), when n > 92 In the original design, the Fibonacci sequence is stored in a `long long` type array, which will overflow after fib(92). In this report, I followed the recommendation from [link](https://hackmd.io/@sysprog/linux2023-fibdrv-b#%E8%A8%88%E7%AE%97-F93-%E5%8C%85%E5%90%AB-%E4%B9%8B%E5%BE%8C%E7%9A%84-Fibonacci-%E6%95%B8-%E5%8A%9F%E8%83%BD%E5%8F%97%E9%99%90) to implement two methods to deal with this issue: #### 1. GCC `__uint128` type: Declare the array for recording Fibonacci sequence using `__uint128` type. The key difference between the `long long` and `__uint128` types is that `printf()` cannot directly output the result with `__uint128` type. As the result of this, we need to convert the number into a string and pass it to the user process through a buffer. <details> <summary><font color=darkred>Details</font></summary> ```c static size_t fib_sequence(long long k, char *buf) { uint128_t f[k + 2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } char *result = uint128_to_string(f[k]); size_t len = uint128_len(f[k]); __reverse(result, len); if (copy_to_user(buf, result, len)) return -EFAULT; kfree(result); return len; } ``` </details> `uint128_to_string` and `uint128_len` are the functions associated with `uint128_t` type. Below are the details of the implementation: <details> <summary><font color=darkred>Details</font></summary> ```c static size_t uint128_len(uint128_t x) { size_t size = 0; while (x) { size++; x /= 10; } size += (size == 0); // digit len can not less than 1 return size; } static char *uint128_to_string(uint128_t x) { char *str = (char *) kmalloc(uint128_len(x) + 1, GFP_KERNEL); char *p = str; *p = '0'; // initilize the string to zero; while (x) { *p++ = x % 10 + '0'; x /= 10; } return str; } ``` </details> So far, we are able to accurately calculate the Fibonacci sequence up to n = 100; however, we still encounter overflow after n = 187. ```shell ##Modify verify.py to calculate fibonacci sequence til fib(200) f(187) fail input: 198239973509362327032045173661212819077 expected: 538522340430300790495419781092981030533 ``` #### 2. [myBigN](https://github.com/slipet/fibdrv/blob/master/myBigN.h) There is an alternative for the large number representation mentioned in [link](https://hackmd.io/@sysprog/linux2023-fibdrv-b#%E8%A8%88%E7%AE%97-F93-%E5%8C%85%E5%90%AB-%E4%B9%8B%E5%BE%8C%E7%9A%84-Fibonacci-%E6%95%B8-%E5%8A%9F%E8%83%BD%E5%8F%97%E9%99%90), which is a customized data structure. I reference the implementations from [AdrianHuang](https://github.com/AdrianHuang/fibdrv/tree/big_number) and [SSO-23](https://github.com/elliotgoodrich/SSO-23) to design `myBigN` for large number representation. <details> <summary><font color=darkred>Structure details</font></summary> ```c typedef struct { char *ptr; size_t size : MY_BIGNUM_MAX_LEN, capcity : 6, is_ptr : 1, reserved : 3; } myLong; typedef struct { char data[15]; uint8_t free_space : 4, reserved : 4; } myShort; typedef union { myLong long_num; myShort short_num; } myBigN; ``` </details> Numbers up to 15 digits are represented using the `myShort` structure. Whenever a number exceeds 15 digits, we dynamically allocate memory space to a pointer in `myLong` for storing large numbers. Both structures include a reserved byte to record information about free space, size, and a control flag. The following is the implementation associated with `myBigN`. <details> <summary><font color=darkred>fib_sequence</font></summary> ```c static ssize_t fib_sequence(long long k, char *buf) { myBigN f[k + 2]; f[0] = *MY_BIGNUM_TMP("0"); f[1] = *MY_BIGNUM_TMP("1"); for (int i = 2; i <= k; i++) { myBigN_Add(&f[i], &f[i - 1], &f[i - 2]); } ssize_t len = myBigN_size(&f[k]); if (copy_to_user(buf, myBigN_data(&f[k]), len)) return -EFAULT; for (int i = 0; i <= k; i++) myBigN_free(&f[i]); return len; } ``` </details> The strategy for `myBigN` variables addition is implemented character by character. Below are the details of this method. <details> <summary><font color=darkred>myBigN_Add</font></summary> ```c void myBigN_Add(myBigN *out, const myBigN *x, const myBigN *y) { size_t sizeX = myBigN_size(x), sizeY = myBigN_size(y); char *dataX = myBigN_data(x), *dataY = myBigN_data(y); size_t i = 0; if (sizeX < sizeY) { __SWAP(dataX, dataY, char); __SWAP(&sizeX, &sizeY, size_t); } __reverse(dataX, sizeX); __reverse(dataY, sizeY); uint8_t carry = 0; char result[sizeX + 2]; for (; i < sizeX || carry; i++) { uint8_t sum = ((i < sizeX) ? (dataX[i] - '0') : 0) + ((i < sizeY) ? (dataY[i] - '0') : 0) + carry; result[i] = '0' + sum % 10; carry = sum / 10; } result[i] = 0; __reverse(result, i); __reverse(dataX, sizeX); __reverse(dataY, sizeY); if (out) *out = *MY_BIGNUM_TMP(result); } ``` </details> The macros `MY_BIGNUM_TMP` and `INIT_STRING_LITERALS`, referenced from [AdrianHuang](https://github.com/AdrianHuang/fibdrv/tree/big_number), leverage the designated initializer to make the initialization of variables more elegant and readable. <details> <summary><font color=darkred>MY_BIGNUM_TMP and INIT_STRING_LITERALS</font></summary> ```c #define INIT_STRING_LITERALS() \ (myBigN) \ { \ .short_num = {.free_space = 15 } \ } #define MY_BIGNUM_TMP(x) myBigN_init(&INIT_STRING_LITERALS(), x) ``` </details> In `MY_BIGNUM_TMP(x)`, we can initialize a myBigN pointer with `NIT_STRING_LITERALS()` and a string literal. Declaring the `INIT_STRING_LITERALS()` macro is equivalent to declaring a myBigN object. `myBigN_init` receives the address of the object and a string literal to complete the subsequent allocation. Numbers with a digit length of up to 15 are stored in a character array. When the length exceeds 15, we allocate heap space to `x->long_num.ptr` for storing the larger number. <details> <summary><font color=darkred>myBigN_init</font></summary> ```c myBigN *myBigN_init(myBigN *x, const char *strNum) { *x = INIT_STRING_LITERALS(); //+ 1 for '\0' ssize_t len = strlen(strNum) + 1; if (len > 16) { x->long_num.size = len - 1; x->long_num.is_ptr = true; x->long_num.ptr = kmalloc(myBigN_size(x), GFP_KERNEL); memcpy(myBigN_data(x), strNum, len); } else { memcpy(x->short_num.data, strNum, len); x->short_num.free_space = 15 - len + 1; } return x; } ``` </details> Up until now, everything seemed set up to calculate Fibonacci sequence to fib(500). However, every time I execute `$make check` command, a failure message appears, followed by my Ubuntu system freezing, requiring a restart to resolve. The failure message is looks like this: ```shell ##Modify verify.py to calculate fibonacci sequence til fib(500) f(312) fail input: 1719558092601766452430641106302905217344934236440122960529002115744 expected: 71558092601766452430641106302905217344934236440122960529002115744 ``` Furthermore, the terminated numbers of the Fibonacci sequence are not the same and appear to occur randomly. At first, I intserted `printk` into the function and checked the information about myBigN variables. I used `$sudo dmesg | grep "xxxxx"` to filter the kernel messages. With the help of the print messages, I found that there is a condition uncovered in `myBigN_init`. The error was that the large number should be assigned to `x->long_num.ptr` rather than `x->short_num.data` when `len = 16`. The revised code is as follows: <details> <summary><font color=darkred>myBigN_init (revised)</font></summary> ```diff myBigN *myBigN_init(myBigN *x, const char *strNum) { *x = INIT_STRING_LITERALS(); //+ 1 for '\0' ssize_t len = strlen(strNum) + 1; - if (len > 16) { + if (len >= 16) { x->long_num.size = len - 1; x->long_num.is_ptr = true; x->long_num.ptr = kmalloc(myBigN_size(x), GFP_KERNEL); memcpy(myBigN_data(x), strNum, len); } else { memcpy(x->short_num.data, strNum, len); x->short_num.free_space = 15 - len + 1; } return x; } ``` </details> However, the failure message persists despite the modification. This leads me to suspect other parts of my implementation. Subsequently, I spent two days addressing this issue. Ultimately, I discovered that the problem lies in `memcpy(myBigN_data(x), strNum, len);`. It copies the entire string including `\0` into `myBigN_data(x)`, but the allocated size for `x->long_num.ptr` is only len - 1. Thus, an extra byte is copied into `myBigN_data(x)`, overwriting the area designated for storing the size. Below is the modification: <details> <summary><font color=darkred>myBigN_init (final)</font></summary> ```diff myBigN *myBigN_init(myBigN *x, const char *strNum) { *x = INIT_STRING_LITERALS(); //+ 1 for '\0' ssize_t len = strlen(strNum) + 1; if (len >= 16) { x->long_num.size = len - 1; x->long_num.is_ptr = true; x->long_num.ptr = kmalloc(myBigN_size(x), GFP_KERNEL); - memcpy(myBigN_data(x), strNum, len); + memcpy(myBigN_data(x), strNum, myBigN_size(x)); } else { memcpy(x->short_num.data, strNum, len); x->short_num.free_space = 15 - len + 1; } return x; } ``` </details> Now, we are able to calculate Fibonacci sequence up to fib(500). The following is the result of fib(500) in the out file, which is same as [link](https://gist.github.com/brchiu/71bf87f51be35d6e334f4898b1ebd011) ``` /fibonacci at offset 500, returned the sequence 139423224561697880139724382870407283950070256587697307264108962948325571622863290691557658876222521294125. ``` TODO: I should learn using other debugging tools to help me debug kernel modules. ### [sysprog21/bignum](https://github.com/sysprog21/bignum) From the description of [sysprog21/bignum 程式碼分析](https://hackmd.io/@sysprog/linux2023-fibdrv/%2F%40sysprog%2Flinux2023-fibdrv-e), we can found that ## Reference [AdrianHuang](https://github.com/AdrianHuang) [Linux 核心專題: 重做 fibdrv](https://hackmd.io/@sysprog/Hk2NJG3H2)