2022q1 Homework2 (quiz2)

# 2022q1 Homework2 (quiz2) contributed by < [Build-A-Moat](https://github.com/Build-A-Moat/lab0-c) > ###### tags: `linux2022` ### 測驗 `1` 考慮以下對二個無號整數取平均值的程式碼: ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a + b) / 2; } ``` 我們可改寫為以下等價的實作: ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (EXP1); } ``` | a | b | (a + b) / 2 | (a + b) >> 1 | (a >> 1) + (b >> 1) | |:---:|:---:|:-----------:|:------------:|:-------------------:| | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 0 | | 1 | 0 | 0 | 0 | 0 | | 1 | 1 | 1 | 1 | **0** | * 從真值表中得知 `(a + b) / 2` 等價於 `(a + b) >> 1` ，而與 `(a >> 1) + (b >> 1)` 不等價，所以 `>>` 沒有分配律。 * 數值被 `>>` 作用時最低位元會被捨去，所以若 `a` 和 `b` 最低位元相加的結果，只影響到最低位元，則`(a + b) >> 1` 與 `(a >> 1) + (b >> 1)` 等價，而其中當 `a` 和 `b` 的最低位元皆為 `1` 時，會產生進位 * `(EXP1)` 就是進位的部份，即 `(a & b & 1)` ， `& 1`代表取 `a` 與 `b` 的最低位元。再次改寫為以下等價的實作: ```c uint32_t average(uint32_t a, uint32_t b) { return (EXP2) + ((EXP3) >> 1); } ``` * 從<[你所不知道的 C 語言：數值系統篇](https://hackmd.io/@sysprog/c-numerics)> ，我們可以知道 `(x + y) / 2` = `(x + y) >> 1` = `(x ^ y + (x & y) << 1) >> 1` = `(x & y) + ((x ^ y) >> 1)` * `EXP2` 即為 `a & b` ， `EXP3` 即為 `a ^ b` * 但是剛剛上面才說 `>>` 沒有分配律，為何在此處卻成立， | x | y | (x ^ y + (x & y) << 1) >> 1 | (x & y) + ((x ^ y) >> 1) | |:---:|:---:|:---------------------------:|:------------------------:| | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 0 | 0 | 0 | | 1 | 1 | 1 | 1 | * 從真值表中的確可以看出結果相等， * 由上面兩個例子可以推斷，當 `(a + b) >> 1` 中的 `a + b` 若會產生進位，則不可以將 `>>` 分別運算於 `a` 和 `b` * 以 `(a ^ b + a ^ b) >> 1` 驗證猜測是否正確 | a | b | (a ^ b + a ^ b) >> 1 | (a ^ b) >> 1 + (a ^ b) >> 1 | |:---:|:---:|:--------------------:|:---------------------------:| | 0 | 0 | 0 | 0 | | 0 | 1 | 1 | **0** | | 1 | 0 | 1 | **0** | | 1 | 1 | 0 | 0 | * 猜測正確先用 `gcc -O3 -c average.c` 編譯，再用 `objdump -d average.o` 來看。 ```c 0000000000000000 <average>: 0: f3 0f 1e fa endbr64 4: 8d 04 37 lea (%rdi,%rsi,1),%eax 7: d1 e8 shr %eax 9: c3 retq ``` ```c 0000000000000000 <average>: 0: f3 0f 1e fa endbr64 4: 89 f8 mov %edi,%eax 6: 89 f2 mov %esi,%edx 8: 21 f7 and %esi,%edi a: d1 e8 shr %eax c: d1 ea shr %edx e: 83 e7 01 and $0x1,%edi 11: 01 d0 add %edx,%eax 13: 01 f8 add %edi,%eax 15: c3 retq ``` ```c 0000000000000000 <average>: 0: f3 0f 1e fa endbr64 4: 89 f8 mov %edi,%eax 6: 21 f0 and %esi,%eax 8: 31 f7 xor %esi,%edi a: d1 ef shr %edi c: 01 f8 add %edi,%eax e: c3 retq ``` - [ ] 比較上述實作在編譯器最佳化開啟的狀況，對應的組合語言輸出，並嘗試解讀 (可研讀 CS:APP 第 3 章) - [ ] 研讀 Linux 核心原始程式碼 include/linux/average.h，探討其 Exponentially weighted moving average (EWMA) 實作，以及在 Linux 核心的應用 --- ### 測驗 `2` 改寫[〈解讀計算機編碼〉](https://hackmd.io/@sysprog/binary-representation) 一文的「不需要分支的設計」一節提供的程式碼 `min`，我們得到以下實作 (`max`): ```c #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } ``` * 考慮以下兩種狀況 * *Case 1* : a > b, `return a ^ 0`，`EXP5` 需要為 `0`，則 `EXP5` 為 `a < b` 或 `a <= b`，當 `a == b` 時，return `a` 即可，所以選擇 `a < b` * *Case 2* : a < b, `return a ^ (a ^ b)`，`EXP4` 為 `a ^ b` - [ ] 對 32 位元無號/有號整數，撰寫同樣 branchless 的實作 - [ ] Linux 核心也有若干 branchless / branch-free 的實作，例如 lib/sort.c: 請在 Linux 核心原始程式碼中，找出更多類似的實作手法。請善用 git log 檢索。 --- ### 測驗 `3` 考慮以下 64 位元 GCD ([greatest common divisor](https://en.wikipedia.org/wiki/Greatest_common_divisor), 最大公因數) 求值函式: ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; while (v) { uint64_t t = v; v = u % v; u = t; } return u; } ``` 改寫為以下等價實作: ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); return RET; } ``` GCD 演算法 > 1. If both x and y are 0, gcd is zero gcd(0,0)=0. > 2. gcd(x,0)=x and gcd(0,y)=y because everything divides 0. `if (!u || !v) return u | v;` 可以表達1與2 > 3. If x and y are both even, gcd(x,y)=2∗gcd(x2,y2) because 2 is a common divisor. Multiplication with 2 can be done with bitwise shift operator. ```c for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` 只要 `u` or `v` 可以同時被 `2` 整除，那就表達成 gcd(u,v)=$2^{shift}$∗gcd($\dfrac{u}{2^{shift}}$,$\dfrac{v}{2^{shift}}$) > 4. If x is even and y is odd, gcd(x,y)=gcd(x2,y). > 5. On similar lines, if x is odd and y is even, then gcd(x,y)=gcd(x,y2). It is because 2 is not a common divisor. ```c while (!(u & 1)) u /= 2; while (!(v & 1)) v /= 2; ``` 當 `u` 或 `v` 為一基數與一偶數時，`2` 不會再是公因數，所以可以直接除`2` ```c if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } ``` 最後的部份就是在做輾轉相除法，可以看出當 `v` 為 `0` 時，就是結束，因此 `COND` 為 `v`，而 `RET` 就是 `u << shift` - [ ] 在 x86_64 上透過 __builtin_ctz 改寫 GCD，分析對效能的提升; - [ ] Linux 核心中也內建 GCD (而且還不只一種實作)，例如 lib/math/gcd.c，請解釋其實作手法和探討在核心內的應用場景。 --- ### 測驗 `4` 在影像處理中，bit array (也稱 bitset) 廣泛使用，考慮以下程式碼: ```c #include <stddef.h> size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; for (size_t k = 0; k < bitmapsize; ++k) { uint64_t bitset = bitmap[k]; size_t p = k * 64; for (int i = 0; i < 64; i++) { if ((bitset >> i) & 0x1) out[pos++] = p + i; } } return pos; } ``` 考慮 GNU extension 的 `__builtin_ctzll` 的行為是回傳由低位往高位遇上連續多少個 `0` 才碰到 `1`。用以改寫的程式碼如下: ```c= #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` * 第一反應是 `0x1 << r` 這樣就能做出最低位 `1` 的mask，但 `r` 的宣告在 `t` 之後，所以這不是答案。 * 想到 `__builtin_ctzll` 也是需要找到同樣的mask才能去算出return的值，於是去看了 `__builtin_ctzl` 的實做，如下： ```c int __builtin_ctzl(unsigned long x) { int r = 63; x &= ~x + 1; if (x & 0x00000000FFFFFFFF) r -= 32; if (x & 0x0000FFFF0000FFFF) r -= 16; if (x & 0x00FF00FF00FF00FF) r -= 8; if (x & 0x0F0F0F0F0F0F0F0F) r -= 4; if (x & 0x3333333333333333) r -= 2; if (x & 0x5555555555555555) r -= 1; return r; } ``` `x &= ~x + 1` 可以得到mask，而 `~x + 1` 即是 `-x`，因此 `EXP6` 是 `bitwise & -bitwise` - [ ] 解釋上述程式運作原理，並舉出這樣的程式碼用在哪些真實案例中; - [ ] 設計實驗，檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進？應考慮到不同的 bitmap density; - [ ] 思考進一步的改進空間; - [ ] 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例; --- ### 測驗 `5` 以下是 [LeetCode 166. Fraction to Recurring](https://leetcode.com/problems/fraction-to-recurring-decimal/) Decimal 的可能實作: ```c= #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "list.h" struct rem_node { int key; int index; struct list_head link; }; static int find(struct list_head *heads, int size, int key) { struct rem_node *node; int hash = key % size; list_for_each_entry (node, &heads[hash], link) { if (key == node->key) return node->index; } return -1; } char *fractionToDecimal(int numerator, int denominator) { int size = 1024; char *result = malloc(size); char *p = result; if (denominator == 0) { result[0] = '\0'; return result; } if (numerator == 0) { result[0] = '0'; result[1] = '\0'; return result; } /* using long long type make sure there has no integer overflow */ long long n = numerator; long long d = denominator; /* deal with negtive cases */ if (n < 0) n = -n; if (d < 0) d = -d; bool sign = (float) numerator / denominator >= 0; if (!sign) *p++ = '-'; long long remainder = n % d; long long division = n / d; sprintf(p, "%ld", division > 0 ? (long) division : (long) -division); if (remainder == 0) return result; p = result + strlen(result); *p++ = '.'; /* Using a map to record all of reminders and their position. * if the reminder appeared before, which means the repeated loop begin, */ char *decimal = malloc(size); memset(decimal, 0, size); char *q = decimal; size = 1333; struct list_head *heads = malloc(size * sizeof(*heads)); for (int i = 0; i < size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; remainder; i++) { int pos = find(heads, size, remainder); if (pos >= 0) { while (PPP > 0) *p++ = *decimal++; *p++ = '('; while (*decimal != '\0') *p++ = *decimal++; *p++ = ')'; *p = '\0'; return result; } struct rem_node *node = malloc(sizeof(*node)); node->key = remainder; node->index = i; MMM(&node->link, EEE); *q++ = (remainder * 10) / d + '0'; remainder = (remainder * 10) % d; } strcpy(p, decimal); return result; } ``` * 第13-22行，將 remainder 經過 hash function 後，找尋是否已經存在 `head[hash]`，若有則是代表小數部份已經開始循環，回傳值 pos 是開始循環小數的位置，若無則回傳-1。 * 第51-60行，將結果的正負號串到 `p`，並將結果分為 `remainder` 與 `division`，但最後一行的 `division > 0 ? (long) division : (long) -division` 有點問題，因為 `n` 與 `d` 都是正數，不會出現 `division` 為負的情況，所以要改為 `sprintf(p, "%ld", (long) division)`。 * 第73-75行，將要儲存 hash table 的 array 用 `INIT_LIST_HEAD` 初始化。 * 第79-88行，若 `pos >= 0` 代表已經找到開始重複小數的位置，要先印出沒有重複的部份，`pos` 以前的數字都代表不重複的，`PPP` 是 `pos--`。 * 第89-93行，要將還沒出現在 `heads` 內的點串上去對應的 heads[hash]，所以 `MMM` 是 `list_add`，`EEE` 是 `&heads[remainder % size]`。為了測試 `edge case`，我採用`numerator = 2147483646, remainder = 2147483647`，將參數輸入後執行程式，發現會觸發 `Segmentation fault`，使用GDB debug時發現，有node的位址無法存取，但如果是 `malloc` 失敗，`node->key` 時就會觸發 `Segmentation fault`，有串上list代表執行當下是成功的，但要存取時卻無法存取，還需要再研究。 ```c= Breakpoint 1, find (heads=0x555555559ac0, size=1333, key=1171491056) at fractionToDecimal.c:16 16 int hash = key % size; (gdb) next 17 list_for_each_entry (node, &heads[hash], link) { (gdb) p hash $7 = 2 (gdb) heads[2] Undefined command: "heads". Try "help". (gdb) p heads[2] $8 = {prev = 0x3937383439363536, next = 0x3234303032343638} (gdb) p heads[2]->next $9 = (struct list_head *) 0x3234303032343638 (gdb) p heads[2]->next->next Cannot access memory at address 0x3234303032343640 (gdb) p heads[2]->prev $10 = (struct list_head *) 0x3937383439363536 (gdb) p heads[2]->prev->prev Cannot access memory at address 0x3937383439363536 ``` - [ ] 解釋上述程式碼運作原理，指出其中不足，並予以改進 - [ ] 在 Linux 核心原始程式碼的 mm/ 目錄 (即記憶體管理) 中，找到特定整數比值 (即 fraction) 和 bitwise 操作的程式碼案例，予以探討和解說其應用場景 --- ### 測驗 `6` [__alignof__](https://gcc.gnu.org/onlinedocs/gcc/Alignment.html) 是 GNU extension，以下是其可能的實作方式: ```c /* * ALIGNOF - get the alignment of a type * @t: the type to test * * This returns a safe alignment for the given type. */ #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X) ``` - [ ]在 Linux 核心原始程式碼中找出 __alignof__ 的使用案例 2 則，並針對其場景進行解說 - [ ]在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集，探討其實作機制和用途，並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻，儘量共用相同的巨集 ### 測驗 `7`