2022q1 Homework2 (quiz2)

--- tags: linux2022 --- # 2022q1 Homework2 (quiz2) contributed by < [Eddielin0926](https://github.com/Eddielin0926) > [2022q1 第 2 週測驗題](https://hackmd.io/@sysprog/linux2022-quiz2) ## 測驗一對二個無號整數取平均值 ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (a & b & 1); } ``` `a >> 1` 和 `b >> 1` 代表將 a 和 b 個別除以 2，`a & b & 1` 用來檢查兩個數是不是都是奇數。另一個對二個無號整數取平均值的實作。 ```c uint32_t average(uint32_t a, uint32_t b) { return (a & b) + ((a ^ b) >> 1); } ``` 我認為可以想像成全加器除二。 ![](https://hackmd.io/_uploads/B1un5A2ec.png) 原本是 `(a + b) >> 1`，將 carry 提出來後就變成 `(a & b) + ((a ^ b) >> 1)`。 ### 比較不同時做的組合語言輸出使用 `gcc -S -masm=intel -Os` 來產生組合語言的輸出。 - `(a + b) / 2` ```c lea eax, [rdi+rsi] // Load effective address, 將 rdi+rsi 的結果放到 eax shr eax // logical shift right ret // return ``` - `low + (high - low) / 2` ```c mov eax, esi sub eax, edi shr eax add eax, edi ret ``` - `(a >> 1) + (b >> 1) + (a & b & 1);` ```c mov eax, edi mov edx, esi and edi, esi shr eax shr edx and edi, 1 add eax, edx add eax, edi ret ``` - `(a & b) + ((a ^ b) >> 1)` ```c mov eax, edi and edi, esi xor eax, esi shr eax add eax, edi ret ``` 單純的比較指令數 `(a + b) / 2`，會是最少的，我嘗試以下程式碼來測試 x64 的電腦是否在計算過程中會溢位，結果是**不會的**。 ```c #include <stdio.h> #include <stdint.h> #include <limits.h> uint32_t average(uint32_t a, uint32_t b) { return (a & b) + ((a ^ b) >> 1); } int main() { uint32_t a = UINT_MAX, b = UINT_MAX; printf("%u\n", average(a, b)); } ``` ## 測驗二 ```c uint32_t max(uint32_t a, uint32_t b) { return a ^ ((a ^ b) & -(a < b)); } ``` 我們從 `a < b` 開始思考， - 當 a < b 結果會是 1，取負數後會是 `0xFFFFFFFF` 與 `(a ^ b)` 做 AND 還是 `(a ^ b)`，最後的結果會是 `a ^ a ^ b` 也就是 b。 - 當 a => b 結果會是 0，取負數後還是 0，與 `(a ^ b)` 做 AND 還是 0，最後的結果會是 `a ^ 0` 也就是 a。 ## 測驗三 ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << shift; } ``` 最前面迴圈，目的是為了計算兩個數字的公因數且是 2 的指數，先計算這個的好處就是方便，因為以二進位來說判斷最後一個 bit 是不是 0 就能知道是不是 2 的倍數。因此這邊迴圈結束後，可得到他們的公因數 `1 << shift` ，且 u 和 v 都不會是 2 的倍數了。再來看 do-while 迴圈，這邊就是類似 GCD 的作法，但去除了取於數的方式，改成不斷相減，當 v 等於 0 的時候，就是餘數為零了。最後的結果 u 要在乘上一開始除掉的公因數，最後結果就是 `u * (1 << shift)` 也就是 `u << shift`。 ## 測驗四先觀察原本函式的功能 - `bitmap` 是要處理的資料，以 64 bits 為區間 - `bitmapsize` 是總共有要處的資料大小 - `out` 是結果存放的位置在第一層迴圈中 `bitset` 會儲存目前處理資料，第二層的迴圈目的是要找到資料的位元是 1 的位置並且去除掉 1 後，將位置記錄在 `out` 中。 ```c #include <stddef.h> size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; for (size_t k = 0; k < bitmapsize; ++k) { uint64_t bitset = bitmap[k]; size_t p = k * 64; for (int i = 0; i < 64; i++) { if ((bitset >> i) & 0x1) out[pos++] = p + i; } } return pos; } ``` 嘗試一個簡單的[例子](https://onlinegdb.com/qxgIaXhvp) ``` bitmap[0] = 0x11 bitmapsize = 1 ``` 我們可以得到 `bitmap` 中 1 的數量以及 1 的位置 ``` pos = 2 out = {0, 4} ``` 接下來透過 [`__builtin_ctzll`](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 將函式改寫 `__builtin_ctzll` 用來找到由低位往高位的第一個 1 中間會有幾個 0，因此可以用來取代重複判斷的 `if ((bitset >> i) & 0x1)` ，就能減少分支的次數，在硬體有支援的情況下可以加速運算，再來看 `while` 的條件是 `bitset != 0`，加上 `__builtin_ctzll` 的特性，因此我們可以知道，當我們找到 `bitset` 的 1 時，要將 1 去處除掉，才有辦法利用 `__builtin_ctzll` 計算 1 的位置。 ```c #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = bitset & -bitset; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` 當我們在思考如何去除掉某個位元的 1 時，第一個想法可能會是與一個在對應位置的 0 做 bitwise AND ```c bitset &= ~(1 << r) ```  但在這邊的做法利用了**二補數**來去除最低位元的 1。 ```c bitset ^= bitset & -bitset ``` 在我們剛剛的計算中，我們目標都是 `bitset` 中最低位的 1，例如 XXX**1**00。當我們對 `bitset` 做二補數計算時，最低位的 1 左邊的位元會是原本的補數，右邊則會與原本一樣。 $$ 0110100_2 \xrightarrow{取二補數} 1001100_2 \\ 0110100_2 \land 1001100_2 = 0000100_2 \\ 0110100_2 \oplus 0000100_2 = 0110000_2 $$ 與原本的 `bitset` 做 XOR 就能將最右邊的 1 變成 0 了。 ## 測驗五 ```c #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "list.h" struct rem_node { int key; int index; struct list_head link; }; static int find(struct list_head *heads, int size, int key) { struct rem_node *node; int hash = key % size; list_for_each_entry (node, &heads[hash], link) { if (key == node->key) return node->index; } return -1; } char *fractionToDecimal(int numerator, int denominator) { int size = 1024; char *result = malloc(size); char *p = result; if (denominator == 0) { result[0] = '\0'; return result; } if (numerator == 0) { result[0] = '0'; result[1] = '\0'; return result; } /* using long long type make sure there has no integer overflow */ long long n = numerator; long long d = denominator; /* deal with negtive cases */ if (n < 0) n = -n; if (d < 0) d = -d; bool sign = (float) numerator / denominator >= 0; if (!sign) *p++ = '-'; long long remainder = n % d; long long division = n / d; sprintf(p, "%ld", division > 0 ? (long) division : (long) -division); if (remainder == 0) return result; p = result + strlen(result); *p++ = '.'; /* Using a map to record all of reminders and their position. * if the reminder appeared before, which means the repeated loop begin, */ char *decimal = malloc(size); memset(decimal, 0, size); char *q = decimal; size = 1333; struct list_head *heads = malloc(size * sizeof(*heads)); for (int i = 0; i < size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; remainder; i++) { int pos = find(heads, size, remainder); if (pos >= 0) { while (pos-- > 0) *p++ = *decimal++; *p++ = '('; while (*decimal != '\0') *p++ = *decimal++; *p++ = ')'; *p = '\0'; return result; } struct rem_node *node = malloc(sizeof(*node)); node->key = remainder; node->index = i; list_add(&node->link, &(heads + (remainder % size ))); *q++ = (remainder * 10) / d + '0'; remainder = (remainder * 10) % d; } strcpy(p, decimal); return result; } ``` ## 測驗六 ```c /* * ALIGNOF - get the alignment of a type * @t: the type to test * * This returns a safe alignment for the given type. */ #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->_h) - (char *)0) ``` `alignof` 的目的在於對齊位置，不同的機器在某些資料類型上大小並不相同，因此不能直接將位移量寫死，這時就需要 `alignof` 的幫助。 ## 測驗七 ```c static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << div3) << div5; char fmt[9]; strncpy(fmt, &"FizzBuzz%u"[(9 >> div5) >> (div3 << 2)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` `(9 >> div5) >> (div3 << 2)], length)` 的 9 應為 8，才不會印出 u 來。