2020q3 第 4 週測驗題

# [2020q3](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 第 4 週測驗題 ###### tags: `sysprog2020` :::info 目的: 檢驗學員對 [C 語言指標](https://hackmd.io/@sysprog/c-pointer), [數值系統](https://hackmd.io/@sysprog/c-numerics) 及 [bitwise 操作](https://hackmd.io/@sysprog/c-bitwise) 的認知 ::: ==[作答表單](https://docs.google.com/forms/d/e/1FAIpQLSeeCzlevUyMjcUSpkdh3sVp9Zwqg-MSUHv0JU0yt5XLgKBr9Q/viewform)== ### 測驗 `1` LeetCode [461. Hamming Distance](https://leetcode.com/problems/hamming-distance/) 提及，兩個整數間的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) 為其二進位的每個位元的差。請計算輸入參數兩整數 x 與 y 的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance)，例如整數 `1` 的二進位為 `0 0 0 1`，而整數 `4` 的二進位為 `0 1 0 0`，則 `1` 與 `4` 的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) 為 `2`。 ``` 1 (0 0 0 1) 4 (0 1 0 0) [ ] [ ] | | \_ 2 _/ ``` ![](https://i.imgur.com/q9YzhCs.png) 上圖 [hypercube](https://en.wikipedia.org/wiki/Hypercube) (中文翻譯：超方形) 中，紅色路徑的 `0100` 到 `1001` 的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) 為 `3`，而藍色路徑的 `0110` 到 `1110` 的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) 則是 `1`。運用[第 3 周測驗](https://hackmd.io/@sysprog/2020-quiz3) 提到的 `__builtin_popcount` (gcc extension 之一)，我們可實作如下: ```cpp int hammingDistance(int x, int y) { return __builtin_popcount(x OP y); } ``` 請補完程式碼。參考資訊: * [詳解二進位數中 `1` 的個數](http://github.tiankonguse.com/blog/2014/11/16/bit-count-more.html) ==作答區== OP = ? * `(a)` `|` * `(b)` `&` * `(c)` `^` * `(d)` `+` * `(e)` `-` :::success 延伸問題: 1. 解釋上述程式碼運作原理，並舉出不使用 gcc extension 的 C99 實作程式碼; 2. 練習 LeetCode [477. Total Hamming Distance](https://leetcode.com/problems/total-hamming-distance/)，你應當提交 (submit) 你的實作 (C 語言實作)，並確保執行結果正確且不超時 3. 研讀[錯誤更正碼簡介](https://w3.math.sinica.edu.tw/math_media/d184/18404.pdf)及[抽象代數的實務應用](http://bit.ly/abstract-algebra)，摘錄其中 Hamming Distance 和 Hamming Weight 的描述並重新闡述，舉出實際的 C 程式碼來解說; - 搭配閱讀 [Hamming codes, Part I](https://youtu.be/X8jsijhllIA) 及 [Hamming codes, Part II](https://youtu.be/b3NxrZOu_CE)，你可適度摘錄影片中的素材，但應當清楚標註來源 4. 研讀 [Reed–Solomon error correction](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction)，再閱讀 Linux 核心原始程式碼的 [lib/reed_solomon 目錄](https://github.com/torvalds/linux/tree/master/lib/reed_solomon)，抽離後者程式碼為單獨可執行的程式，作為 ECC 的示範。 ::: --- ### 測驗 `2` LeetCode [1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) 大意: > 給你一棵樹，樹上有 n 個節點，編號自 0 到 $n - 1$。樹以父節點陣列的形式提供，其中 `parent[i]` 是節點 i 的父節點。樹的根節點是編號為 0 的節點。請設計 `treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k)` 函式，回傳節點 node 的第 k 個祖先節點。若不存在這樣的祖先節點，回傳 `-1`。樹節點的第 k 個祖先節點是從該節點到根節點路徑上的第 k 個節點 ![](https://i.imgur.com/xT8OuCR.png) Input: ```json ["TreeAncestor","getKthAncestor","getKthAncestor","getKthAncestor"] [[7,[-1,0,0,1,1,2,2]],[3,1],[5,2],[6,3]] ``` :::info :information_source: 上方為 [JSON](https://www.json.org/json-en.html) 格式，其中: * `7` 表示共有 7 個節點 * `GetKthAncestor(3, 1)` 預期回傳 `1`，後者是 `3` 的父節點，即「上 `1` 代」; * `GetKthAncestor(5, 2)` 預期回傳 `0`，後者是 `5` 的祖父節點，即「上 `2` 代」; * `GetKthAncestor(6, 3)` 預期回傳 `-1`，因為不存在這樣的節點 ::: Output: ```json [null,1,0,-1] ``` 以下是參考 C 程式實作: ```cpp= #include <stdlib.h> typedef struct { AAA; int max_level; } TreeAncestor; TreeAncestor *treeAncestorCreate(int n, int *parent, int parentSize) { TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); int max_level = 32 - __builtin_clz(n) + 1; for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; for (int j = 1;; j++) { int quit = 1; for (int i = 0; i < parentSize; i++) { obj->parent[i][j] = obj->parent[i][j + BBB] == -1 ? -1 : obj->parent[obj->parent[i][j - 1]][j - 1]; if (obj->parent[i][j] != -1) quit = 0; } if (quit) break; } obj->max_level = max_level - 1; return obj; } int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (CCC)) node = obj->parent[node][i]; return node; } void treeAncestorFree(TreeAncestor *obj) { for (int i = 0; i < obj->n; i++) free(obj->parent[i]); free(obj->parent); free(obj); } ``` 請補完。 ==作答區== AAA = ? * `(a)` `int ***parent` * `(b)` `int **parent` * `(c)` `int *parent` BBB = ? * `(a)` `(-2)` * `(b)` `(-1)` * `(c)` `0` * `(d)` `1` * `(e)` `2` CCC = ? * `(a)` `1` * `(b)` `i` * `(c)` `i >> 1` * `(d)` `i >> k` * `(e)` `k << i` * `(f)` `1 << i` :::success 延伸問題: 1. 解釋上述程式碼運作原理，指出可改進的地方，並實作對應的程式碼; 2. 在 `treeAncestorCreate` 函式內部，若干記憶體被浪費，請撰寫完整測試程式碼，並透過工具分析; 3. 在 LeetCode [1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) 提交你的實作，你應該要在 C 語言項目中，執行時間領先 75% 的提交者; ::: --- ### 測驗 `3` :::info > 白板 coding 其實本意 (最好不要是) 不是考一些你已經會的東西，而是考一個你可能不會的問題，然後你要 keep trying, keep doing 下去，因為它的本質是在考一個未來你可能碰到的問題 (而且可能 Google 不到)。出處: [簡單的 FizzBuzz 藏有深度 (Google 面試題)](https://medium.com/@Bear_/%E7%B0%A1%E5%96%AE%E7%9A%84-fizzbuzz-%E8%97%8F%E6%9C%89-%E6%B7%B1%E5%BA%A6-google-%E9%9D%A2%E8%A9%A6%E9%A1%8C-f5e66e3a97be) ::: 考慮貌似簡單卻蘊含實作深度的 [FizzBuzz](https://en.wikipedia.org/wiki/Fizz_buzz) 題目: * 從 1 數到 n，如果是 3的倍數，印出 "Fizz" * 如果是 5 的倍數，印出 "Buzz" * 如果是 15 的倍數，印出 "FizzBuzz" * 如果都不是，就印出數字本身直覺的實作程式碼如下: (`naive.c`) ```cpp #include <stdio.h> int main() { for (unsigned int i = 1; i < 100; i++) { if (i % 3 == 0) printf("Fizz"); if (i % 5 == 0) printf("Buzz"); if (i % 15 == 0) printf("FizzBuzz"); if ((i % 3) && (i % 5)) printf("%u", i); printf("\n"); } return 0; } ``` 觀察 `printf` 的(格式)字串，可分類為以下三種: 1. 整數格式字串 `"%d"` : 長度為 2 B 2. "Fizz" 或 "Buzz" : 長度為 4 B 3. "FizzBuzz" : 長度為 8 B 考慮下方程式碼: ```cpp #define MSG_LEN 8 char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[start], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); ``` 我們若能精準從給定輸入 `i` 的規律去控制 `start` 及 `length`，即可符合 FizzBuzz 題意: ```cpp string literal: "fizzbuzz%u" offset: 0 4 8 ``` 以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (`bitwise.c`) ```cpp #define MSG_LEN 8 static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char *argv[]) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << div3) << div5; char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` gcc-9 還內建了 [FizzBuzz optimization](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853) (Bug 82853 - Optimize x % 3 == 0 without modulo)。請補完。 ==作答區== KK1 = ? * `(a)` div5 * `(b)` div3 * `(c)` 2 * `(d)` 1 KK2 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` div3 * `(e)` div5 KK3 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` div3 * `(e)` div5 :::success 延伸問題: 1. 解釋上述程式運作原理並評估 `naive.c` 和 `bitwise.c` 效能落差 * 避免 stream I/O 帶來的影響，可將 `printf` 更換為 `sprintf` 2. 分析 [Faster remainders when the divisor is a constant: beating compilers and libdivide](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/) 一文的想法 (可參照同一篇網誌下方的評論)，並設計另一種 bitmask，如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」，然後再改寫 `bitwise.c` 程式碼，試圖運用更少的指令來實作出 branchless; * 參照 [fastmod: A header file for fast 32-bit division remainders on 64-bit hardware](https://github.com/lemire/fastmod) ::: --- ### 測驗 `4` 考慮到整數 `PAGE_QUEUES` 可能有以下數值: * (1 or 2 or 3 or 4) * (5 or 6 or 7 or 8) $\times$ (2^n^), n from 1 to 14 給定 `size_t offset`，試著將原本運算: ```cpp #include <stdint.h> size_t blockidx = offset / PAGE_QUEUES; ``` 由於 `PAGE_QUEUES` 的數值分佈已知，在整數除法時，可思考加速運算的機制。需要注意，某些硬體平台缺乏整數除法指令 (如 Arm Cortex-A8)，即使 Intel 公司出品的 Core 處裡器 [Sandy Bridge 微架構](https://en.wikipedia.org/wiki/Sandy_Bridge)中，針對 64 位元整數的除法運算，會佔用 40 到 103 個處理器週期，運算成本仍屬沈重。 > 來源: [Agner Fog’s instruction tables](https://www.agner.org/optimize/instruction_tables.pdf)，第 180 頁於是我們可預先進行計算，從而避免整數除法指令的使用: (假設在 x86_64 硬體執行 GNU/Linux 並透過 gcc/clang 編譯) ```cpp #include <stdint.h> #define ffs32(x) ((__builtin_ffs(x)) - 1) size_t blockidx; size_t divident = PAGE_QUEUES; blockidx = offset >> ffs32(divident); divident >>= ffs32(divident); switch (divident) { CASES } ``` 其中 `CASES` 可能形如: ```cpp case 2: blockidx /= 2; break; ``` 這樣的組合，請用最少的 case-break 敘述做出同等 `size_t blockidx = offset / PAGE_QUEUES;` 效果的程式碼。參考資料: * 摘自 [Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html): > Built-in Function: `int __builtin_ffs (int x)` > Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero. * [Integer division is slow](https://kristerw.blogspot.com/2017/05/integer-division-is-slow.html) ==作答區== `CASES` 至少該包含哪些數字: (複選) * `(a)` 2 * `(b)` 3 * `(c)` 4 * `(d)` 5 * `(e)` 6 * `(f)` 7 * `(g)` 8 * `(i)` 9 * `(j)` 10 :::success 延伸問題: 1. 解釋程式運算原理，可搭配 Microsoft Research 的專案 [snmalloc](https://github.com/microsoft/snmalloc) 來解說; * 對應的原始程式碼 [src/mem/sizeclass.h](https://github.com/microsoft/snmalloc/blob/master/src/mem/sizeclass.h#L54-L145) 2. 練習 LeetCode [51. N-Queens](https://leetcode.com/problems/n-queens/)，應選擇 C 語言並善用上述的 `__builtin_ffs`，大幅降低記憶體開銷; ::: ---