2022q1 Homework3 (quiz3)

--- tags: Linux, linux2022 --- # 2022q1 Homework3 (quiz3) contributed by < `NOVBobLee` > > [作業要求](https://hackmd.io/@sysprog/BJJMuNRlq) ## 測驗 1 可產生 [LP64](https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models) 中 `unsigned long` 寬度且對應輸入參數控制 bits 位置的 [bitmask](https://en.wikipedia.org/wiki/Mask_(computing)) 巨集。 ```c #define GENMASK(h, l) \ (((~0UL) >> (LEFT)) & ((~0UL) >> (l) << (RIGHT))) ``` 預期結果： * `GENMASK(6, 4)` 產生 $01110000_2$ (若程式碼改為 8 位元版本) * `GENMASK(39, 21)` 產生 `0x000000ffffe00000` ( 64 位元) 從現在的巨集上可以看到全為 `1`-bit 的 `(~0UL)` 做左移右移，最後兩個數再做 AND 運算產生出最後結果。從預期結果可以觀察到， `l` 對應的是 [trailing zero](https://en.wikipedia.org/wiki/Find_first_set) 的數量，所以在 `GENMASK` AND 運算右邊的部份有一個右移 `l` 位的運算，這部份的 mask 只是要將那 `l` 個 bits 給歸零，所以應該還要再左移將零補回那 `l` 個位元位置，即 RIGHT 應為 `l` 。 ```graphviz digraph { node [shape=record]; rankdir=LR; te2 [ shape=plaintext; label="(uint8_t)(~0) >> 3 << 3"; ]; bits2 [ label="{1111 1000|____ ____}"; ]; te1 [ shape=plaintext; label="(uint8_t)(~0) >> 3"; ]; bits1 [ label="{0001 1111|111_ ____}"; ]; te0 [ shape=plaintext; label="(uint8_t)(~0)"; ]; bits0 [ label="{1111 1111|____ ____}"; ]; te2 -> bits2 [style=invis]; te1 -> bits1 [style=invis]; te0 -> bits0 [style=invis]; } ``` 剩下要處理 AND 運算的左邊 mask 部份，從預期結果觀察 `h` 為 6 時要右移 1 位（ 8 位元）， `h` 為 39 時要右移 20 位（ 64 位元），猜測關係式為 `h + 右移位數 + 1 = 總位元寬度` ，所以 LEFT 應為 `64 - h - 1` 。 :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 比較 Linux 核心 `GENMASK` 巨集的實作，闡述其額外的考量 3. 舉出 Linux 核心原始程式碼中二處 `GENMASK` 巨集和 [include/linux/bitfield.h](https://github.com/torvalds/linux/blob/master/include/linux/bitfield.h) 的應用案例 ::: ## 測驗 2 在 `struct foo` 使用 forward declaration 的情況下，輸入一地址 `v` 型態轉換到 `struct fd` 上（ `struct foo` 為 4 位元對齊）。 ```c struct foo; struct fd { struct foo *foo; unsigned int flags; }; enum { FOO_DEFAULT = 0, FOO_ACTION, FOO_UNLOCK, } FOO_FLAGS; static inline struct fd to_fd(unsigned long v) { return (struct fd){EXP1, v & 3}; } ``` 觀察地址 `v` 有兩個作用，一為地址，二為 `FOO_FLAGS` 。因為 4 位元對齊的關係，可以將 flag 的資訊塞到用不到的兩個 bits 上，即 $2^0$ 和 $2^1$ 的位置上。這樣看來我們要做的事情就是將那兩個 bits 給 mask 掉，或者說做向下對齊（ C++ Boost 描述的 [align_down](https://www.boost.org/doc/libs/1_78_0/doc/html/align/reference.html#align.reference.functions.align_down) ），這樣應該就是填 `v & ~3` ，但注意 `v` 現在型態為 `unsigned long` ，所以還要再做一個型態轉換，所以 EXP1 應為 `(struct foo*)(v & ~3)` 。 :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 在 Linux 核心原始程式碼找出類似技巧的實作 (例如檔案系統) 並解說 ::: 在 Linux 核心裡有用塞資訊在對齊空白裡的技巧，像是紅黑樹的節點，一個節點上會儲存三個地址，分別是父節點和兩個子節點，而顏色要存在哪裡，他是存在父節點裡，名稱是 `__rb_parent_color` ，樣子是 `((parent node's address) | (current node's color))` ，紅為 `0` ，黑為 `1` ，要取出父節點的地址時，可以使用 `__rb_parent` 巨集。 ```c /* include/linux/rbtree_types.h*/ struct rb_node { unsigned long __rb_parent_color; struct rb_node *rb_right; struct rb_node *rb_left; } __attribute__((aligned(sizeof(long)))); ``` 因為儲存顏色只需要一個 bit ，比較節省空間的方式是利用對齊的特性，把資訊塞到用不到的位元上。 ```c /* include/linux/rbtree_augmented.h */ #define RB_RED 0 #define RB_BLACK 1 #define __rb_parent(pc) ((struct rb_node *)(pc & ~3)) ``` ## 測驗 3 8 位元無號整數位元反轉，比較 tricky 的方法，直接看演試圖吧。 ``` original: a b c d e f g h ---------------------------------- 1st line: (a b c d)(e f g h) <- swap 1 with 2 1 2 result1 : (e f g h)(a b c d) 2 1 ---------------------------------- 2nd line: (e f)(g h)(a b)(c d) <- swap 1 with 2, 3 with 4 1 2 3 4 result2 : (g h)(e f)(c d)(a b) 2 1 4 3 ---------------------------------- 3rd line: (g)(h)(e)(f)(c)(d)(a)(b) <- swap per two elements 1 2 3 4 5 6 7 8 result3 : (h)(g)(f)(e)(d)(c)(b)(a) 2 1 4 3 6 5 8 7 ``` 程式碼裡用的交換方式是使用 bitmask 選擇 bits 區塊，然後分別左移和右移後，使用 OR 運算將兩個左右移後的結果合成。以下圖示為 `a b c d` 交換 `(a b)` 和 `(c d)` 。 ```graphviz graph { node [shape=record]; a; b; c; d; } ``` 1. 使用 bitmask `0xC` (`1100`) 得到 `(a b)` ；使用 bitmask `0x3` (`0011`) 得到 `(c d)` 。 ```graphviz graph { node [shape=record]; a; b; c [style=invis]; d [style=invis]; } ``` ```graphviz graph { node [shape=record]; a [style=invis]; b [style=invis]; c; d; } ``` 2. 對 `(a b)` 右移 2 位；對 `(c d)` 左移 2 位。 ```graphviz digraph { node [shape=record]; c [style=invis]; d [shape=plaintext; label="----\>"]; a; b; } ``` ```graphviz digraph { node [shape=record]; c; d; a [shape=plaintext; label="\<----"]; b [style=invis]; } ``` 3. 對 `(a b)` 和 `(c d)` 使用 OR 運算作合成的工作，也就完成了交換。 ```graphviz graph { node [shape=record]; c; d; a; b; } ``` 依照程式碼和圖示，主要工作就是利用 bitmask 選擇要交換的 bits 區域，然後做交換。第一行直接左移 4 位和右移 4 位，然後用 OR 運算完成交換。第二行剛開使用 `0xCC` (`1100'1100`) 作 bitmask ，然後右移 2 位準備合成，那我們要做的是對稱的部份，所以用 `0x33` (`0011'0011`) 作 bitmask ，然後左移 2 位，即 EXP2 應填 `(x & 0x33) << 2` 。而剩下的第三行先用 `0xAA` (`1010'1010`) 作 bitmask 後右移 1 位，而之後的對稱部份使用 `0x55` (`0101'0101`) 作 bitmask 後左移 1 位，所以 EXP3 應填 `(x & 0x55) << 1` 。 ```c #include <stdint.h> uint8_t rev8(uint8_t x) { x = (x >> 4) | (x << 4); x = ((x & 0xCC) >> 2) | (EXP2); x = ((x & 0xAA) >> 1) | (EXP3); return x; } ``` :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 在 Linux 核心原始程式碼找出類似技巧的實作並解說其應用場景 ::: Linux bits rotate encrypto ## 測驗 4 寫出符合對應要求的 `foreach` 巨集。 ```c /* foreach_ 巨集使用例子 */ int i; foreach_int(i, 0, -1, 100, 0, -99) { printf("i is %i\n", i); } const char *p; foreach_ptr(p, "Hello", "world") { printf("p is %s\n", p); } ``` 從程式碼與預期輸出來看，是將第二個以後（含）的引數輪流帶入變數中（第一個引數）。 ```shell ## 程式碼預期輸出 i is 0 i is -1 i is 100 i is 0 i is -99 p is Hello p is world ``` 兩個巨集都有使用 [`__VA_ARGS__`](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) ，然後再用 [designated initializers](https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html) 將這些 `__VA_ARGS__` 的引數放到一個陣列裡，之後 `foreach_` 巨集只需要輪流拿陣列裡的元素 assign 到指定的變數裡即可。 ```c #define foreach_int(i, ...) \ for (unsigned _foreach_i = (((i) = ((int[]){__VA_ARGS__})[0]), 0); \ _foreach_i < sizeof((int[]){__VA_ARGS__}) / sizeof(int); \ (i) = ((int[]){__VA_ARGS__, 0})[EXP4]) ``` 先看 `foreach_int` 初始化，使用 designated initializer 將 `__VA_ARGS__` 放到陣列裡，然後取出第零個元素到變數 `i` 中。 ```c ((i) = ((int[]){__VA_ARGS__})[0]) ``` 但要輪流取陣列元素需要一個變數來紀錄目前元素的位置，該變數為 `_foreach_i` ，使用了 [comma operator](https://stackoverflow.com/a/52558/16257547) ，所以初始的 `_foreach_i` 是 `0` 。 ```c unsigned _foreach_i = (((i) = ((int[]){__VA_ARGS__})[0]), 0); ``` > N1256 (6.5.17) > The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value. If an attempt is made to modify the result of a comma operator or to access it after the next sequence point, the behavior is undefined. 停止條件就單純的檢查是否還在陣列之中。 ```c _foreach_i < sizeof((int[]){__VA_ARGS__}) / sizeof(int); ``` 更新的部份有兩個變數需要更新，一個是 `i` 紀錄的是元素，另一個是 `_foreach_i` 紀錄的是元素位置，由於兩個放在一個敘述裡，所以 EXP4 應該填 `++_foreach_i` 。 ```c (i) = ((int[]){__VA_ARGS__, 0})[EXP4] ``` 這裡陣列初始化多了一個元素，是 `for` 的流程所需，不然在 `++_foreach_i` 的情況下，會在停止迴圈前讀取到超出陣列外的元素。 ```c #include <assert.h> #define _foreach_no_nullval(i, p, arr) \ assert((i) >= sizeof(arr) / sizeof(arr[0]) || (p)) #define foreach_ptr(i, ...) \ for (unsigned _foreach_i = \ (((i) = (void *) ((typeof(i)[]){__VA_ARGS__})[0]), 0); \ (i); (i) = (void *) ((typeof(i)[]){__VA_ARGS__, \ NULL})[EXP5], \ _foreach_no_nullval(_foreach_i, i, \ ((const void *[]){__VA_ARGS__}))) ``` 基本上原理跟 `foreach_int` 一樣， EXP5 也是填 `++_foreach_i` 。在 `foreach_ptr` 的引數有個限制， `__VA_ARGS__` 中不能有 `NULL` 指標出現，這會在每次更新完 `i` 之後，用 `_foreach_no_nullval` 檢查。由於停止條件是看 `i` 是否為 `NULL` 指標，所以會要求 `__VA_ARGS__` 裡不能有 `NULL` 出現，所以合理的情況是在陣列元素 `__VA_ARGS__` 走完之前 `i` 都不是 `NULL` 指標，只有在陣列走到最後一個（ `__VA_ARGS__` 之後另外多加的 `NULL` 指標）的時候，才會出現 `NULL` 指標，即我們加入的停止用指標。不過有個疑問，何不將變數 `i` 保持原型態，為何要換成 `(void *)` 型態？ > N1256 (6.3.2.3-1) > A pointer to void may be converted to or from a pointer to any incomplete or object type. A pointer to any incomplete or object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer. :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 在 Linux 核心原始程式碼找出類似技巧的實作並解說其應用場景 ::: ## 測驗 5 針對 LeetCode [Divide Two Integers](https://leetcode.com/problems/divide-two-integers/) 可能的實作，除數 `divisor` 不為零，要求無法使用乘法、除法、 modulo 運算，而回傳結果是整數（商的小數部份直接捨去）。 ```c #include <limits.h> int divide(int dividend, int divisor) { int signal = 1; unsigned int dvd = dividend; if (dividend < 0) { signal *= -1; dvd = ~dvd + 1; } unsigned int dvs = divisor; if (divisor < 0) { signal *= -1; dvs = ~dvs + 1; } int shift = 0; while (dvd > (EXP6)) /* dvs << shift */ shift++; unsigned int res = 0; while (dvd >= dvs) { while (dvd < (dvs << shift)) shift--; res |= (unsigned int) 1 << shift; EXP7; /* dvd -= dvs << shift */ } if (signal == 1 && res >= INT_MAX) return INT_MAX; return res * signal; } ``` 過程跟除法大同小異，不同的地方在於沒有乘法，代替的是左移運算，換句話說正確的商會由 $2$ 的冪組成，原本有乘法的話（十進位）可以一次乘到位，換成左移運算後，要用多個 $2$ 的冪去堆積出來，不過從二進位的角度看也是一次乘到位啦。 ```c int signal = 1; unsigned int dvd = dividend; if (dividend < 0) { signal *= -1; dvd = ~dvd + 1; } unsigned int dvs = divisor; if (divisor < 0) { signal *= -1; dvs = ~dvs + 1; } ``` 前兩個區塊檢查兩個輸入 `dividend` 和 `divisor` 是否為負數，之後的運算都在其對應的無號變數 (`dvd`, `dvs`) 裡進行，若原為負數則將之轉為正數，這樣在無號變數裡的值大小才會一樣，並將兩個的正負號紀錄在 `signal` ，在最後回傳的結果乘進去。 ```c int shift = 0; while (dvd > (EXP6)) shift++; ``` 再來的兩個區塊就是做除法，第一個 `while` 計算商最高的位數要到多少 $2$ 的次方，所以 EXP6 為 `dvs << shift` ，之後除法就從這個最高的 `shift` 位數往下計算。 ```c unsigned int res = 0; while (dvd >= dvs) { while (dvd < (dvs << shift)) shift--; res |= (unsigned int) 1 << shift; EXP7; } ``` 下一個區塊有兩個 `while` 迴圈，最外面的是除法的停止條件 `dvd >= dvs` ，即除數大於被除數的時候。在裡面流程是檢查 `shift` 位數的商是否是否計算完成（ `while (dvd < (dvs << shift))` ），還沒則不要移動位數，若完成則移動到下一個位數（ `shift--` ）。因為這是二進位的除法，商的計算流程也就在該位數的商加上一個 bit ，將餘數 `dvd` 更新，繼續下一個位數的除法，即 EXP7 為 `dvd -= dvs << shift` 。 ```c if (signal == 1 && res >= INT_MAX) return INT_MAX; return res * signal; ``` 最後來討論溢位的問題，比方說被除數 `dividend` 是 `INT_MIN` 時，轉到 `dvd` 裡然後做 `~dvd + 1` 的運算後會出現 `0x8000'0000` 這個數，超出 `int` 的上限，這時若除數 `dvs` 是 `1` 的話，使用除法會出現 `res >= INT_MAX` 為真的情況，所以在 LeetCode 裡最後有加一個條件，若發生溢位的問題，即用 `INT_MAX` 表示最終結果。 :::info 延伸問題: 1. 解釋上述程式碼運作原理，指出可改進之處，並予以實作 2. 在 Linux 核心原始程式碼中找出針對整數/浮點數/[定點數](https://en.wikipedia.org/wiki/Fixed-point_arithmetic)除法特別撰寫的程式碼，並予以討論 ::: ## 測驗 6 針對 LeetCode [Max Points on a Line](https://leetcode.com/problems/max-points-on-a-line/) 可能的實作，在二維平面上有若干點，若有共線出現，哪條共線上的點數量最多。 ```c #include <stdbool.h> #include "list.h" struct Point { int x, y; }; struct point_node { int p1, p2; struct list_head link; }; ``` `struct Point` 為點，兩個成員紀錄座標。 `struct point_node` 為兩點連接的線， `p1`, `p2` 紀錄兩個點，若有共線則用 `link` 以 linked list 的方式將共線的線都連在一起。 ```c static bool can_insert(struct list_head *head, int p1, int p2) { struct point_node *pn; list_for_each_entry (pn, head, link) return EXP8; return true; } static int gcd(int x, int y) { while (y) { int tmp = y; y = x % y; x = tmp; } return x; } ``` 因為要紀錄共線的線，除了水平共線、垂直共線和重疊的點，另外還有斜的共線（斜率在 $(0, \infty)$ 範圍），對於多種不同的斜率的共線，這裡使用 hash table 去紀錄，在插入前會用 `can_insert` 確認是否可以插入，這是檢查是否為重複計算。在共線裡，若總共有 $k$ 個點，則會產生 $C^k_2$ 個線段，這裡使用 `can_insert` 來篩選掉重複的部份，只紀錄必要的線段，在最終計算中，計算此共線裡的點數量，只需將紀錄的線段數加一即可，詳細請見下圖範例，所以 EXP8 為 `pn->p1 == p1` 。以下圖為例（配合程式碼 `maxPoints` 的雙 `for` 迴圈部份）， `points[]` 假設元素和順序為 `A B C D E F` ，他剛開始會檢查 `AB` 這條線段，再來依序檢查的線段是 `AC`, `AD`, `AE`, `AF`, `BC`, `BD` 等等，這裡就討論共線、平行線和重複計算的部份，因為共線然後會通過 `can_insert` 檢查的會有 `AB`, `AC`, `AD` ，而不會通過的會是 `BC`, `BD`, `CD` ，這些不會通過是因為要排除重複計算的線段。但是使用 `can_insert` 也會排除掉 `EF` 這個平行線，是有漏洞的。 (TODO: fix bug) ![平行線、共線、重疊點](https://i.imgur.com/a0xUzzj.png) 對於斜率要怎麼紀錄，這裡是使用最簡分數，兩點的斜率要用兩座標的差值計算，使用 `float` 或 `double` 都會有誤差產生，對於如何決定是否共線也有問題，不如不計算，直接紀錄最簡分數。而最簡分數即分子、分母互質的狀態，所以使用 `gcd` 幫忙計算最大公因數，輔助取得最簡分數。 ```c static int maxPoints(struct Point *points, int pointsSize) { if (pointsSize <= 2) return pointsSize; int i, j, slope_size = pointsSize * pointsSize / 2 + 133; int *dup_cnts = malloc(pointsSize * sizeof(int)); int *hori_cnts = malloc(pointsSize * sizeof(int)); int *vert_cnts = malloc(pointsSize * sizeof(int)); int *slope_cnts = malloc(slope_size * sizeof(int)); memset(hori_cnts, 0, pointsSize * sizeof(int)); memset(vert_cnts, 0, pointsSize * sizeof(int)); memset(slope_cnts, 0, slope_size * sizeof(int)); for (i = 0; i < pointsSize; i++) dup_cnts[i] = 1; struct list_head *heads = malloc(slope_size * sizeof(*heads)); for (i = 0; i < slope_size; i++) INIT_LIST_HEAD(&heads[i]); ``` 先將每種共線的數量初始化為零，還有初始化 hash table 準備紀錄斜的共線。 ```c for (i = 0; i < pointsSize; i++) { for (j = i + 1; j < pointsSize; j++) { if (points[i].x == points[j].x) hori_cnts[i]++, hori_cnts[j]++; if (points[i].y == points[j].y) vert_cnts[i]++, vert_cnts[j]++; if (points[i].x == points[j].x && points[i].y == points[j].y) dup_cnts[i]++, dup_cnts[j]++; if (points[i].x != points[j].x && points[i].y != points[j].y) { int dx = points[j].x - points[i].x; int dy = points[j].y - points[i].y; int tmp = gcd(dx, dy); dx /= tmp; dy /= tmp; int hash = dx * dy - 1333 * (dx + dy); if (hash < 0) hash = -hash; hash %= slope_size; if (can_insert(&heads[hash], i, j)) { struct point_node *pn = malloc(sizeof(*pn)); pn->p1 = i; pn->p2 = j; EXP9; } } } } ``` 開始檢查每兩點一組的共線情況，輪流的配對順序如下，總共配對數量為 $C^{pointsSize}_2$ 。每次檢查兩點的座標差，可以得知其相連是否為水平線、垂直線、重疊點，若都為非，則此兩點構成斜線。在垂直、水平、重疊的部份是直接紀錄數量，在斜線部份則是先紀錄到 hash table 裡，之後再計算數量，其前半流程是先算出其最簡分數，將其分子、分母代入湊雜函數得到湊雜值，使用 `can_insert` 檢查是共線還是平行線，若為共線則可以加入到 hash table 裡，所以 EXP9 為 `list_add(&pn->link, &heads[hash])`。 | `points[]` | `[0]` | `[1]` | `[2]` | `[3]` | |:--------:|:-----:|:-----:|:-----:|:-----:| | `[0]` | x | 1 | 2 | 3 | | `[1]` | x | x | 4 | 5 | | `[2]` | x | x | x | 6 | | `[3]` | x | x | x | x | (TODO: check hash function collision) ```c for (i = 0; i < slope_size; i++) { int index = -1; struct point_node *pn; list_for_each_entry (pn, &heads[i], link) { index = pn->p1; slope_cnts[i]++; } if (index >= 0) slope_cnts[i] += dup_cnts[index]; } int max_num = 0; for (i = 0; i < pointsSize; i++) { if (hori_cnts[i] + 1 > max_num) max_num = hori_cnts[i] + 1; if (vert_cnts[i] + 1 > max_num) max_num = vert_cnts[i] + 1; } for (i = 0; i < slope_size; i++) { if (slope_cnts[i] > max_num) max_num = slope_cnts[i]; } return max_num; } ``` 最後計算共線中包含的最大點數量，因為儲存的線段已為最簡狀態，所以計算方式為 $點數量 = 線段數 + 1$ ，找尋最大值後回傳。 :::info 延伸問題: 1. 解釋上述程式碼運作原理，指出可改進之處，並予以實作 2. 擴充 LeetCode 題目，考慮更多座標點的輸入 (例如超過 10 萬個) 時，設計有效的資料結構以降低記憶體開銷，並確保快速的執行 ::: 對於水平、垂直線段紀錄的方式，假設一水平共線中共有 $k$ 的點，這裡會紀錄的線段數會有 $C^k_2$ 個，而實際上最大值只要找共線最開始的點所紀錄的線段數即可。 (TODO: implement it) ## 測驗 7 給定一 32 位元無號整數，扣除開頭的 `0` ，計算最少需要多少位元來儲存。 (the minimum number of bits required to store an unsigned 32-bit value without any leading zero bits) ```c int ilog32(uint32_t v) { int ret = v > 0; int m = (v > 0xFFFFU) << 4; v >>= m; ret |= m; m = (v > 0xFFU) << 3; v >>= m; ret |= m; m = (v > 0xFU) << 2; v >>= m; ret |= m; m = EXP10; v >>= m; ret |= m; EXP11; return ret; } ``` 過程跟測驗 5 的除法有點像，以下用 8 位元無號整數舉例，最初 `v` 為 178 （十進位），剛開始會與 `0xF` 作比較（位數一半），若大於，代表說有位元在左側（ `0xF` 的 `0` 位元部份），此時右側位數就不重要了，所以右移 4 個位數。 ```graphviz digraph { node [shape=record]; rankdir=LR; mm [ label="{0|0|0|0|0|1|0|0}", xlabel="m = 4"]; ff [ label="{0|0|0|0|1|1|1|1}", xlabel="0xF" ]; vv [ label="{1|0|1|1|0|0|1|0}", xlabel="v = 178" ]; } ``` 右移完得到 `v` 為 11 （十進位），可以觀察到 `m` 代表的是右移位數，剛好會是 2 的冪，而 `ret` 是我們要回傳的答案，所以應該是我們總共位移的位數數量，所以 `m` 加進 `ret` 裡，因為是每次 `m` 都是 2 的冪且不會重複，所以可以用 OR 運算。下次比大小又是減少一半位數，即跟 `0x3` 比大小，之後就是以此類推到最後一個位數。 ```graphviz digraph { node [shape=record]; rankdir=LR; vv [ label="{0|0|0|0|1|0|1|1}", xlabel="v = 11" ]; } ``` 根據以上說明及類推，可以得到 EXP10 為 `(v > 0x3) << 1` ， EXP11 為 `ret |= v > 0x1` 。 :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 在 Linux 核心原始程式碼找出類似實作並解說其應用場景 3. 研讀論文《[Using de Bruijn Sequences to Index a 1 in a Computer Word](http://supertech.csail.mit.edu/papers/debruijn.pdf)》，探討缺乏硬體 ctz/clz 指令的微處理器上，如何實作 branchless 的 `ilog` 4. 運用〈[你所不知道的 C 語言：前置處理器應用篇](https://hackmd.io/@sysprog/c-preprocessor)〉和〈[你所不知道的 C 語言：數值系統篇](https://hackmd.io/@sysprog/c-numerics)〉提及的技巧，實作編譯時期 (compile-time) 的 `ilog2` 實作 ::: [`__ffs`](https://github.com/torvalds/linux/blob/1930a6e739c4b4a654a69164dbe39e554d228915/include/asm-generic/bitops/__ffs.h#L13) (實質 ctz, count leading zeros) ## 測驗 8 以下為 C++ 二元樹的一部份實作，利用 indirect pointer 寫出改寫 `remove_data` 。 ```cpp typedef struct tnode *tree; struct tnode { int data; tnode *left; tnode *right; tnode(int d) { data = d; left = right = 0; } }; ``` 以上宣告 `struct tnode` 二元樹的節點結構和根的結構名稱，而裡面包括初始化方法。以下為移除樹中某節點的函式，在移除點後還是要維持二元樹的結構。若移除的點沒有子節點，則可以回傳。若移除的點只有一個子節點，則可直接拿該子節點頂替空缺的位置。而比較複雜的情況，移除的點有兩個子節點，這時有兩個選擇，根據二元樹的特性，使用[中序遍歷](https://en.wikibooks.org/wiki/A-level_Computing/AQA/Paper_1/Fundamentals_of_algorithms/Tree_traversal#In-order)會依小到大的順序走訪每個節點，假設由小到大順序為 `a b c d e f g` ，移除 `d` 節點，這時 `c`, `e` 都可以頂替 `d` 節點，而這兩點的位置分別會在 `d` 的左子樹最右下的葉子（比 `d` 小的元素中最大的節點），和右子樹最左下的葉子（比 `d` 大的元素中最小的節點）。 ```cpp void remove_data(tree &t, int d) { tnode *p = t; tnode *q; while (p != 0 && p->data != d) { if (d < p->data) { q = p; p = p->left; } else { q = p; p = p->right; } } ``` 剛開始先尋找符合 `data` 為 `d` 的節點。找到後則先判斷有幾個子節點，第一個 `if` 負責零子節點和只有右節點的情況，第二個 `if` 負責只有左節點的情況。最後則是存在兩個子節點的情況，而這裡選擇的是用比 `d` 大的節點中最小的節點頂替位置。頂替完後刪除要移除的節點。 ```cpp if (p != 0) { if (p->left == 0) if (p == t) t = p->right; else if (p == q->left) q->left = p->right; else q->right = p->right; else if (p->right == 0) if (p == t) t = p->left; else if (p == q->left) q->left = p->left; else q->right = p->left; else { tnode *r = p->right; while (r->left) { q = r; r = r->left; } p->data = r->data; if (r == p->right) p->right = r->right; else q->left = r->right; p = r; } delete p; } } ``` 以上方法因為會更改到移除節點的父節點其中一個指標，所以會用 `q` 指到上一個走過的節點上（最終會停在父節點上），而 `p` 則是指在現在走到的節點上（最終停在移除節點上）。 ```graphviz digraph { node [shape=record;]; rankdir=LR; p; inv [shape=point, width=.01]; q; q -> inv[arrowhead=none]; inv -> p; } ``` 而這方法每次走動都需要更改兩個指標，不如改使用 indirect pointer 來代替，在程式碼外觀上也是簡潔許多。而該指標會指向指到 `p` 的指標上， indirect pointer 可以完成以上兩個目標，一是 `p` 的地址，二是需要修改的父節點裡的指標（即 indirect pointer 本身）。 ```graphviz digraph { node [shape=record]; rankdir=LR; indirect [shape=ellipse]; p; inv [shape=point, width=.01]; q; q -> inv [arrowhead=none, weight=2]; inv -> p; indirect -> inv; } ``` 第一個 `while` 是要找到移除節點，其利用二元收尋的方式往目標節點前進， EXP12 情況為 `d < (*p)->data` ，所以要往左子樹走，更新方式為 `p = &(*p)->left` ，而 EXP13 為相反方向，所以是 `p = &(*p)->right` 。 ```cpp void remove_data(tree &t, int d) { tnode **p = &t; while (*p != 0 && (*p)->data != d) { if (d < (*p)->data) EXP12; else EXP13; } tnode *q = *p; if (!q) return; if (!q->left) *p = q->right; else if (!q->right) *p = q->left; else { tnode **r = &q->right; while ((*r)->left) r = EXP14; q->data = (*r)->data; q = *r; *r = q->right; } delete q; } ``` EXP14 是要找右子樹中最小的節點，應該要一直往左子樹走，所以是 `&(*r)->left` 。 :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 以 C99 改寫上述程式碼，並寫出更精簡高效的實作，和最初的實作相比，探討效率的提升 3. 研讀 [Static B-Trees](https://en.algorithmica.org/hpc/data-structures/s-tree/)，探討針對記憶體佈局 (memory layout) 的改進策略 ::: ## 測驗 9 計算向上對齊地址的巨集。向上對齊其數學式為 $ROUNDUP(x) = \begin{cases} x & \text{ if } x\equiv 0\pmod{16} \\ \displaystyle\min_a {(x + a)} \text{ s.t. } (x + a)\equiv 0\pmod{16} & \text{ otherwise} \end{cases}$ 由於是對齊 16 的倍數，以二進位來看最小的 4 個位數都會是零，觀察程式碼，他會先對 `x` 加一個數，之後看起來是用 bitmask 把最小的 4 個位數給歸零，所以可以推測出 NNN 應該是 `MAX_ALIGNMENT - 1` (即 `0xF` ) 。 ```c /* maximum alignment needed for any type on this platform, rounded up to a power of two */ #define MAX_ALIGNMENT 16 /* Given a size, round up to the next multiple of sizeof(void *) */ #define ROUND_UP_TO_ALIGNMENT_SIZE(x) \ (((x) + MAX_ALIGNMENT - MMM) & ~(NNN)) ``` 而要加什麼數，這裡用邊界來推測，若 `x & MAX_ALIGNMENT` 結果為 `0` ，我們希望加上某數後，再用 bitmask 可以回到原本的樣子 `x` ，這時候容許的數從 `0` 到 `MAX_ALIGNMENT - 1` 。若 `x & MAX_ALIGNMENT` 的結果為 `1` ，我們會希望對齊後會到下一個對齊位置，此時符合的數只有 `MAX_ALIGNMENT - 1` ，所以 MMM 應該是 `1` 。 :::info 延伸問題: 1. 解釋上述程式碼運作原理，並撰寫出對應 `ROUND_DOWN` 巨集 2. 在 Linux 核心找出類似的巨集和程式碼，說明 round-up/down 的應用場合 ::: ## 測驗 10 整數除法，取最近整數值。程式碼看起來比較關鍵部份就 `((typeof(x)) -1) > 0` ，應該是判斷是否為無號型態，還有就是 undefined when divisor < 0 ，應該是跟型態轉變有關，整理一下進入不同敘述的條件： ```c /* undefined when divisor < 0 */ #define DIVIDE_ROUND_CLOSEST(x, divisor) \ ({ \ typeof(x) __x = x; \ typeof(divisor) __d = divisor; \ (((typeof(x)) -1) > 0 || ((typeof(divisor)) -1) > 0 || \ (((__x) > 0) == ((__d) > 0))) \ ? ((RRR) / (__d)) \ : ((SSS) / (__d)); \ }) ``` * 進入 `((RRR) / (__d))` 的條件: 1. `x` 型態為無號 2. `x` 型態為有號且 `divisor` 型態為無號 3. `x`, `divisor` 型態為有號，且同號之後的討論都會牽涉到型態和其階級 (integer conversion rank) ，不同的型態及階級組合會有不同的結果。第一種情況，若 `x`, `divisor` 皆為無號整數，則兩數都為正數，可以利用測驗 9 向上對齊的技巧，讓被除數 `x` 加一個適當位移，讓商有取最近整數的效果，所以 RRR 可以是 `(__x) + ((__d) >> 1)` 。（尚未加位移前：若 `x` 餘數是 `0` 至 `divisor - 1` ，則商會得到 `x / divisor` ，如下面左圖。加位移後：若 `x + (divisor >> 1)` 餘數是 `divisor >> 1` 至 `divisor - 1` ，則商會得到 `x / divisor` ；若 `x + (divsior >> 1)` 餘數是 `divisor` (`0`) 至 `divisor + (divisor >> 1) - 1` (`(divisor >> 1) - 1`) ，則商會得到 `x / divisor + 1` ，如下圖右側） ![division with offset](https://i.imgur.com/fkcapAG.jpg) > N1256 (8.3.1.8) > […] Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands: >> If both operands have the same type, then no further conversion is needed. >> Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank. >> __Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.__ >> __Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.__ >> Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type. :::spoiler Integer Conversion Rank > N1256 (6.3.1.1-1) > Every integer type has an integer conversion rank defined as follows: — No two signed integer types shall have the same rank, even if they have the same representation. — The rank of a signed integer type shall be greater than the rank of any signed integer type with less precision. — The rank of `long long int` shall be greater than the rank of `long int`, which shall be greater than the rank of `int`, which shall be greater than the rank of `short int`, which shall be greater than the rank of `signed char`. — The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type, if any. — The rank of any standard integer type shall be greater than the rank of any extended integer type with the same width. — The rank of `char` shall equal the rank of `signed char` and `unsigned char`. — The rank of `_Bool` shall be less than the rank of all other standard integer types. — The rank of any enumerated type shall equal the rank of the compatible integer type. — The rank of any extended signed integer type relative to another extended signed integer type with the same precision is implementation-defined, but still subject to the other rules for determining the integer conversion rank. — For all integer types `T1`, `T2`, and `T3`, if `T1` has greater rank than `T2` and `T2` has greater rank than `T3`, then `T1` has greater rank than `T3`. ::: 第一種情況，若 `divisior` 為有號整數，有正數和負數的可能。正數的話就跟上面的算法一樣，在型態轉換上也沒什麼問題。 > N1256 (6.3.1.3) > 1 When a value with integer type is converted to another integer type other than `_Bool`, if the value can be represented by the new type, it is unchanged. > 2 __Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.__ 若 `divisor` 為負數，則這時視 integer rank 有兩種可能，若 `x` 有相同或較高的階級，則 `divisor` 會轉換成跟 `x` 一樣的型態，如何轉換見下面文件的第二點， `divisor` 會一直加（或減） `x` 型態範圍的最大數值再加一，直到 `divisor` 進入 `x` 型態的範圍內，此時做除法的結果會跟原本期望結果大相逕庭。 ```c /* mix-type division test */ #include <stdio.h> int main(void) { unsigned a = 4; int b1 = 2, b2 = -2; printf("0x%X\n", a / b1); // 0x2 printf("0x%X\n", a / b2); // 0x0 return 0; } ``` 若 `divisor` 為負數，有較高的階級，且 `divisor` 的型態範圍可以囊括 `x` 的型態範圍，則此時 `x` 會轉成跟 `divisor` 一樣的型態，此時跟平常的有號整數除法一樣。 ```c #include <stdio.h> int main(void) { unsigned short a = 4; int b1 = 2, b2 = -2; printf("0x%X\n", a / b1); // 0x2 printf("0x%X\n", a / b2); // 0xFFFFFFFE return 0; } ``` 第二種情況，若 `x` 為正數則跟平常除法一樣。若為負數，則跟剛才討論的結果一樣，若 `divisor` 有相同或較高的階級，則 `x` 會轉換成跟 `divisor` 一樣的型態，這時除法也會出現問題。統整以上，只要負數轉換成無號整數時，除法結果不是我們希望得到的。第三種情況跟平常整數除法無異，加上負數是不用考慮的，所以考慮兩數皆為正數，這樣 RRR 可以一樣是 `(__x) + ((__d) >> 1)` 。 * 進入 `((SSS) / (__d))` 的條件: 1. `x`, `divisor` 型態為有號，且不同號這裡只需考慮有號且是正負號相異，加上題目是不討論 `divisor` 不為負數的情況，這樣只剩一情況，`x` 為負數， `divisor` 為正數。這時要取最近正整數， `x` 加上的位移會跟 `x` 為正數的時候相反，所以 SSS 為 `(__x) - ((__d) >> 1)` 。除了 integer promotion 的問題， `divisor` 為負數時，其右移運算是 implementation-defined ，像是 GCC 會將 sign bit 保留。 > N1256 (6.5.7-5) > The result of `E1 >> E2` is `E1` right-shifted `E2` bit positions. If `E1` has an unsigned type or if `E1` has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / $2^{E2}$. If `E1` has a signed type and a negative value, the resulting value is implementation-defined. :::info 延伸問題: 1. 解釋上述程式碼運作原理 2. 在 Linux 核心找出類似的巨集和程式碼，說明 div round-up/closest 的應用場合 ::: ## 測驗 11 計算開整數開平方根，無條件捨去至整數位 ($\lfloor\sqrt n\rfloor$) ，其方法為 [Digit-by-digit calculation](https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_(base_2)) ，假設目標是 $n$ ，則剛開始會先有一個初始值 $a_0$ 滿足 $a^2_0 \leq n$ ，接下來開始測試 $a_0$ 加一個非零的正數 $a_1$ 是否還會滿足 $(a_0 + a_1)^2 \leq n$ ，若滿足則繼續疊加，測試 $(a_0 + a_1 + a_2)^2 \leq n$ ，若不滿足，則不要加 $a_1$ 這個數，測試下一個疊加 $(a_0 + 0 + a_2)^2 \leq n$ ，直到疊加的平方等於或大於 $n$ ，這時那些疊加就是期望的平方根。更詳細一點，這裡使用的是二進位，可以使用效率更高的左移和右移運算代替乘法。假設目標是 $N^2$ ，所以我們要找的平方根是 $N = (a_n + a_{n-1} + \ldots + a_0) \text{ , where } a_k = 2^k \text{ or } 0 \text{ , }\forall k = n, n-1, \ldots, 0$ 令 $P_m = a_n + a_{n-1} + \ldots + a_m$ ，這時每次要做的測試為 $P_{m-1}^2 = (P_m + a_{m-1})^2 \leq N^2$ ，先將 $a_{m-1} = 2^{m-1}$ 代入，若左側還是小於等於 $N^2$ ，那 $a_{m-1}$ 就確定為 $2^{m-1}$ ；若變成大於 $N^2$ ，則 $a_{m-1}$ 必須為 $0$ 。（ $m$ 是從 $n$ 找到 $0$ ，這這疊加方式有點像[黃金比例](https://en.wikipedia.org/wiki/Golden_ratio#Geometry)的面積疊加，只是如果新加入的面積 $(P_m + 2^{m-1})^2$ 超出範圍，則我們不要使用那個邊長的面積（即 $(P_m + 0)^2$ ），換使用更小的面積 $(P_m + a_{m-2})^2$ 去盡量填滿整個大正方型 $N^2$ ） ![Golden ratio geometry](https://i.imgur.com/NBPT5NY.png) 這樣每次測試還需要計算平方，如果可以避免平方那更好，試著轉換該關係式： $P_{m-1}^2 \leq N^2\implies X_{m-1} = N^2 - P_{m-1}^2 \geq 0$ 這樣變成每次計算差 $X_{m-1}$ ，然後檢測差 $X_{m-1}$ 是否大於等於零，而差 $X_{m-1}$ 的更新方式為： $\begin{split} Y_m &= X_{m-1} - X_m \\ &= (N^2 - P_{m-1}^2) - (N^2 - P_m^2) \\ &= P_m^2 - P_{m-1}^2 \\ &= P_m^2 - (P_m + a_{m-1})^2 \\ &= -(2P_ma_{m-1} + a_{m-1}^2) \\ &= \begin{cases} -(P_m2^m + 2^{2(m-1)}) & \text{ if } a_{m-1} = 2^{m-1} \\ 0 & \text { if } a_{m-1} = 0 \end{cases} \end{split}$ 現在將平方的計算給拿掉了，跟 $2$ 有關的可以使用左移或右移運算，從式子也可以看出，每次的差呈遞減的趨勢。這時再將一些計算拆開，方便儲存變數和計算： $y_m = P_m2^m \\ d_m = 2^{2(m-1)} = 4^{m-1} \\ Y_m = \begin{cases} -(y_m + d_m) & \text{ if } a_{m-1} = 2^{m-1} \\ 0 & \text { if } a_{m-1} = 0 \end{cases}$ 以及 $y_m$, $d_m$ 的更新方式： $y_{m-1} = P_{m-1}2^{m-1} = (P_m + a_{m-1})2^{m-1} = \begin{cases} y_m/2 + d_m & \text{ if } a_{m-1} = 2^{m-1} \\ y_m/2 & \text { if } a_{m-1} = 0 \end{cases} \\ d_{m-1} = 2^{2(m-2)} = d_m / 4$ 而結束條件為 $m = 0$ ，此時 $d_0 = 0$, $y_0 = P_0 = N$ ，即我們要求的平方根。而初始值分別是 $X_{n+1} = N \\ y_n = 0 \\ d_n = 2^{\max\{2k\in\mathbb{N}\text{ s.t. }2^{2k}\leq N^2 \}}$ ```c static inline unsigned long fls(unsigned long word) { int num = 64 - 1; if (!(word & (~0ul << 32))) { num -= 32; word <<= 32; } if (!(word & (~0ul << (64 - 16)))) { num -= 16; word <<= 16; } if (!(word & (~0ul << (64 - 8)))) { num -= 8; word <<= 8; } if (!(word & (~0ul << (64 - 4)))) { num -= 4; word <<= 4; } if (!(word & (~0ul << (64 - 2)))) { num -= 2; word <<= 2; } if (!(word & (~0ul << (64 - 1)))) num -= 1; return num; } ``` `fls` 是輔助找到 $d_m$ 的初始值，計算二進位中是 `1` 的最高位數 (the last bit set of value) ，而 $d_m$ 的次方是偶數，所以用 `~1UL` 將奇數去除，最後 `m` 就是找到的 $d_m$ 。 ```c unsigned long i_sqrt(unsigned long x) /* x = X_{n+1} */ { unsigned long b, m, y = 0; if (x <= 1) /* trivial case */ return x; m = 1UL << (fls(x) & ~1UL); while (m) { /* check d_m > 0 */ b = y + m; /* Y_m = y_m + d_m */ y >>= 1; /* y_m = y_m / 2 [or y_{m-1} = y_m / 2 (A) ] */ if (x >= b) { /* check X_{m+1} >= Y_m [i.e. X_m = X_{m+1} - Y_m >= 0] */ x -= b; /* X_m = X_{m+1} - Y_m */ y += m; /* y_{m-1} = y_m + d_m (B) */ } m >>= 2; /* d_{m-1} = d_m / 4 */ } return y; /* return y_0 = P_0 = N */ } ``` :::info 延伸問題: 1. 解釋上述程式碼運作原理，嘗試利用硬體的 clz/ctz 指令改寫 2. 在 Linux 核心找出類似的巨集和程式碼，說明其應用場景 :::