2024q1 Homework4 (quiz3+4)

# 2024q1 Homework4 (quiz3+4) contributed by < [allenliao666](https://github.com/allenliao666) > ## [Quiz3](https://hackmd.io/@sysprog/linux2024-quiz3) ### 測驗1 #### 解釋程式碼思路：運用直式開方法的變形，找出要開平方的數的 `MSB` 並持續測試回傳值平方後是否超過原數，直到 `LSB` 。 `ffs` : 回傳第一個(Least significant) 被設定的位元 `fls` : 回傳最後一個(Most significant) 被設定的位元可以用`fls` 取代題目中的`__builtin_clz()`，改寫如下： ```c int i_sqrt_fls(int x) { if (x <= 1) /* Assume x is always positive */ return x; int z = 0; for (int m = 1UL << (fls(x) & ~1UL); m; m >>= 2) { int b = z + m; z >>= 1; if (x >= b) x -= b, z += m; } return z; } ``` #### Linux 核心實例在 Linux 原始程式碼以關鍵字 `sqrt` 搜尋，發現在 arch 、 driver 等目錄下有非常多有關平方根運算的程式碼，其中在 [linux/block/blk-wbt.c](https://github.com/torvalds/linux/blob/master/block/blk-wbt.c) 僅有一段程式碼有關 `sqrt` 如下： ```c static void rwb_arm_timer(struct rq_wb *rwb) { struct rq_depth *rqd = &rwb->rq_depth; if (rqd->scale_step > 0) { /* * We should speed this up, using some variant of a fast * integer inverse square root calculation. Since we only do * this for every window expiration, it's not a huge deal, * though. */ rwb->cur_win_nsec = div_u64(rwb->win_nsec << 4, int_sqrt((rqd->scale_step + 1) << 8)); } else { /* * For step < 0, we don't want to increase/decrease the * window size. */ rwb->cur_win_nsec = rwb->win_nsec; } blk_stat_activate_nsecs(rwb->cb, rwb->cur_win_nsec); } ``` ### 測驗2 #### 解釋程式碼考量在相鄰敘述進行 mod 10 和 div 10 操作，觀察除法原理: $f = g \times Q + r$ 可以發現 $r = f - g \times Q$，若我們知道 div 10，就可以反推得出 mod 10。，因為10並非2的冪，所以我們使用近似值 $\frac{128}{13} \cong 9.84$ 實做。首先透過 bitweise operation 湊出13，因為13 = 8 + 4 + 1 ，等同計算 $\frac{13tmp}{8} \times8$ 如下 ```c (((tmp >> 3) + (tmp >> 1) + tmp) << 3) ``` 但 LSB 數來3個位元會被捨棄，因此在上述操作前要先記錄該位元。文中的 `0b1` 表示二進位表示法的1。 ```c d0 = q & 0b1; d1 = q & 0b11; d2 = q & 0b111; ``` 把上述兩段程式碼結果相加後，即可得到 $13tmp$ 。最後除以128達到 $\frac{tmp}{9.84}$，即 div10的效果。不過令人疑惑的是為什麼 $13tmp$ 不透過 bitwise 左移如下，而是使用右移和 d0, d1,d2 補回位元? ```c ((tmp << 3) + (tmp << 1) + tmp) ``` 完整程式碼如下 ```c void divmod_10(uint32_t in, uint32_t *div, uint32_t *mod) { unsigned d0 = x & 0b1; unsigned d1 = x & 0b11; unsigned d2 = x & 0b111; *div = ((((in >> 3) + (in >> 1) + in) << 3) + d0 + d1 + d2) >> 7; *mod = in - (((*div << 2) + *div) << 1); } ``` 題目程式碼如下 ```c void divmod_10(uint32_t in, uint32_t *div, uint32_t *mod) { uint32_t x = (in | 1) - (in >> 2); /* div = in/10 ==> div = 0.75*in/8 */ uint32_t q = (x >> 4) + x; x = q; q = (q >> 8) + x; q = (q >> 8) + x; q = (q >> 8) + x; q = (q >> 8) + x; *div = (q >> 3); *mod = in - ((q & ~0x7) + (*div << 1)); } ``` ### 測驗3 #### 解釋程式碼 ```c static size_t ilog2(size_t i) { size_t result = 0; while (i >= 65536) { result += 16; i >>= 16; } while (i >= 256) { result += 8; i >>= 8; } while (i >= 16) { result += 4; i >>= 4; } while (i >= 2) { result += 1; i >>= 1; } return result; } ``` 用 $2^{16}$、$2^{8}$、$2^{4}$、$2$ 和輸入值 $i$ 比大小，若 $i$ $\ge$ $2^{x}$，表示 $log i$ $\ge$ x。並對 $i$ 右移 x 位元。反之，若 $i$ $\lt$ $2^{x}$ ，則比較 $i$ 和 $2^{y}$ 的大小，${y}$ $\lt$ ${x}$ ，直到最後。計算以2為底的對數，等同尋找 `fls` ，因此使用 `__builtin_clz` 改寫如下： ```c int ilog32(uint32_t v) { return (31 - __builtin_clz(v)); } ``` #### Linux 核心實例在 [kernel/sched/fair.c](https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c) 中有部分程式碼使用對數，例如`SCHED_TUNABLESCALING_LOG` 等排程器相關設定。 ### 測驗4 #### 解釋程式碼首先 EWMA 使用結構體紀錄相關資訊如下： `internal` : 目前的 EWMA `factor` : 定點數的小數位元數 `weight`：EWMA 的衰變率從 `ewma_read` 在輸出前執行`avg->internal >> avg->factor` 可以發現 EWMA 是以定點數的格式儲存，因此輸出時才要右移 `avg->factor` 個位元。 EWMA原式定義為： $$ s_t = \alpha x_t + (1 - \alpha)s_{t-1}, \ \ t>0 $$ 即隨著 $t$ 增加，前 $t-1$ 個數值的影響逐漸減少。但對應到 `ewma_add` 中： ```c avg->internal = avg->internal ? (((avg->internal << avg->weight) - avg->internal) + (val << avg->factor)) >> avg->weight : (val << avg->factor); ``` 和原式的計算方式略有不同，經過左右同乘 ${1/\alpha}$ 後： $$ s_t/\alpha = x_t + (1/\alpha - 1)s_{t-1} $$ 就會符合程式碼的計算方法，因此可知 `avg->weight` 和 $\alpha$ 的關係為：$$ \alpha = 1/2^{avg->weight}$$ #### Linux 核心實例 ### 測驗5 #### 解釋程式碼完整程式碼: ```c int ceil_ilog2(uint32_t x) { uint32_t r, shift; x--; r = (x > 0xFFFF) << 4; //16 x >>= r; shift = (x > 0xFF) << 3; //8 x >>= shift; r |= shift; shift = (x > 0xF) << 2; //4 x >>= shift; r |= shift; shift = (x > 0x3) << 1; //2 x >>= shift; return (r | shift | x > 1) + 1; } ``` 首先 `x--` 確保後續的 x > $2^r - 1$ 等比較大小時，能輸出1的 x 必大於 $2^r$。換句話說，避免 x 等於2的冪時，最終輸出的對數取頂會比正確值多1。接著拿 x 和 $2^r - 1$ 比較大小，若 x 較大，則用 r 紀錄對應的 bitwise 右移量，同時 r 也代表 x 的對數，並且使用 shift 持續更新 r 。執行到 `return (r | shift | x > 1) + 1;` 時， x 已經右移30位元，只剩下最後2位元可能非零，即 x 可能為0到3其中之一，所以用 `x > 1` 檢測。最後因為要取頂，所以再加1。 ## [Quiz4](https://hackmd.io/@sysprog/linux2024-quiz4) ### 測驗1 #### 解釋程式碼 popcount 用於計算數值的二進為表示中，有多少位元設定為`1`，其數學式: $$popcount(x) = x - \lfloor{\frac{x}{2}}\rfloor - \lfloor{\frac{x}{4}}\rfloor - ... - \lfloor{\frac{x}{2^{31}}}\rfloor $$ 設 $x = {b_{31}}...{b_{2}}{b_{1}}{b_{0}}$，經過推導後，得到: $$popcount(x) = \sum\limits_{n = 0}^{31}{(2^n - \sum\limits_{i = 0}^{n - 1}{2^i}) b_n} = \sum\limits_{n=0}^{31} {b_n} $$ 依照上述數學式，我們以 4 個位元(nibble)為一組計算1的個數，即 $x - \lfloor{\frac{x}{2}}\rfloor - \lfloor{\frac{x}{4}}\rfloor - \lfloor{\frac{x}{8}}\rfloor$ ，實作程式碼如下 ```c unsigned popcount_branchless(unsigned v) { unsigned n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> 4)) & 0x0F0F0F0F; v *= 0x01010101; return v >> 24; } ``` `v >> 1` : 即 $\lfloor{\frac{x}{2}}\rfloor$ 。例如令 v 為7，二進位表示式為0111，右移1位元後就得到3，等同 $\lfloor{\frac{7}{2}}\rfloor$ 。 `n & 0x77777777` : 因為以 nibble 為單位操作，所以要與0x77777777進行 bitwise & 。舉例如下 ```c b7 b6 b5 b4 b3 b2 b1 b0 //v 0 b7 b6 b5 b4 b3 b2 b1 //v >> 1 0 1 1 1 0 1 1 1 //0x77 ------------------------ 0 b7 b6 b5 0 b3 b2 b1 ``` 接著分析 `v = (v + (v >> 4)) & 0x0F0F0F0F` 的功能: 1. 假設 $B_n$ 為前面求出的第 n 個 nibble 的數值，並加總 v 和 v >> 4 ```c B7 B6 B5 B4 B3 B2 B1 B0 //v 0 B7 B6 B5 B4 B3 B2 B1 //v>>4 --------------------------------------------------- B7 (B7+B6) (B6+B5) (B5+B4) (B4+B3) (B3+B2) (B2+B1) (B1+B0) //v + v>>4 ``` 2. 以 `0x0F0F0F0F` 為 mask 得到 ```c B7 (B7+B6) (B6+B5) (B5+B4) (B4+B3) (B3+B2) (B2+B1) (B1+B0) //v + v>>4 0 F 0 F 0 F 0 F ---------------------------------------------------------- 0 (B7+B6) 0 (B5+B4) 0 (B3+B2) 0 (B1+B0) ``` 最後執行 `v *= 0x01010101` ，令 (B1+B0) 為 A0 ，依此類推 ```c 0 A6 0 A4 0 A2 0 A0 x 0 1 0 1 0 1 0 1 -------------------------------------------- 0 A6 0 A4 0 A2 0 A0 A6 0 A4 0 A2 0 A0 A6 0 A4 0 A2 0 A0 A6 0 A4 0 A2 0 A0 -------------------------------------------- ↑ A0+A2+A4+A6 ``` 可以發現在原本 A6 的位置即是 v 的 set bit 的個數，因此將 v 右移24位元即為所求。 #### Hamming Distance Hamming Distance 為兩個整數的二進位表示每個位元差的和，例如 0001 //1 1000 //4 兩數間的 Hamming Distance 為2。以下透過 A $\oplus$ B 計算 A 和 B 之間每個位元差，再用 GCC 內建函式 `__builtin_popcount(x)` 計算位元差的總和。就 [LeetCode 477. Total Hamming Distance](https://leetcode.com/problems/total-hamming-distance/description/) 的其中一解的程式碼 ```c int totalHammingDistance(int* nums, int numsSize) { int total = 0;; for (int i = 0;i < numsSize;i++) for (int j = 0; j < numsSize;j++) total += __builtin_popcount(nums[i] ^ nums[j]); return total >> 1; } ``` 上述程式碼自 num[] 中依序比較 i 和 j 的 Hamming Distance 並記錄在 total ，其時間複雜度為 $O(n^{2})$，因為 num[] 中每個元素都會被算到兩次，所以最後一行右移1位元。 #### 程式碼改進從位元的角度出發，改進如下 ```c int totalHammingDistance(int* nums, int numsSize) { int total = 0;; for (int i = 0;i < 32;i++){ int bitTotal = 0; for (int j = 0; j < numsSize;j++) bitTotal += nums[j] >> i & 0b1; total += bitTotal * (numsSize - bitTotal); } return total; } ``` 以 nums[3] = {4, 14 ,2}為例，他們的二進位表示法如下 0100 //4 1110 //14 0010 //2 首先自 LSB 觀察，發現3個數值的 LSB 皆為0，所以此位元的 Hamming Distance = 0。接著觀察 $1^{th}$位元，1有兩個，所以此位元的 Hamming Distance = 2 * numsize - 2。因此我們可以歸納出每個位元的 Hamming Distance = 該位元1的數量 * numsize - 該位元1的數量，時間複雜度改進為 $O(n)$。程式中 `i` 表示目前的位元位置， nums[j] 右移後和1執行 bitwise & 以檢測該位元是否為1，並用 `bitTotal` 紀錄此位元1的數量。最後再加總各位元的 `bitTotal * (numsSize - bitTotal)` 即為輸出值。 ### 測驗2 #### 解釋程式碼 ### 測驗3 #### 解釋程式碼 AVL tree 和 red-black tree 雖然樹高都屬於 $O(log n )$ ，但實際上因為 AVL tree 會確保左右子樹的樹高相差不超過`1`，因此 AVL tree 的樹會較 red-black tree 茂密(bushy)，然而也必須花費更高成本作旋轉( rotation )。 XTree 嘗試兼具 AVL tree 和 red-black tree 的優點，其中 XTree 的平衡機制類似 AVL tree 。 XTree 的 hint 計算方式為，其左右節點 hint 最大值 +1 並且只有在被呼叫到 xt_update 時才會被更新。 XTree 有4個主要操作， `rotate_left`, `rotate_right`, `replace_right` 和 `replace_left` 。 XTree 在插入和刪除節點後會更新部分節點的 hint ，此時會用到 `rotate_left` 和 `rotate_right` ；移除節點時會用到 `replace_right` 和 `replace_left` 其中之一。更新 hint 的規則為: `rotate_left` : `rotate_right` `replace_right` `replace_left` ## 閱讀教材心得 ### [Linux 核心的紅黑樹](https://hackmd.io/@sysprog/linux-rbtree) [include/linux/rbtree_types.h](https://github.com/torvalds/linux/blob/master/include/linux/rbtree_types.h) 中 : ```c struct rb_node { unsigned long __rb_parent_color; struct rb_node *rb_right; struct rb_node *rb_left; } __attribute__((aligned(sizeof(long)))); ``` #### 理解 [linux/rbtree.h](https://elixir.bootlin.com/linux/v5.10/source/include/linux/rbtree.h) 中的程式碼 ```c rb_parent(r) ((struct rb_node *)((r)->__rb_parent_color & ~3)) ``` 回傳該節點之親代節點的位置，使用 `__rb_parent_color & ~3` 清除 LSB 數來兩位元後，再將其強制轉型為紅黑數節點的指標。 ```c RB_EMPTY_NODE(node) ((node)->__rb_parent_color == (unsigned long)(node)) RB_CLEAR_NODE(node) ((node)->__rb_parent_color = (unsigned long)(node)) ``` 兩者功能皆為將節點的 `__rb_parent_color` 設定為自己。