2020q3 Homework4 (quiz4)

# 2020q3 Homework4 (quiz4) contributed by < `zzzxxx00019` > > [2020q3 第 4 週測驗題](https://hackmd.io/@sysprog/2020-quiz4) ## 測驗 1 ### 題目： [LeetCode 461. Hamming Distance](https://leetcode.com/problems/hamming-distance/) 提及，兩個整數間的 Hamming distance 為其二進位的每個位元的差。請計算輸入參數兩整數 x 與 y 的 Hamming distance，例如整數 1 的二進位為 0 0 0 1，而整數 4 的二進位為 0 1 0 0，則 1 與 4 的 Hamming distance 為 2。 ``` 1 (0 0 0 1) 4 (0 1 0 0) [ ] [ ] | | \_ 2 _/ ``` 運用第 3 周測驗提到的 __builtin_popcount (gcc extension 之一)，我們可實作如下: ```cpp int hammingDistance(int x, int y) { return __builtin_popcount(x OP y); } ``` 請補完程式碼。 ### 原理：根據題目說明，兩個整數間的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) 為其二進位的每個位元的差 ( 距離 ) ，並將每個位元間的距離加總，因此在距算每個位元間的距離，透過 `xor` 運算是最合理的，要計算每位元距離加總，其實就是去算兩數做 `xor` 後，共有幾個 `bit 1` ，因此透過 `__builtin_popcount` 去做計算，根據上述說明，判斷 `OP : ^` ### 延伸問題： 1. 舉出不使用 gcc extension 的 C99 實作程式碼 * 參考 [quiz3](https://hackmd.io/@sysprog/2020-quiz3) 中所提供的 `popcnt_branchless` 實作程式碼來改寫： ```gcc= int popcnt_branchless(uint32_t v) { uint32_t n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> 4)) & 0x0F0F0F0F; v *= 0x01010101; return v >> 24; } int hammingDistance(int x, int y) { return popcnt_branchless(x ^ y) ; } ``` ![](https://i.imgur.com/ArNZPnW.png) 2. 練習 [LeetCode 477. Total Hamming Distance](https://leetcode.com/problems/total-hamming-distance/)，你應當提交 (submit) 你的實作 (C 語言實作)，並確保執行結果正確且不超時 * 根據題目所提供的三筆數字做觀察 4 、 14 、 2 ，它們的二進制表示分別為 `0100` 、 `1110` 、`0010` ，假設從右數第二個 `bit` 來做觀察的話，三數依序分別為 `0` 、 `1` 、 `1` ，如此總共會有 $1\times 2$ 種組合 * 要算出 `Total Hamming Distance` 的方法，就是把所有數中，每個 `bit` 的所有組合加總 * 假設每個 `bit` 中，共有 $m$ 個 $0$ 與 $n$ 個 $1$ ，且輸入為 `32 bit` 的正整數，可以寫成以下的數學表事法： $TotalHammingDistance=\sum_{k=0}^{31}(m_k)\times(n_k)$ ```cpp= int totalHammingDistance(int* nums, int numsSize) { int distance = 0 ; int bit1[32] = {0} , bit0[32] = {0} ; for (int i = 0 ; i < numsSize ; i++){ for (int j = 0 ; j < 32 ; j++){ ( nums[i] >> j ) & 1 ? bit1[j]++ : bit0[j]++ ; } } for (int i = 0 ; i < 32 ; i++) distance += bit1[i] * bit0[i] ; return distance ; } ``` ![](https://i.imgur.com/TX4KD9F.jpg) * 但這樣的寫法需要透過兩個 `array` 去存放 `bit 0` 跟 `bit 1` 的數量，明顯造成空間上的浪費，且在檢查每個 `bit` 上，是透過迴圈依序右移並做 `&1` ，步驟明顯較為冗長，因此做以下改正： * 共有 $numsSize$ 個數，則在第 $k$ 個位元，滿足 $numsSize = m_k + n_k$ * 將上述結果套用，$TotalHammingDistance=\sum_{k=0}^{31}(m_k)\times(numsSize - m_k)$ * 根據上面的結果，目前只需要尋找 `bit 1` 的位置，因此改用 `__builtin_ctz` 來取代迴圈 * 在 `__builtin_ctz` 的操作上，參考 [quiz3 測驗 5](https://hackmd.io/@jT29vQ4-SxmlBU5UMj1TGA/quiz3#%E6%B8%AC%E9%A9%97-5) 的部分，原理相似 ```cpp= int totalHammingDistance(int *nums, int numsSize) { if (!nums || !numsSize) return 0 ; int distance = 0 ; int bit1[32] = {0} ; for (int i = 0 ; i < numsSize ; i++){ while(nums[i]){ bit1[ __builtin_ctz(nums[i]) ] ++ ; nums[i] ^= nums[i] & -nums[i] ; } } for (int i = 0 ; i < 32 ; i++ ){ distance += bit1[i] * (numsSize - bit1[i]) ; } return distance ; } ``` ![](https://i.imgur.com/Da7m7pV.png) 3. 研讀錯誤更正碼簡介及抽象代數的實務應用，摘錄其中 Hamming Distance 和 Hamming Weight 的描述並重新闡述，舉出實際的 C 程式碼來解說： * 透過 `encoder-decoder` 的方式，在資料加入一些數學結構，這些數學結構可以協助在一定數量的資料錯誤內進行資料修正 ![](https://i.imgur.com/dSkAkNh.png) * Hamming distance & Hamming weight * 透過兩個向量 $x=(0001011)$ 與 $x'=(1101010)$ 來舉例說明 * Hamming distance 就是它們不一樣的 bit 有幾個，這個例子 $d_h(x,x')=3$ * Hamming weight 是指一個向量不是 0 的地方有幾個，所以 $w_H(x)=3$ 、 $w_H(x')=4$ * Hamming code * `Hamming code` 功能在於資料除錯，**能檢查出 1 位元的錯誤發生與發生錯誤的位置** * `Hamming code` 長度公式： $M≦2^n$ ， $K=n+1$ ， $M$ 為資料長度， $K$ 為檢查位元長度，以 8 bits 資料長度舉例， $8≦2^3$ ，故 $n=3$ ， $K=4$ ，因此加上 `Hamming code` 長度為 `12 bits` * 透過 $M=(10101101)$ 這 `8 bits` 的資料來說明 `hamming code` 的 `encoder-decoder` 流程首先先建立 `12 bits` 的表格，並將 $M$ 依序填入表格中， `hamming code` 通常使用到 $2^n$ 的位置，因此必須空下 $2^n$ 的 bit 供錯誤檢查碼填入， `H0 ~ H3` 代表更正碼填入位置 |1 |2 |3 |4 |5 |6 |7 |8 |9 |10|11|12| |- |- |- |- |- |- |- |- |- |- |- |- | |H0|H1|1 |H2|0 |1 |0 |H3|1 |1 |0 |1 | 接著將 `bit 1` 的位置做 `XOR` 運算，也就是 `3` 、 `6` 、 `9` 、 `10` 、 `12` 的二進制編碼做相加運算 $HammingCode = (0011)_2\oplus (0110)_2\oplus (1001)_2\oplus (1010)_2\oplus (1100)_2 =(1010)_2$ 因此 `H0 ~ H3` 依序為 `1` 、 `0` 、 `1` 、 `0` ，並將其==反填==回表中， $Codeword=(011001011101)$ |1 |2 |3 |4 |5 |6 |7 |8 |9 |10|11|12| |- |- |- |- |- |- |- |- |- |- |- |- | |==0== |==1==|1 |==0==|0 |1 |0 |==1==|1 |1 |0 |1 | 若資料傳輸過程中，沒發生任何錯誤，也就是 `Codeword` 的任何 `bit` 沒有損毀，那麼將 `Codeword` 中對應為 `bit 1` 的二進制編碼做 `XOR` 運算，出來結果會為 `0` ，以上述 decoder 的結果為例： $Check = (0010)_2\oplus (0011)_2\oplus (0110)_2\oplus (1000)_2\oplus (1001)_2\oplus (1010)_2\oplus(1100)_2=(0000)_2$ 但若資料傳輸過程發生損毀， hamming code 功能是能檢查出 1 bit 的資料錯誤與位置，假設 bit 5 發生損毀，那麼 Codeword 則變為 $(011011011101)$ ，同樣將 bit 1 位置的二進制做 XOR 運算： $Check = (0010)_2\oplus (0011)_2\oplus(0101)\oplus (0110)_2\oplus (1000)_2\oplus (1001)_2\oplus (1010)_2\oplus(1100)_2=(0101)_2$ 可以發現結果為 $(0101)$ ，代表 `5` 的位置發生錯誤，藉由此機制修正 `1 bit` 的錯誤發生 ## 測驗 2 ### 題目： LeetCode [1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) ```cpp= #include <stdlib.h> typedef struct { AAA; int max_level; } TreeAncestor; TreeAncestor *treeAncestorCreate(int n, int *parent, int parentSize) { TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); int max_level = 32 - __builtin_clz(n) + 1; for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; for (int j = 1;; j++) { int quit = 1; for (int i = 0; i < parentSize; i++) { obj->parent[i][j] = obj->parent[i][j + BBB] == -1 ? -1 : obj->parent[obj->parent[i][j - 1]][j - 1]; if (obj->parent[i][j] != -1) quit = 0; } if (quit) break; } obj->max_level = max_level - 1; return obj; } int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (CCC)) node = obj->parent[node][i]; return node; } void treeAncestorFree(TreeAncestor *obj) { for (int i = 0; i < obj->n; i++) free(obj->parent[i]); free(obj->parent); free(obj); } ``` 請補完。 ### 原理：先試著理解題目與整個程式運作模式： #### 題目理解： ``` Input: ["TreeAncestor","getKthAncestor","getKthAncestor","getKthAncestor"] [[7,[-1,0,0,1,1,2,2]],[3,1],[5,2],[6,3]] Output: [null,1,0,-1] Explanation: TreeAncestor treeAncestor = new TreeAncestor(7, [-1, 0, 0, 1, 1, 2, 2]); treeAncestor.getKthAncestor(3, 1); // returns 1 which is the parent of 3 treeAncestor.getKthAncestor(5, 2); // returns 0 which is the grandparent of 5 treeAncestor.getKthAncestor(6, 3); // returns -1 because there is no such ancestor ``` * 透過陣列來表示每個 `node` 所指向的祖節點， `parent[0]` 代表 `node 0` ... * `node 0` 為 `root` ，祖節點不存在，為 `-1` ，而 `node 1` 與 `node 2` 的祖節點為 `0` ，以此類推 #### `TreeAncestor` 結構定義： ```cpp=3 typedef struct { AAA; int max_level; } TreeAncestor; ``` * 用來定義 `TreeAncestor` 的資料型態 #### 解讀 `treeAncestorCreate` 程式內容： ```cpp=10 TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); ``` * 透過 `malloc` 動態分配記憶體，將位址回傳給 `TreeAncestor` 型態的指標 `obj` * `obj->parent` 被分配用來記錄 `int` 型態指標的記憶體空間 ```cpp=13 int max_level = 32 - __builtin_clz(n) + 1; ``` * 透過 `32 - __builtin_clz(n) + 1` 計算右數開始的最高位有效 `bit` ，即 `tree` 的 `max_level` * 在下面建表的迴圈中，終止存取條件是最後為 `-1` ，若不 `+1` 的話，在單鏈狀態下最後一個 `node` 會將空間會全部剛好使用，導致迴圈無法順利終止 ```cpp=14 for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } ``` * 現在 `parent` 中每個 `int*` 紀錄 `malloc(max_level * sizeof(int))` 所分配的記憶體位址，成為 `parent[n][max_level]` 這樣型態的二維陣列，並將陣列中的所有數值初始化為 `-1` * 透過二維陣列 `parent[i][j]` ，代表 `node i` 的第 $2^j$ 代祖節點 * 因 parent 被設定為二維陣列的型態，故 `AAA = int** parent` ```cpp=19 for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; ``` * 這邊的 `parent[i]` 為一開始的 `int *parent` ，而非 `obj->parent` 的二維陣列 * 初始化陣列狀態，將 `parent[i][0]` 記為上一代祖節點 ```cpp=22 for (int j = 1;; j++) { int quit = 1; for (int i = 0; i < parentSize; i++) { obj->parent[i][j] = obj->parent[i][j + BBB] == -1 ? -1 : obj->parent[obj->parent[i][j - 1]][j - 1]; if (obj->parent[i][j] != -1) quit = 0; } if (quit) break; } ``` * 所使用的存取邏輯為透過 parent[i][j] ，存取 node i 的第 $2^j$ 的祖節點 * 整個順序可以表示為 `parent[i][j-1] <- parent[i][j] <- parent[i][0]` 這樣的關係 * 因此迴圈是用來檢查 `node i` 的 $2^{j+BBB}$ 代祖先是否存在，如果不存在的話， `node i` 的第 $2^j$ 代祖節點自然也不存在 * 若 `node i` 的 $2^{j+BBB}$ 代祖先存在，代表 `node j` 的祖節點就是 node i 的第 $2^{j+BBB}$ 的祖節點 * 透過此方法可以在建表上省下不少時間，時間複雜度為 $n$ $log(n)$ * 故 `BBB = -1` #### 解讀 `treeAncestorGetKthAncestor` 程式內容： ```cpp=36 int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (CCC)) node = obj->parent[node][i]; return node; } ``` * 在不超出 `max_level` 的情況下，透過迴圈來尋找節點 * 如果 `node == -1` ，代表該節點已不存在，也沒往上再找的必要，終止迴圈 * 每尋找一個節點，就將自己本身設為上一代祖節點，因此尋訪方式為依序尋找 $2^0$ 、 $2^1$ 、 $2^2$ ... 以此類推，故 `CCC = 1 << i` #### 解讀 `treeAncestorFree` 程式內容： ```cpp=45 void treeAncestorFree(TreeAncestor *obj) { for (int i = 0; i < obj->n; i++) free(obj->parent[i]); free(obj->parent); free(obj); } ``` * 將 `n` 個節點所分配的記憶體空間依序刪除 * `obj->n` 在前面未定義也未使用過，需在 `TreeAncestor` 的結構定義上加入 `int n` * 在 `treeAncestorCreate` 加入 `obj->n = n` 這樣的程式碼 #### 輸入陣列舉例：假設輸入 `[-1,0,0,1,2,0,1,3,6,1]` 這樣的陣列，則 parent 得到以下的結果： |[i][j]| 0 | 1 | 2 | 3 | | - | - | - | - | - | | 0 |-1 |-1 |-1 |-1 | | 1 |0 |-1 |-1 |-1 | | 2 |0 |-1 |-1 |-1 | | 3 |1 |0 |-1 |-1 | | 4 |2 |0 |-1 |-1 | | 5 |0 |-1 |-1 |-1 | | 6 |1 |0 |-1 |-1 | | 7 |3 |1 |-1 |-1 | | 8 |6 |1 |-1 |-1 | | 9 |1 |0 |-1 |-1 | ```graphviz digraph { 0 0 -> 1 0 -> 2 1 -> 3 2 -> 4 0 -> 5 1 -> 6 3 -> 7 6 -> 8 1 -> 9 } ``` ### 延伸問題：在 [LeetCode 1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) 提交你的實作，你應該要在 C 語言項目中，執行時間領先 75% 的提交者原本的想法是透過犧牲記憶體空間與建表時間，在 `treeAncestorGetKthAncestor` 的部分達到時間複雜度為 $O(1)$ 的實作，但在 `treeAncestorCreate` 的過程就需要 $O(n^2)$ 的時間複雜度，在 leedcode 上果然 `Time Limit Exceeded` 了，還需要想別的方法改善 ```cpp= TreeAncestor *treeAncestorCreate(int n, int *parent, int parentSize) { TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); obj->bound = malloc(n* sizeof(int)); obj->n = n; for (int i = 0; i < n; i++){ obj->parent[i] = malloc((i+1) * sizeof(int)); } for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; for (int i = 1; i < parentSize ; i++) { for (int j = 1 ;; j++) { if(obj->parent[i][j-1] == 0){ obj->bound[i] = j ; break ; } obj->parent[i][j] = obj->parent[ obj->parent[i][j-1] ][0] ; } } return obj; } int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { if (k > obj->bound[node]) return -1 ; return obj->parent[node][k-1] ; } ``` ![](https://i.imgur.com/gmfiyfe.png) ## 測驗 3 ### 題目：考慮貌似簡單卻蘊含實作深度的 FizzBuzz 題目: * 從 1 數到 n，如果是 3的倍數，印出 “Fizz” * 如果是 5 的倍數，印出 “Buzz” * 如果是 15 的倍數，印出 “FizzBuzz” * 如果都不是，就印出數字本身觀察 printf 的(格式)字串，可分類為以下三種: 1. 整數格式字串 "%d" : 長度為 2 B 2. “Fizz” 或 “Buzz” : 長度為 4 B 3. “FizzBuzz” : 長度為 8 B 考慮下方程式碼: ```cpp #define MSG_LEN 8 char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[start], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); ``` 我們若能精準從給定輸入 i 的規律去控制 start 及 length，即可符合 FizzBuzz 題意: ```cpp string literal: "fizzbuzz%u" offset: 0 4 8 ``` 以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (bitwise.c) ```cpp #define MSG_LEN 8 static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char *argv[]) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << div3) << div5; char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` 請補完。 ### 原理： * `is_divisible` 的部分透過 [quiz2](https://hackmd.io/@jT29vQ4-SxmlBU5UMj1TGA/quiz2#%E6%B8%AC%E9%A9%97-4) 的快速除法技巧來實作 * `fmt` 這個陣列裡面的內容，決定這次要 `print` 出樣的字串，而 `fmt` 的內容由 `strncpy` 來決定 >strncpy(fmt, &"FizzBuzz%u"[start], length) * `strncpy` 透過 `offset` 與 `length` 來決定要複製怎麼樣的字串到 `fmt` 裡面，如下表所示： |倍數 |offset|length| |- |- |- | |3 |0 |4 | |5 |4 |4 | |15 |0 |8 | |other|8 |2 | * 透過 `is_divisible` 檢查 `i` 是否為 `3` 或 `5` 的倍數，並回傳 `1 ( true )` 或是 `0 ( false )` * `length = (2 << div3) << div5` 透過 `div3` 與 `div5` 決定要右推幾個 `bit` ，當 `i` 為 `15` 的倍數，就代表 `div3` 與 `div5` 結果皆為 `true` ，則 `length = 8` * `offset` 的部分由 `(MSG_LEN >> KK1) >> (KK2 << KK3)` 一連串的運算來實作 * 代入 `i = 5`，只需要右移 `1` 次，故推測 `KK1 = div5` ，且 `KK2 = div3` ，如此 `KK2 << KK3` 結果必為 `0` * 代入 `i = 3`，共需要右移 `4` 次，套用 `KK1 = div5` 與 KK2 = div3 的推測， `KK2 << KK3` 的結果需為 `4`， `KK3 = 2` 才能滿足條件 * 代入 `i = 15` ， `offset = ( 8 >> 1 ) >> ( 1 << 2 ) = 0` ，上述假設成立 * 代入 `i = 17` ， `offset = ( 8 >> 0 ) >> ( 0 << 2 ) = 8` ，上述假設成立 * 故 `KK1 = div5` 、 `KK2 = div3` 、 `KK3 = 2` ### 延伸問題：評估 `naive.c` 和 `bitwise.c` 效能落差 * 透過迴圈檢查 `0 ~ 1000000` 資料，並透過 perf 分析效能落差 ==naive.c== ``` Performance counter stats for './naive.o': 1767.93 msec task-clock # 0.053 CPUs utilized 41 context-switches # 0.023 K/sec 4 cpu-migrations # 0.002 K/sec 54 page-faults # 0.031 K/sec 63,6106,7793 cycles # 3.598 GHz 48,2593,9095 instructions # 0.76 insn per cycle 9,6705,4919 branches # 546.999 M/sec 1232,3274 branch-misses # 1.27% of all branches 33.140408084 seconds time elapsed 0.404676000 seconds user 1.366285000 seconds sys ``` ==bitwise.c== ``` Performance counter stats for './bitwise.o': 1664.01 msec task-clock # 0.049 CPUs utilized 19 context-switches # 0.011 K/sec 1 cpu-migrations # 0.001 K/sec 51 page-faults # 0.031 K/sec 62,8016,9210 cycles # 3.774 GHz 47,6773,4807 instructions # 0.76 insn per cycle 9,5679,7062 branches # 574.994 M/sec 1245,0080 branch-misses # 1.30% of all branches 34.217948407 seconds time elapsed 0.439350000 seconds user 1.226187000 seconds sys ``` ## 測驗 4 ### 題目：考慮到整數 `PAGE_QUEUES` 可能有以下數值: * (1 or 2 or 3 or 4) * (5 or 6 or 7 or 8) × ($2^n$), n from 1 to 14 給定 `size_t offset`，試著將原本運算: ```cpp #include <stdint.h> size_t blockidx = offset / PAGE_QUEUES; ``` 由於` PAGE_QUEUES` 的數值分佈已知，在整數除法時，可思考加速運算的機制。需要注意，某些硬體平台缺乏整數除法指令 (如 Arm Cortex-A8)，即使 Intel 公司出品的 Core 處裡器 [Sandy Bridge 微架構](https://en.wikipedia.org/wiki/Sandy_Bridge)中，針對 64 位元整數的除法運算，會佔用 40 到 103 個處理器週期，運算成本仍屬沈重。於是我們可預先進行計算，從而避免整數除法指令的使用: (假設在 x86_64 硬體執行 GNU/Linux 並透過 gcc/clang 編譯) ```cpp #include <stdint.h> #define ffs32(x) ((__builtin_ffs(x)) - 1) size_t blockidx; size_t divident = PAGE_QUEUES; blockidx = offset >> ffs32(divident); divident >>= ffs32(divident); switch (divident) { CASES } ``` 其中 `CASES` 可能形如: ```cpp case 2: blockidx /= 2; break; ``` 這樣的組合，請用最少的 case-break 敘述做出同等 `size_t blockidx = offset / PAGE_QUEUES;` 效果的程式碼。 `CASES` 至少該包含哪些數字: ### 原理： >Built-in Function: int __builtin_ffs (int x) > * Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero. * `(__builtin_ffs(x)) - 1` 回傳的結果為 x 中最後一個為 1 的位置，換句話說，就是從右數來到第一個 `bit 1` ，共經過幾個 `bit 0` ，也就是可以右移幾次 * 功能等同於 `__builtin_ctz(x)` 不同的地方是 `__builtin_ffs(x)` 允許 `x = 0` 的情況發生 * 透過 `blockidx = offset >> ffs32(divident)` 與 `divident >>= ffs32(divident)` 簡化除法動作，可以透過以下數學表示： $blockidx = \dfrac{offset}{PageQueues} = \dfrac{\dfrac{offset}{2^n}}{\dfrac{PageQueues}{2^n}}$ * 再透過下表觀察不同的 QUEUE_PAGES ，向右位移後的結果： * 因為 (5 or 6 or 7 or 8) × ($2^n$), n from 1 to 14 ，皆可直接右移 n 次，因此不列入下表 * 觀察到右移後結果，因此只需要考慮 1 、 3 、 5 、 7 這幾種 case | QUEUE_PAGES | binary | divident >>= ffs32(divident) | | - | - | - | |1 | 0001 | 1 | |2 | 0010 | 1 | |3 | 0011 | 3 | |4 | 0100 | 1 | |5 | 0101 | 5 | |6 | 0110 | 3 | |7 | 0111 | 7 | |8 | 1000 | 1 |