2020q3 Homework4 (quiz4)

--- title: '2020q3 Homework4 (quiz4)' tags: linux2020 --- # 2020q3 Homework4 (quiz4) contributed by < `ChongMingWei` > ## Outline [TOC] ## 環境 ```shell $ uname -a Linux cmw-System-Product-Name 5.4.0-47-generic #51~18.04.1-Ubuntu SMP Sat Sep 5 14:35:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` ## 作業要求 ## 測驗 1 要計算每個位元的差，就使用 `XOR` 來檢查哪些 bit 不是同時為 set 或是 clear ，最後在使用 popcount 來計算 set bit 的數量。 ```c= int hammingDistance(int x, int y) { return __builtin_popcount(x ^ y);//OP } ``` :::success 1.解釋上述程式碼運作原理，並舉出不使用 gcc extension 的 C99 實作程式碼; ::: - Line 5 可得到最低位的 set bit ，其他 bits 設為 clear 。 - Line 6 計算 set bit 的數量 - Line 7 計算完 set bit 的數量，將剛剛得到的最低位 set bit 用 `XOR` 來消除 ```c= int hammingDistance(int x, int y){ int a = x^y; int b = 0; while (a){ int t = a & (-a); b++; a ^= t; } return b; //return __builtin_popcount(x ^ y); } ``` :::success 2.練習 [LeetCode 477. Total Hamming Distance](https://leetcode.com/problems/total-hamming-distance/)，你應當提交 (submit) 你的實作 (C 語言實作)，並確保執行結果正確且不超時 ::: - Version 1: Using double for loop to find hamming distance of each pair. - Result: Time Limit Exceeded ```c= int totalHammingDistance(int* nums, int numsSize){ int sum=0; for (int i = 0;i < numsSize - 1; ++i){ for (int j = i + 1; j < numsSize; ++j){ sum += __builtin_popcount(nums[i]^nums[j]); } } return sum; } ``` - Version 2: Replace the `__builtin_popcount` by constant time implementation - Result ![](https://i.imgur.com/RyRrAbb.png) ```c= int totalHammingDistance(int* nums, int numsSize){ int sum=0; for (int i = 0;i < numsSize - 1; ++i){ for (int j = i + 1; j < numsSize; ++j){ int a = nums[i]^nums[j]; unsigned n; unsigned v = (unsigned)a; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v += v >> 4; v &= 0x0F0F0F0F; v *= 0x01010101; v >>= 24; sum +=v; } } return sum; } ``` - Version3: For each bit, we can find the hamming distance of each pair is `countOfOne*countOfZero`. So we traverse 32 bits and computes the hamming distance of each one bit. Finally, sum all hamming distance. - Result ![](https://i.imgur.com/NTslLha.png) > 參考 [O(n) Simple Solution](https://leetcode.com/problems/total-hamming-distance/discuss/865629/O(n)-Simple-Solution) ```c= int totalHammingDistance(int* nums, int numsSize){ int sum=0; for(int i = 0; i < 32;++i){ int count1=0; for (int j = 0;j < numsSize;++j) count1 += (nums[j]&(1u<<i))>>i;//if bit i of nums[j] is 1, count1++ sum += count1*(numsSize-count1);//compute hamming distance of bit i } return sum; } ``` ## 測驗 2 > 參考 [YLowy](https://hackmd.io/@YLowy/S1PmArmLD) LeetCode [1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) 大意: >給你一棵樹，樹上有 n 個節點，編號自 0 到 n−1。樹以父節點陣列的形式提供，其中 parent[i] 是節點 i 的父節點。樹的根節點是編號為 0 的節點。請設計 treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) 函式，回傳節點 node 的第 k 個祖先節點。若不存在這樣的祖先節點，回傳 -1。樹節點的第 k 個祖先節點是從該節點到根節點路徑上的第 k 個節點 ```c= #include <stdlib.h> typedef struct { int **parent;//AAA int max_level; } TreeAncestor; TreeAncestor *treeAncestorCreate(int n, int *parent, int parentSize) { TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); int max_level = 32 - __builtin_clz(n) + 1; for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; for (int j = 1;; j++) { int quit = 1; for (int i = 0; i < parentSize; i++) { obj->parent[i][j] = obj->parent[i][j + (-1)] == -1//BBB ? -1 : obj->parent[obj->parent[i][j - 1]][j - 1]; if (obj->parent[i][j] != -1) quit = 0; } if (quit) break; } obj->max_level = max_level - 1; return obj; } int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (1<<i))//CCC node = obj->parent[node][i]; return node; } void treeAncestorFree(TreeAncestor *obj) { for (int i = 0; i < obj->n; i++) free(obj->parent[i]); free(obj->parent); free(obj); } ``` ### 破題根據題意，首先面臨的問題就是要建立怎麼樣的一個資料結構，來儲存如以下圖的 tree: ![](https://i.imgur.com/AVnjv3f.png) 在此實作中，我們採用的是查表的方式，建立一個名為 `TreeAncestor` 的 struct ，使用裡面的表格 `parent` ，來供我們查詢，那麼查詢的方法是甚麼呢? ### 查詢方法我們會建立一個 $node\times (\lceil{log_{_2}node}\rceil+1)$ (此處 node=7)大小的表格，以上圖為範例: | Node | 2^0^ th ancestor | 2^1^ th ancestor | 2^2^ th ancestor | | | -------- | -------- | -------- | -------- | -------- | | 0 | | | | | | 1 | | | | | | 2 | | | | | | 3 | | | | | | 4 | | | | | | 5 | | | | | | 6 | | | | | 這邊其實多出一個 column ，==用途是讓我們有修改的空間== 在表格中， `parent[node][j]` 中儲存的是 `node` 的 2^j^ th 祖先，我們把要找第 k 個祖先的 "k" 拿出來討論: k 是一個正整數，而正整數 k 事實上可以被表示成數個 2^n^ 的和 E.g. 要找第 7 個祖先，就是去找 ((第 1 個祖先) 的第 2 個祖先) 的第 4 個祖先 >因為 7 = 2^0^ + 2^1^ + 2^2^ 這樣一來，我們就可以達到 O(logn) 的搜尋時間 > 假設我們今天的資料結構是用 linked list ，每個 node 都有著其 parent 的位址，那麼我們的搜尋時間就會是 O(n) ### Code Details - 計算樹的高度以及將表格 `parent` 初始化，全部都設成-1 ```c=13 int max_level = 32 - __builtin_clz(n) + 1; for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } ``` | Node | 2^0^ th ancestor | 2^1^ th ancestor | 2^2^ th ancestor | | | -------- | -------- | -------- | -------- | -------- | | 0 | -1 | -1 | -1 | -1 | | 1 | -1 | -1 | -1 | -1 | | 2 | -1 | -1 | -1 | -1 | | 3 | -1 | -1 | -1 | -1 | | 4 | -1 | -1 | -1 | -1 | | 5 | -1 | -1 | -1 | -1 | | 6 | -1 | -1 | -1 | -1 | - 將每一個 node 的父親填入表格中 ```c=19 for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; ``` | Node | 2^0^ th ancestor | 2^1^ th ancestor | 2^2^ th ancestor | | | -------- | -------- | -------- | -------- | -------- | | 0 | -1 | -1 | -1 | -1 | | 1 | 0 | -1 | -1 | -1 | | 2 | 0 | -1 | -1 | -1 | | 3 | 1 | -1 | -1 | -1 | | 4 | 1 | -1 | -1 | -1 | | 5 | 2 | -1 | -1 | -1 | | 6 | 2 | -1 | -1 | -1 | - 將表格 `parent` 中，剩下的欄位填入資料 - Line 25: 如果第2^n-1^個祖先不存在 (-1)，那麼後續第2^n^個祖先也不存在 (設為-1) - Line 27: 如果第2^n-1^個祖先存在 (!= -1)，那麼第2^n^個祖先就是**第2^n-1^個祖先的第2^n-1^個祖先**。 > 2^n^ = 2^n-1^ + 2^n-1^ ```c=22 for (int j = 1;; j++) { int quit = 1; for (int i = 0; i < parentSize; i++) { obj->parent[i][j] = obj->parent[i][j + (-1)] == -1//BBB ? -1 : obj->parent[obj->parent[i][j - 1]][j - 1]; if (obj->parent[i][j] != -1) quit = 0; } if (quit) break; } ``` | Node | 2^0^ th ancestor | 2^1^ th ancestor | 2^2^ th ancestor | | | -------- | -------- | -------- | -------- | -------- | | 0 | -1 | -1 | -1 | -1 | | 1 | 0 | -1 | -1 | -1 | | 2 | 0 | -1 | -1 | -1 | | 3 | 1 | 0 | -1 | -1 | | 4 | 1 | 0 | -1 | -1 | | 5 | 2 | 0 | -1 | -1 | | 6 | 2 | 0 | -1 | -1 | - 樹的高度，前面 Line 13 沒必要+1的(會使得表格多出一個欄位)，這邊再-1也就顯得多餘。 ```c=32 obj->max_level = max_level - 1; ``` - 搜尋第 k 個祖先 - Line 38: 先得到樹的最大高度 - Line 39: for 迴圈的跳出條件中， i 的搜尋次數限定不超過 `max_level` 且判斷目前的 node 是否已經是 root (-1) - Line 40: if 使用 k 的二進位表示法來判斷是否要去找之前的祖先 > E.g. > k = 3, 表示要找第1個祖先的第2個祖先， > 因為3 = 2^0^ + 2^1^，會在 i = 0和1 的時候更新 node 的值 > k = 5, 表示要找第1個祖先的第4個祖先， > 因為5 = 2^0^ + 2^2^，會在 i = 0和2 的時候更新 node 的值 ```c=38 int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (1<<i))//CCC node = obj->parent[node][i]; return node; ``` -Result ![](https://i.imgur.com/GTr7bDS.png) :::success 2. 在 treeAncestorCreate 函式內部，若干記憶體被浪費，請撰寫完整測試程式碼，並透過工具分析 ::: :::success 3. 在 [LeetCode 1483. Kth Ancestor of a Tree Node](https://leetcode.com/problems/kth-ancestor-of-a-tree-node/) 提交你的實作，你應該要在 C 語言項目中，執行時間領先 75% 的提交者 ::: - Line 13-14: 修改之前發現建立表格 `parent` 時多出來的一個欄位，也就是移除計算 `max_level` 時多餘的部分 - Line 24-31: 之前在建立表格的最後一個步驟時，是每一個欄位都會更新一次，將其更改為，在更新每一個 node 的祖先欄位時，發現出現 -1 就跳出該個迴圈。 | Node | 2^0^ th ancestor | 2^1^ th ancestor | 2^2^ th ancestor | | -------- | -------- | -------- | -------- | | 0 | -1 | -1 | -1 | | 1 | 0 | -1 | -1 | | 2 | 0 | -1 | -1 | | 3 | 1 | 0 | -1 | | 4 | 1 | 0 | -1 | | 5 | 2 | 0 | -1 | | 6 | 2 | 0 | -1 | ```c= #include <stdlib.h> typedef struct { int **parent;//AAA int max_level; } TreeAncestor; TreeAncestor *treeAncestorCreate(int n, int *parent, int parentSize) { TreeAncestor *obj = malloc(sizeof(TreeAncestor)); obj->parent = malloc(n * sizeof(int *)); int max_level = 32 - __builtin_clz(n); obj->max_level = max_level; for (int i = 0; i < n; i++) { obj->parent[i] = malloc(max_level * sizeof(int)); for (int j = 0; j < max_level; j++) obj->parent[i][j] = -1; } for (int i = 0; i < parentSize; i++) obj->parent[i][0] = parent[i]; for (int i = 0; i < parentSize; i++) { for (int j = 1;j<max_level; j++) { if(obj->parent[i][j + (-1)] == -1) break; else obj->parent[i][j] = obj->parent[obj->parent[i][j - 1]][j - 1]; } } return obj; } int treeAncestorGetKthAncestor(TreeAncestor *obj, int node, int k) { int max_level = obj->max_level; for (int i = 0; i < max_level && node != -1; ++i) if (k & (1<<i))//CCC node = obj->parent[node][i]; return node; } void treeAncestorFree(TreeAncestor *obj) { for (int i = 0; i < obj->max_level; i++) free(obj->parent[i]); free(obj->parent); free(obj); } ``` ![](https://i.imgur.com/ogGmvYR.png) ## 測驗 3 考慮貌似簡單卻蘊含實作深度的 [FizzBuzz](https://en.wikipedia.org/wiki/Fizz_buzz) 題目: + 從 1 數到 n，如果是 3的倍數，印出 “Fizz” + 如果是 5 的倍數，印出 “Buzz” + 如果是 15 的倍數，印出 “FizzBuzz” + 如果都不是，就印出數字本身直覺的實作程式碼如下: (`naive.c`) ```cpp #include <stdio.h> int main() { for (unsigned int i = 1; i < 100; i++) { if (i % 3 == 0) printf("Fizz"); if (i % 5 == 0) printf("Buzz"); if (i % 15 == 0) printf("FizzBuzz"); if ((i % 3) && (i % 5)) printf("%u", i); printf("\n"); } return 0; } ``` 利用 bitwise 和由 i 來控制字串起始和長度所實作的 FizzBuzz 程式碼: (`bitwise.c`) ```c= #define MSG_LEN 8 static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char *argv[]) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); // 如果是3或5的倍數，要輸出的長度皆為4，如果是15的倍數，要輸出長度為8 unsigned int length = (2 << div3) << div5; char fmt[MSG_LEN + 1]; strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> div5) >> (div3 << 2)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` - 根據是否為3, 5, 15的倍數，分別給予不同的起始 index 。 ```c=18 strncpy(fmt, &"FizzBuzz%u"[(MSG_LEN >> KK1) >> (KK2 << KK3)], length); ``` - 可以整理成以下的表格 | div3 | div5 |start | | -------- | -------- | -------- | | 0 | 0 | 8 | | 0 | 1 | 4 | | 1 | 0 | 0 | | 1 | 1 | 0 | - 從`8`開始的，表示既不是3的倍數，也不是5的倍數 - 從`4`開始的，表示只能是5的倍數 - 從`0`開始的，表示可能是15的倍數，也可能只是3的倍數，這邊要輸出甚麼樣的字串是由 Line 15 計算長度時所控制，所以把他當成同一種狀況就好。 - 回到程式碼，`KK1` 要填的就是 `div5` ，表示在只有能被5整除時，起始點為 `8>>1`(也就是4) - 而後面的 `(KK2 << KK3)` 則需要考慮，如果能被3整除時要讓起始點變成0，反之要能讓起始點保持在8，當我們填 `(div3 << 2)` 就滿足了以上的考慮。 :::success 1. 解釋上述程式運作原理並評估 naive.c 和 bitwise.c 效能落差 - 避免 stream I/O 帶來的影響，可將 printf 更換為 sprintf ::: - 實驗設計: i 從 1 到 100，將每次執行的時間相加後，最後除以100 - 使用 printf 來做實驗 (**加速 1.21x**) | Naive | Bitwise | | -------- | -------- | | 10601 ns | 8769 ns | - 使用 sprintf 來做實驗 (**加速 1.69x**) | Naive | Bitwise | | -------- | -------- | | 584 ns | 345 ns | 可以見得，做 printf 會比 sprintf 耗時，會消耗掉改成 bitwise 操作所加速的時間。 ## 測驗 4 測驗4介紹了一個 GCC 的擴充函式 `int __builtin_ffs (int x)` ，說明如下: >摘自 [Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) >Built-in Function: int __builtin_ffs (int x) Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero. 也就是說這個函式返回的值是最低位 set bit 的 index 值再加上1 > E.g. > $6_{10} = 110_2$ > $\_\_builtin\_ffs (6) = 2(index\ of\ the\ least\ significant\ set\ bit = 1)$ 回到題目，這題是在事先知道除數 (PAGE_QUEUES) 有哪些的情況下，要將原本的整數除法修改成兩步驟: 1. 先除以除數中的因數:2^n^ (使用 right shift 來做) 2. 再除以除數中的剩餘因數 PAGE_QUEUES 如下: - (1 or 2 or 3 or 4) - (5 or 6 or 7 or 8) × (2^n^), n from 1 to 14 給定 `size_t offset`，試著將原本運算: ```cpp #include <stdint.h> size_t blockidx = offset / PAGE_QUEUES; ``` 更改成 ```c= #include <stdint.h> #define ffs32(x) ((__builtin_ffs(x)) - 1) size_t blockidx; size_t divident = PAGE_QUEUES; blockidx = offset >> ffs32(divident); divident >>= ffs32(divident); switch (divident) { case 3: blockidx /= 3; break; case 5: blockidx /= 5; break; case 7: blockidx /= 7; break; } ``` 前面有敘述過 `int __builtin_ffs (int x)` 返回值，這邊的 `ffs32(x)` 做的就是計算最低位 set bit 的 index ，**代表著這個數 x 有 2^ffs32(x)^ 這個因數**。 ```c=2 #define ffs32(x) ((__builtin_ffs(x)) - 1) ``` 在看這兩行之前，我們先觀察上述 PAGE_QUEUES 的數值: - (1 or 2 or 3 or 4) - (5 or 6 or 7 or 8) × (2^n^), n from 1 to 14 我們將其做整理: | Original | After | | -------- | -------- | | 1 | 2^0^ | | 2 | 2^1^ | | 3 | 3 x 2^1^ | | 4 | 2^2^ | | 5 x 2^n^ n from 1 to 14 | 5 x 2^n^ n from 1 to 14 | | 6 x 2^n^ n from 1 to 14 | 3 x 2^n+1^ n from 1 to 14 | | 7 x 2^n^ n from 1 to 14 | 7 x 2^n^ n from 1 to 14 | | 8 x 2^n^ n from 1 to 14 | 2^n+3^ n from 1 to 14 | 而這是我們要計算的值: $\dfrac{offset}{PAGE\_QUEUES}$ 我們可以同時對分子分母除以2^n^(n = ffs32(divident))，也就是 Line 5-6 在做的事情 ```c=5 blockidx = offset >> ffs32(divident); divident >>= ffs32(divident); ``` 而根據上面表格，當我們做完分子分母同時 right shift `ffs32(divident)`之後，會有4種情況: 1. divident = 1 2. divident = 3 3. divident = 5 4. divident = 7 所以我們要接著再做除數為 `3`, `5`, `7`的除法(`1`就不用繼續做了) ```c=7 switch (divident) { case 3: blockidx /= 3; break; case 5: blockidx /= 5; break; case 7: blockidx /= 7; break; } ```