2023q1 Homework2 [(quiz2)](https://hackmd.io/@sysprog/linux2023-quiz2)

# 2023q1 Homework2 [(quiz2)](https://hackmd.io/@sysprog/linux2023-quiz2) contributed by < `tseng0201` > ## 測驗一考慮 next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值，例如: 以下是可能的實作方式: ```c #include <stdint.h> static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; } uint64_t next_pow2(uint64_t x) { uint8_t lo = 0, hi = 63; while (lo < hi) { uint8_t test = (lo + hi)/2; if (x < pow2(test)) { hi = test; } else if (pow2(test) < x) { lo = test+1; } else { return pow2(test); } } return pow2(lo); } ``` 我們嘗試改寫為以下: ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` ### 1.解釋程式碼原理，並用 __builtin_clzl 改寫解釋上述程式碼原理第一個方法使用二分搜尋演算法，首先將下界與上界設為 0 與 63，分別代表 $2^{0}$ 與 $2^{63}$ ``` uint8_t lo = 0, hi = 63; ``` 每次取下界與上界的中間值(test)與 $x$ 做比較 ``` uint8_t test = (lo + hi)/2; ``` 比較的結果有 3 種可能，x = test、x < test、x > test，當 x = test 回傳 test，其餘調整上界或下界直到 lo == hi 回傳 lo ``` if (x < pow2(test)) { hi = test; } else if (pow2(test) < x) { lo = test+1; } else { return pow2(test); } ``` 第二種方法也是一種採用二分法的算法，透過位元位移與或運算，從輸入值二進制表示中最高位出現的1的位置開始，將其以下所有位數設為1，這樣就可以得到一個大於輸入值x但不大於最接近2的冪的值。接著，將輸入值x加1再進行進位操作，就可以得到大於等於最接近2的冪的值。**此外此方法對於 x 剛好等於 2 的冪時會產生錯誤** 上述方法可以簡化為以下程式碼。且為了修正當 x 剛好等於 2 的冪時產生的錯誤，我們將輸入值 x 減 1。當於x不是2的冪的情況，將其減1不會影響最高位1出現的位置。而對於 x 為 2 的冪的情況，x 減 1 經過或運算並不會改變其值，最後回傳時 x 加 1 又將 x 變回輸入時的數值 ``` uint64_t next_pow2(uint64_t x) { if (!x) return 1; x = x - 1; x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` 改用 __builtin_clzl 後程式碼如下 ``` uint64_t next_pow2(uint64_t x) { // if x = 0, result of __builtin_clzl(x) is undefined. if (!x) return 1; return 1 << (63 - __builtin_clzl(x)); } ``` 根據 [gcc](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 所述當輸入為 0 時會出現未定義行為 ``` Built-in Function: int __builtin_clz (unsigned int x) Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. Built-in Function: int __builtin_ctzl (unsigned long) Similar to __builtin_ctz, except the argument type is unsigned long. ``` ### 2.在 Linux 核心原始程式碼找出類似的使用案例並解釋 - roundup_pow_of_two() 本習題透過 Chatgpt 完成，詢問內容如下將程式碼餵給Chatgpt 後詢問函式功能 ``` my input: what this code doing in linux kernel uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` 確認 Chatgpt 回答正確後在詢問 linux kernel 是否有相似的函式得到 roundup_pow_of_two() ``` my input: dose linux kernel has some similiar function? ``` 最後透過查詢 Linux github 確認回答是否屬實該函式位於 include/linux/log2.h ``` roundup_pow_of_two() /** * roundup_pow_of_two - round the given value up to nearest power of two * @n - parameter * * round the given value up to the nearest power of two * - the result is undefined when n == 0 * - this can be used to initialise global variables from constant data */ #define roundup_pow_of_two(n) \ ( \ __builtin_constant_p(n) ? ( \ (n == 1) ? 1 : \ (1UL << (ilog2((n) - 1) + 1)) \ ) : \ __roundup_pow_of_two(n) \ ) fls_long() 位於 ``` ### 3.當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？ gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1) 將以下程式碼透過 gcc -O2 -std=c99 -S test.c 進行編譯 ``` #include <stdint.h> #include <stdio.h> static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; } uint64_t next_pow2(uint64_t x) { uint8_t lo = 0, hi = 63; while (lo < hi) { uint8_t test = (lo + hi)/2; if (x < pow2(test)) { hi = test; } else if (pow2(test) < x) { lo = test+1; } else { return pow2(test); } } printf("%d \n", lo); return pow2(lo); } uint64_t next_pow2(uint64_t x) { return 1 << ( 63 - __builtin_clzl(x)); } int main() { uint64_t input = pow2(10); printf("%lu, %lu", next_pow3(input), input); } ``` next_pow2 函式表示如下 ``` next_pow2: .LFB14: .cfi_startproc endbr64 // find significant bit index, index start from 0 bsrq %rdi, %rdi movl $63, %ecx movl $1, %eax // number of leading zero = 63 - significant bit index = 63 ^ significant bit index xorq $63, %rdi subl %edi, %ecx sall %cl, %eax cltq ret .cfi_endproc ``` ## 測驗二 LeetCode [1680. Concatenation of Consecutive Binary Numbers](https://leetcode.com/problems/concatenation-of-consecutive-binary-numbers/) 給定一整數 $n$ ，回傳將 1 到 $n$ 的二進位表示法依序串接在一起所得到的二進位字串，其所代表的十進位數字 mod $10 ^ 9 + 7$ 之值。以下是可能的實作: ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { // removing the rightmost set bit // e.g. 100100 -> 100000 // 000001 -> 000000 // 000000 -> 000000 // after removal, if it is 0, then it means it is power of 2 // as all power of 2 only contains 1 set bit // if it is power of 2, we increase the bit length if (!(i & (i - 1))) len++; ans = (i | (ans << len)) % M; } return ans; } ``` ### 1.解釋程式碼運作原理上述程式碼將由 1 到 n 透過 len 的長度控制目前的數字最短可以用多少 bit 表示當 i 為 2 的冪時，代表此後的數字必定要在多一位來表達，故 len + 1，最後透過向左位移 len 個位元來與 i 進行拼接 ### 2.嘗試使用 __builtin_{clz,ctz,ffs} 改寫，並改進 mod $10^9 + 7$的運算使用 __builtin_{clz,ctz,ffs} 改寫如下 ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { ans = (i | ans << (32 -__builtin_clz(i))) % M; } return ans; } ``` TODO: 改進 mod $10^9 + 7$的運算 ## 測驗三 ### 1.解釋下列程式碼運作原理，比較 SWAR 和原本的實作效能落差 ``` size_t swar_count_utf8(const char *buf, size_t len) { const uint64_t *qword = (const uint64_t *) buf; const uint64_t *end = qword + len >> 3; size_t count = 0; for (; qword != end; qword++) { const uint64_t t0 = *qword; const uint64_t t1 = ~t0; const uint64_t t2 = t1 & 0x04040404040404040llu; const uint64_t t3 = t2 + t2; const uint64_t t4 = t0 & t3; count += __builtin_popcountll(t4); } count = (1 << 3) * (len / 8) - count; count += (len & 7) ? count_utf8((const char *) end, len & 777) : DDDD; return count; } ``` 上述程式碼透過 SWAR 技術一次性檢查 8 個 Byte 計算給定的輸入共有多少個 UTF-8 字元，依範例所提及合法 UTF-8 的字元共有 4 種不同的長度，除了開頭不同外後面都接著 10xxxxxx 格式，因此透過特定的 Mask 統計 10xxxxxx 的格式所出現的次數，將字串長度(len)扣掉 10xxxxxx 的格式出現次數後,便是此字串所含有的 UTF-8 字元數，最後字串長度可能無法整除 8，所以最後剩餘未檢查的字串需以 count_utf8 進行確認 ### 2.在 Linux 核心原始程式碼找出 UTF-8 和 Unicode 相關字串處理的程式碼，探討其原理，並指出可能的改進空間 TODO ## 測驗四 #### 1.解釋下述程式碼運作原理 ``` bool is_pattern(uint16_t x) { const uint16_t n = -1; return (n ^ x) < x; } ``` TODO #### 2.在 Linux 核心原始程式碼找出上述 bitmask 及產生器，探討應用範疇 [參見 Data Structures in the Linux Kernel](https://0xax.gitbooks.io/linux-insides/content/DataStructures/linux-datastructures-3.html) TODO