2023q1 Homework2 (quiz2)

# 2023q1 Homework2 (quiz2) contributed by < `sherrygottalent` > > [第 2 週測驗題](https://hackmd.io/@sysprog/linux2023-quiz2) :::spoiler 開發環境 ```shell $ gcc --version gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0 $ lscpu Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: 0x00 Model name: - Model: 0 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: 0x0 BogoMIPS: 48.00 ``` ::: :::spoiler ToDo - [ ] 測驗 1 - [x] 解釋上述程式碼原理，並用 `__builtin_clzl` 改寫 - [ ] 在 Linux 核心原始程式碼找出類似的使用案例並解釋 - [ ] 當上述 `clz` 內建函式已運用時，編譯器能否產生對應的 x86 指令？ - [ ] 測驗 2 - [x] 解釋上述程式碼運作原理 - [ ] 嘗試使用 `__builtin_{clz,ctz,ffs}` 改寫，並改進 mod $10^9+7$的運算 - [ ] 測驗 3 - [x] 測驗 4 ::: ## 測驗 1 :::spoiler 測驗內容參照你所不知道的 C 語言: bitwise 操作，考慮 next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值 ``` c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> AAAA; x |= x >> 16; x |= x >> BBBB; return CCCC; } ``` 請補完程式碼，使其符合預期。作答規範: - AAAA 和 BBBB 皆為數值，且 AAAA 小於 BBBB - CCCC 為表示式 :::success 延伸問題: 1. 解釋上述程式碼原理，並用 `__builtin_clzl` 改寫 > int __builtin_clz (unsigned int x) > Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. > int __builtin_clzl (unsigned long) > Similar to `__builtin_clz`, except the argument type is unsigned long. 2. 在 Linux 核心原始程式碼找出類似的使用案例並解釋 3. 當上述 `clz` 內建函式已運用時，編譯器能否產生對應的 x86 指令？ > 提示: 可執行 `cc -O2 -std=c99 -S next_pow2.c` 並觀察產生的 `next_pow2.s` 檔案，確認 `bsrq` 指令 (表示 “Bit Scan Reverse”) ::: ### 程式碼原理理解對 64 位元數值 `x` 操作，逐續填補位元後，要得到最接近且大於等於 2 的冪的值 - step 1. 填補位元，從最高位元開始移動(`>`)填補（`|`），從 `x` 有 1 的位元開始後，後面的位元都會是 1 前段程式處理 8 個位元，直覺上會覺得只處理前 8 個位元；實際是即使 `x` 數值為 1 的最高位元不在前 8 個位元，也會向右移動填補右側 8 個位元。之後再移動 8 個位元，填補完 16 個位元，逐續再移動 16 個位元，填補完 32 個位元，移動 32 個位元，填補完 64 個位元，完成此步驟操作。 - step 2. 為了得到大於等於 2 的冪的值，將 x+1 ，即為所求的結果 ### 答題 AAAA = 8 BBBB = 32 CCCC = x+1 ### [__builtin_clzl](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 改寫 > int __builtin_clz (unsigned int x) > Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. `__builtin_clz` 回傳 `MSB` 方向連續位元 0 的長度，另 `len` 為 x 的位元總數，則大於等於 2 的冪的值為 `1 << (len-__builtin_clz(x))` ``` c uint64_t next_pow2_clz(uint64_t x) { size_t len = sizeof(uint64_t) * 8; return 1 << (len-__builtin_clz((unsigned int)x)); } ``` `__builtin_clzl` > int __builtin_clzl (unsigned long) > Similar to `__builtin_clz`, except the argument type is unsigned long. ``` c uint64_t next_pow2_clzl(uint64_t x) { size_t len = sizeof(uint64_t) * 8; return 1 << (len-__builtin_clzl((unsigned long)x)); } ``` ### Linux 核心原始程式碼找出類似的使用案例 --- ## 測驗 2 :::spoiler 測試內容 LeetCode [1680. Concatenation of Consecutive Binary Numbers](https://leetcode.com/problems/concatenation-of-consecutive-binary-numbers/)給定一整數 $n$ ，回傳將 1 到 $n$ 的二進位表示法依序串接在一起所得到的二進位字串，其所代表的十進位數字 mod $10^9+7$ 之值。 ``` c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { /* removing the rightmost set bit * e.g. 100100 -> 100000 * 000001 -> 000000 * 000000 -> 000000 * after removal, if it is 0, then it means it is power of 2 * as all power of 2 only contains 1 set bit * if it is power of 2, we increase the bit length */ if (!(DDDD)) len++; ans = (i | (EEEE)) % M; } return ans; } ``` 請補完程式碼，使其符合預期。作答規範: - `DDDD` 和 `EEEE` 皆為表示式 :::success 延伸問題: 1. 解釋上述程式碼運作原理 2. 嘗試使用 `__builtin_{clz,ctz,ffs}` 改寫，並改進 mod $10^9+7$的運算 ::: ### 程式碼原理理解將 10 進位數值轉換成 2 進位表示法後，相連，再從 2 進位轉換成 10 進位數值，回傳以 $10^9+7$ 為除數的餘數． - step 1 此處程式碼的目標是要取得 `len` 值，代表之後要向左移的次數 `len` 從初始值 0 開始慢慢增加，在碰到數值的二進位表述法位數增加時跟著增加 ``` if (!(DDDD)) len++; ``` 數值 i | 2 進位表示法| !(DDD) |len| ans ----|--------|-----|-----|--------- 1| $(1)_2$ | 1 | 1 (len++)| $(1)_2$ 2| $(10)_2$| 1 | 2 (len++)| $(110)_2$ 3| $(11)_2$| 0 | 2| $(11011)_2$ 4| $(100)_2$| 1 | 3 (len++)| $(11011100)_2$ 5| $(101)_2$| 0 | 3| $(11011100101)_2$ ...|||| 當數值 `i` 為 2 的冪時， len++ 此時數值有的特性， 1. 二進位表示法中只有一個bit為 `1` 2. 移除最右側位元後數值會是 0 (程式註解提示`removing the rightmost set bit`) :arrow_right: DDD = `i & (i - 1)` - step 2. ``` ans = (i | (EEEE)) % M; ``` 這個步驟要把 `i` 連接(`|`)到 `ans` 上，又已知位元移動數(`len`)，EEE 應該將 `ans` 做適當的位元移動 :arrow_right: EEE = `ans << (len-1)` ### 答題 DDD = `i & (i - 1)` EEE = `ans << (len-1)` ### 改寫 :::info Built-in Function: int __builtin_clz (unsigned int x) Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. Built-in Function: int __builtin_ctz (unsigned int x) Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. Built-in Function: int __builtin_ffs (int x) Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero. ::: ``` c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { if ((__builtin_clz(i) + __builtin_ctt(i) + 1 == sizeof(n) * 8) & (__builtin_ffs(i) == (1 << (len+1)) len++; ans = (i | (ans << (len-1))) % M; } return ans; } ``` ### 延伸疑問： Why mod $10^9+7$ ? overflow?還在想 :::warning 參照 [Barrett reduction](https://en.wikipedia.org/wiki/Barrett_reduction) :notes: jserv ::: --- ## 測驗 3 ### 程式碼原理理解 - SIMD(single Instruction, multiple data) within a register (SWAR) - :warning: 不是很了解一開始敘述兩數 concatenate 判斷是否為奇數的遮罩跟後續 SWAR 實作的關聯，只是說明 SWAR 是同時對多個位元操作的一種運算方法？ - Unicode using one to four one-byte (8-bit) code units. - [SIMD and SWAR Techniques](https://www.chessprogramming.org/SIMD_and_SWAR_Techniques) > 若輸入的字串是一個有效的 UTF-8 序列，則計算其中的後續位元組數量，並將此數字從總字串長度中減去，即可確定字元數量。 > :question: code point是什麼？ :::info Built-in Function: int __builtin_popcount (unsigned int x) Returns the number of 1-bits in x. ::: SWAR 實作如下: ``` c size_t swar_count_utf8(const char *buf, size_t len) { const uint64_t *qword = (const uint64_t *) buf;// const uint64_t *end = qword + len >> 3; size_t count = 0; for (; qword != end; qword++) { // 對一字元操作 qword = character const uint64_t t0 = *qword; const uint64_t t1 = ~t0; const uint64_t t2 = t1 & 0x04040404040404040llu; const uint64_t t3 = t2 + t2; const uint64_t t4 = t0 & t3; //取出qword 位元組 not bit6 and bit7 count += __builtin_popcountll(t4); // 看第六七位是否為 1 } count = (1 << AAAA) * (len / 8) - count; count += (len & BBBB) ? count_utf8((const char *) end, len & CCCC) : DDDD; return count; } ``` `buf` 型態是 pointer to char (string) `qword` 強制轉型為 64-bit 位元長度 `const uint64_t *end = qword + len >> 3;` 看不懂:warning: for-loop 對每個位元組(byte)操作 ### 作答想法（猜測）： - SWAR看起來是對稱的操作 - 提示： AAAA, BBBB, CCCC, DDDD 皆為常數 `count = (1 << AAAA) * (len / 8) - count;` :arrow_right: AAAA = 3 (因為後面有 / 8 )(**只是猜測:warning:**) --- ## 測驗 4 :::spoiler 測驗內容以下程式碼可判定 16 位元無號整數是否符合特定樣式 (pattern): ``` c #include <stdint.h> #include <stdbool.h> bool is_pattern(uint16_t x) { if (!x) return 0; for (; x > 0; x <<= 1) { if (!(x & 0x8000)) return false; } return true; } ``` 符合上述程式碼的樣式，如下: ``` pattern: 8000 (32768) pattern: c000 (49152) pattern: e000 (57344) pattern: f000 (61440) pattern: f800 (63488) pattern: fc00 (64512) pattern: fe00 (65024) pattern: ff00 (65280) pattern: ff80 (65408) pattern: ffc0 (65472) pattern: ffe0 (65504) pattern: fff0 (65520) pattern: fff8 (65528) pattern: fffc (65532) pattern: fffe (65534) pattern: ffff (65535) ``` 改寫上述程式碼，使其達到等價行為，但更精簡有效。 ::: ### 程式碼原理理解觀察可以通過的結果 MSB 都是 1，且數值如： 8、c、e、f 在二進位表示都是 MSB 後有連續位元 1 的表示法可以知道檢驗的 pattern 是 MSB 為 1 ，且直到出現位元 0 之前連續為 1 不間斷的數值。 ### 答題 ``` bool is_pattern(uint16_t x) { const uint16_t n = EEEE; return (n ^ x) < FFFF; } ``` 參考 [YSRossi](https://hackmd.io/@YSRossi/quiz2-2023#%E6%B8%AC%E9%A9%97-4)重新思考，線索： 1. `n^x` may toggle bits of 1 2. 因為 `(n ^ x) < FFFF`， `n^x` 運算後，不符合樣式的數值會在至少 MSB 後出現的第一個 0 轉換為 1 :::info e.g. x = $(9)_{10} = (1101)_2$， `x ^ n` may be $(1110)_2$，n = $(0011)_2$ = ~x+1 = $-x$ or `x ^ n` may be $(1111)_2$，n = $(0010)_2$ = ~x ::: 以 `~x` 測試可以找到失敗例子，所以 n = EEEE = `-x` or `~x+1` FFFF = x