2020q3 Homework2 (quiz2)

# 2020q3 Homework2 (quiz2) contributed by < `nelsonlai1` > ## 測驗 1 ```c= #include <stddef.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; for (int i = 0; i < size; i++) if (str[i] & 0x80) /* i.e. (unsigned) str[i] >= 128 */ return false; return true; } ``` 逐一字元比對範圍內是否全是有效 ASCII 字元的 code，由於 ASCII 範圍是 0 到 127，也就是七個 bits，所以若是 & 0x80(10000000) 不為 0 的話代表大於 127 ，也就不是有效的 ASCII 字元。 ```c= #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & MMM) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` 此題利用上述 code 的基礎，改為一次比對八個字元，增加效率，而最後不足八個字元的部分再使用逐一字元比對。由於 payload 是將八個字元串接在一起，所以可以推斷出 `MMM = 0x8080808080808080`。 ### 延伸問題 1. 為何用到 memcpy? 根據 <s>http://www.cplusplus.com/reference/cstring/memcpy/</s> 提到的特性，確保 payload 複製到的值正好為 64 個位元。 > The underlying type of the objects pointed to by both the source and destination pointers are irrelevant for this function; The result is a binary copy of the data. > > The function does not check for any terminating null character in source - it always copies exactly num bytes. :::danger :warning: 改用 [Linux online manpage](https://man7.org/linux/man-pages/index.html) 作為 C 標準函式庫的主要參考資訊。另外，如果不用 memcpy，而是直接透過另一個指標來指向特定的地址，可以嗎？這才是該思考的議題。 :notes: jserv ::: > [name=nelsonlai1] > > 後來查看 memcpy 在 [glibc](https://github.com/lattera/glibc/blob/master/string/memcpy.c) 中的原始碼發現，當要複製的 byte 數大於一定值 > (len >= OP_T_THRES) 時，會先複製幾個 bytes 來達成 alignment。 > (BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ);)，增加效率。 2. 將上述技巧應用於「給予一個已知長度的字串，檢測裡頭是否包含有效的英文大小寫字母」可以利用測驗 5 的技巧來寫這題 ```c= #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); payload |= PACKED_BYTE(0x20); if ((payload & PACKED_BYTE(0x80)) == 0) { uint64_t a = payload + PACKED_BYTE(128 - 'a'); uint64_t z = payload + PACKED_BYTE(128 - 'z' - 1); if ((a ^ z) & PACKED_BYTE(0x80) ^ PACKED_BYTE(0x80)) return false; } i += 8; } while (i < size) { if ((str[i] | 0x20) < 97 || (str[i] | 0x20) > 122) return false; i++; } return true; } ``` 前半段一次比對八個字元時，先將複製進來的字串 `|= PACKED_BYTE(0x20)` 這樣如果有大寫英文就會變小寫，再用測驗 5 的技巧找出 'a'~'z'。後面逐字比對的部分也是 `| 0x20` 後判斷是不是落在小寫字母的範圍內。 ## 測驗 2 ```c= uint8_t hexchar2val(uint8_t in) { const uint8_t letter = in & MASK; const uint8_t shift = (letter >> AAA) | (letter >> BBB); return (in + shift) & 0xf; } ``` 這題可以從最後一行開始看起，由於是 &0xf，可以先看後四個 bits。如此一來，0~9 需要的 (in + shift) 等於 in，而 A~F (a~f) 需要的 (in+shift) 則分別是 1010, 1011, 1100, 1101, 1110, 1111。可以推斷出 in 是數字時，shift 是 0000，而 in 是字母時，shift 是1001。這時可以知道，letter 在 in 是字母時才會有值，所以能選的 MASK 只有 0x40。而得到的 letter 為 01000000，然後就可以算出 AAA 和 BBB 分別是 3 跟 6。 ### 延伸問題 1. 解釋上述程式的運作原理一開始的 MASK 用來分辨數字和字母，並且忽視大小寫。而 shift 用來讓 in 是字母時回傳對應的值。之所以能這麼寫，也是得利於 ASCII 的編排方式。 2. 將 hexchar2val 擴充為允許 "0xFF", "0xCAFEBABE", "0x8BADF00D" 等字串輸入，且輸出對應的十進位數值 ```c= uint64_t hexchar2val(char *in) { uint64_t result = 0; for (int i = 0; i < strlen(in); i++) { uint64_t letter = in[i] & 0x40; uint64_t shift = (letter >> 3) | (letter >> 6); result = result << 4 | ((in[i] + shift) & 0xf); } return result; } ``` 沿用此題概念，從第一個字元開始做，每次都會把之前的結果左移四次 (* 16) 後，or 這次的結果 (加上這次結果)。 ## 測驗 3 ```c= const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(XXX) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint32_t quickmod(uint32_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } ``` 其中第六行在做的是題目敘述中提到的 $n \times \dfrac{\dfrac{2 ^ N}{d}}{2 ^ N}$，所以我們可以得知 N = 64，而 M = $\dfrac{2 ^ N}{d}$。然而其實題目中的 M 不完全是這樣，但選一個最相近的解就是`XXX = 0xFFFFFFFFFFFFFFFF`，所以這裡的 M 其實是 $\dfrac{2 ^ {64} - 1}{d} + 1$。這樣又會衍伸一個問題，兩者出來的結果會相同嗎? 一樣從第六行來驗證結果，$\dfrac{(\dfrac{2 ^ {64} - 1}{d} + 1) \times n}{2 ^ {64}} = \dfrac{\dfrac{n \times 2 ^ {64} - n}{d} + n}{2 ^ {64}} = \dfrac{n \times 2 ^ {64} - n}{d \times 2 ^ {64}} + \dfrac{n}{2 ^ {64}}$ $= \dfrac{n}{d} - \dfrac{n}{d \times 2 ^ {64}} + \dfrac{n}{2 ^ {64}} = \dfrac{n}{d} + (\dfrac{d \times n - n}{d \times 2 ^ {64}}) = \dfrac{n}{d} + (\dfrac{(d - 1) \times n}{d \times 2 ^ {64}}) < \dfrac{n}{d} + \dfrac{n}{2^{64}}$。而因為 n, d 的上限是 $2^{32} - 1$，所以 $\dfrac{n}{2^{64}} < \dfrac{1}{2^{32}}$，而 $\dfrac{n}{d}$ 的小數點部分最高是 $\dfrac{2^{32}-2}{2^{32}-1}$，即使加上此時的 $\dfrac{n}{2^{64}}$ 也不會進位，因此可以確定 M 不管是 $\dfrac{2 ^ {64}}{d}$ 或 $\dfrac{2 ^ {64} - 1}{d} + 1$，出來的結果都是一樣的。 --- 後來想到算 M 時就應該考慮取整數問題了，所以這裡先分兩種情況探討 : 1. $\dfrac{2^{64}}{d}$ 是整數(這裡假設是 K )，這樣求出的 M 會是 $\dfrac{2 ^ {64} - 1}{d} + 1 = (K - 1) + 1 = K$ 2. $\dfrac{2^{64}}{d}$ 不是整數(假設是 K.x )，這樣求出的 M 變成 K + 1，比原本結果 K 大了 1 第一種情況不影響 quotient 結果，而第二種情況會讓 quotient 變成 $\dfrac{n}{d} + \dfrac{n}{2^{64}}$，但根據我上述的推論，最後出來的結果依然是一樣的。 ### 延伸問題 1. 解釋上述程式的原理這題其實就是把敘述中的概念應用出來而已，其中 quotient 是 $\dfrac{n}{D}$ 的商，所以回傳的餘數為 n (被除數) - quotient (商) * D (除數)。 2. 由 Facebook 公司所維護的 jemalloc 是個高效能的記憶體配置器 (memory allocator，即 malloc 和 free 函式對應的實作)，特別在多核多執行緒的環境有優異表現，在其原始程式碼 include/jemalloc/internal/div.h 和 src/div.c 也運用類似的技巧，請閱讀原始程式碼，並抽離快速除法實作為獨立的程式碼，說明運作原理，也該撰寫測試程式，比較透過處理器的整數除法指令及本技巧的效能差距; 此程式原理跟這題類似，先假設一組 n, d，且 d 可以整除 n。 $\left \lfloor \left \lceil \dfrac{2^k}{d} \right \rceil \times \dfrac{n}{2^k} \right \rfloor = \left \lfloor \dfrac{2^k + r}{d} \times \dfrac{n}{2^k} \right \rfloor , r = d - 2^k mod \space d$ $= \left \lfloor \dfrac{2^k}{d} \times \dfrac{n}{2^k} + \dfrac{r}{d} \times \dfrac{n}{2^k} \right \rfloor = \left \lfloor \dfrac{n}{d} + \dfrac{r}{d} \times \dfrac{n}{2^k}\right \rfloor$ $= \dfrac{n}{d} + \left \lfloor \dfrac{r}{d} \times \dfrac{n}{2^k}\right \rfloor$ (因為 d 整除 n) 接著只要確保 $\dfrac{r}{d} \times \dfrac{n}{2^k} < 1$ 就能用一開始的算式來求 $\dfrac{n}{d}$ 首先 $r < d$ 所以 $\dfrac{r}{d} < 1$，而把 k 設為 32 的話，$\dfrac{n}{2^k} < 1$，因為 $n <= 2^{32} - 1$ ```c= #ifndef JEMALLOC_INTERNAL_DIV_H #define JEMALLOC_INTERNAL_DIV_H #include "assert.h" typedef struct div_info_s div_info_t; struct div_info_s { uint32_t magic; }; void div_init(div_info_t *div_info, size_t divisor); static inline size_t div_compute(div_info_t *div_info, size_t n) { assert(n <= (uint32_t)-1); size_t i = ((uint64_t)n * (uint64_t)div_info->magic) >> 32; return i; } #endif /* JEMALLOC_INTERNAL_DIV_H */ void div_init(div_info_t *div_info, size_t d) { assert(d != 0); assert(d != 1); uint64_t two_to_k = ((uint64_t)1 << 32); uint32_t magic = (uint32_t)(two_to_k / d); if (two_to_k % d != 0) { magic++; } div_info->magic = magic; ``` JEMALLOC_DEBUG 只是用來確保計算結果正確，但不影響程式邏輯，這裡就沒放入了。 div_init 部分先確定 d 不等於 0 或 1，確保不會讓除數為零或是 magic 超過上限 ($2^{32} - 1$)。接著讓 two_to_k = $2^{32}$，magic = $\left \lfloor \dfrac{2^{32}}{d} \right \rfloor$，而因為要取的是 ceiling，所以若 d 不整除 $2^{32}$，magic 要再加一。 div_compute 的部分則是先確認 $n <= 2^{32} - 1$ (在 uint32_t 中，-1 代表 0xFFFFFFFF)，然後計算 $\left \lfloor magic \times \dfrac{n}{2^{32}} \right \rfloor$ 得到最後的結果。 ## 測驗 4 ```c= bool divisible(uint32_t n) { return n * M <= YYY; } ``` 這題的 D, M 沿用上一題，所以我們可以知道 D = 3, M = 0x5555555555555556。而 n 可以分成 3k, 3k + 1, 3k + 2 (k 為非負整數)。 if n = 3k, n * M = 3k * (0x5555555555555556) = (0xFFFFFFFFFFFFFFFF) * k + 3k = 2k (因為 uint64_t 超過 0xFFFFFFFFFFFFFFFF 會變回 0) (max = 0xAAAAAAAA) if n = 3k + 1, n * M = 2k + M (min = M) if n = 3k + 2, n * M = 2k + 2M (min = 2M) 如此一來， a, b ,e 都不是答案，而 c(M - 1), d(M >> 1) 都可以是答案。(雖然答案給 c) ## 測驗 5 ```c= #include <ctype.h> #include <stddef.h> #include <stdint.h> #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) /* vectorized implementation for in-place strlower */ void strlower_vector(char *s, size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *) s; if ((*chunk & PACKED_BYTE(VV1)) == 0) { /* is ASCII? */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + VV2); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + VV3); uint64_t mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2; *chunk ^= mask; } else strlower(s, 8); } k = n % 8; if (k) strlower(s, k); } ``` PACKED_BYTE(b) 取出 b 最後兩位數並將其變成八倍長度，也就是若 b 最後兩位是 87, PACKED_BYTE(b) = 0x8787878787878787。可以看到它一次從 s 裡取出八個字元，如果其中有不為 ASCII 的字元(`if ((*chunk & PACKED_BYTE(VV1)) == 0) { /* is ASCII? */`)，就會利用 strlower 來做逐一字元判斷以及轉換。而取不到八個字元時，也會用 strlower。因為第 13 行在判斷是不是有效 ASCII 字元，根據測驗 1 的結果，VV1 就是 0x80。 14, 15 行的 A, Z 是用來找出大寫字母的。當任一英文字母 ('A'~'Z') + (128 - 'A' + VV2)，都會 >= 128，也就是首個 bit 為 1。而當('A'~'Z') + (128 - 'Z' + VV3) 時會 < 128，首個 bit 為 0。而從上面的式子可以看出，為了精準找出大寫字母，VV2 = 0 (如果為 1，雖不影響結果，但會把 '@' 一同算進來，為 -1，則會在'A'出錯)，VV3 = -1 (如果為 0 或 1，會在 'Z' 或 'Y', 'Z' 出錯)。 16 行的 mask 利用技巧來得到 0x32(00100000)，當某字元是大寫字母， A ^ Z 對應的位置會是 (1XXXXXXX)，與 0x80(10000000) 做 and 變成 (10000000)，再右移兩次就變 (00100000) 了。所以 VV4 = 0x80。最後再用`*chunk ^= mask;`來對大寫字母做轉換。(詳見[你所不知道的 C 語言：數值系統篇](https://hackmd.io/@sysprog/c-numerics)) ``` ('a' ^ ' ') // 得到 'A' ('A' ^ ' ') // 得到 'a' ``` ### 延伸問題 1. 將前述測試程式 main 函式的 char str[] 更換為 char *str，會發生什麼事？請參閱 C 語言規格書，說明 string literal 的行為 ```c= #include <stdio.h> #include <string.h> int main() { /* quote from https://www.gutenberg.org/files/74/74-h/74-h.htm */ char str[] = "This eBook is for the use of anyone anywhere at no cost and with \ almost no restrictions whatsoever. You may copy it, give it away or \ re-use it under the terms of the Project Gutenberg License included \ with this eBook or online at www.gutenberg.net"; int n = strlen(str); strlower_vector(str, n); puts(str); } ``` 會出現 ``` Segmentation fault (core dumped) ``` 根據 [ISO/IEC 9899:1999](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) p.130 中一段 ``` char s[] = "abc", t[3] = "abc"; defines ‘‘plain’’ char array objects s and t whose elements are initialized with character string literals. This declaration is identical to char s[] = { 'a', 'b', 'c', '\0' }, t[] = { 'a', 'b', 'c' }; The contents of the arrays are modifiable. On the other hand, the declaration char *p = "abc"; defines p with type ‘‘pointer to char’’ and initializes it to point to an object with type ‘‘array of char’’ with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined. ``` 中的這句 > If an attempt is made to use p to modify the contents of the array, the behavior is undefined. 在 `strlower` 這個 function 中就有對陣列內容做修改，也因此出現 undefined behavior。 ## 測驗 6 ```c= int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { lower ^= nums[i]; lower &= KKK; higher ^= nums[i]; higher &= JJJ; } return lower; } ``` 這題的 nums 裡的元素除了一個以外，其餘都會重複三次。所以每個 bit 以三個狀態來表示，分別是00 (出現 0, 3 次)，01 (出現 1 次)，10 (出現 2 次)。 ```graphviz digraph { rankdir = LR 00 -> 01 [label = "input = 1"] 00 -> 00 [label = "input = 0"] 01 -> 10 [label = "input = 1"] 01 -> 01 [label = "input = 0"] 10 -> 00 [label = "input = 1"] 10 -> 10 [label = "input = 0"] } ``` 畫成真值表如下 : | higher | lower | input | higher_out | lower_out | | ------ | ----- | ----- | ---------- | --------- | | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | | 0 | 0 | 1 | 0 | 1 | | 0 | 1 | 1 | 1 | 0 | | 1 | 0 | 1 | 0 | 0 | $lower = higher' \times lower \times input' + higher' \times lower' \times input$ $= higher' \times (lower \times input' + lower' \times input)$ $= higher' \times (lower \oplus input)$ 而因為 higher 運算時 lower 已經變成 lower_out 了，所以真值表為 : | higher | lower_out | input | higher_out | | ------ | --------- | ----- | ---------- | | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | | 1 | 0 | 0 | 1 | | 0 | 1 | 1 | 0 | | 0 | 0 | 1 | 1 | | 1 | 0 | 1 | 0 | $higher = higher \times lower' \times input' + higher' \times lower' \times input$ $= lower' \times (higher \times input' + higher' \times input)$ $= lower' \times (higher \oplus input)$ 因此 KKK = ~higher, JJJ = ~lower。 ### 延伸問題 1. 請撰寫不同上述的程式碼，並確保在 LeetCode 137. Single Number II 執行結果正確且不超時; ```c= int singleNumber(int* nums, int numsSize){ int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { int tmp = lower; lower = ~higher & (lower ^ nums[i]); higher = (higher & ~nums[i] & ~tmp) | (~higher & nums[i] & tmp); } return lower; } ``` <s>跟原題目有 87% 像</s> :::danger 你如何估算出 0.87 這數值呢？工程人員說話要精準。 :notes: jserv ::: ，只是把 lower 先存在 tmp 中，這樣 higher 在計算時就可以代入未變動的 lower 值。 ![](https://i.imgur.com/0gP57Nh.png) 2. 延伸題意，將重複 3 次改為重複 4 次或 5 次，並找出可容易推廣到多次的程式碼; 重複四次以及其他偶數次都可以用以下程式碼，因為只有出現奇數次，偶數次兩種狀態。 ```c= int singleNumber(int* nums, int numsSize){ int lower = 0; for(int i = 0; i < numsSize; i ++) { lower ^= nums[i]; } return lower; } ``` 重複五次則相較重複三次多了兩個狀態，但基本概念一樣。 ```graphviz digraph { rankdir = LR 000 -> 000 [label = "input = 0"] 000 -> 001 [label = "input = 1"] 001 -> 001 [label = "input = 0"] 001 -> 010 [label = "input = 1"] 010 -> 010 [label = "input = 0"] 010 -> 011 [label = "input = 1"] 011 -> 011 [label = "input = 0"] 011 -> 100 [label = "input = 1"] 100 -> 100 [label = "input = 0"] 100 -> 000 [label = "input = 1"] } ``` 真值表 : | left | middle | right | input | left_out | middle_out | right_out | | ---- | ------ | ----- | ----- | -------- | ---------- | --------- | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 0 | 0 | 0 | 1 | | 0 | 1 | 0 | 0 | 0 | 1 | 0 | | 0 | 1 | 1 | 0 | 0 | 1 | 1 | | 1 | 0 | 0 | 0 | 1 | 0 | 0 | | 0 | 0 | 0 | 1 | 0 | 0 | 1 | | 0 | 0 | 1 | 1 | 0 | 1 | 0 | | 0 | 1 | 0 | 1 | 0 | 1 | 1 | | 0 | 1 | 1 | 1 | 1 | 0 | 0 | | 1 | 0 | 0 | 1 | 0 | 0 | 0 | $right = ... = left' \times (input \oplus right)$ 由於我程式碼變動的順序為 right -> middle -> left，所以跟前面一樣，在計算 middle 時代入right_out，而計算 left 時代入 middle_out, right_out。 $middle = ... = left' \times (right' \times (middle \oplus input) + (middle \times right))$ $left = ... = middle' \times right' \times (left \oplus input)$ ```c= int singleNumber(int *nums, int numsSize) { int left = 0, middle = 0, right = 0; for (int i = 0; i < numsSize; i++) { right = ~left & (nums[i] ^ right); middle = ~left & ((~right & (middle ^ nums[i])) | (middle & right)); left = ~middle & ~right & (left ^ nums[i]); } return right; } ``` 但利用這種方法的壞處是如果要重複更多奇數次的話，就要每次都算真值表以及邏輯表示式(應該是取最小質因數才對，也就是重複九次可以沿用重複三次的函式，但遇到質數就沒辦法了)，好處是運作效率很高，只有一個迴圈而且都在做位元操作。如果要簡單推廣到多次的方法可以參考 [RinHizakura](https://hackmd.io/@RinHizakura/SJ5NuIANP#%E4%B8%8D%E5%90%8C%E7%9A%84%E8%A7%A3%E9%A1%8C%E6%80%9D%E8%B7%AF) 同學的方式，計算每個位元 "1" 出現的次數並取餘數，最後只有出現一次 "1" 的位元被保留。但因為上述方法需要對每個位元做運算，以及使用 % 運算，所以效率會稍微低一些，單看使用者如何取捨。