2023q1 Homework2 (quiz2)

# 2023q1 Homework2 (quiz2) contributed by <[jhin1228](https://github.com/)> ## 測驗一 ## 測驗二根據 LeetCode [1680. Concatenation of Consecutive Binary Numbers](https://leetcode.com/problems/concatenation-of-consecutive-binary-numbers/) 給定一整數 $n$，回傳將 $1$ 到 $n$ 的二進位表示法依序串接在一起所得到的二進位字串，其所代表的十進位數字 $mod\ 10^9 + 7$ 之值。以輸入值 n = 12 為例: > 串接結果: 1101110010111011110001001101010111100 > 該十進位值: 118505380540 > $mod\ 10^9 + 7$ 後，結果為 505379714 以下是可能的實作: ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { /* removing the rightmost set bit * e.g. 100100 -> 100000 * 000001 -> 000000 * 000000 -> 000000 * after removal, if it is 0, then it means it is power of 2 * as all power of 2 only contains 1 set bit * if it is power of 2, we increase the bit length */ if (!(i & (i-1))) len++; ans = (i | (ans << len)) % M; } return ans; } ``` 在串接各數字時，需要判定何時需要增加 shift 的位移量，譬如 : > * `要串接 3 = 0b11 到 110 時，只需要將 110 向左 shift 2 位元再和 0b11 作 or 運算即可，變成 11011。` > * `然而如果要將 4 = 0b100 串接到 11011，會發現無法只把 11011 向左移 2 位元，因為在二進位中，只要數字為 2 的冪，leading 1 就會向左增加 1，所以這時 shift 位移量需加 1 並作 or 運算得到 11011100。` 所以，利用 `x & (x - 1) == 0` 檢查 `x` 是否為 2 的冪，是則增長 `len` 的長度。 > * `x & (x - 1) 將最右邊的 1 變成 0，其他位元保持不變。` > * `利用 x & (x - 1) == 0 判斷 x 是否為 2 的冪時在 x = 0 時會出現問題，須修正為 x && !(x & (x - 1))`。 ### 嘗試使用 [`__builtin_{clz,ctz,ffs}`](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 改寫 > * [Counting leading zeros (clz)](https://en.wikipedia.org/wiki/Find_first_set) : 用來計算 2 進位數從 MSB 往右遇到的第一個 1 之前的所有 0 數量。 > * [Counting trailing zeros (ctz)](https://en.wikipedia.org/wiki/Find_first_set) : 用來計算 2 進位數從 LSB 往左遇到的第一個 1 之前的所有 0 數量。 > * [Find first set (ffs)](__builtin_) : 用來計算 2 進位數中從右往左數來第一個 1 出現在第幾個位置。 ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; long ans = 0; for (int i = 1; i <= n; i++) { len = 32 - __builtin_clz(i); ans = (i | ans << len) % M; } return ans; } ``` 作業環境中存取一個 `int` 型別需 4 bytes，亦即以 32 個位元構成一資料型態為 `int` 的變數。`32 - __builtin_clz(i)` 可以視為整數 `i` 從 MSB 開始往右第一個出現的 1 在 LSB 開始往左的第幾個位置。 ``` 32 - __builtin_clz(19) = 5 19 = 0b0000 0000 0000 0000 0000 0000 0001 0011 ^ ---> 這裡 ``` 除了上述方法，可利用在 `i` 為 2 的冪時左移 `__builtin_clz(i) + 1` 會只剩一個 1 位元的性質。 ```c for(int i = 1; i <= n; i++) { if(!(i << (__builtin_clz(i) + 1))) len++; ... } ``` 然而，在 `i = 1` 時會左移 32 位元，由於 `i` 資料型態為 `int` (在此 `sizeof(int) == 4`)，這樣的行為在 [C standard](https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf) 中是未定義，換言之，只有左移 `0~31` 是被 well-defined。 > 6.5.7 Bitwise shift operators 第三段提及: > * The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined. 嘗試修改成如下: ```c for(int i = 1; i <= n; i++) { if(!((i << __builtin_clz(i)) << 1)) ... } ``` 在 LeetCode 執行卻發生 `runtime error: left shift of 1 by 31 places cannot be represented in type 'int'`，回頭查閱[規格書](https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf): > 6.5.7 Bitwise shift operators 第四段提及: > * The result of $E1 << E2$ is $E1$ left-shifted $E2$ bit positions; vacated bits are filled with zeros. If $E1$ has an unsigned type, the value of the result is $E1 × 2^{E2}$, reduced modulo one more than the maximum value representable in the result type. If $E1$ has a signed type and nonnegative value, and **$E1 × 2^{E2}$ is representable in the result type**, then that is the resulting value; **otherwise, the behavior is undefined**. 因為 1 是 **singed** integer，`1 << 31` 相當於 $2^{31}$，顯然超出 `int` 的表達範圍($[-2^{31},\ 2^{31} - 1]$)，可將 `i` 轉換成 **unsigned** integer: ```c for(unsigned int i = 1; i <= n; i++) { if(!((i << __builtin_clz(i)) << 1)) ... } ``` 在 `$ gcc (Debian 10.2.1-6) 10.2.1 ` 環境下測試: ```c int main() { int inputl scanf("%d", &input); printf("Print 1 << 31 outcome: %d\n", 1 << 31); printf("Print 1 << 32 outcome: %d\n", 1 << 32); printf("Here is the input outcome: %d\n", 1 << input); } ``` 可發現 `gcc` 只警告 `1 << 32` 的部分: ``` $ gcc test2.c -o test2 test2.c: In function ‘main’: test2.c:9:42: warning: left shift count >= width of type [-Wshift-count-overflow] 9 | printf("Print 1 << 32 outcome: %d\n", 1 << 32); | ``` 實際執行時會發現 `1 << 32` 和 `1 << input` 輸出結果不同，可知 `1 << 32` 在 `gcc` 中是未定義。 ``` $ ./test2 32 Print 1 << 31 outcome: -2147483648 Print 1 << 32 outcome: 0 Here is output: 1 ``` 至於為何 `gcc` 沒有針對 `1 << 31` 跳出警告，查閱 [gcc - 4.5 Integers](https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Integers-implementation.html#Integers-implementation) 可發現: > As an extension to the C language, GCC does not use the latitude given in C99 and C11 only to treat certain aspects of signed ‘<<’ as undefined. However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such cases. They are also diagnosed where constant expressions are required. 先實驗 `gcc` 針對 `1 << 31` 的行為處理: ```c int main() { int input; printf("%d\n", 1 << 31); scanf("%d", &input); printf("%d\n", 1 << input); } ``` 執行結果如下: ``` $ ./test1.c -2147483648 31 -2147483648 ``` 結論是前處理器計算的值和實際執行計算的值相同。 > 在編譯時加上 `- fsanitize=shift` 可在執行時檢查出: > `runtime error: left shift of 1 by 31 places cannot be represented in type 'int'` 接著可以從組合語言的角度來檢視: ```c int main() { int input; scanf("%d", &input); input = 1 << input; } ``` 從組合語言的觀點來看 `gcc` 如何處理 `1 << 31`，以下是擷取上方程式碼編譯後和 `1 << 31` 相關的組合語言部分: ``` 0x0000000000001158 <+35>: mov $0x1,%edx 0x000000000000115d <+40>: mov %eax,%ecx 0x000000000000115f <+42>: shl %cl,%edx 0x0000000000001161 <+44>: mov %edx,%eax ``` 根據[官方](https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-1-manual.html)針對 `shl` 的敘述，預期是以 `(int)(((unsigned)1) << n)` 的方式處理。回到利用`__builtin_{clz,ctz,ffs}` 改寫的部分，亦可改寫成如下: ```c for (int i = 1; i <= n; i++) { if(!(i >> (__builtin_ctz(i) + 1))) // or if(!(i >> __builtin_ffs(i))) len++; ... } ``` ### 改進 $mod\ 10^9+7$ 的運算 #### 探討編譯器如何優化 `mod` 運算進行 `mod` 運算相當耗費 CPU 資源，在編譯器無優化下，可從以下程式碼的組合語言發現除法 (`divl`) 指令: ```c #include <stdio.h> #include <stdint.h> int main() { const uint32_t M = 1e9 + 7; int a; scanf("%d", &a); printf("%d", a % M); } ``` ``` 0x000000000000116c <+39>: mov -0x8(%rbp),%eax 0x000000000000116f <+42>: mov $0x0,%edx 0x0000000000001174 <+47>: divl -0x4(%rbp) 0x0000000000001177 <+50>: mov %edx,%eax 0x0000000000001179 <+52>: mov %eax,%esi 0x000000000000117b <+54>: lea 0xe82(%rip),%rdi ``` 此時搭配 `gcc` 參數 `-O1` 觀察優化過程: ``` 0x000000000000115f <+26>: mov 0xc(%rsp),%esi 0x0000000000001163 <+30>: mov %esi,%eax 0x0000000000001165 <+32>: imul $0x12e0be63,%rax,%rax 0x000000000000116c <+39>: shr $0x20,%rax 0x0000000000001170 <+43>: mov %rax,%rdx 0x0000000000001173 <+46>: mov %esi,%eax 0x0000000000001175 <+48>: sub %edx,%eax 0x0000000000001177 <+50>: shr %eax 0x0000000000001179 <+52>: add %edx,%eax 0x000000000000117b <+54>: shr $0x1d,%eax 0x000000000000117e <+57>: imul $0x3b9aca07,%eax,%eax 0x0000000000001184 <+63>: sub %eax,%esi 0x0000000000001186 <+65>: lea 0xe77(%rip),%rdi ``` 先講述背後原理: $n\ mod\ M = n - \left\lfloor \frac{n}{M}\right\rfloor M =\left\lfloor \frac{n}{2^N} *\frac{2^N}{M}\right\rfloor = \left\lfloor n*\frac{2^N}{M}*\frac{1}{2^N}\right\rfloor\ (1)$ 這裡有個特別的數: $m_{exact} = \frac{2^N}{M}$，稱之為 magic number，神奇的地方在於透過此數可將除法替換成乘法運算。但在這 $M$ 不為 $2^N$ 的因數，在有限的位元數下做浮點數乘法是件麻煩事，所以希望 $m_{exact}$ 是整數。 Case1: 令$\ m = \left\lfloor m_{exaxt}\right\rfloor$, 假設 $n = M$， $\left\lfloor \frac{n}{M} \right\rfloor=\left\lfloor \frac{n}{n} \right\rfloor=1=\left\lfloor m_{exact}*\frac{n}{2^N}\right\rfloor > \left\lfloor m*\frac{n}{2^N}\right\rfloor$ $\Rightarrow 1 > \left\lfloor m*\frac{n}{2^N}\right\rfloor = 1$ (矛盾) Case2: 令$\ m = \left\lceil m_{exact}\right\rceil$, $e = m - m_{exact}$ 則 $m = \left\lceil \frac{2^N}{M}\right\rceil = \frac{2^N+e}{M},\ 0 < e < M$ ($e = M-2^N mod\ M$) $\left\lfloor \frac{n}{M}\right\rfloor = \left\lfloor m*\frac{n}{2^N}\right\rfloor = \left\lfloor\frac{2^N+e}{M}*\frac{n}{2^N}\right\rfloor = \left\lfloor \frac{n}{M} + \frac{e}{M}*\frac{n}{2^N}\right\rfloor$，其中的 $\frac{e}{M}*\frac{n}{2^N}$ 是誤差值，可藉 $N$ 來改善。我們當然可以取足夠大的 $N$ 來降低誤差值，但此舉會造成 $m = \left\lceil \frac{2^N}{M}\right\rceil$ 值過大而無法存取於有限位於的暫存器中，導致需要額外處理並影響效能。 > 目標: 取最小的 $N$ 同時能有效減少誤差值的影響。所以，要確保 $\frac{n}{M}$ 多出來的小數部分加上 $\frac{e}{M}*\frac{n}{2^n}$ 小於 1，可以列成: $\frac{M-1}{M}(as \ n\in\mathbb{Z})+\frac{e}{M}*\frac{n}{2^N} < 1$ $\Rightarrow \frac{e}{M}*\frac{n}{2^N} < \frac{1}{M}$ $\because \frac{e}{M} < 1$ $\therefore \frac{n}{2^N} < \frac{1}{M}$ 在這由於 `n` 的資料型態為 `int`，假設 `n` 為 32 位元的整數: $\Rightarrow n < 2^{32}$ 當 $N = 32$，$\frac{n}{2^N} < 1$ 當 $N > 32+log_2M$，$\frac{n}{2^N} < \frac{1}{M}$ 所以在 $N = 32+\left\lfloor log_2M\right\rfloor$ 時，$(1)$ 式會成立。 :::success $\left\lfloor \frac{n}{M}\right\rfloor = \left\lfloor n*\left\lceil \frac{2^N}{M}\right\rceil*\frac{1}{2^N}\right\rfloor,\ as\ N = 32+\left\lfloor log_2M\right\rfloor$ ::: 接著思考取這樣的 `N` 值會不會導致 `m` 太大? 以下找尋 $m$ 值範圍: $\Rightarrow m=\left\lceil \frac{2^N}{M}\right\rceil = \left\lceil \frac{2^{32+\left\lceil log_2M \right\rceil}}{M}\right\rceil$ $\because M \leq 2^{\left\lceil log_2M\right\rceil}$ $\therefore m \geq \left\lceil \frac{2^{32+\left\lceil log_2M\right\rceil}}{2^{\left\lceil log_2M\right\rceil}}\right\rceil= \left\lceil 2^{32}\right\rceil = 2^{32}$ 另外， $\because M \geq 2^{\left\lfloor log_2M\right\rfloor}$ $\therefore m \leq \left\lceil \frac{2^{32+\left\lceil log_2M\right\rceil}}{2^{\left\lfloor log_2M\right\rfloor}}\right\rceil = \left\lceil 2^{32+\left\lceil log_2M \right\rceil*\left\lfloor log_2M \right\rfloor}\right\rceil = \left\lceil 2^{33}\right\rceil = 2^{33}$ :::success $\therefore 2^{32} \leq m \leq 2^{33}，as\ N = 32 + \left\lceil log_2M\right\rceil$ ::: 在計算 $\left\lfloor m*n*\frac{1}{2^N}\right\rfloor$ 時，由於 $m$ 可能是一個 33 位元的數，$m*n$ 需要特別留意 ($m*n$ 可能為 65 位元的數)。可以利用加法分配律來改進，針對 $n*m$: $\Rightarrow n*m = n(m-2^{32}+2^{32}) = n(m-2^{32}) + n2^{32}$ (可看成是將第 33 位元和其他位元分開處理) $\Rightarrow \left\lfloor \frac{n}{M}\right\rfloor = \left\lfloor m*\frac{n}{2^N}\right\rfloor = \left\lfloor \frac{n(m-2^{32})+n2^{32}}{2^N}\right\rfloor$ 令 $m_1 = m - 2^{32}$ (在這 $m_1$ 是 $m$ 扣除第 33 位元) 令 $\ q = (n*m_1) >> 32$ $\Rightarrow\left\lfloor \frac{n(m-2^{32})+n2^{32}}{2^N}\right\rfloor = \left\lfloor \frac{q+n}{2^{N-32}}\right\rfloor$ 其中 $n+q$ 仍可能產生溢位 ($n+q \leq 2^{33}-1$)，需再作進一步處理: $\left\lfloor \frac{q+n}{2^{N-32}}\right\rfloor = \left\lfloor \frac{n-q+2q}{2}/2^{N-33}\right\rfloor = \left\lfloor (\left\lfloor \frac{n-q}{2}\right\rfloor+q)/2^{N-33}\right\rfloor \equiv \left\lfloor t/2^{N-33}\right\rfloor,\ where\ t = \left\lfloor \frac{n-q}{2}\right\rfloor + q$ $\Rightarrow n - q = \frac{n(2^{33}-m)}{2^{32}}$ 從以下不等式可知: \begin{cases} 0\leq(n-q)\leq n,\ as\ n \geq0 \\ n\leq(n-q)\leq 0,\ as\ n < 0 \end{cases} $\Rightarrow\ n - q$ 不會造成 integer underflow $\because \left\lfloor \frac{n-q}{2} \right\rfloor+q \leq (\frac{n-q}{2}) + q = \frac{n+q}{2} \leq n$ $\therefore\ t$ 也不會造成溢位結論 (計算 $\left\lfloor \frac{n}{M}\right\rfloor$): > * 預先計算: $p = \left\lceil log_2M\right\rceil$ > * 預先計算: $m = \left\lceil \frac{2^N}{M}\right\rceil = \left\lceil \frac{2^{32+p}}{M}\right\rceil$ (注意: 此例實際在編譯器上組合語言顯示的 magic number 是 $m_1$，不是 $m$。) > * 計算: $q = (n*m_1) >> 32,\ m_1 = m-2^{32}$ > * 計算: $t = [(n-q) >> 1] + q$ > * 計算: $t = t >> (N-32-1)$ 接著搭配[此處組合語言](#%E6%94%B9%E9%80%B2-mod-1097-%E7%9A%84%E9%81%8B%E7%AE%97:~:text=%E6%AD%A4%E6%99%82%E6%90%AD%E9%85%8D%20gcc%20%E5%8F%83%E6%95%B8%20%2DO1%20%E8%A7%80%E5%AF%9F%E5%84%AA%E5%8C%96%E9%81%8E%E7%A8%8B)驗證上述演算法: > * $p = 30$ > * `$eax` 存取輸入值 `a` > * `0x12e0be63` 就是 $m_1$，並將 `q` 值存放至 `$rax` 和 `$rdx` ($m = \left\lceil \frac{2^{62}}{M}\right\rceil$, $m_1$ 等於 $m$ 去除第 33 位元) > * 接著 `$eax` 值為 `n`，在 `<+50>:` 時 `$eax` 值為 $(n-q) >> 1$ > * 然後 `<+52>: add %edx,%eax` 等同計算 `t` > * 更新 $t = t >> (62-32-1)$。至此已計算出 $\left\lfloor \frac{n}{M}\right\rfloor$ 並存至 `$eax` > 最後，$n\ mod \ M$ 結果存入 `$esi` ## 測驗三 ## 測驗四以下程式碼可判定 16 位元無號整數是否符合特定樣式 (pattern): ```c #include <stdint.h> #include <stdbool.h> bool is_pattern(uint16_t x) { if(!x) return x; for(; x > 0; x <<= 1) { if(!(x & 0x8000)) return false; } return true; } ``` 符合上述程式碼樣式如下: ``` pattern: 8000 (32768) pattern: c000 (49152) pattern: e000 (57344) pattern: f000 (61440) pattern: f800 (63488) pattern: fc00 (64512) pattern: fe00 (65024) pattern: ff00 (65280) pattern: ff80 (65408) pattern: ffc0 (65472) pattern: ffe0 (65504) pattern: fff0 (65520) pattern: fff8 (65528) pattern: fffc (65532) pattern: fffe (65534) pattern: ffff (65535) ``` 觀察上述代碼和程式碼樣式可得知其作用為確認 MSB 是否為 1 且從 MSB 開始到最低位的 1 間是否存在 0 ，若存在便會回傳 `false`。上述代碼在輸入值越大時，由於位元長度遞增而造成迴圈執行次數的增加，故可利用下列程式碼改進: ```c bool is_pattern(uint16_t x) { const uint16_t n = -x; return (n ^ x) < x; } ``` 首先，由於特定樣式的性質，取其二補數時可保證只有一個 1 且是最右邊的 1，如下方例子: ``` x = 0b11100000 ^ ---> 最右邊的 1 位元 -x = 0b00100000 ^ ---> 只剩下此位元 ``` 接著，`n ^ x` 可去除原數最右邊的 1，保證運算後的結果必定小於原數。 ``` x = 0b11100000 x ^ (-x) = 0b11000000 ^ ---> 最右邊的 1 被去除 ``` 若輸入值 `x` 不滿足該樣式，也就是從 MSB 開始到最低位的 1 間存在至少一個 0，取二補數後夾在中間的 0 會變成 1，即使後續作 XOR 運算也只能去除最右邊的 1，這些在中間的 1 依然存在，導致結果比原輸入值大，如下方例子: ``` x = 0b11100100 -x = 0b00011100 x ^ (-x) = 0b11111000 ||^ ---> 最右邊的 1 被去除 || 夾在中間的 0 最後會變成 1 ```