2020q3 Homework2 (quiz2)

# 2020q3 Homework2 (quiz2) contributed by < `sammer1107` > ###### tags: `進階電腦系統理論與實作` > [測驗題](https://hackmd.io/@sysprog/2020-quiz2) # 測驗 1 ```cpp #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & MMM) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` + 這題我們要檢查每個 byte 是否為 extended ASCII，對應的數字範圍為 128~255，也就是第 8 個 bit 一定要是 1 + 若我們要檢查一個 byte 中第 8 個 bit 是否為 1 ，可以把這個 byte 跟二進位的 `10000000` 作 AND。對應到 16 進位的 `80` + 所以要一次檢查 8 個 byte ，就複製 8 次即可。只要有一個為 extended ascii，出來結果界不會為 0。 + 所以答案選 `(d) 0x8080808080808080` # 測驗 2 ```c= uint8_t hexchar2val(uint8_t in) { const uint8_t letter = in & MASK; const uint8_t shift = (letter >> AAA) | (letter >> BBB); return (in + shift) & 0xf; } ``` + `'0'`, `'1'`, `'2'`, …, `'9'` 對應到 `0x30`, `0x31`, `0x32`, … `0x39` + `'a'`, `'b'`, `'c'`, …, `'f'` (小寫) 對應到 `0x61`, `0x62`, `0x63`, …, `0x66` + `'A'`, `'B'`, `'C'`, …, `'F'` 對應到 `0x41`, `0x42`, `0x43`, …, `0x46` 1. 首先我根據名字 `letter` 推測，第 3 行應該是要判斷輸入是否屬於字母 a~f。我注意到數字的16進位表示法中開頭都為 3 ，而小寫字母則是 6 ，大寫為 4。將 3、4、6 的二進位寫出來，如下: | 數字 | 二進位表示 | | -------- | -------- | | 3 | 0011 | | 4 | 0100 | | 6 | 0110 | 可以看出來第三個 bit 可用來區分是否為字母，故 **MASK** = `0b01000000` = `0x40` 2. 由於不清楚第 4 行的作用，我先看第五行。 + `& 0xf` 是只取最小的四個 bit 的意思，作用是確保回傳的值為 0~15。 + 如果 in 為數字的話，可以看出來單純取前四個 bit 就完成轉換了。 e.g. `'1' = 0x31`, `0x31 & f = 0x01 = 1` 故我推測只有當 letter 非 0 時，shift 才為非 0 3. 回到第 4 行，我現在的任務就是當 `letter` 不為 0，我要用他湊出能把字母轉成數字的方法。我繼續觀察目前有的資訊: + 當 in 為字母，`letter` 為 `0b01000000` + 不管小寫大寫，字母的編碼有此規律: | 字母 | lower 4 bit <br>(Decimal) | 對應數值 | | --- | --- | --- | | a | 1 | 10 | | b | 2 | 11 | | c | 3 | 12 | | d | 4 | 13 | | e | 5 | 14 | | f | 6 | 15 | a 對應到前四個 bit 為 1，b 對應到 2，後面的規律一樣。所以要讓字母對應到他們的數值，其實就只要 +9 就好了，所以 shift 應該要是 9。 + 9 的二進位為 `00001001`，故可以用 letter 向右位移 3 和位移 6 來合成，所以選擇 **AAA**=3, **BBB**=6 # 測驗 3 ## jemalloc 由於我根本看不懂題目說明，所以決定先追查到 jemalloc 的原始碼，看看有何線索。以下為 `jemalloc/src/div.c` 的註解所說（數學這裡用英文比較順）: Suppose $n = qd$ for some integer $q$, we have \begin{align} \newcommand{\floor}[1]{\lfloor{#1}\rfloor} \newcommand{\ceil}[1]{\lceil{#1}\rceil} \newcommand{\naturals}[]{\mathbb{N}} &\floor{\ceil{\frac{2^k}{d}} \cdot n \cdot \frac{1}{2^k}} \\ =& \floor{\frac{2^k+r'}{d} \cdot n \cdot \frac{1}{2^k}} \\ =& \floor{\frac{n}{d}\ +\ \frac{r'}{d} \cdot \frac{n}{2^k} } \\ =& \frac{n}{d}\ +\ \floor{\frac{r'}{d} \cdot \frac{n}{2^k} } \\ \end{align} where $2^k+r' = q'd$ for some integer $r',q'$ and $r'<d$. Because $\frac{r'}{d} < 1$, if we select $k$ such that $\frac{n}{2^k} < 1$, we'll have $\frac{r'}{d} \cdot \frac{n}{2^k} < 1$ and \begin{align} \frac{n}{d}\ +\ \floor{\frac{r'}{d} \cdot \frac{n}{2^k} } = \frac{n}{d}\ . \end{align} 這告訴我們**只要今天 n 是被 d 整除且我們選擇的 k 夠大**，我們就可以用 $\ceil{\frac{2^k}{d}}$ 當作 magic number $M$。**要做除法時就做 $n \cdot M \cdot \frac{1}{2^k}$ 即可**，對應到 c 語言中，即是乘法與位移兩個動作。但要注意的是，他假設 $n$ 是被 $d$ 整除。但我們的情況不同，因為我們要算餘數， $n$ 當然不一定會被整除，因此我們需要修改一下他的推導。 ## Our Case According to division rule, we can say that $n = qd + r$ for some $q,r \in\naturals$ and $r < d$. Then, we have \begin{align} &\floor{\ceil{\frac{2^k}{d}} \cdot n \cdot \frac{1}{2^k}} \\ =& \floor{\frac{2^k+r'}{d} \cdot n \cdot \frac{1}{2^k}} \\ =& \floor{\frac{n}{d}\ +\ \frac{r'}{d} \cdot \frac{n}{2^k} } \\ =& \floor{q + \frac{r}{d}\ +\ \frac{r'}{d} \cdot \frac{n}{2^k} } \\ =&\ q\ +\ \floor{\frac{r}{d}+\frac{r'}{d} \cdot \frac{n}{2^k} }. \end{align} The first and second equality comes from the same reasoning in the jemalloc case. The third equality comes from the fact that $\frac{n}{d} = q+\frac{r}{d}$. The final equality stands because $q$ is an integer. The final line equals to $q$ only if $\frac{r}{d}+\frac{r'}{d} \cdot \frac{n}{2^k} < 1$. ## 回到測驗題 ```c= const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(XXX) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint32_t quickmod(uint32_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } ``` 我們可以從 shift 64 的動作看出來 $k=64$，再經過計算我們得到 $2^{64}\mod{3} = 1$ 和 $r'=2$。而 $M$ 應該要等於 $\ceil{\frac{2^{k}}{d}} = \floor{\frac{2^{k}}{d}} + 1 = \frac{2^{k}-1}{d} + 1$，等式成立因為 $2^{64}\mod{3} = 1$。**因此答案選擇 `0xFFFFFFFFFFFFFFFF` $=2^{64}-1$**。帶進上面的推導，我們的方法要成立必須要滿足 $\frac{r}{3} + \frac{2}{3}\cdot \frac{n}{2^{64}}$。由於 $r$ 在這裡最大是 2，而 $n$ 受型別的限制所以 $n < 2^{32}$，我們可以看出來 $\frac{r}{3} + \frac{2}{3}\cdot \frac{n}{2^{64}} < \frac{2}{3} + \frac{1}{3} = 1$，**所以演算法成立**:clap:。 ## 探討此方法的限制 **我們上面討論了演算法為何成立，那有沒有不成立的情況呢？** 我們令 $k$ 一樣是 $64$，則若我們選擇 $n \geq 2^{63}$，且這個 $n$ 對應到 $r=2$，則 \begin{align} &\frac{r}{3} + \frac{2}{3}\cdot \frac{n}{2^{64}}\\ = &\frac{2}{3} + \frac{2}{3}\cdot \frac{n}{2^{64}}\\ \geq &\frac{2}{3} + \frac{2}{3}\cdot \frac{1}{2}\\ \geq &\ 1 \end{align} 演算法就會崩潰。為了驗證我的理論，我修改了題目程式： ```cpp #include <stdio.h> #include <stdint.h> const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(0xFFFFFFFFFFFFFFFF) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint64_t quickmod(uint64_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } int main(void) { for (uint64_t i= ((uint64_t)1 << 63) - 10, ans=i%3; i < ((uint64_t)1 << 63)+10; ++i) { printf("i=%lu\n",i); uint64_t ret = quickmod(i); if (ans != ret) { printf("wrong ans for %lu, expect %lu but get %lu\n", i, ans, ret); } ans = (ans+1) % 3; } return 0; } ``` + 首先我改了 `quickmod` 使用的 type 為 `uint64_t`，來因應我要測試的 $n$（在 $2^{63}$上下） + 在 `main`，我從 $2^{63}-10$ 開始測試到 $2^{63}+10$ + `ans`為對應初始值的正解，可以算出之後的正解。 + 我期望在數值 $n$ 超過 $2^{63}$ 之後，會看到錯誤 **程式輸出** ``` i=9223372036854775798 i=9223372036854775799 i=9223372036854775800 i=9223372036854775801 i=9223372036854775802 i=9223372036854775803 i=9223372036854775804 i=9223372036854775805 i=9223372036854775806 i=9223372036854775807 i=9223372036854775808 wrong ans for 9223372036854775808, expect 2 but get 18446744073709551615 i=9223372036854775809 i=9223372036854775810 i=9223372036854775811 wrong ans for 9223372036854775811, expect 2 but get 18446744073709551615 i=9223372036854775812 i=9223372036854775813 i=9223372036854775814 wrong ans for 9223372036854775814, expect 2 but get 18446744073709551615 i=9223372036854775815 i=9223372036854775816 i=9223372036854775817 wrong ans for 9223372036854775817, expect 2 but get 18446744073709551615 ``` :::success :clap: Bingo! 我們看到在 $n=2^{63}$ 時，剛好 $r=2$，所以程式就出錯了。之後隨著 $n$ 愈來愈大，每次只要 $r=2$ 就會出錯，因此每 3 個 input 會有一個錯。 ::: + 其中錯誤的答案皆為 $18446744073709551615 = 2^{64}-1$，這是因為我們原本期望要得到 $q$，卻得到了 $q+1$，因此 `n - quotient * D` 原本的 $2$ 再多減了 $3$ 變成 $-1$，所以在無號數才會變成很大的數字。 **從這裡我們看到，隨著 $d$ 以及 $n$ 的範圍不同，選擇適當的 $k$ 是很重要的，否則演算法不一定會成立。** # 測驗 4 這題我嘗試沿用上面的推導但不得其門而入，所以我重新觀察解答為何成立： ```c= bool divisible(uint32_t n) { return n * M <= M-1; } ``` **什麼樣的整數乘以 $M$ 會小於 $M$ 呢？** 答案是不存在這樣的整數，要回答這題，必須把 overflow 考慮進來。由於 `M` 的 type 為 `uint64_t`，根據 **Integer Promotion** 的規則，所有的運算與比較會在 `uint64_t` 底下完成。這對應到數學中的 $\mod{2^{64}}$ 的操作。以下分成兩個情況討論： $$ \newcommand{\divs}{\mid} \newcommand{\ndivs}{\nmid} $$ 1. $d \divs n$.Let $n=3q$, then $$ \begin{align} n\cdot M &= 3q \cdot (\frac{2^{64}+2}{3})\\ &= 2^{64}q + 2q\\ &\equiv 2q \pmod{2^{64}}. \end{align} $$ Because $n<2^{32}$, we know $q<2^{32}$. Thus, $2q < 2^{33} < \frac{2^{64}}{4} < \frac{2^{64}+2}{4} < \frac{2^{64}+2}{3} = M$. 我們證明了當 $d \divs n$ ， $n\cdot M < M \pmod{2^{64}}$。 3. $d \ndivs n$. Let $n=3q+r$ where $1 < r < 3$, then $$ \begin{align} n\cdot M &= (3q+r) \cdot M\\ &\equiv 2q + r\cdot M \pmod{2^{64}}. \end{align} $$ We can see that if $2q + r\cdot M < 2^{64}$，then $n \cdot M \equiv 2q + r\cdot M \geq M$. It's equivalent to prove that $r\cdot M < 2^{64}-2^{33} < 2^{64}-2q$ since $q < 2^{32}$. So, $$ \begin{align} r \cdot M &\leq 2\cdot M \\ &= 2\cdot \frac{2^{64}+2}{3} \\ &< \frac{2}{3}(2^{64}) \\ &= 2^{64}(1-\frac{1}{3}) \\ &< 2^{64}(1-\frac{1}{2^{31}}) \\ &= 2^{64} - 2^{33} \end{align} $$ 我們證明了當 $d\ndivs n$，$n\cdot M \equiv 2q+r\cdot M \geq M$。我好像找不到方法直接推出解答，但根據上面證明 `(c) M - 1` 的確是正確答案。這裡我也用了小測試程式測試我的推導，但沒有測試所有數字： :::spoiler ```c= #include <stdio.h> #include <stdint.h> #include <stdbool.h> const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(0xFFFFFFFFFFFFFFFF) / (D) + 1)) bool divisible(uint32_t n) { return n * M <= M - 1; } int main(void) { for (uint32_t i=0; i<100; ++i) { uint32_t q = i / 3, r = i % 3; uint64_t result = i * M; if (r == 0){ if (result != (uint64_t)2*q) { printf("conjecture is wrong for i=%d=%dd+%d, result is %lu instead of %lu\n", i, q, r, result, (uint64_t)2*q); } else { printf("conjecture is correct for i=%d=%dd+%d, result is %lu\n", i, q, r, result); } } else { if (r != 0 && result != r*M+2*q) { printf("conjecture is wrong for i=%d=%dd+%d, result is %lu instead of %lu\n", i, q, r, result, r*M+2*q); } else { printf("conjecture is correct for i=%d=%dd+%d, result is %lu\n", i, q, r, result); } } } return 0; } ``` ::: # 測驗 5 ```cpp= #include <ctype.h> #include <stddef.h> #include <stdint.h> #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) /* vectorized implementation for in-place strlower */ void strlower_vector(char *s, size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *) s; if ((*chunk & PACKED_BYTE(VV1)) == 0) { /* is ASCII? */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + VV2); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + VV3); uint64_t mask = ((A ^ Z) & PACKED_BYTE(VV4)) >> 2; *chunk ^= mask; } else strlower(s, 8); } k = n % 8; if (k) strlower(s, k); } ``` + **PACKED_BYTE**：要了解這段程式碼，最重要的是先理解 `PACKED_BYTE` 的作用。可以看到 b 先被 `&0xff`，也就是取最低的一個 byte。再來 `* 0x0101010101010101u` 的作用相當於是**將此 byte 複製 8 次**。 + 12行: 這裡我們一次抓了 8 個 byte 過來處理。 + 13行: 這裡根據提示我們是要確認這 8 個字元是否都為 ASCII。所以與**測驗1**一樣，**VV1** 應該用 `0x80` 來做 mask 。 + 由 16~17 行判斷，`mask` 的作用應該是在字元是 `'A'-'Z'` 的情況下，對應的 byte 的第6個 bit 應該要為 1，其他皆為0，如此來做到轉小寫。<br> 所以可以判斷 `(A ^ Z) & PACKED_BYTE(VV4)` 的作用是在屬於 `'A'-'Z'` 的 byte 的第 8 個位元產生一個 1 。合理判斷 **VV4** 為 `0x80`，前提是只要當字母屬於 `'A'-'Z'` 的情況下， `A` 與 `Z` 的第 8 個 bit 不同即可。 + 若我們選擇 **VV2=0** 和 **VV3=-1**，我們可以觀察到下面的規律(不用考慮 wrap around 因為我們已經去掉 extended ASCII ) | 原始區間 | 經 `+128-'A'`<br>轉換後 | 經 `+128-'Z'-1`<br>轉換後 | |--|--|--| | `<'A'` | `<128` | `<128` | | `>'Z'` | `>128` | `>128` | | `'A'-'Z'` | `>128` | `<128` | 只有 `'A'-'Z'` 區間的字元會在 bit 8 不一樣。如此就可以達成區間的判別了。 # 測驗 6 > 受到 [`RinHizakura`](https://hackmd.io/@RinHizakura/SJ5NuIANP#%E6%B8%AC%E9%A9%97-4) 的啟發，以下是我嘗試自己整理一次邏輯。 ## 觀念 + 首先每個 bit 的動作是獨立的，不會互相影響。所以我們其實是在分別計算每個 bit 的出現次數。 + 我們可以利用 $\mod{3}$ 算術來紀錄每個 bit 的出現次數。一開始為 0，每出現一次就 $+1$，加到3的時候又歸零。根據題目，第 $n$ 個 bit 最後累計出現的次數一定是 $i_n + 3k \equiv i_n \pmod{3}$，其中 $i$ 為只出現過 1 次的那個數字，$i_n$ 代表 $i$ 的第 $n$ 個 bit。而因為其他數字出現次數一定為 3，但他們不一定會出現在第 $n$ 個 bit，因此我用 $3k$ 來代表有 $k$ 個數字用到這個 bit。 + 若我們用 `0 => 00`,`1 => 01`,`2 => 10` 3個狀態來實作 $\mod{3}$ 算術。左邊叫 higher，右邊叫他 lower。根據題目，最後每個位置的狀態要嘛是 `0=>00`，要嘛是 `1=>01`。而取所有的 lower 反應出來的就是我們要的 $i$ 了。 ## 回到測驗題 ```c= int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { lower ^= nums[i]; lower &= KKK; higher ^= nums[i]; higher &= JJJ; } return lower; } ``` + 假設我想要透過上面的程式碼來完成下表的狀態轉換： | $H$ | $L$ | $i$ | $H_o$ | $L_o$ | | --- | --- | --- | --- | --- | | 0 | 0 | 1 | 0 | 1 | | 0 | 1 | 1 | 1 | 0 | | 1 | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | X | X | | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | | 1 | 1 | 0 | X | X | $\newcommand{\xor}{\oplus}$ + 藉由把 $L_o$ 中的 X 設為 0，我們可得到 $L_o = H'L'i+H'Li' = (i\xor L)H'$，因此第 5 行做完 XOR 後，第6行做 `&= ~higher` 可以獲得我們要的結果。 + 到了第 7 行，目前 $H_o = (Hi'+H'i)f(H,L,i)$，我們需要知道 $f$ 是什麼。若 $H_o = (Hi'+H'i)$ ，真值表長這樣： | $H$ | $L$ | $i$ | $H_o$ | $L_o$ | | --- | --- | --- | --- | --- | | 0 | 0 | 1 | 1 | 1 | | 0 | 1 | 1 | 1 | 0 | | 1 | 0 | 1 | 0 | 0 | | 1 | 1 | 1 | 0 | 0 | | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 1 | | 1 | 0 | 0 | 1 | 0 | | 1 | 1 | 0 | 1 | 0 | + (a) higher => 不對，這樣根本沒變 + (b) lower => 在第1列就不對了 + (c) ~higher => 會全零，不對 + (d) ~lower => 剛好是對的 + (e) 0xFFFFFFFF => 如表，第1列就錯了 ## 觀念推廣 + 今天如果要做 $n=4$，則我們就是要選擇一個 $m$ 並確保 $4k \mod{m} = 0$ 即可。我們可以選擇模二算術，因為 $4k \mod{2} = 0,\ \forall k\in\naturals$。事實上，當 $n$ 屬於偶數，我們都可以套用 $n=2$ 的算法。 + 如果 $n=5$，我們可以用模五算術，如此就需要用 3 個變數來實作。對於更長的奇數 $n$，我們可以取 $m$ 為 $n$ 的最小質因數。 e.g. $7$ 取 $m=7$，$9$ 取 $m=3$，$15$ 取 $m=3$。以 $n=15$ 再細部說明，我們可以看出來 $15k \mod{3} = 0$，因此我們可以使用本題的解法來解任何 $n$ 為 $3$ 的倍數的情況。 ## 延伸問題二我使用更直覺的加法器設計，每個位置一樣使用兩個位 (lower, higher) 代表狀態： 1. 先把低位的進位加到高位 2. 再計算低位加法的結果 3. 最後把所有狀態為 11 的位置重新歸 0 這樣就完成 Mod 3 計數器了。 ```cpp int singleNumber(int* nums, int numsSize){ int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { int null_mask; // add carry from lower higher ^= (lower & nums[i]); // add bit to lower lower ^= nums[i]; // set all 11 back to 00 null_mask = lower & higher; lower ^= null_mask; higher ^= null_mask; } return lower; } ```