2020q3 Homework2 (quiz2)

# 2020q3 Homework2 (quiz2) contributed by < `huang-me` > ###### tags: `embedded_master` [第二周題目](https://hackmd.io/@sysprog/2020-quiz2) > [toc] # 測驗1 ```cpp #include <stddef.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; for (int i = 0; i < size; i++) if (str[i] & 0x80) /* i.e. (unsigned) str[i] >= 128 */ return false; return true; } ``` 1. 因為英文字母的 ascii code 從十進位的 65-90 以及 97-122，不管是哪一個字母的 binary code 第 8 個 bit 都一定是 0，所以我們只需要將所有的字都跟 0x80 做 AND，如果得到的結果不是 0，則代表此記憶體位置內的資料必定不是字母。 2. 不過以上的方法必須要逐字進行比較所以執行時間可能比較慢，況且現在電腦至少都是32位元，因此我們希望可以一次比較64位元以加快執行速度。 ```c bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & 0x8080808080808080) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` - 在大概一萬個字的文件中判定全部的字是否為英文字母原本要 148,336 instructions per call，而更改為一次比較64位元之後只需要 26,294 instructions per call，有顯著的差異。 - 一個 char 的 size 是8 bits 而如果要一次比對64 bits 只需要將 0x80重複8次即可。 :::warning 為何需要用 memcpy? 因為 memcpy 在複製的時候會先確認資料是否已經 aligned，如果沒有的話就會先將前幾個 byte 搬移使得資料達成 data aligned後再以 word 為單位進行複製，由此可以使得複製的速度更快。 ::: :::danger 不能只用「更快」解釋，需要從處理器架構去考慮，如果在 ARM/MIPS 架構缺乏 unaligned access，會發生什麼事？ :notes: jserv ::: # 測驗2 1. 解釋 code **Ascii Code 對照表** |大寫|大寫ascii|小寫|小寫ascii| |---|---|---|---| |A|01000001|a|01100001| |B|01000010|b|01100010| |C|01000011|c|01100011| |D|01000100|d|01100100| |E|01000101|e|01100101| |**F**|**01000110**|**f**|**01100110**| - 所有的大小寫的 ascii code 的差別都只有第6 bit，大寫的第6 bit 為0，而小寫的為1，所以如果要將 f 或 F 都對應到 15 (0b00001111) 需要以下幾個步驟 1. 取 f 以及 F 的共同位元（第7個 bit ），所以需要拿 0b01000000 (0x40) 跟 f 或者 F 做 and 以取得一位的 1 2. 將 0x40 向右移動兩次，分別移動3、6格後再互相進行一次 or ，可得到 shift = 0b00001001 3. 將 shift 加上 f 或 F 的 ascii code 後，與 0x0f(0b00001111) 做 and 就可以得到後面4個 bit，也就得到答案 0b00001111 ```c uint8_t hexchar2val(uint8_t in) { const uint8_t letter = in & 0x40; const uint8_t shift = (letter >> 3) | (letter >> 6); return (in + shift) & 0xf; } ``` :::warning 為何題目只給 A-F/a-f 的 ascii code？ F/f 之後的 ascii code 如果一樣做以上計算的話就會超出4個 bits，在最後與 0x0f and 之後的值就不會在繼續按照順序增加，所以這樣的方法只適用於 A-F & a-f ::: 2. 將 hexchar2val 擴充為允許 "0xFF", "0xCAFEBABE", "0x8BADF00D" 等字串輸入，且輸出對應的十進位數值 ```c uint64_t hexchar2val(char *in) { int len = strlen(in), len_cpy = len - 1; uint32_t output = 0; while (--len >= 0) { uint8_t in_char = in[len]; const uint8_t letter = in_char & 0x40; const uint8_t shift = (letter >> 3) | (letter >> 6); output |= ((in_char + shift) & 0xf) << (4 * (len_cpy - len)); } return output; } ``` 1. 首先，將讀取的資料改為 char * 形態以讀取更多的字元 2. 計算此次傳入的字串長度，再用 while 決定執行的次數，對每個字做一樣的計算後轉換為 0-15 的數字，再將此數字向左位移適當個位置 3. 最後，將要 output 的數字與這次的計算結果進行一次 OR :::warning Hexspeak 的功用：在寫程式的時候常常需要用到一些記憶體的空間，於是工程師們利用數字以及A-F組合形成的16進位數字當作某些特別的記憶體位置，較便於記憶或者理解記憶體空間內資料的意思。 ::: # 測驗3 ```c const uint32_t D = 3; #define M ((uint64_t)(UINT64_C(0xFFFFFFFFFFFFFFFF) / (D) + 1)) /* compute (n mod d) given precomputed M */ uint32_t quickmod(uint32_t n) { uint64_t quotient = ((__uint128_t) M * n) >> 64; return n - quotient * D; } ``` - 因為除法的計算時間比較長，所以我們希望將除法轉換為 bitwise 的計算，不過 bitwise 的計算只能乘或除 $2^N$，所以如果想要達成除以3的話就必須要修改一下初始的數字。因為後面有向右移動64 bits，所以前面原本也應該有相對應的將2左移64 bits（0x10000000000000000），不過題目有給提示要使用 UINT64_C ，所以這題選擇使用 0xFFFFFFFFFFFFFFFF 去除以3後再加一以補回程式計算的時候自動消除的小數部分 :::warning 之所以使用 UINT64_C 應該是因為如果要使用 $2^{64}$ 就必須要多配置記憶體，不過如果使用 0xFFFFFFFFFFFFFFFF 也可以達成一樣的效果 ::: --- ``` c void div_init(div_info_t *div_info, size_t d) { /* Nonsensical. */ assert(d != 0); /* * This would make the value of magic too high to fit into a uint32_t * (we would want magic = 2^32 exactly). This would mess with code gen * on 32-bit machines. */ assert(d != 1); uint64_t two_to_k = ((uint64_t)1 << 32); uint32_t magic = (uint32_t)(two_to_k / d); /* * We want magic = ceil(2^k / d), but C gives us floor. We have to * increment it unless the result was exact (i.e. unless d is a power of * two). */ if (two_to_k % d != 0) { magic++; } div_info->magic = magic; #ifdef JEMALLOC_DEBUG div_info->d = d; #endif } ``` - 計算出 $2^{32}/d$ 的值之後再進行調整，因為計算除法的時候 c 會自己幫我們忽略剩下的小數，不過我們需要的是比 $2^{32}/d$ 大一點的數字，所以如果 ```two_to_k % d != 0``` 的話應該要將 magic 的值加一 ```c div_compute(div_info_t *div_info, size_t n) { assert(n <= (uint32_t)-1); /* * This generates, e.g. mov; imul; shr on x86-64. On a 32-bit machine, * the compilers I tried were all smart enough to turn this into the * appropriate "get the high 32 bits of the result of a multiply" (e.g. * mul; mov edx eax; on x86, umull on arm, etc.). */ size_t i = ((uint64_t)n * (uint64_t)div_info->magic) >> 32; #ifdef JEMALLOC_DEBUG assert(i * div_info->d == n); #endif return i; } ``` 1. (uint32_t) -1 = 0xffff ffff ```assert(n <= (uint32_t)-1);``` 檢測輸入的值有沒有超過32 bits 2. 把 n 跟之前提前先計算好的 magic 相乘之後再位移32個 bits 以得到對的答案 ## 比較 mod 以及 quick_mod 效能差距 ```c for (int count = 0; count < 1000; count++) { uint32_t dividend = rand(); start = clock(); for (int i = 0; i < 1000; i++) dividend = mod(dividend, count + 2); end = clock(); fprintf(mod_file, "%f\n", (double) (end - start) / CLOCKS_PER_SEC); printf("ans: %d\n", dividend); } ``` ```c for (int count = 0; count < 1000; count++) { uint32_t dividend = rand(); start = clock(); div_init(count + 2); for (int i = 0; i < 1000; i++) { // div_init(count + 2); dividend = quick_mod(dividend); } end = clock(); fprintf(quick, "%f\n", (double) (end - start) / CLOCKS_PER_SEC); printf("ans: %d\n", dividend); } ``` - 利用 count 當作被除數，並且每次計算皆執行1000次並記錄執行時間再使用MATLAB畫出計算時間圖 1. gcc default ![](https://i.imgur.com/90qeMBt.png) - 可以看得出已經有蠻大的效能差距 2. gcc -O2 ![](https://i.imgur.com/boQ3yFh.png) - 不過編譯時使用 -O2 優化效能之後竟然兩者沒有差別，參考[RinHizakura](https://hackmd.io/@RinHizakura/SJ5NuIANP?fbclid=IwAR00Irsw6hPXgSQkZbFMpgRUj4T1soKYdA869CHTEVRoytpTAfHaaEn6mpg#%E6%9C%80%E4%BD%B3%E5%8C%96%E7%9A%84%E8%A7%80%E5%AF%9F%E8%88%87%E5%AF%A6%E9%A9%97%E6%94%B9%E9%80%B2)發現原來是因為沒有使用所以編譯時 gcc 直接把不需要的程式碼刪除了 3. gcc -O2 (print 出計算結果迫使程式必須計算) - 同樣的數字都有重複計算1000次，比較每次計算都重新計算 magic 會不會耗費很多時間 1. div_init 只有執行一次 ![](https://i.imgur.com/Xkvrsy8.png) 2. 每次計算皆執行 div_init ![](https://i.imgur.com/BaZrFtC.png) # 測驗4 ```c bool divisible(int input) { return (input * M) <= (M - 1); } ``` - $M = (2^{64} / d) + 1$ $M * n = n * ((2^{64}/d) + 1)$ $M * n = k*2^{64} + n$ 而 $k*2^{64}$ 必定超過64位元所以剩下的就是 n ，因此如果 $M * n <= M - 1$ 就可以判定是 d 的倍數，反之則因為 n 跟 d 不會約分，因而64位元以下的值不會小於 $M - 1$ # 測驗5 - 原理：因為大寫的英文字母的 ascii code 介於 0x41-0x5A，無論是哪一個字母加上 0x3F 後第8個 bit 必定為1，而加上 0x25，則必定為0。小寫的字母因為 ascii code 介於 0x61-0x7A，無論加上 0x3F 或者 0x25 第8個 bit 都一定是0。因此，只要將字母分別位移 0x3F 以及 0x25 後再做一次 XOR 就可以確定第8個 bit 是否相同，等同於得知此字母是大寫或者小寫，最後只需要再與 0x80 做 and 就可以知道那些部分需要處理才可以變回小寫。最後，因為大小寫不相同的 bit 是第6個 bit 不過我們判斷是否為大寫是藉助第8個 bit 的輔助，所以我們必須在 toggle 之前先將 mask 右移兩個 bits。 ```c #define PACKED_BYTE(b) (((uint64_t)(b) & (0xff)) * 0x0101010101010101u) /* in-place implementation for converting all characters into lowercase. */ void strlower(char *s, size_t n) { for (size_t j = 0; j < n; j++) { if (s[j] >= 'A' && s[j] <= 'Z') s[j] ^= 1 << 5; else if ((unsigned) s[j] >= '\x7f') /* extended ASCII */ s[j] = tolower(s[j]); } } /* vectorized implementation for in-place strlower */ void strlower_vector(char *s, size_t n) { size_t k = n / 8; for (size_t i = 0; i < k; i++, s += 8) { uint64_t *chunk = (uint64_t *) s; if ((*chunk & PACKED_BYTE(0x80)) == 0) { /* is ASCII? */ uint64_t A = *chunk + PACKED_BYTE(128 - 'A' + 0); uint64_t Z = *chunk + PACKED_BYTE(128 - 'Z' + (-1)); uint64_t mask = ((A ^ Z) & PACKED_BYTE(0x80)) >> 2; *chunk ^= mask; } else strlower(s, 8); } k = n % 8; if (k) strlower(s, k); } int main() { /* quote from https://www.gutenberg.org/files/74/74-h/74-h.htm */ char str[] = "This eBook is for the use of anyone anywhere at no cost and with \ almost no restrictions whatsoever. You may copy it, give it away or \ re-use it under the terms of the Project Gutenberg License included \ with this eBook or online at www.gutenberg.net"; int n = strlen(str); strlower_vector(str, n); puts(str); } ``` ## str[] & *str 差異 - 如果直接把 main 宣告的 ```char str[]``` 直接更改為 ```char *str``` 會出現 segmentation fault 1. 如果宣告的是 ```char *str = "some thing"```，此時程式會將 "some thing" 存入一個 read only 的記憶體並且讓 pointer str 指向這個記憶體位置。 2. 當使用 ```char str[] = "some thing"```，此時程式一樣將 "some thing" 放入一個 read only 的記憶體位置，不過也會將這個字串複製並放入 stack。 3. 因此只有使用 ```char str[]``` 的時候才可以修改字串的內容以對其做大小寫的轉換。 :::danger TODO: 翻閱 C 語言規格書並摘錄相關內容來論述 :notes: jserv ::: # 測驗6 ```c int singleNumber(int *nums, int numsSize) { int lower = 0, higher = 0; for (int i = 0; i < numsSize; i++) { lower ^= nums[i]; lower &= ~higher; higher ^= nums[i]; higher &= ~lower; } return lower; } ``` - 對任何數字的 1's complement 做 AND 只會留下自己為1且對方為0的 bit，所以利用 higher & lower 就可以互相抵消輸入，當輸入相同數字第二次時就會因為 higher 還沒有更新新的數值，所以可以把本次輸入抵消，而第三次則是 higher 上一輪讀入的值與本輪相同，所以兩個都是1的話就會被消除。 ## 撰寫不同程式碼，並確保在 [LeetCode 137. Single Number II](https://leetcode.com/problems/single-number-ii/) 執行結果正確且不超時 ```c int singleNumber(int* nums, int numsSize){ int last1 = 0, last2 = 0, output = 0; for(int i=0;i<numsSize;i++) { output = (nums[i] & ~(last1 | last2)) | (~nums[i] & last1); last2 = last1 & nums[i] | last2 & ~nums[i]; last1 = output & nums[i] | last1 & ~nums[i]; } return output; } ``` ![](https://i.imgur.com/giWXNpz.png) - 利用卡諾圖整理出期望達成的 state machine，希望每個 bit 的運作規則都按照下圖 ```graphviz digraph state{ node[shape=circle] rankdir=LR A[label="0"] B[label="1"] C[label="0"] A->B->C->A[label="1"] A->A[label=0] B->B[label=0] C->C[label=0] initial[label="Initial", shape=plaintext] initial->A } ``` - 由圖可得 ```output = (nums[i] & ~(last1 | last2)) | (~nums[i] & last1)``` - 並且，需要更新到 last 的 bit 只有 input 為1的 bit，因此 ```last1 = output & nums[i] | last[i] & ~nums[i]``` 且 ```last2 = last1 & nums[i] | last2 & ~nums[i]``` - 擴展到重複4次或者5次只需要重複記憶過去的 state 分別4、5次，其他相同 - 以下4次： ```c int singleNumber(int *nums, int numsSize) { int last1 = 0, last2 = 0, output = 0, last3 = 0; for (int i = 0; i < numsSize; i++) { output = (nums[i] & ~(last1 | last2 | last3)) | (~nums[i] & last1); last3 = (last2 & nums[i]) | (last3 & ~nums[i]); last2 = (last1 & nums[i]) | (last2 & ~nums[i]); last1 = (output & nums[i]) | (last1 & ~nums[i]); } return output; } ``` 輸入測資 [1, 2, 3, 3, 2, 2, 3, 3, 2] 時，output 為1 如圖： :::danger 文字訊息避免用圖片呈現 :notes: jserv ::: ![](https://i.imgur.com/UqSMbQY.png) - 以下5次： ```c int singleNumber5(int *nums, int numsSize) { int last1 = 0, last2 = 0, output = 0, last3 = 0, last4 = 0; for (int i = 0; i < numsSize; i++) { output = (nums[i] & ~(last1 | last2 | last3 | last4)) | (~nums[i] & last1); last4 = (last3 & nums[i]) | (last4 & ~nums[i]); last3 = (last2 & nums[i]) | (last3 & ~nums[i]); last2 = (last1 & nums[i]) | (last2 & ~nums[i]); last1 = (output & nums[i]) | (last1 & ~nums[i]); } return output; } ```