2019q1 Homework3 (review)

# 2019q1 Homework3 (review) contributed by < `bauuuu1021` > * [作業要求](https://hackmd.io/s/S1-wcjavN) ## 第 1 周第 1 題 ### 題目西洋棋中的皇后可以直線前進，吃掉遇到的所有棋子，如果棋盤上有 8 個皇后，則這 8 個皇后如何相安無事地放置在棋盤上，1970 年與 1971 年， E.W.Dijkstra 與 N.Wirth 曾用這個問題來講解程式設計之技巧。關於棋盤的問題，都可以用遞迴求解，然而如何減少遞迴的次數？在八個皇后的問題中，不必要所有的格子都檢查過，例如若某列檢查過，該該列的其它格子就不用再檢查了，這個方法稱為分支修剪。 ![](https://i.imgur.com/Ntc7TDQ.png) 所以檢查時，先判斷是否在已放置皇后的可行進方向上，如果沒有再行放置下一個皇后，如此就可大大減少遞迴的次數，例如以下為修剪過後的遞迴檢查行進路徑： ![](https://i.imgur.com/0aUWpaI.png) 8 個皇后的話，會有 92 個解答，如果考慮棋盤的旋轉，則旋轉後扣去對稱的，會有 12 組基本解。 [ [source](https://openhome.cc/Gossip/AlgorithmGossip/EightQueen.htm) ] 透過 [bitwise 演算法](http://jgpettibone.github.io/bitwise-n-queens/) 改寫為以下 C 程式: ```cpp= #include <stdio.h> #include <stdlib.h> static int sol_count = 0; void recursive_solver(int row, int maj_con, int min_con, int col_con, int n) { int queen_position; int conflicts = min_con | maj_con | col_con; int i = 0; min_con &= 65535; while (i < n) { queen_position = 1 << (A1 - i); if (!(queen_position & conflicts)) { if (row == n - 1) sol_count++; else recursive_solver(row + A2, (maj_con | queen_position) >> A3, (min_con | queen_position) << A4, col_con | queen_position, n); } i++; } } void solve_queens(int n) { int minor_dconflict = 0, major_dconflict = 0, column_conflict = 0; recursive_solver(0, column_conflict, major_dconflict, minor_dconflict, n); printf("solutions = %d\n", sol_count); } ``` 其中 `solve_queens` 的參數 `n`，數值介於 `4` 到 `16`。請補完程式碼。 ### 解題與想法 * N-Queen problem 指的是在 n*n 大小的棋盤上放置 n 個皇后，使沒有任二皇后在同一個直行或橫列或斜線上 * 首先閱讀完 [bitwise 演算法](http://jgpettibone.github.io/bitwise-n-queens/)，可對這個演算法有基本的認知 * conflicts ```Cpp=8 int conflicts = min_con | maj_con | col_con; ``` coflicts 紀錄了在哪些位子放皇后是會產生衝突的，在稍後的 ```Cpp=14 if (!(queen_position & conflicts)) { ``` 被用來檢查，若 `queen_position & conflicts` 為 0，`queen_position` 現在表示的位子可以放置皇后 * A1 ```=11 min_con &= 65535; ``` * 因為 min_con 會隨遞迴不斷變大，可能超出預期範圍 (亦即棋盤大小) 而造成錯誤，故需要在開始計算前使用 mask 使之回到正確範圍 * 根據題幹 ==其中 `solve_queens` 的參數 `n`，數值介於 `4` 到 `16`== ，可知棋盤最大應為 16\*16 ，故 mask 為 65535 (0xFFFF) ```C=12 while (i < n) { queen_position = 1 << (A1 - i); ... ``` * 每一列皇后都可能被放在 column index 15~0 這 16 個位子，對應到 1000 0000 0000 0000 ~ 0000 0000 0000 0001 * 因為 i 初始為 0，且 i 在 while loop 中不斷加大 * 故 queen_position 應在第一個 while iteration 最大(1000 0000 0000 0000)，再隨著 iteration 次數增多而減少（至 0000 0000 0000 0001)。 * 最大應在棋盤的左上角，即 $1000 0000 0000 0000_{(2)} = 1<<15$ * ==A1 = 15== * A2 ```Cpp=18 recursive_solver(row + A2, (maj_con | queen_position) >> A3, (min_con | queen_position) << A4, col_con | queen_position, n); ``` * 根據程式初始呼叫 ```C=28 recursive_solver(0, column_conflict, major_dconflict, minor_dconflict, n); ``` 本小題討論的參數(第一個參數)為 0，表示是從第一個 row 往下逐行遞迴呼叫。 * ==A2 = 1== * A3 & A4 * major conflict (maj_con) 表示右斜對角線，minor conflict (min_con) 表示左斜對角線 * 舉例說明：假設棋盤大小為 4\*4 ，row 0 在左上角 (0,0) 放置皇后，則 `maj_con | queen_position` 值為 1000，圖示如下 | 1 | 0 | 0 | 0 | | -------- | -------- | -------- | --- | | 0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | * 遞迴呼叫下個 row 時 ```Cpp=18 recursive_solver(row + 1, (maj_con | queen_position) >> A3, (min_con | queen_position) << A4, col_con | queen_position, n); ``` 將 `(maj_con | queen_position) >> A3)` 設為 0100，即可避免 row 1 在 (1,1) 放置皇后 * 要達成這樣的目標可以透過將 `(maj_con | queen_position)` 執行 `>>1` * `(min_con | queen_position)` 則是執行 `<<1` * ==A3 = 1== * ==A4 = 1== ### 延伸 :::success 延伸題目: 1. 讓參數 `n` 的有效範圍比 `[4, 16]` 更大，但仍透過 bitwise 操作，並改寫為 iteration 程式; 2. 考慮到 tail-call optimization (TCO)，改進上述遞迴程式碼並分析效能表現; 3. 考慮旋轉後扣去對稱的解法，求出基本解法的總量; ::: * [tail-call optimization](https://stackoverflow.com/questions/310974/what-is-tail-call-optimization) ## 第 2 周第 2 題 ### 題目 [population count](https://en.wikichip.org/wiki/population_count) 簡稱 popcount 或叫 sideways sum，是計算數值的二進位表示中，有多少位元是 `1`，在一些場合下很有用，例如計算 0-1 稀疏矩陣 (sparse matrix)或 bit array 中非 `0` 元素個數、計算兩個字串的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_weight)。Intel 在 2008 年 11 月 Nehalem 架構的處理器 Core i7 引入 SSE4.2 指令集，其中就有 `CRC32` 和 `POPCNT` 指令，`POPCNT` 可處理 16-bit, 32-bit, 64-bit 整數。對應到 C 程式的實作: ```clike unsigned popcnt_naive(unsigned v) { unsigned n = 0; while (v) v &= (v - 1), n = -(~n); return n; } ``` 可改寫為以下常數時間的實作: ```clike unsigned popcnt(unsigned v) { unsigned n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> X)) & Y; v *= 0x01010101; return v >> 24; } ``` ### 解題與想法 * 首先我注意到的是 `popcnt_naive` 這個函式 ```clike= unsigned popcnt_naive(unsigned v) { unsigned n = 0; while (v) v &= (v - 1), n = -(~n); return n; } ``` * 第 4 行的 `n = -(~n)` 讓人感到困惑，但實際帶入數字: 1. n = 0 2. n = -(~n)，得 n = 1 3. n = -(~n)，得 n = 2 4. n = -(~n)，得 n = 3 5. ... * 可發現 while 每執行一次就會使 n = n+1，故 `n = -(~n)` 作用等同於 n++ >後來發現其實有**科學**一點的想法： >* 因為 n 是無號數 >* -n = ~n + 1 >* -n - 1 = ~n >* -(~n) = n + 1 > >[name=bauuuu1021] [color=#0ABAB5] * 而 `v &= (v - 1)` 帶入數字可發現可以逐步消去最低位的 1，例如： 1. v = $1101_{(2)}$ 2. v &= (v - 1)，得 $1100_{(2)}$ 3. v &= (v - 1)，得 $1000_{(2)}$ 4. v &= (v - 1)，得 $0000_{(2)}$，結束迴圈 5. 共執行 3 次 while loop iteration，n = 3，表 $1101_{(2)}$ 中有 3 個 1 * 接下來探討常數時間可完成操作的 `popcnt` 函式，可以用一個簡單的舉例來說明這題的核心操作： 假設存在一種 data type `p2_t`，可以儲存 4-bit 無號數 ```C p2_t v; // 4 bits ``` `v - v>>1` 的結果為 * $\frac{1}{2}v+1$，若 v 尾數為 1 * $\frac{1}{2}v$，若 v 尾數為 0 則以下程式碼 ```C= p2_t n; n = (v >> 1); v -= n; n = (n >> 1); v -= n; n = (n >> 1); v -= n; ``` 第 2, 4, 6 行的 n 分別表示 $\frac{1}{2}$v, $\frac{1}{4}$v, $\frac{1}{8}$v，故第 7 行執行後 v = v-(v >> 1)-(v >> 2)-(v >> 3) = $\frac{1}{8}$v + (最低 3-bit 1 數)，故 v 存的為 v 之二進位表示式中 1 的數量 * 了解以上舉例後再來看 `popcnt` 函式就變得簡單許多 ```clike= unsigned popcnt(unsigned v) { unsigned n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> X)) & Y; v *= 0x01010101; return v >> 24; } ``` * 4~9 行只是改成一次執行 8 組 4-bit 數組 * 加入 mask `0x77777777` 只是為了避免執行 `>>1` 後數組中最低位的 1 變成右邊數組的最高位 * 第 11 行是為了將這 8 組 (index 7~0) 數組兩兩相加成為 4 個新數組，並利用 mask 除去不必要的部份 * 原： | 7 |6 | 5 | 4 | 3 | 2 | 1 | 0 | |---|---|---|---|---|---|---|---| * 相加後： | 7 | 7+6 | 5 | 5+4 | 3 | 3+2 | 1 | 1+0 | |---|---|---|---|---|---|---|---| * 加入 mask： | (all 0)| 7+6 | (all 0) | 5+4 |(all 0) | 3+2 | (all 0) | 1+0 | |---|---|---|---|---|---|---|---| * ==X=4==，==Y=0x0F0F0F0F== * 第 12 及 14 行將這 4 個新數組相加，並調整回正確數值 * v *= 0x01010101 (unsingned 只能存 32-bit，故不討論 overflow 部份) | (all 0)| 7+6 | (all 0) | 5+4 |(all 0) | 3+2 | (all 0) | 1+0 | |---|---|---|---|---|---|---|---| | (all 0)| 5+4 | (all 0) | 3+2 |(all 0) |1+0 | (all 0) | (all 0) | | (all 0)| 3+2 | (all 0) | 1+0 |(all 0) | (all 0) | (all 0) | (all 0) | | (all 0)| 1+0 | (all 0) | (all 0) |(all 0) | (all 0) | (all 0) | (all 0) | 相加得 | (all 0)| 7+6+5+4 +3+2+1+0 | (all 0) | 5+4+3 +2+1+0 |(all 0) | 3+2 +1+0 | (all 0) | 1+0 | |---|---|---|---|---|---|---|---| * return v >> 24 | (all0)| (all 0) | (all 0) |(all 0) |(all 0) | (all 0) | (all 0) | 7+6+5+4+ 3+2+1+0 | |---|---|---|---|---|---|---|---| ### 延伸 :::success 延伸題目: 解釋原理並找出可抵抗 [timing attack](https://en.wikipedia.org/wiki/Timing_attack) 的相關程式和場景 ::: * 原理 * Timing attack 是一種透過分析不同 input 執行所需時間來破解加密的 [side channel attack](https://en.wikipedia.org/wiki/Side-channel_attack) * Timing attack 可藉由實作常數時間 function 或嚴格的 final test 來避免 * 舉例 * 共享 memory 之 processes 間可能存在 timing attack 疑慮 >Two otherwise securely isolated processes running on a single system with either cache memory or virtual memory can communicate by **deliberately causing page faults and/or cache misses** in one process, then **monitoring the resulting changes in access times** from the other. Likewise, if an application is trusted, but its paging/caching is affected by branching logic, it may be possible for a second application to determine the values of the data compared to the branch condition by monitoring access time changes; in extreme examples, this can allow recovery of cryptographic key bits. 若一個程式 A 的 paging/caching 與 branch 執行與否有關，另一個共享 paging/caching device 的程式 B 可能藉由分析程式 A 的 access time，來得知程式 A 進行 branch condition 比較時所使用的 data value，在極端狀況下甚至可能還原加密密鑰 * Secure and Insecure string comparison * insecure ```Basic= Function InsecureCompare(StrA As String, StrB As String, length As Integer) As Boolean Dim result As Boolean For i = 1 To length If Mid(StrA, i, 1) <> Mid(StrB, i, 1) Then Exit For Next result = (i > length) InsecureCompare = result End Function ``` * secure ```Basic= Function SecureCompare(StrA As String, StrB As String, length As Integer) As Boolean Dim result As Boolean For i = 1 To length result = result Or (Asc(Mid(StrA, i, 1)) Xor Asc(Mid(StrB, i, 1))) Next SecureCompare = Not result End Function ``` 兩者不同的點在於前者只要找到相異處就會跳出迴圈並 return 結果，因此兩字串第一個 bit 就相異或最後一個 bit 才相異會有不同的執行時間；而後者則是不管相異的是哪個 bit 執行時間都相同。故後者可以避免 timing attack。 * 延伸問題的延伸問題 : `strcmp` 屬於 insecure or secure string comparison? * 以下是一個使用舉例，可以發現在完全相符、第一個不相符處前者 ascii 值較大、第一個不相符處前者 ascii 值較小會分別輸出 0、1、-1 ```C= #include <stdio.h> #include <string.h> int main() { printf("%d\n", strcmp("bauuuu", "bauuuu")); // output 0 printf("%d\n", strcmp("bauuuu", "baUUUU")); // output 1 printf("%d\n", strcmp("baUUUU", "bauuuu")); // output -1 } ``` * 從 [strcmp](https://code.woboq.org/userspace/glibc/string/strcmp.c.html) 的 glibc source code 中節錄如下 ```C= int STRCMP (const char *p1, const char *p2) { const unsigned char *s1 = (const unsigned char *) p1; const unsigned char *s2 = (const unsigned char *) p2; unsigned char c1, c2; do { c1 = (unsigned char) *s1++; c2 = (unsigned char) *s2++; if (c1 == '\0') return c1 - c2; } while (c1 == c2); return c1 - c2; } ``` 根據第 14 行可發現只要有字元不相符即跳出迴圈，故 `strcmp` 屬於 **insecure string comparison** ## 第 4 周 (下) ### 題目考慮略為複雜的 reference counting 實作，如下: ```clike #include <limits.h> #include <stddef.h> #include <stdlib.h> /** * Destructor callback. * \param ptr Pointer to space to destroy and free */ typedef void (*rc_dtor)(void *ptr); typedef int volatile refcnt; #define REFCOUNTER_MAX (INT_MAX / 2) /* FIXME: reference counting should be robust against race conditions in * multi-threaded applications. Use C11 atomics. */ static inline refcnt refcnt_acquire(refcnt *dest) { return (*dest += 1) - 1; } static inline refcnt refcnt_release(refcnt *dest) { return (*dest -= 1) + 1; } struct refcnt_entry { refcnt n; rc_dtor dtor; void *ptr; }; /** * \brief Allocate some reference counted space * \param sz Size of space in bytes * \param dtor Destructor to use in case of free * \return the space on sucess, NULL otherwise */ void *rc_malloc(size_t sz, rc_dtor dtor) { size_t const static rc_offset = offsetof(struct refcnt_entry, ptr); size_t const static max_size = (WW) - offsetof(struct refcnt_entry, ptr); if (sz >= max_size) goto reject; void *out = malloc(sz + rc_offset); if (out) { struct refcnt_entry *const outref = (struct refcnt_entry *) out; outref->n = 1; outref->dtor = dtor; return XX; } reject: return NULL; } /** * \brief Acquire a reference to some space * \param ptr pointer, previously allocated with `rc_malloc` * \return the same pointer on sucess, NULL otherwise */ void *rc_acquire(void *ptr) { size_t const static rc_offset = offsetof(struct refcnt_entry, ptr); struct refcnt_entry *const ptrref = (struct refcnt_entry *) (void *) (((char *) ptr) - rc_offset); /* try to acquire a reference */ refcnt next = refcnt_acquire(&ptrref->n); if (YY) return ptr; /* artifically refuse the reference */ refcnt prev = refcnt_release(&ptrref->n); if (prev == 1) { if (ptrref->dtor) (*ptrref->dtor)(ptr); free(ptrref); } return NULL; } /** * \brief Release a reference to some space * \param ptr pointer, previously allocated with `rc_malloc` * \note If the reference count drops to zero, the space is freed. */ void rc_release(void *ptr) { size_t const static rc_offset = offsetof(struct refcnt_entry, ptr); if (!ptr) return; struct refcnt_entry *const ptrref = (struct refcnt_entry *) ZZ; /* release the reference */ refcnt prev = refcnt_release(&ptrref->n); if (prev == 1) { if (ptrref->dtor) (*ptrref->dtor)(ptr); free(ptrref); } } /* -- unit test -- */ #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> static void test_destructor(void *); static void *test_thread(void *); struct test_box *test_constructor(int j); struct test_box { int j; }; struct test_box *test_boxes[10] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL}; int test_repeat_count = 1000000; struct test_box *test_constructor(int j) { struct test_box *box = rc_malloc(sizeof(struct test_box), &test_destructor); assert(box != NULL); box->j = j; fprintf(stderr, "created item %d\n", box->j); return box; } void test_destructor(void *arg) { struct test_box *const box = (struct test_box *) arg; fprintf(stderr, "destroying item %d\n", box->j); box->j = -1; return; } void *test_thread(void *arg) { int const thread_id = *(int *) arg; int const repeat_count = test_repeat_count; int personal_refs[10]; int faults = 0; int overdrops = 0; int outstanding = 0; fprintf(stderr, "[Thread %i starting]\n", thread_id); for (int i = 0; i < 10; ++i) { personal_refs[i] = 0; } for (int i = 0; i < repeat_count; ++i) { int const step_rand = rand(); int command = step_rand % 2; int j = (step_rand / 2) % 10; if (command == 0) { /* acquire */ struct test_box *box; box = rc_acquire(test_boxes[j]); if (box == test_boxes[j]) { personal_refs[j] += 1; } else faults += 1; } else { /* release */ if (personal_refs[j] > 0) { rc_release(test_boxes[j]); personal_refs[j] -= 1; } else overdrops += 1; } } for (int i = 9; i >= 0; --i) { while (personal_refs[i] > 0) { rc_release(test_boxes[i]); outstanding += 1; personal_refs[i] -= 1; } } fprintf(stderr, "[Thread %i done: %i faults, %i overdrops, %i outstanding]\n", thread_id, faults, overdrops, outstanding); return NULL; } int main(int argc, char **argv) { int corrupt_count = 0; srand((int) (time(NULL))); /* make the boxes */ { for (int j = 0; j < 10; ++j) { test_boxes[j] = test_constructor(j); } } /* run thread start code */ { int thread_i = 0; test_thread(&thread_i); } /* inspect boxes */ { for (int j = 0; j < 10; ++j) { if (test_boxes[j]->j != j) { fprintf(stderr, "Box %i was corrupted.\n", j); corrupt_count += 1; } rc_release(test_boxes[j]); test_boxes[j] = NULL; } } return corrupt_count == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } ``` 預期的輸出如下: (其中一例) ``` created item 0 created item 1 created item 2 created item 3 created item 4 created item 5 created item 6 created item 7 created item 8 created item 9 [Thread 0 starting] [Thread 0 done: 0 faults, 2575 overdrops, 2721 outstanding] destroying item 0 destroying item 1 destroying item 2 destroying item 3 destroying item 4 destroying item 5 destroying item 6 destroying item 7 destroying item 8 destroying item 9 ``` ## 參考資料 * [2019q1 Homework3 (作業區)](https://hackmd.io/s/BJgx6jav4) * [Secure and Insecure string comparison](https://security.stackexchange.com/questions/83660/simple-string-comparisons-not-secure-against-timing-attacks) ###### tags: `bauuuu1021`, `sysprog`,`2019spring`