F05: review - HackMD

# F05: review contributed by < `st9540808` > * [F05: review](https://hackmd.io/s/S1-wcjavN) ## [第 2 週測驗 2](https://hackmd.io/s/H1Pik8M8E#%E6%B8%AC%E9%A9%97-2) [population count](https://en.wikichip.org/wiki/population_count) 簡稱 popcount 或叫 sideways sum，是計算數值的二進位表示中，有多少位元是 `1`，在一些場合下很有用，例如計算 0-1 稀疏矩陣 (sparse matrix)或 bit array 中非 `0` 元素個數、計算兩個字串的 [Hamming distance](https://en.wikipedia.org/wiki/Hamming_weight)。Intel 在 2008 年 11 月 Nehalem 架構的處理器 Core i7 引入 SSE4.2 指令集，其中就有 `CRC32` 和 `POPCNT` 指令，`POPCNT` 可處理 16-bit, 32-bit, 64-bit 整數。對應到 C 程式的實作: ```c unsigned popcnt_naive(unsigned v) { unsigned n = 0; while (v) v &= (v - 1), n = -(~n); return n; } ``` 可改寫為以下常數時間的實作: ```c unsigned popcnt(unsigned v) { unsigned n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> 4)) & 0x0F0F0F0F; v *= 0x01010101; return v >> 24; } ``` ### 原理 #### `popcnt_naive()` 先看第一個實作，`popcnt_naive()` 是利用不斷清除 LSB 直到輸入的數值 `v` 為 0。 * `v &= (v - 1)` 將 LSB 設為 0 (**reset LSB**) 舉個例子，假設輸入數值為 20: ```c 0001 0100 \\ 20 => LSB in bit position 2 0001 0011 \\ 20-1 0001 0000 \\ 20 & (20-1) ``` > 類似的操作還有 `x & -x`，將 `x` 的 LSB 取出來 (**isolate LSB**) * `n = -(~n)` 等同 `n++` * 因為在二補數系統中，$-n =\ \sim n + 1$ * $-(\sim n) = n + 1$ 因此 `popcnt_naive()` 的執行時間取決於輸入數值中 1 (set bit) 的個數，所以用常數時間的 `popcnt()` 取代。 --- #### `popcnt()` 對於一個 32 bit 的無號整數，popcount 可以寫成以下數學式: $popcnt(x) = x - \left \lfloor{{\dfrac{x}{2}}}\right \rfloor - \left \lfloor{{\dfrac{x}{4}}}\right \rfloor - ... - \left \lfloor{{\dfrac{x}{2^{{31}}}}}\right \rfloor$ 假設 $x = b_{31}...b_3b_2b_1b_0$，先看看 $x[3:0]$ 4 個位元，用以上公式可以計算得: $(2^3b_3 + 2^2b_2 + 2^1b_1 + 2^0b_0) - (2^2b_3 + 2^1b_2 + 2^0b_1) - (2^1b_3 + 2^0b_2) - 2^0b_3$ > $\left \lfloor{{\dfrac{x}{2}}}\right \rfloor$ 相當於 C 表達式中 `x >> 1` 稍微改寫一下可以得到: $(2^3 - 2^2 - 2^1 - 2^0)b_3 + (2^2 - 2^1 - 2^0)b_2 + (2^1 - 2^0)b_1 + 2^0b_0$ 因此 popcnt 的一般式可以改寫成: $popcnt(x) = \sum\limits_{n=0}^{31} {}(2^n - \sum\limits_{i=0}^{n-1} 2^{i})b_n = \sum\limits_{n=0}^{31}b_n$ 因為 $2^n - \sum\limits_{i=0}^{n-1} 2^{i} = 1$，只要對應的 $b_n$ 為 1，這個 bit 就會在 popcnt 的總和中加一，剛好對應 `popcnt_naive()`，因此映證了一開始的數學式確實可以計算出 population count。且一個 32 bit 的無號整數最多有 32 個 1 (set bit)，剛好可以用一個 byte 表示，所以可以分成幾個區塊平行計算，最後再全部加總到一個 byte 中，可以達到更好的時間複雜度而不需檢查 32 次。 --- 實作一開始以**每 4 個位元 (nibble) 為一個單位**計算 1 的個數，利用最初的公式計算 $x - \left \lfloor{{\dfrac{x}{2}}}\right \rfloor - \left \lfloor{{\dfrac{x}{4}}}\right \rfloor - \left \lfloor{{\dfrac{x}{8}}}\right \rfloor$ ```c= n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; ``` 1. `n = (v >> 1) & 0x77777777` : 將輸入數值 `v` 除以 2，得到 $\left \lfloor{{\dfrac{v}{2}}}\right \rfloor$ ```c b_31 b_30 b_29 b_28 ... b7 b6 b5 b4 b3 b2 b1 b0 // v 0 b_31 b_30 b_29 ... 0 b7 b6 b4 0 b3 b2 b1 // (v >> 1) & 0x77777777 ``` 2. `v -= n` : 計算結果相當於 $v - \left \lfloor{{\dfrac{v}{2}}}\right \rfloor$ 3. `n = (n >> 1) & 0x77777777` : 再對 `n` 除以 2，得到 $\left \lfloor{{\dfrac{v}{4}}}\right \rfloor$ 4. `v -= n` : 計算出 $v - \left \lfloor{{\dfrac{v}{2}}}\right \rfloor - \left \lfloor{{\dfrac{v}{4}}}\right \rfloor$ 5. 和 6. 重複同樣動作最後這段結束後計算出 $v - \left \lfloor{{\dfrac{v}{2}}}\right \rfloor - \left \lfloor{{\dfrac{v}{4}}}\right \rfloor - \left \lfloor{{\dfrac{v}{8}}}\right \rfloor$，得到每 4 個位元為一個單位中 set bit 的個數 --- **`v = (v + (v >> 4)) & 0x0F0F0F0F`** : 將每 4 個位元中 set bit 的個數加到 byte 中: 1. 假設 $B_n$ 代表第 n 個 nibble (4 位元) 中的數值 ```c B7 B6 B5 B4 B3 B2 B1 B0 // v 0 B7 B6 B5 B4 B3 B2 B1 // (v >> 4) ``` 2. 加起來就可以得到: ```c // (v + (v >> 4)) B7 (B7+B6) (B6+B5) (B5+B4) (B4+B3) (B3+B2) (B2+B1) (B1+B0) ``` 3. 最後使用 0x0F0F0F0F 做 mask 得到 ```c // (v + (v >> 4)) & 0x0F0F0F0F 0 (B7+B6) 0 (B5+B4) 0 (B3+B2) 0 (B1+B0) ``` --- **`v *= 0x01010101`** : 在最後一個 statement 中，將 `v` 乘上 0x01010101 * 假設 A = B7+B6, B = B5+B4, C = B3+B2, D = B1+B0, 根據分配律可以得到: ``` v * 0x01010101 = (A + B + C + D) (B + C + D) (C + D) (D) |<-- 1 byte -->|<-- 1 byte -->|<-- 1 byte -->|<-- 1 byte -->| ``` **`return v >> 24`** : 最後得到的結果會放在 Most significant byte 中，所以向右位移 24 bit。 --- > 同樣也是利用相同的概念，但我認為這個版本比較容易理解，把 set bit 的個數依序合併到每個 2、4、8、16 bit 中。 ```c unsigned int count_bit(unsigned int x) { x = (x & 0x55555555) + ((x >> 1) & 0x55555555); x = (x & 0x33333333) + ((x >> 2) & 0x33333333); x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F); x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF); x = (x & 0x0000FFFF) + ((x >> 16)& 0x0000FFFF); return x; } ``` [參考資料](https://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer) ### timing attack 的相關程式和場景在傳送資料時，會使用訊息鑑別碼（Message authentication code、MAC) 來保證資料的完整性 (integrity)，以及身分驗證 (authentication)，訊息鑑別碼通常會使用 HMAC 演算法計算，而在比對 HMAC 的時候，必須避免使用非常數時間的比較方式，以免攻擊者能夠以執行時間，猜出任意一段訊息的 HMAC。 nodejs 裡有推薦比較 HMAC 的函式 [crypto.timingSafeEqual()](https://nodejs.org/api/crypto.html#crypto_crypto_timingsafeequal_a_b)，實作如下: [node/src/node_crypto.cc Line 6065](https://github.com/nodejs/node/blob/51e0948862f8920c0387f6702843e8fd79f24172/src/node_crypto.cc#L6065) ```c void TimingSafeEqual(const FunctionCallbackInfo<Value>& args) { CHECK(Buffer::HasInstance(args[0])); CHECK(Buffer::HasInstance(args[1])); size_t buf_length = Buffer::Length(args[0]); CHECK_EQ(buf_length, Buffer::Length(args[1])); const char* buf1 = Buffer::Data(args[0]); const char* buf2 = Buffer::Data(args[1]); return args.GetReturnValue().Set(CRYPTO_memcmp(buf1, buf2, buf_length) == 0); } ``` 而其中比較函式 `CRYPTO_memcmp()` 在定義在 openssl 的實作中: [openssl/crypto/cryptlib.c Line 401](https://github.com/openssl/openssl/blob/a4cc3c8041104896d51ae12ef7b678c31808ce52/crypto/cryptlib.c#L401) ```c int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len) { size_t i; const unsigned char *a = in_a; const unsigned char *b = in_b; unsigned char x = 0; for (i = 0; i < len; i++) x |= a[i] ^ b[i]; return x; } ``` CRYPTO_memcmp() 使用對每個 a 和 b 的元素做 XOR，如果有一個不相同，x 的值就不會是 0，而且會比較完所有的元素。 ## 第 2 週測驗 3 ```c #include <stdio.h> #define cons(x, y) (struct llist[]){{x, y}} struct llist { int val; struct llist *next; }; int main() { struct llist *list = cons(9, cons(5, cons(4, cons(7, NULL)))); struct llist *p = list; for (; p; p = p->next) printf("%d", p->val); printf("\n"); return 0; } ``` §6.5.2.5 Compound literals, ¶4 > A postfix expression that consists of a **parenthesized type name** followed by a braceenclosed list of initializers is a compound literal. It provides an unnamed object whose value is given by the **initializer list** .81) > 81) Note that this differs from a cast expression. For example, a cast specifies a conversion to scalar types or void only, and the result of a cast expression is not an lvalue. > 值得注意的是一開始的 `(struct llist[])` 並不是一個 cast，他是跟著 compound literal 一起的。 §6.7.8 Initialization, ¶17 > Each brace-enclosed initializer list has an associated **current object**. When no designations are present, subobjects of the current object are __***initialized in order***__ according to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union. ... §6.7.8 Initialization, ¶ 20 > If the **aggregate or union** _contains_ elements or members that are **aggregates or unions**, these rules apply recursively to the subaggregates or contained unions. If the initializer of a subaggregate or contained union begins with a left brace, the initializers enclosed by that brace and its matching right brace initialize the elements or members of the subaggregate or the contained union. ... aggregate type 指的是 array 或是 structure，當看到 `{{x, y}}` 的第二個左大括號時，就會使用到 §6.7.8:20 這個規則，接著應用 §6.7.8:17 將 `struct llist` 依照順序初始化，而回到一開始的 aggregate type (在這邊是 array)，剛剛的 initializer 就會被放在第一個元素。所以當: ```c struct llist *list = cons(9, cons(5, cons(4, cons(7, NULL)))); ``` 就會讓 list 指向一個 struct llist 物件，list 會有鏈結串列的性質，而每一個元素根據規格書 6.5.2.5:6 的描述都有 automatic storge duration. 這一題的 cons 如果要使用 designated initializers 也可以定義成: ```c #define cons(x, y) &(struct llist){.val = x, .next = y} ``` 因為 compound literal 是 lvalue。