2019q1 Homework3 (review)

# 2019q1 Homework3 (review) contributed by < `grant7163`> ###### tags: `sysprog2019_q1` ## 作業要求依據 [F03: fibdrv](https://hackmd.io/s/S1-wcjavN) * 從 [第 1 週](https://hackmd.io/s/SyrZMGYr4), [第 2 週](https://hackmd.io/s/H1Pik8M8E), [第 3 週(上)](https://hackmd.io/s/S1weT4iLE), [第 3 週(下)](https://hackmd.io/s/SkrVSKiU4), [第 4 週(上)](https://hackmd.io/s/H1KLoTEv4), [第 4 週(下)](https://hackmd.io/s/BJKk1ANv4) 選出你感興趣的 3 個題組進行作答，並且回答附帶的「延伸問題」 * 比照 [課前測驗參考解答: Q1](https://hackmd.io/s/ByzoiggIb) 和 [對 Linked List 進行氣泡排序](https://hackmd.io/s/BklGm1MhZ) 的模式來撰寫共筆，需要詳細分析自己的想法、參閱的材料 (包含 C 語言規格書的章節)，以及==進行相關實驗== * 若有不能理解的部分，請標註出來。善用 HackMD 的語法 `:::info` 和 `:::` 標註你的提問 ## 實驗環境 ```shell $ lscpu 架構： x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 每核心執行緒數： 2 每通訊端核心數： 4 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 158 Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz 製程： 9 CPU MHz： 900.044 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 5616.00 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 6144K NUMA node0 CPU(s)： 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d $ cat /proc/version` Linux version 4.15.0-46-generic (buildd@lgw01-amd64-038) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 $ cat /etc/lsb-release` DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS" ``` ## 第一週測驗 1 ### 程式目的 * 探討西洋棋八個皇后在互相不會吃掉對方的情形下有幾種排列方式 * 如何減少遞迴的次數 ### 想法 & 思考依據西洋棋皇后的走法是在同一條直線(方向有上下左右，右上至左下，左上至右下)上都可以吃掉別人。 ### 程式實作 ## 第一週測驗 2 ### 程式目的 * 將資料放置在 $2^N$ 的 memory address 中，為了增加 memory 存取效率 ### 想法 & 思考 * x 為 align 的位址 * a 為 align a-byte a 為$2^N$，所以只有一位會是1，且 mask 經由 a - 1 所產生的結果為0…011…1。以此題為例 x = 0x1006, a = 4, mask = 3 ： ```shell 0001 0000 0000 0110 + 0000 0000 0000 0011 ---------------------- 0001 0000 0000 1000 & 1111 1111 1111 1100 ---------------------- 0001 0000 0000 1000 ``` ```clike= #include <stdint.h> #define ALIGN_uint32(x, a) __ALIGN_MASK(x, ((uint32_t)(a)) - 1) #define __ALIGN_MASK(x, mask) (((x) + (mask)) & ~(mask)) ``` ## 第二週測驗 2 ### 程式目的 * 計算數值的二進位表示中，有多少位元是 1 * 使用 bitwise 達到常數時間的操作 ### 想法 & 思考使用 gdb 將 24 代進去觀察發現 n = -(~n) 是對 n 做加一的動作 * (~n): 對 n 取 1s' complement * -(~n): 再取 2's complement v &= (v - 1) 是用來清除最低位的 1 * v - 1 = 11000 - 1 = 10111 * v & (v - 1) = 11000 & 10111 = 10000 ```shell v = 24, n = 0 v = 16, n = 1 v = 0, n = 2 ``` ```clike= unsigned popcnt_naive(unsigned v) { unsigned n = 0; while (v) v &= (v - 1), n = -(~n); return n; } ``` 接下來換常數時間的操作由 0x7 的二進位表示 0b0111 可以發現是以 4 bit 為一組作操作。將 v 假設成以 4 bit 為一組的表示 $A_7 A_6 A_5 A_4 A_3 A_2 A_1 A_0$ 在以其中的 $A_0$ = $a_3*2^3+a_2*2^2+a_1*2^1+a_0$ 為例第4 ~ 5行 $A_0 = (a_3*2^3+a_2*2^2+a_1*2^1+a_0)-(a_3*2^2+a_2*2^1+a_1)$ 第6 ~ 7行 $A_0 = (a_3*2^3+a_2*2^2+a_1*2^1+a_0)-(a_3*2^2+a_2*2^1+a_1)-(a_3*2^1+a_2)$ 第8 ~ 9行 $A_0 = (a_3*2^3+a_2*2^2+a_1*2^1+a_0)-(a_3*2^2+a_2*2^1+a_1)-(a_3*2^1+a_2)-a_3$ 最後發現 $A_0 = a_3+a_2+a_1+a_0$(所有 bit 的和!) 第11行將奇偶位相加後存在偶數位 ```shell A7 A6 A5 A4 A3 A2 A1 A0 + A7 A6 A5 A4 A3 A2 A1 -------------------------- & 0 F 0 F 0 F 0 F ``` 第12行發現在 A6 的位置就是所求，因此最後在右移 24 ```shell 0 A6 0 A4 0 A2 0 A0 x 0 1 0 1 0 1 0 1 ----------------------------------------------- 0 |A6| 0 A4 0 A2 0 A0 0 A6 0 |A4| 0 A2 0 A0 0 0 A6 0 A4 0 |A2| 0 A0 0 0 A6 0 A4 0 A2 0 |A0| 0 ``` ```clike= unsigned popcnt(unsigned v) { unsigned n; n = (v >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; n = (n >> 1) & 0x77777777; v -= n; v = (v + (v >> 4)) & 0x0F0F0F0F; v *= 0x01010101; return v >> 24; } ``` ### [Timing Attack](https://en.wikipedia.org/wiki/Timing_attack) 硬體在處理不同的資料所花費的時間不儘相同，因此可以透過輸入不同的資料，然後觀察每筆資料所花費的時間，再透過數學上的分析來推測所輸入的資料。可應用在攻擊 RSA 演算法上，RSA 的本質就是冪模運算，根據加解密的冪模運算時間推測出 key。抵抗的方法有如下3種 * constant exponentiation time : 在回傳計算結果前，確定所有指數運算使用相同時間。 * random delay : 在指數運算中，加入隨機延遲。 * blinding : 在指數運算前，將密文乘以一個隨機數。依據 [openssl](https://github.com/openssl/openssl/blob/39c44eee7fd89ce13e805873e1c43bd8e488a93f/crypto/rsa/rsa_crpt.c) 中的程式碼 ```clike= int RSA_blinding_on(RSA *rsa, BN_CTX *ctx) { int ret = 0; if (rsa->blinding != NULL) RSA_blinding_off(rsa); rsa->blinding = RSA_setup_blinding(rsa, ctx); if (rsa->blinding == NULL) goto err; rsa->flags |= RSA_FLAG_BLINDING; rsa->flags &= ~RSA_FLAG_NO_BLINDING; ret = 1; err: return ret; } ``` ## 第二週測驗 3 ### 程式目的 * 使用靜態配置的方式將物件加入到 link list 中 ### 想法 & 思考使用到 compound literal 的技巧 Syntax : ( type ) { initializer-list } * 資料型態為 object type 或是一個未知大小的陣列 * 可初始化一個未知大小的物件 * 在 function 外使用 compound literal 的變數，變數為一個靜態配置，否則變數可視範圍只有在 function 內根據 [C99 規格書](http://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf)對 Compound literals 的敘述為 >6.5.2.5 Compound literals > >1 The type name shall specify an object type or an array of unknown size, but not a variable length array type. 2 No initializer shall attempt to provide a value for an object not contained within the entire unnamed object specified by the compound literal. 3 If the compound literal occurs outside the body of a function, the initializer list shall consist of constant expressions. 4 A postfix expression that consists of a parenthesized type name followed by a brace-enclosed list of initializers is a compound literal. It provides an unnamed object whose value is given by the initializer list. 5 If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.8, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. In either case, the result is an lvalue. 6 The value of the compound literal is that of an unnamed object initialized by the initializer list. If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block. 將題目中的第 9 行一一展開可得到如下結果 list_7 -> NULL ```shell cons(7, NULL) llist->val = 7; llist->next = NULL; ``` list_4 -> list_7 -> NULL ```shell cons(4, cons(7, NULL)) llist->val = 4; llist->next = cons(7, NULL); ``` list_5 -> list_4 -> list_7 -> NULL ```shell cons(5, cons(4, cons(7, NULL))) llist->val = 5; llist->next = cons(4, cons(7, NULL)); ``` list_9 -> list_5 -> list_4 -> list_7 -> NULL ```shell cons(9, cons(5, cons(4, cons(7, NULL)))) list->val = 9; list->next = cons(5, cons(4, cons(7, NULL))); ``` 最後印出結果為 9547 ```clike= #include <stdio.h> #define cons(x, y) (struct llist[]){{x,y}} struct llist { int val; struct llist *next; }; int main() { struct llist *list = cons(9, cons(5, cons(4, cons(7, NULL)))); struct llist *p = list; for (; p; p = p->next) printf("%d", p->val); printf("\n"); return 0; } ``` ### 程式測試使用 compound literal 初始化一個變數與一個未知大小的陣列並觀察其印出資訊。 ```clike= int z = (int){10}; int *ptr = (int[]){2, 4}; int main() { int n = 5, *ptr = &n; ptr = (int [2]){*ptr}; printf("array_0 %d \n", *ptr); printf("array_1 %d \n", *(ptr + 1)); printf("z %d \n", z); return 0; } ``` int[0] 印出結果符合預期但 int[1] 的數值遽然不為0 ```shell array_0 5 array_1 -1813443584 z 10 ``` 將原本程式第5行改為 int n[2] = {5, 9}, *ptr = n; 印出結果發現這次 int[1] 確實變為0了 ```shell array_0 5 array_1 0 z 10 ``` 延伸題目: 找出 Linux 核心內部相似的程式碼 ## 第三週(上)測驗 2 ### 程式目的 * 找出最接近的平方根 * 使用定點數來表達浮點數 ### 想法 & 思考 ### 程式實作延伸問題: 1 解釋上述程式碼運作原理，並找出類似的開放原始碼程式; 2 上述 sqrt_I2F 函式存在 overflow 風險，請提出修正方案並驗證; ## 第三週(下)測驗 1 ### 程式目的 * 利用特製的資料結構減少 doubly linked list 的記憶體開銷 ### 想法 & 思考 ``` addr 0x1 0x2 0x3 0x4 data HAT CAT EAT BAT link (0x0 xor 0x2) (0x1 xor 0x3) (0x2 xor 0x4) (0x3 xor 0x0) ``` xor 運算特性 * A xor A = 0 * A xor 0 = A * A xor 1 = ~A * A xor B = AB 分析使用 xor 的運算紀錄位址假設現在 p_this 為 CAT 時，p_left(p_current) 變為 HAT。第10行作用: * 將 p_left->lxr 的位置與新配置空間 (p_this) 的位址作 xor，運算結果存在 p_left->lxr (原本 p_left->lxr 中紀錄的 link 為0x0，經過上個步驟後就更新成 0x2) 第12行作用 : * p_left 與 p_right 作 xor，運算結果存在 p_this->lxr (原本 p_this->lxr 中紀錄的 link 為0x0，經過上個步驟後就更新成 0x1) 更新 p_this 為 EAT 時，p_left(p_current) 變為 CAT。第10行作用: * 將 p_left->lxr 的位置與新配置空間 (p_this) 的位址作 xor，運算結果存在 p_left->lxr (原本 p_left->lxr 中紀錄的 link 為0x1，經過上個步驟後就更新成為0x1 xor 0x3) 第12行作用 : * p_left 與 p_right 作 xor，運算結果存在 p_this->lxr (原本 p_this->lxr 中紀錄的 link 為0x0，經過上個步驟後就更新成 0x2) 依此類推... ```clike= static inline void shift_lxr(item *lxr, item *known, item *update) { lxr->lxr = (lxr->lxr ^ (unsigned long) known) ^ (unsigned long) update; } item *append_item(lxr_info *p_lxri, item *p_current, int d1, int d2) { item *p_this, *p_left, *p_right; if ((p_this = calloc(1, sizeof(item)))) { p_left = p_lxri->left = p_current; p_right = p_lxri->right; if (p_left) shift_lxr(p_left, p_right, p_this); if (p_right) shift_lxr(p_right, p_left, p_this); p_this->lxr = (unsigned long) p_current ^ (unsigned long) p_lxri->right; p_this->data1 = d1; p_this->data2 = d2; } return p_this; } ``` ```clike= static inline item *extract_address(item *lxr, item *other) { return (item *) (lxr->lxr ^ (unsigned long) other); } void list_walk(void) { item *pcur = lh; lxr_info lxri = {.left = NULL, .right = NULL}; lxri.right = extract_address(pcur, lxri.left); while (pcur) { print_at(&lxri, pcur); pcur = lxr_move_right(&lxri, pcur); } } ``` ### 程式測試使用 xor 的運算紀錄 next, prev 的資料結構。 ```shell typedef struct { unsigned long lxr; int data; } item; ``` 使用 xor 的方式將物件加入到 list 的尾端並使用一個 pcur 指標紀錄每次加入到 list 中物件的位置，以減少加入到 list 中都要收尋到尾端所耗費的時間。使用包裝過的 malloc 觀察記憶體配置空間與次數。 ```shell static inline void shift_lxr(item *lxr, item *known, item *update) { lxr->lxr = (lxr->lxr ^ (unsigned long) known) ^ (unsigned long) update; } static inline void append_item(lxr_info *p_lxri, item *p_current, item *p_new) { item *p_left, *p_right; p_left = p_lxri->left = p_current; p_right = p_lxri->right; if(p_left) shift_lxr(p_left, p_right, p_new); p_new->lxr = (unsigned long) p_current ^ (unsigned long) p_lxri->right; } int main(int argc, char **argv) { ... for(int i = 0; i < 1; i++) { clock_gettime(CLOCK_REALTIME, &start); while(index < maxrun) { p_this = test_malloc(sizeof(item)); if(p_this == NULL) break; p_this->data = 3 * index; append_item(&lxri, pcur, p_this); pcur = p_this; if(!lh) lh = pcur; index++; } clock_gettime(CLOCK_REALTIME, &end); list_time = diff_in_second(start, end); fprintf(output, "%.8lf\n", list_time); index = 0; } ... } ``` 使用2個 pointer 紀錄 next, prev 的資料結構，為了方便書寫，在此稱為一般典型的方式。 ```shell struct Node { int data; struct Node* next; struct Node* prev; }; ``` 使用 pointer 的方式將物件加入到 list 的尾端並使用一個 pcur 指標紀錄每次加入到 list 中物件的位置，以減少加入到 list 中都要收尋到尾端所耗費的時間。使用包裝過的 malloc 觀察記憶體配置空間與次數。 ```shell static inline void list_add_end(struct Node *pcur, struct Node *node) { node->next = NULL; if(pcur == NULL) { node->prev = NULL; return; } pcur->next = node; node->prev = pcur; } int main(int argc, char *argv[]) { ... for(int j = 0; j < 1; j++) { clock_gettime(CLOCK_REALTIME, &start); for(i = 0; i < 10000; i++) { node = (struct Node *) test_malloc(sizeof(*node)); if(node == NULL) break; node->data = 3 * i; list_add_end(pcur, node); pcur = node; if(!head) head = pcur; } clock_gettime(CLOCK_REALTIME, &end); list_time = diff_in_second(start, end); fprintf(output, "%.8lf\n", list_time); } ... } ``` ### 測試方法使用作業一 (lab0) 的方法，將 malloc 函式外加入如下要觀察的資訊並記錄下來。 * 總呼叫次數 * 總配置空間大小 ```clike= static void *test_malloc(size_t size) { block_ele_t *new_block = malloc(size + sizeof(block_ele_t) + sizeof(size_t)); if (new_block == NULL) { printf("Couldn't allocate any more memory"); return NULL; } new_block->magic_header = MAGICHEADER; new_block->payload_size = size; *find_footer(new_block) = MAGICFOOTER; void *p = (void *) &new_block->payload; memset(p, FILLCHAR, size); new_block->next = allocated; new_block->prev = NULL; if (allocated) allocated->prev = new_block; allocated = new_block; allocated_count++; allosize += size; max_allosize = max_allosize > allosize ? max_allosize : allosize; return p; } ``` free 函式也是一樣，不過要將 allosize, allocated_count 作減的動作。 ````shell static void test_free(void *p) { ... b->magic_header = MAGICFREE; *find_footer(b) = MAGICFREE; memset(p, FILLCHAR, b->payload_size); allosize -= b->payload_size; ... free(b); allocated_count--; } ```` 將要加入到 link list 中的物件數量設定為一萬，先執行程式觀察會不會產生 memory leak。從印出來的資訊看到 cnt and tsize 最後結果為0，代表測試過程中呼叫 test_malloc and test_free 的次數與配置空間大小是一樣的，可作為檢查 memory leak 的方式。 ```shell ./xorlink cnt 0, tsize 0, msize 160000, time 0.000677 ./dlink cnt 0, tsize 0, msize 240000, time 0.000557 ``` 平均 100 次的結果 : * 由圖中看出典型版本比 xor 版本耗費的時間要來的少。 ![](https://i.imgur.com/q1TXMky.png) * 不過從配置記憶體空間來看的話，很明顯 xor 版本較省記憶體空間。 ![](https://i.imgur.com/UMjajgd.png) 延伸問題: 在 Linux 核心原始程式碼找出特製的 linked list，並予以解說