quiz8 實作紀錄

--- tags: jserv --- # quiz8 實作紀錄 ## Floating Add * 題目缺陷 * 發現題目只考慮 normalized 和 normalized 相加條件 * 先完成 normalized 和 normalized 相加實作細節 + 抓出 floating point 內，各部分 signed bit、 exponent 和 significand + 因為兩 operand 皆為 normalized 故在 mantissa 前補 1 + 對齊 exponent + 根據正負號決定是否轉為 two's complement 再相加 + 發現[**缺陷**](https://github.com/posutsai/EmbSysProject/blob/master/proj1/FloatAdd/FloatAdd.c#L114) + 在更改前為 `if (sig_Z & 0x01000000)` 只檢查第 25 個 bit，若是兩者 exp 沒對齊的情況下，就不會進位到 25 bit，故應檢查第 24 和 25 位 + 以 `FloatAdd(1.5, 5.5)` 為例 + 1.5 經過右移的 mantissa 為 `0b 00000000001100000000000000000000` + 5.5 經過右移的 mantissa 為 `0b 00000000101100000000000000000000` + 從這個例子就可以知道相加後的 mantissa 並不會進位到第 25 bit + 甚至若其中一個數值為負值，還需要判斷是否 exp 全不為 0 且全不為 1，根據 mantissa 以 while loop 調整 exponent + 把相加後的各部分填入 `Z` ，因為 Z 也是 normalized 要去掉小數點前的 1 + 轉為 int 回傳 ```c= // 缺陷條件 if (sig_Z & 0x1800000) { sig_Z = sig_Z >> 1; exp_Z += 1; if (exp_Z & 0b100000000) { if (sign_Z > 0) { return 1.0 / 0.0; return -1.0 / 0.0; } } while ((sig_Z & 0b100000000000000000000000) == 0) { sig_Z = sig_Z << 1; exp_Z -= 1; if (exp_Z == 0) return 0; } sig_Z &= 0x7fffff; unsigned int Z = 0; Z |= sign_Z << 31; Z |= exp_Z << 23; Z |= sig_Z; return int_to_float(Z); } ``` * 在處理 IEEE 754 定義的 special number 根據題目原始實作並沒有定義，故在取出 exp_X 及 exp_Y 後判斷二者是否為 `0xFF`，若是則回傳 `-nan` 或 `nan` ## Merge Sort Linked List 一開始先初始化需要排序的資料 ```C= struct mylist *temp; struct list_head *new_head; temp = malloc(sizeof *temp); init_list_head(&temp->list); temp->data = 700; temp->c = 'a'; new_head = &temp->list; for (int i = 1; i < 4; i++) { temp = malloc(sizeof *temp); init_list_head(&temp->list); temp->data = rand() % 1000; temp->c = 'a' + i; tail_add(&temp->list, new_head); } ``` linked-list traversal ```C= struct list_head *curr; curr = new_head; do { printf("%d(%c) ", list_entry(curr, struct mylist, list)->data, list_entry(curr, struct mylist, list)->c); curr = curr->next; } while(curr != new_head); ``` ```shell= $ ./a.out 700(a) 383(b) 886(c) 777(d) ``` 透過以上的程式碼可以建立下圖的結構 ![](https://i.imgur.com/U9W3Kva.png) 先做 divide ```C= struct list_head *merge_sort(struct list_head *head) { if (head->next == head) { return head; } int c = COUNT(head); struct list_head *last_few = SLICE_BW(c / 2, head); struct list_head *first_few = SLICE(c % 2 ? c / 2 + 1 : c / 2, head); return merge(merge_sort(first_few), merge_sort(last_few), predicate); } ``` ![](https://i.imgur.com/Vp7N0fV.png) ![](https://i.imgur.com/2LPSt84.png) 接著做 conquer ```C= static struct list_head *merge(struct list_head *left, struct list_head *right, bool (*pred)(struct list_head *, struct list_head *)) { if (left == NULL) return right; if (right == NULL) return left; struct list_head *head, *tail; /* Fill your solution here */ if(pred(left, right)) { head = left; left = right; right = head; } else { head = right; } tail = head_remove(right); return head_add(head, merge(left, tail, pred)); } ``` `tail = head_remove(right);` 這行的作用是移除 min(left->head, right->head)，所以在上面利用 pred 判斷時，如果 left->head 比 right->head 小，就必須 left/right 對調。 conquer 的過程如下 ![](https://i.imgur.com/SoX6oNU.png) ![](https://i.imgur.com/JdxfMz4.png) ```shell= $ ./a.out 700(a) 383(b) 886(c) 777(d) 383(b) 700(a) 777(d) 886(c) ``` * 題目缺陷在 merge sort 的演算法中會遞迴的呼叫 merge 以及 merge_sort 這兩個函式，且每次的遞迴都需要 push 東西進到 stack 裡面，因此如果需要排序的資料太長的話，會造成遞迴的深度太深，進而形成 stack overflow，先做個簡單的測試： ```shell= ## n = 1000000 $ ./a.out Segmentation fault (core dumped) ``` 的確在長度過長的情況下會因為 stack 被塞滿而造成程式停止。在進行其他實驗前，考量到初始序列排列的不同會造成不同深度的遞迴，因此在初始任何長度的序列時都按照 temp->data = n - i - 1 的規則來設定順序，例如 n = 8 時，初始序列為 8, 7, 6, 5, 4, 3, 2, 1。 ```shell= $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128026 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 102400 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128026 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited ``` 在我們所用的機器上 stack size 為 8192 kbytes。為了計算遞迴過程中所累加的 stack frame size，我們在 merge 以及 merge_sort 加入計算的程式碼。 ```C= static struct list_head *merge(struct list_head *left, struct list_head *right, bool (*pred)(struct list_head *, struct list_head *)) { stack_size += (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); max = max > stack_size ? max : stack_size; if (left == NULL) { stack_size -= (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); return right; } if (right == NULL) { stack_size -= (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); return left; } struct list_head *head, *tail; /* Fill your solution here */ if(pred(left, right)) { head = left; left = right; right = head; } else { head = right; } tail = head_remove(right); struct list_head *tmp_head_add = head_add(head, merge(left, tail, pred)); stack_size -= (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); return tmp_head_add; } struct list_head *merge_sort(struct list_head *head) { stack_size += (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); max = max > stack_size ? max : stack_size; if (head->next == head) { stack_size -= (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); return head; } int c = COUNT(head); struct list_head *last_few = SLICE_BW(c / 2, head); struct list_head *first_few = SLICE(c % 2 ? c / 2 + 1 : c / 2, head); struct list_head *tmp; tmp = merge(merge_sort(first_few), merge_sort(last_few), predicate); stack_size -= (char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0); return tmp; } ``` :::info 以 `(char *)__builtin_frame_address(1) - (char *)__builtin_frame_address(0);` 量測單一 stack frame 的大小，GCC extension 有提供 [__buitin_frame_address](https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html) 這個函數存取 frame 所在記憶體 stack 中的位置，若丟入參數 `1` 會取得 caller 的位置，反之，若丟入 `0` 則會獲得 callee 的位址，故相減就可得 frame 大小。 ::: stack_size 紀錄現階段因為 merge 及 merge_sort 所造成遞迴的 stack size (因為大部分的遞迴都在這兩個地方造成)，max 則記錄整個過程中最大的 stack_size，期望若 max < 8192 kbytes 時會順利的跑完， ```shell= ## n = 209400 $ ./a.out max: 8376176 ## n = 209500 $ ./a.out Segmentation fault (core dumped) ``` 計算在 n = 209400 時，max = 8376176 bytes = 8179 kbytes，這個數字很接近 8192 了，所以當我們稍微增加序列長度，就會造成 stack overflow，為了檢驗這種計算當下使用的 stack size 的方法是不是可行的，我們想要調整系統 stack size 的大小來驗證，看是否在不同 stack size 下，我們算出的 max 會很接近。 ![](https://i.imgur.com/OR9wcT0.png) 由於我們所使用的機器是公用的，因此只能利用 docker 來達成這件事。為了能改變 stack size 的大小，在設定 container 時需要 --privileged 這個參數， ```shell $ docker run --privileged ``` 接著測試在不同的 n 會需要多大的 stack size ```shell= ## n = 32 $ ulimit -s 8 $ ./a.out max: 1456 ## 1456 bytes = 1.42 kbytes ## n = 128 $ ulimit -s 12 $ ./a.out max: 5296 ## 5296 bytes = 5.17 kbytes ## n = 1024 $ ulimit -s 46 $ ./a.out max: 41136 ## 41136 bytes = 40 kbytes ## n = 32768 $ ulimit -s 1287 $ ./a.out max: 1310896 ## 1310896 bytes = 1280 kbytes ``` 可以發現我們算的 max 都會比實際上用到的大小少了 6-7 個 bytes，這可能是因為裏頭其他 function 所造成的 overhead。 ## todo 1. 以更不具侵略性的測量方式，量測 frame size 以目前的做法，在量測 frame size 的同時也增加了 frame 本身的大小，我想比較好的方法會是用老師那天在直播中談到的 eBPF 在 kernel 內做量測，我在 Brenden gregg 的網站有看到類似的[文章](http://www.brendangregg.com/blog/2016-01-18/ebpf-stack-trace-hack.html?fbclid=IwAR2MW_dJ83TN5AaSyHQ1PGn0U79XsdzaEOTaQNUzv5DsBXo4RlI2IHLlS9s)，或許可以試試。 :::info 善用 GDB ::: 2. 在實驗中，實際觀察到的數據並不是每一次的 stack max depth 都會一樣，我們找不到合理的解釋，以及實際上 stack 使用的最大深度總是會和我們的量測數據有常數 7k byte 的落差，我們找不到合理的解釋。