--- tags: 進階電腦系統理論與實作, NCKU Linux Kernel Internals, 作業系統 --- # 2020q3 Homework6 (quiz6) contributed by < `RusselCK` > ###### tags: `RusselCK` > [浮點數運算 & 前置處理器 & 程式設計技巧 練習題](https://hackmd.io/@sysprog/2020-quiz6) ## Q1. float32 轉成 [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) [bfloat16.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/bfloat16.h)、[bfloat16.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/bfloat16.cc) ```cpp= float fp32tobf16(float x) { float y = x; int *py = (int *) &y; unsigned int exp, man; exp = *py & 0x7F800000u; man = *py & 0x007FFFFFu; if (!exp && !man) /* zero */ return x; if (exp == 0x7F800000u) /* infinity or NaN */ return x; /* Normalized number. round to nearest */ float r = x; int *pr = (int *) &r; *pr &= BB1; // BB1 = 0xff800000 r /= 256; y = x + r; *py &= BB2; // BB2 = 0xffff0000 return y; } ``` :::spoiler 測試程式 ```cpp void print_hex(float x) { int *p = (int *) &x; printf("%f=%x\n", x, *p); } int main() { float a[] = {3.140625, 1.2, 2.31, 3.46, 5.63}; for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++) { print_hex(a[i]); float bf_a = fp32tobf16(a[i]); print_hex(bf_a); } return 0; } ``` Output ``` 3.140625=40490000 3.140625=40490000 1.200000=3f99999a 1.203125=3f9a0000 2.310000=4013d70a 2.312500=40140000 3.460000=405d70a4 3.453125=405d0000 5.630000=40b428f6 5.625000=40b40000 ``` ::: * [FP64, FP32, FP16, BFLOAT16, TF32, and other members of the ZOO](https://medium.com/@moocaholic/fp64-fp32-fp16-bfloat16-tf32-and-other-members-of-the-zoo-a1ca7897d407) ![](https://i.imgur.com/hBX4CdO.png) * 根據測試結果,我們可知 `fp32tobf16()` 的功能相當於 ==將 Fraction 0 捨 1 入 到小數點第 7 位== * 程式碼解析: ( 可與 [quiz5 Q2](https://hackmd.io/RXv0Ed5yQXOPAGv0X06u2g#Q2-%E6%B5%AE%E9%BB%9E%E6%95%B8%E9%96%8B%E4%BA%8C%E6%AC%A1%E6%96%B9%E6%A0%B9%E7%9A%84%E8%BF%91%E4%BC%BC%E5%80%BC) 比較 ) ```cpp=2 float y = x; int *py = (int *) &y; unsigned int exp, man; exp = *py & 0x7F800000u; man = *py & 0x007FFFFFu; ``` * `*py` = `(int *) &y` : 將 `&y` 轉換為 ==指向 int 型態的指標== ( `*py` 與 `y` 指向同一塊記憶體,但看法不同 ) * `exp` : `*py` Exponent 部份 * ( 0 11111111 0000...0000 ) = ( 0111 1111 1000 0000 ... 0000 ) = 0x7F800000 * `man` : `*py` Fraction ( Mantisa ) 部份 * ( 0 00000000 1111...1111 ) = ( 0000 0000 0111 1111 ... 1111 ) = 0x007FFFFF :::warning * 為何這裡的 mask 後面要加 `u`,但 quiz5 Q2. 的卻不用? * quiz5 Q2.浮點數開根號 已將 負數的情形 排除 * 是否能改成 `unsigned int *py = (int *) &y` ,mask 後面就不用加 `u` 了呢? ::: ```cpp=7 if (!exp && !man) /* zero */ return x; if (exp == 0x7F800000u) /* infinity or NaN */ return x; ``` * 0 以 float32 表示 為 ( * 00000000 0000...0000 ) * `exp` = 0 * `man` = 0 * $∞$ 以 float32 表示 為 ( 0 11111111 0000...0000 ) * NaN 以 float32 表示 為 ( * 11111111 **** ... **** ) * `exp` = ( 0 11111111 0000...0000 ) = 0x7F800000 ```cpp=12 /* Normalized number. round to nearest */ float r = x; int *pr = (int *) &r; *pr &= BB1; r /= 256; y = x + r; ``` * 根據提示,要將 `y` 的 Fraction 0 捨 1 入 到小數點第 7 位 * 因此, `r` 應為 ( * ******** 0000000 1000 0000 0000 0000 ) $=0.00000001 \times 2^{Exp-bias}$ $=1 \times 2^{(Exp-bias)-8}$ $=(1 \times 2^{Exp-bias}) \div 2^8$ * `#15` : 保留 Sign 與 Exponent * 故,**BB1 = 0xFF800000** = ( 1 11111111 0000...0000 ) = ( 1111 1111 1000 0000 .... 0000 ) * `#16` : 除以 $2^8$ = 256 * 以上,完成進位,但尚未將後 16 bits 設為 0 ```cpp=19 *py &= BB2; // BB2 = 0xffff0000 return y; ``` * 將後 16 bits 設為 0 * 故,**BB2 = 0xFFFF0000** = ( 1 11111111 1111111 0000...0000) = ( 1111 1111 1111 1111 0000 ... 0000 ) ## Q2. [Ring Buffer](https://en.wikipedia.org/wiki/Circular_buffer) 實作 ```cpp= #define RINGBUF_DECL(T, NAME) \ typedef struct { \ int size; \ int start, end; \ T *elements; \ } NAME #define RINGBUF_INIT(BUF, S, T) \ { \ static T static_ringbuf_mem[S + 1]; \ BUF.elements = static_ringbuf_mem; \ } \ BUF.size = S; \ BUF.start = 0; \ BUF.end = 0; #define NEXT_START_INDEX(BUF) \ (((BUF)->start != (BUF)->size) ? ((BUF)->start + RB1) : 0) //RB1 = 1 #define NEXT_END_INDEX(BUF) (((BUF)->end != (BUF)->size) ? ((BUF)->end + RB2) : 0) // RB2 = 1 #define is_ringbuf_empty(BUF) ((BUF)->end == (BUF)->start) #define is_ringbuf_full(BUF) (NEXT_END_INDEX(BUF) == (BUF)->start) #define ringbuf_write_peek(BUF) (BUF)->elements[(BUF)->end] #define ringbuf_write_skip(BUF) \ do { \ (BUF)->end = NEXT_END_INDEX(BUF); \ if (is_ringbuf_empty(BUF)) \ (BUF)->start = NEXT_START_INDEX(BUF); \ } while (0) #define ringbuf_read_peek(BUF) (BUF)->elements[(BUF)->start] #define ringbuf_read_skip(BUF) (BUF)->start = NEXT_START_INDEX(BUF); #define ringbuf_write(BUF, ELEMENT) \ do { \ ringbuf_write_peek(BUF) = ELEMENT; \ ringbuf_write_skip(BUF); \ } while (0) #define ringbuf_read(BUF, ELEMENT) \ do { \ ELEMENT = ringbuf_read_peek(BUF); \ ringbuf_read_skip(BUF); \ } while (0) ``` :::spoiler 測試程式 ```cpp #include <assert.h> RINGBUF_DECL(int, int_buf); int main() { int_buf my_buf; RINGBUF_INIT(my_buf, 2, int); assert(is_ringbuf_empty(&my_buf)); ringbuf_write(&my_buf, 37); ringbuf_write(&my_buf, 72); assert(!is_ringbuf_empty(&my_buf)); int first; ringbuf_read(&my_buf, first); assert(first == 37); int second; ringbuf_read(&my_buf, second); assert(second == 72); return 0; } ``` ::: ## Q2. 進階題 #### `do{...}while(0)` 的考量 1. 對應 C 語言 的 撰寫風格 * 考慮下列程式碼 ```cpp #include <stdio.h> #define sum_two_num(x,y) {x++;y++;} int main(void) { int a = 0, b = 100; if(a == b) sum_two_num(a,b); else ... return 0; } ``` 展開 macro ```cpp= #include <stdio.h> #define sum_two_num(x,y) {x++;y++;} int main(void) { int a = 0, b = 100; if(a == b) //sum_two_num(a,b); { a++; b++; }; else ... return 0; } ``` * 由於`#12` 的 `;`,`#13` 的 `else` 會找不到對應的 `if` 使用 `do{...}while(0)` 並展開 macro ```cpp #include <stdio.h> #define sum_two_num(x,y) do{x++;y++;}while(0) int main(void) { int a = 0, b = 100; if(a == b) // sum_two_num(a,b); do{ a++; b++; }while(0); else ... return 0; } ``` * 合法敘述 且 達到我們預期的功能 :::warning TODO : 2. 避免 goto ::: #### [Super Fast Circular Ring Buffer Using Virtual Memory trick](https://medium.com/@abhinavagarwal1996/a-fast-circular-ring-buffer-4d102ef4d4a3) ## Q3. 靜態初始化的 Singly-Linked List 實作 ```cpp= #include <stdio.h> /* clang-format off */ #define cons(x, y) (struct llist[]){{y, x}} /* clang-format on */ struct llist { int val; struct llist *next; }; void sorted_insert(struct llist **head, struct llist *node) { if (!*head || (*head)->val >= node->val) { SS1; // SS1 = node->next = *head SS2; // SS2 = *head = node return; } struct llist *current = *head; while (current->next && current->next->val < node->val) current = current->next; node->next = current->next; current->next = node; } void sort(struct llist **head) { struct llist *sorted = NULL; for (struct llist *current = *head; current;) { struct llist *next = current->next; sorted_insert(&sorted, current); current = next; } *head = sorted; } int main() { struct llist *list = cons(cons(cons(cons(NULL, A), B), C), D); // (A, B, C, D) = (7, 4, 5, 9) struct llist *p; for (p = list; p; p = p->next) printf("%d", p->val); printf("\n"); sort(&list); for (p = list; p; p = p->next) printf("%d", p->val); printf("\n"); return 0; } ``` output ``` 9547 4579 ``` :::info * 根據 C 語言規格書 **6.5.2.5 Compound literal**: >A postfix expression that consists of a parenthesized type name followed by a braceenclosed list of initializers is a compound literal. It provides an unnamed object whose value is given by the initializer list. 在括號(`()`)中為型別名稱,並接續 braces(`{}`) 中的初始化 list 會行成一個 compound literal。這讓我們可以定義一個匿名的物件,並指派給某一變數或者作為 function 的參數使用。 >If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.8, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. * [[心得] Compound literals, flexible array members](https://www.ptt.cc/man/C_and_CPP/DB9B/DC33/M.1271034153.A.7F9.html) ::: * 程式碼解析: ```cpp=3 /* clang-format off */ #define cons(x, y) (struct llist[]){{y, x}} /* clang-format on */ struct llist { int val; struct llist *next; }; ``` * 範例 ```cpp struct llist *A = con(x,y); ``` 等同於 ```cpp struct llist *A ={ .val = y; .next = x; }; ``` ```cpp=39 struct llist *list = cons(cons(cons(cons(NULL, A), B), C), D); ``` * 根據上述,可以得到 ```graphviz digraph graphname{ node[shape=box] rankdir=LR NULL[label=NULL,shape=plaintext] D->C->B->A->NULL } ``` * 依題意,初始的 Linked list 為 ```graphviz digraph graphname{ node[shape=box] rankdir=LR NULL[label=NULL,shape=plaintext] 9->5->4->7->NULL } ``` * 因此,**(A, B, C, D) = (7, 4, 5, 9)** ## Q3. 進階題 #### [Doubly Linked Lists in C](http://locklessinc.com/articles/dlist/) #### 比對 Linux 核心的 linked list 實作 #### LeetCode [148. Sort List](https://leetcode.com/problems/sort-list/) ( 引入 [Optimizing merge sort](https://en.wikipedia.org/wiki/Merge_sort#Optimizing_merge_sort) ) * [Fundamental Sorting Algorithms](https://victoramelkin.com/ta/cs32/pa/2/pa2.html) * tiled merge sort 此種 cache-aware 演算法的考量點 * 對於排序大量資料而言,quick sort 和 merge sort 勝過 insertion sort * $O(nlogn)<O(n^2)$ * 對於小量資料則取決於執行環境 * 因此在 merge sort 將問題切成 L1 cache size 後,即不再往下遞迴,改用 insertion sort 進行排序。 * [eecheng87/quiz3](https://hackmd.io/@eecheng/ryqwNCoVI) ## Q4. LeetCode [287. Find the Duplicate Number](https://leetcode.com/problems/find-the-duplicate-number/) * Given an array of integers `nums` containing `n + 1` integers where each integer is in the range `[1, n]` inclusive. * There is only one duplicate number in `nums`, return this duplicate number. ```cpp= int findDuplicate(int *nums, int numsSize) { int res = 0; const size_t log_2 = 8 * sizeof(int) - __builtin_clz(numsSize); for (size_t i = 0; i < log_2; i++) { int bit = 1 << i; int c1 = 0, c2 = 0; for (size_t k = 0; k < numsSize; k++) { if (k & bit) ++c1; if (nums[k] & bit) ++c2; } if (CCC) // CCC = c1 < c2 res += bit; } return res; } ``` * 根據題目條件,我們可以知道下面兩件事: 1. `nums` index 必為 $0$ ~ $n$ 各出現 1 次 2. `nums` 的內容 必為 $1$ ~ $n$ 其中 1 個數字出現多次,其餘只出現 0 或 1 次 * 考慮 二進制 ( 以 0 ~ 7 為例 ) | Decimal | Binary | |:-------:|:-------:| | 0 | 000 | | 1 | 00==1== | | 2 | 010 | | 3 | 01==1== | | 4 | 100 | | 5 | 10==1== | | 6 | 110 | | 7 | 11==1== | * 假設 `nums[8] = {1, 2, 3, 4, 5, 5, 6, 7}` 1. index : $0+1+2+3+4+5+6+7$ 的 第一個 bit 的 1 的個數為 **4** 2. nums : $1+2+3+4+5+5+6+7$ 的 第一個 bit 的 1 的個數為 **5** * 其餘位置同理 * 我們可以發現,==**在重複數字中為 1 的 bit,nums 加總 比 index 加總 還多**== * 當然, 為 0 的 bit,nums 加總 可能會變少 * 程式碼解析: ```cpp=4 const size_t log_2 = 8 * sizeof(int) - __builtin_clz(numsSize); ``` * `log_2` : 只少要檢查幾個 bits ( i.e. 最大的 1 在 第幾個 bit ) :::warning * 為何使用 `8 * sizeof(int)` 而非 `32` ? * 不同架構的 `sizeof(int)` 不同 ::: ```cpp=5 for (size_t i = 0; i < log_2; i++) { int bit = 1 << i; int c1 = 0, c2 = 0; for (size_t k = 0; k < numsSize; k++) { if (k & bit) ++c1; if (nums[k] & bit) ++c2; } if (CCC) res += bit; } ``` * 逐一檢查每個 bit : * 檢查 nums 中的每個數字: * `c1` : index 的加總 * `c2` : nums 內容 的加總 * `#14`、`#15` : 若 `c1 < c2`,代表該 bit 在重複數字中也為 1,要加進 `res` * 故 **CCC = `c1 < c2`** * [效能評估](https://leetcode.com/submissions/detail/413303314/): ![](https://i.imgur.com/ey9IlVG.png) ## Q4. 進階題 提出效能更好的程式碼 #### Leetcode [142. Linked List Cycle II](https://leetcode.com/problems/linked-list-cycle-ii/) * Given a linked list, return the node where the cycle begins. * If there is no cycle, return `null`. * There is a cycle in a linked list if there is some node in the list that can be reached again by continuously following the `next` pointer. Internally, `pos` is used to denote the index of the node that tail's `next` pointer is connected to. * Note that `pos` is not passed as a parameter. ```cpp= struct ListNode { int val; struct ListNode *next; }; struct ListNode *detectCycle(struct ListNode *head) { struct ListNode *fast = head, *low = head; while(1) { if (fast != NULL && fast->next != NULL) { low = low->next; fast = fast->next->next; } else return NULL; if (low == fast) break; } while(low != head) { low = low->next; head = head->next; } return head; } ``` ![](https://i.imgur.com/heu0Cy4.png) ``` Input: head = [3,2,0,-4], pos = 1 Output: tail connects to node index 1 Explanation: There is a cycle in the linked list, where tail connects to the second node. ``` * 假設 Cycle 長度 為 $C$ 若 `low` 走 $k$ 步, `fast` 就走 $2K$ 步 * 若 `low`、`fast` 相遇了,代表 `fast` 比 `low` 多走了 $n$ 圈,$n \geq 1$ $\Rightarrow (2K - K) = nC$ 1. 假設 Cycle 起點 與 第一次相遇點 差距 $a$ `head` 與 Cycle 起點 差距 $F$ * `head` 與 第一次相遇點 差距 $F + a$ ![](https://i.imgur.com/MGL2K9U.png) $$ 2(F+a) = F + nC + a \\ \Rightarrow F + a = nC $$ 2. 令 `low` 從 `head` 出發、 `fast` 從 第一次相遇點 出發 且 **兩者前進速度改為相同** * 當兩者都前進 $F$ 後 * `low` 會停在 Cycle 起點,與 `head` 差距 $F$ * `fast` 會停在 與 `head` 差距 $(F + a) + F = nC+F$ * `low`、`fast` 兩者差距為 $nC$ * 因此,兩者會再相遇的位置 必為 Cycle 起點 #### [Floyd's algorithm](https://en.wikipedia.org/wiki/The_Tortoise_and_the_Hare) ```cpp= int findDuplicate(int *nums, int numsSize) { int fast = 0, low = 0; while (true){ fast = nums[nums[fast]]; // fast = fast->next->next; low = nums[low]; // low = low->next; if (low == fast){ low = 0; break; } } while (true){ fast = nums[fast]; // low = low->next; low = nums[low]; // head = head->next; if (low== fast) return low; } return -1; } ``` ![](https://i.imgur.com/jTHs5G7.png) ![](https://i.imgur.com/QzQyqup.png) * 把題目想成解 Leetcode [142. Linked List Cycle II](https://leetcode.com/problems/linked-list-cycle-ii/) * Index 當成 node * Value 當成連接的下一個 node