2020q3 Homework6 (quiz6)

# 2020q3 Homework6 (quiz6) contributed by < `Sisker1111` > ###### tags: `進階電腦系統理論與實作` > [測驗題](https://hackmd.io/@sysprog/2020-quiz6) > ## 測驗 1 [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) 浮點數格式由 Google 公司發展，最初用於該公司第三代 Tensor 處理單元 (Cloud TPU)。bfloat16 的主要想法是提供 16 位元浮點數格式，其動態範圍與標準 IEEE 754 的 FP32 (Single-precision floating-point format) 相同，但精度較低，相當於指數區和 FP32 保持相同的 8 位元，並將 FP32 的 fraction 區域縮減到 7 位元。 ![](https://i.imgur.com/1jbYaRL.png) ```cpp= float fp32tobf16(float x) { float y = x; int *py = (int *) &y; unsigned int exp, man; exp = *py & 0x7F800000u; man = *py & 0x007FFFFFu; if (!exp && !man) /* zero */ return x; if (exp == 0x7F800000u) /* infinity or NaN */ return x; /* Normalized number. round to nearest */ float r = x; int *pr = (int *) &r; *pr &= BB1; r /= 256; y = x + r; *py &= BB2; return y; } ``` 對應的測試程式: ```cpp= void print_hex(float x) { int *p = (int *) &x; printf("%f=%x\n", x, *p); } int main() { float a[] = {3.140625, 1.2, 2.31, 3.46, 5.63}; for (int i = 0; i < sizeof(a) / sizeof(a[0]); i++) { print_hex(a[i]); float bf_a = fp32tobf16(a[i]); print_hex(bf_a); } return 0; } ``` 程式碼說明: 1. fp32tobf16 內的 5~10 行，屬於常規的確認是否為 0/INF/NAN 的判斷，處理完後，接著需要把精度轉換為bfloat 的 format. 2. 原前面一樣，先用一個 int * 去指向 float r 的 address，因為我們無法直接對 float 的 bitwise operation. 3. 從後面`y = x + r`可以推測出，r 是一個維持精度的補償值，且 bfloat 的 fraction 只有 7 個 bits，因此我們只需要把 fraction 的第 8 個 bits 加回來即可. 程式碼 13 ~ 17 在做第 3 點的事情，因此`BB1 = 0xff800000`，另外 float 的`r /= 256;`不是當純的向右 shift 8 個 bits，而是 exponent 減 8 的結果. 最後把 bloat 的格式存入 `*py`，因此`BB2 = 0xffff0000`. ## 測驗 2 考慮以下 [ring buffer](https://en.wikipedia.org/wiki/Circular_buffer) 的實作: 其測試程式如下: ``` cpp= #include <assert.h> RINGBUF_DECL(int, int_buf); int main() { int_buf my_buf; RINGBUF_INIT(my_buf, 2, int); assert(is_ringbuf_empty(&my_buf)); ringbuf_write(&my_buf, 37); ringbuf_write(&my_buf, 72); assert(!is_ringbuf_empty(&my_buf)); int first; ringbuf_read(&my_buf, first); assert(first == 37); int second; ringbuf_read(&my_buf, second); assert(second == 72); return 0; } ``` 以下從原程式分段擷取: ``` cpp #define RINGBUF_DECL(T, NAME) \ typedef struct { \ int size; \ int start, end; \ T *elements; \ } NAME #define RINGBUF_INIT(BUF, S, T) \ { \ static T static_ringbuf_mem[S + 1]; \ BUF.elements = static_ringbuf_mem; \ } \ BUF.size = S; \ BUF.start = 0; \ BUF.end = 0; #define is_ringbuf_empty(BUF) ((BUF)->end == (BUF)->start) #define is_ringbuf_full(BUF) (NEXT_END_INDEX(BUF) == (BUF)->start) ``` 根據其資料結構及 empty/full 的 marco 來看: * 若想存放 S 個 value，會把 buffer size 訂為 S+1，剩下的空格只用來指示 end 的位置(假設插入一新值 X，end 指向的位置永遠是新值 X 所在位置的下一個). * 當 buffer 為 empty 時，start/end 會指向同一個位置. * 當 buffer 為 full 時，end 會指向 first 的前一個位置. ``` cpp #define NEXT_START_INDEX(BUF) \ (((BUF)->start != (BUF)->size) ? ((BUF)->start + 1) : 0) #define NEXT_END_INDEX(BUF) (((BUF)->end != (BUF)->size) ? ((BUF)->end + 1) : 0) ``` 一開始，start/end 會指向相同的位置，每新增一個 value 時，會將 end 指向 buffer 的下一個位置. ``` cpp= #define ringbuf_write_peek(BUF) (BUF)->elements[(BUF)->end] #define ringbuf_write_skip(BUF) \ do { \ (BUF)->end = NEXT_END_INDEX(BUF); \ if (is_ringbuf_empty(BUF)) \ (BUF)->start = NEXT_START_INDEX(BUF); \ } while (0) #define ringbuf_write(BUF, ELEMENT) \ do { \ ringbuf_write_peek(BUF) = ELEMENT; \ ringbuf_write_skip(BUF); \ } while (0) ``` * `ringbuf_write_peek(BUF)`用來將要新增的 value 放在 buffer->end 的位置. * `ringbuf_write_skip(BUF)`當發現 start/end 指到相同的位置時，會將 start 從 A 移動到 A+1，buffer->end 也會指到原先 A ，此時原先 A 值就視為 invalid value. ## 測驗 3 考慮到以下靜態初始化的 singly-linked list 實作: ```cpp= #include <stdio.h> /* clang-format off */ #define cons(x, y) (struct llist[]){{y, x}} /* clang-format on */ struct llist { int val; struct llist *next; }; void sorted_insert(struct llist **head, struct llist *node) { if (!*head || (*head)->val >= node->val) { SS1; SS2; return; } struct llist *current = *head; while (current->next && current->next->val < node->val) current = current->next; node->next = current->next; current->next = node; } void sort(struct llist **head) { struct llist *sorted = NULL; for (struct llist *current = *head; current;) { struct llist *next = current->next; sorted_insert(&sorted, current); current = next; } *head = sorted; } int main() { struct llist *list = cons(cons(cons(cons(NULL, A), B), C), D); struct llist *p; for (p = list; p; p = p->next) printf("%d", p->val); printf("\n"); sort(&list); for (p = list; p; p = p->next) printf("%d", p->val); printf("\n"); return 0; } ``` ```cpp 9547 4579 ``` `#define cons(x, y) (struct llist[]){{y, x}}`定義了一個靜態初始化的 macro，在`cons(cons(cons(cons(NULL, A), B), C), D);`這段 code 中，因為函式有後進先出的特性，很自然的我們要從最裡面的`cons`開始解析，`cons`會傳遞 2 個參數，對應 list 內部資料結構的順序我們可以知道， y 是 node_value、x 是 next_node 的指標. + 內層的 node，會成為外層 node 指標指向的位址，且 link-list 最尾端的 node 應指向 NULL，對應答案應為`A = 7` `B = 4` `C = 5` `D = 9` + 程式碼 14~17 行即是很常規的將某一 node 設為 head 的操作，交換時根據以下步驟; + 將`node->next`指向 head. + 將`node` 設為 head . + 因此 SS1 為`node->next = *head` + SS2 為`*head = node` ## 測驗 4 LeetCode [287. Find the Duplicate Number](https://leetcode.com/problems/find-the-duplicate-number/) 給定一個整數序列，其中會有一個數值重複，請找出。已知條件： 1. 假設陣列長度為 n，數值的範圍是 1 到 n−1. 2. 重複的數值不一定只重複一次考慮以下程式碼是可能的解法: ```cpp= int findDuplicate(int *nums, int numsSize) { int res = 0; const size_t log_2 = 8 * sizeof(int) - __builtin_clz(numsSize); for (size_t i = 0; i < log_2; i++) { int bit = 1 << i; int c1 = 0, c2 = 0; for (size_t k = 0; k < numsSize; k++) { if (k & bit) ++c1; if (nums[k] & bit) ++c2; } if (CCC) res += bit; } return res; } ``` 1. log_2 這個變數是用來判別，該數列的最大數值轉換成 2 進位共有幾個 bits ，因為我們後面會從右至左逐一 bit 來檢查. 2. 再來看到程式 8 ~ 13 行，根據鴿籠原理，有 n 個數字，其中數字的範圍式 1 ~ n-1，那麼這 n 個數字中必定有一個數字重複. 3. 假設該重複的數字為 1 ，我們可以檢查 n 個數字的最右邊的 bit 是否為 1 的並把是 1 的次數加總，如果總和比 1 ~ n-1 最右邊 bit 為 1 的次數總和還多，那麼我們可以斷定，該重複數字的最右邊 bit 必定是 1，其餘 bit 以此類推. 4. 第 14 行判斷上述條件是否達成，是的話累加進`res` 5. 因此`CCC`為`c1 < c2`，注意不可是`c1 <= c2`，發生等於的情況代表重複的數字該 bit 為 0，但會把`1 << i`累加進`res`