# 2022q1 Homework2 (quiz2) contributed by < [`leewei05`](https://github.com/leewei05) > ###### tags: `linux2022` ## [測驗 1](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-1) :::info - [x] 解釋下方程式碼運作的原理 - [ ] 比較下方實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 (可研讀 CS:APP 第 3 章) - [ ] 研讀 Linux 核心原始程式碼 include/linux/average.h,探討其 Exponentially weighted moving average (EWMA) 實作,以及在 Linux 核心的應用 >移動平均(Moving average),又稱滾動平均值、滑動平均,在統計學中是種藉由建立整個資料集合中不同子集的一系列平均數,來分析資料點的計算方法。 移動平均通常與時間序列資料一起使用,以消除短期波動,突出長期趨勢或周期。短期和長期之間的閾值取決於應用,移動平均的參數將相應地設置。例如,它通常用於對財務數據進行技術分析,如股票價格、收益率或交易量。它也用於經濟學中研究國內生產總值、就業或其他宏觀經濟時間序列。 ::: 這個直覺的解法會有 overflow 的問題,若我們已知 a, b 數值的大小,可用下方程式避免 overflow: ```c #include <stdint.h> uint32_t average(uint32_t low, uint32_t high) { return low + (high - low) / 2; } ``` 接著我們可改寫為以下等價的實作: 下列程式碼 a 與 b 分別先除 2 再相加,當兩個數值皆為奇數時平均數會少加 1,故透過 `a & b & 1` 來判斷是否把少加的 1 加回去。 ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { // EXP1 return (a >> 1) + (b >> 1) + (a & b & 1); // a = 6, b = 4, avg = 3 + 2 + 0 = 5 // a = 6, b = 3, avg = 3 + 1 + 0 = 4 // a = 5, b = 3, avg = 2 + 1 + 1 = 4 } ``` 根據[數值系統的 bit-wise operator 筆記](https://hackmd.io/@sysprog/c-numerics#%E9%81%8B%E7%94%A8-bit-wise-operator),運用加法器的邏輯思考。兩數的和為 `a ^ b`,兩數的進位為 `(a & b) << 1`,可得 `a + b = a ^ b + (a & b) << 1`。 ```c (a + b) / 2 = (a ^ b + (a & b) << 1) >> 1 = (a & b) + ((a ^ b) >> 1) ``` `EXP2` 為 `a & b`,`EXP3` 為 `a ^ b`。 ```c uint32_t average(uint32_t a, uint32_t b) { // EXP2 EXP3 return (a & b) + ((a ^ b) >> 1); } ``` ### 編譯器輸出結果分析 環境:Ubuntu 20.04.3 LTS 編譯器版本:gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 ```c // avg.c #include <stdio.h> #include <stdint.h> uint32_t average(uint32_t low, uint32_t high) { return low + (high - low) / 2; } uint32_t average_bitwise(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (a & b & 1); } uint32_t average_bitadder(uint32_t a, uint32_t b) { return (a & b) + ((a ^ b) >> 1); } int main() { int low = 3; int high = 5; printf("average is %u", average(low, high)); return 0; } ``` gdb 反組譯結果,或是可以透過 `gcc -Og -S avg.c` 輸出組合語言。gcc 預設是匯出 ATT 格式, Intel 格式的則是可以透過 `-masm=intel` 來產生。 ```shell $ gcc -g -o avg -O2 avg.c $ gdb -q avg Reading symbols from avg... (No debugging symbols found in avg) (gdb) break average Breakpoint 1 at 0x1149 (gdb) run Starting program: /home/lee/dev/playground/average/avg Breakpoint 1, average (low=21845, high=1431654832) at avg.c:5 5 { (gdb) disassemble Dump of assembler code for function average: => 0x0000555555555149 <+0>: endbr64 ... 0x0000555555555167 <+30>: retq End of assembler dump. ``` 閱讀 [Intel IA-32 規格書](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html) 以及 CS:APP 第三章並試著解讀以下的組合語言指令。首先分析 `avg.s` 的內容,`LFB` 以及 `LFE` 分別是 [gcc](https://github.com/gcc-mirror/gcc/blob/releases/gcc-9/gcc/dwarf2out.c#L298) 編譯時根據 [DWARF(除錯輸出格式)](http://eagercon.com/dwarf/dwarf-2.0.0.pdf) 所產生的 label (`.L` 為 [local label](https://sourceware.org/binutils/docs/as/Symbol-Names.html#Symbol-Names)),分別代表函式的開始以及結束。 > Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging. 因為 `gcc -S` 只會編譯 (compile) 程式碼並輸出組合語言,尚未進行組譯 (assemble) 以及連結 (link),所以可以直接看到組合語言用到的 local label。 ```shell # avg.s average: .LFB23: ... .LFE23: ``` `.cfi` 代表 DWARF 所定義的 [`Call Frame Instruction(CFI)`](http://eagercon.com/dwarf/dwarf-2.0.0.pdf): > Debuggers often need to be able to view and modify the state of any subroutine activation that is on the call stack. An activation consists of: • A code location that is within the subroutine. This location is either the place where the program stopped when the debugger got control (e.g. a breakpoint), or is a place where a subroutine made a call or was interrupted by an asynchronous event (e.g. a signal). • An area of memory that is allocated on a stack called a ‘‘call frame.’’ The call frame is identified by an address on the stack. We refer to this address as the Canonical Frame Address or CFA. • A set of registers that are in use by the subroutine at the code location. 根據以上的描述,可以理解成 CFI 為一個子程序 (subroutine) 的起始或是結束觸發點。一個配置於 stack memory 的記憶體區塊稱作 `call frame`,而此 `call frame` 的記憶體位址我們稱作 `Canonical Frame Address(CFA)`。回到下方的組合語言,[`.cfi-startproc`](https://sourceware.org/binutils/docs/as/CFI-directives.html#CFI-directives) 以及 [`.cfi-endproc`](https://sourceware.org/binutils/docs/as/CFI-directives.html#CFI-directives) 分別是組合語言中定義的 CFI directives (指令),代表著函式的起始以及結束。 > .cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc. ```shell # avg.s average: .LFB23: .cfi_startproc endbr64 ... .cfi_endproc .LFE23: ``` 在了解 `endbr64` 指令以前,需要先知道 [ROP](https://en.wikipedia.org/wiki/Return-oriented_programming) 以及 [COP/JOP](https://www.comp.nus.edu.sg/~liangzk/papers/asiaccs11.pdf) 這兩項攻擊手法。黑客可以藉由上述提到的手法竄改原先的指令(e.g. RET, CALL, JUMP)來操作既有的指令。而 Intel 處理器提供的 Control-Flow Enforcement Technology (CET) 則是用來防範上述提到的攻擊,以下是 CET 所提供的兩項主要功能: - Shadow stack: Return address protection to defend against ROP - Indirect branch tracking: Free branch protection to defend against COP/JOP `endbr64` 則是隸屬於 [Indirect branch tracking](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html) 的指令。 > 18.1.2 Indirect Branch Tracking > > The `ENDBRANCH` instruction is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. ... The CPU implements a state machine that tracks indirect `JMP` and `CALL` instructions. When one of these instructions is executed, the state machine moves from `IDLE` to `WAIT_FOR_ENDBRANCH` state. In `WAIT_FOR_ENDBRANCH` state the next instruction in the program stream must be an `ENDBRANCH`. If the next instruction is not an `ENDBRANCH`, the processor causes a control protection exception (#CP); otherwise, the state machine moves back to `IDLE` state. 接著我們看 average 函式當中的指令。`mov` ```shell 0000000000000000 <average>: 0: f3 0f 1e fa endbr64 4: 89 f0 mov %esi,%eax 6: 29 f8 sub %edi,%eax 8: d1 e8 shr %eax a: 01 f8 add %edi,%eax c: c3 retq d: 0f 1f 00 nopl (%rax) ``` ```shell 0000000000000010 <average_bitwise>: 10: f3 0f 1e fa endbr64 14: 89 f8 mov %edi,%eax 16: 89 f2 mov %esi,%edx 18: 21 f7 and %esi,%edi 1a: d1 e8 shr %eax 1c: d1 ea shr %edx 1e: 83 e7 01 and $0x1,%edi 21: 01 d0 add %edx,%eax 23: 01 f8 add %edi,%eax 25: c3 retq 26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 2d: 00 00 00 0000000000000030 <average_bitadder>: 30: f3 0f 1e fa endbr64 34: 89 f8 mov %edi,%eax 36: 21 f7 and %esi,%edi 38: 31 f0 xor %esi,%eax 3a: d1 e8 shr %eax 3c: 01 f8 add %edi,%eax 3e: c3 retq ``` --- ## [測驗 2](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-2) :::info 延伸問題: - [x] 解釋下方程式碼運作的原理 - [ ] 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作 - [ ] Linux 核心也有若干 branchless / branch-free 的實作,例如 lib/sort.c: ```c /* * Logically, we're doing "if (i & lsbit) i -= size;", but since the * branch is unpredictable, it's done with a bit of clever branch-free * code instead. */ __attribute_const__ __always_inline static size_t parent(size_t i, unsigned int lsbit, size_t size) { i -= size; i -= size & -(i & lsbit); return i / 2; } ``` 請在 Linux 核心原始程式碼中,找出更多類似的實作手法。請善用 git log 檢索。 ::: ```c int32_t min(int32_t a, int32_t b) { int32_t diff = (a - b); return b + (diff & (diff >> 31)); } ``` 改寫〈解讀計算機編碼〉一文的「不需要分支的設計」一節提供的程式碼 min,我們得到以下實作 (max): > EXP4 為 a 和 b 進行某種特別 bitwise 操作,限定用 OR, AND, XOR 這三個運算子 > EXP5 為 a 和 b 的比較操作,亦即 logical operator,限定用 >, <, ==, >=, <=, != ```c #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } ``` - 假設 a 是比較大的值,右邊的 `((EXP4) & -(EXP5))` 會需要等於 0,因為 `a ^ 0 = a` - 假設 b 是比較大的值,右邊的 `((EXP4) & -(EXP5))` 會需要是 `b - a` > 6.5.8 Relational operators > Each of the operators < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.92) The result has type int. 根據上述規格書的內容,`EXP5` 只會返回 1 或是 0。 - 假設 `EXP5 == 1`,則可以得到 `((EXP4) & -1)`,亦等於 `((EXP4) & 0xFFFF)` - 假設 `EXP5 == 0`,則可以得到 `((EXP4) & 0x0000)`。從第二個條件回推,任何數跟 0 做 AND 運算會等於 0,條件二即符合 `max = a`。 根據現有條件,可以推測出 `EXP5` 為 `a <= b`,如果 a 等於 b 那直接回傳 a 也合理。但題目有說到以最精簡的方式撰寫程式碼,那其實 `a < b` 即可。`EXP4` 為 `a ^ b` 也等同於 `b - a`,任何數跟 `0xFFFF` 做 AND 運算都會返回自己。 ```c #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((a ^ b) & -(a < b)); } ``` ### 32 位元無號/有號整數實作 --- ## [測驗 3](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-3) :::info - [x] 解釋下方程式運作原理; - [x] 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升; - [ ] Linux 核心中也內建 GCD (而且還不只一種實作),例如 lib/math/gcd.c,請解釋其實作手法和探討在核心內的應用場景。 ::: 完整程式碼 ```c #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); return RET; } ``` 任何一個變數為 0 則返回另一個非零的變數。如果兩個變數皆為偶數則兩個變數皆除以二,並且把除以二的次數記錄到 `shift`。 ```c if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` 如果 `u` 為偶數則除以二。 `do ... while` 的迴圈裡頭則是實作 [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm#:~:text=In%20mathematics%2C%20the%20Euclidean%20algorithm,them%20both%20without%20a%20remainder.)。根據 Euclid's algorithm 的描述可以推斷`COND` 為 `u != v && u && v`,`RET` 為 `u << shift`。最後需要把一開始兩邊皆除以二的次數乘回 `u` 才會是最大公因數。 ```c while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (u != v && u && v); return u << shift; ``` :::danger COND = v 不需要 u != v 因為如果 u == v,COND = v 會直接跳出 while loop。而 COND = v 是因為在做輾轉相除法時,只要當 u - v 為 0 時就會跳出 while loop。 ::: ### 透過 `__builtin_ctz` 改寫 GCD > Built-in Function: int __builtin_ctz (unsigned int x) Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. > Built-in Function: int __builtin_ctzll (unsigned long long) Similar to __builtin_ctz, except the argument type is unsigned long long. 根據 gcc 官方文件的描述,可以透過 `__builtin_ctz` 來取得無號數 `x` 的 trailing 0 bits,換句話說即等於 `shift` 的值。而原本題目給定的變數是 `unsigned long long`,即替換 `__builtin_ctz` 為 `__builtin_ctzll`。 ```c #include <stdint.h> #include <stdio.h> #define MIN(a,b) (((a)<(b))?(a):(b)) uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int u_ctz = __builtin_ctzll(u); int v_ctz = __builtin_ctzll(v); u >>= u_ctz; v >>= v_ctz; do { if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << MIN(u_ctz, v_ctz); } ``` --- ## [測驗 4](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-4) :::info - [x] 解釋上述程式運作原理,並舉出這樣的程式碼用在哪些真實案例中; - [ ] 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density; - [ ] 思考進一步的改進空間; - [ ] 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例; ::: 其中第 9 行的作用是找出目前最低位元的 1,並紀錄到 t 變數。舉例來說,若目前 bitset 為 $000101_b$,那 t 就會變成 $000001_b$,接著就可以透過 XOR 把該位的 1 清掉,其他保留 (此為 XOR 的特性)。 請補完程式碼。書寫規範: - bitwise 運算子和運算元之間用一個半形空白區隔,如: bitset | 0x1 - 考慮到 -bitwise 的特性 ```c= #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` 考慮到 `-bitset` 會對 `bitset` 做二的補數,透過下列程式範例理解: ```c #include <stdio.h> #include <stdint.h> int main() { uint8_t biset = 5; printf("%u\n", (int8_t)biset); printf("%u\n", (uint8_t)-biset); printf("%u\n", (uint8_t)-biset&biset); return 0; } //5 //251 //1 ``` :::danger EXP6 bitset & -bitset -bitset & bitset ::: 真實案例:Bloomfilter, Compression ## [測驗 5](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-5) :::info - [ ] 解釋上述程式碼運作原理,指出其中不足,並予以改進 例如,判斷負號只要寫作 bool isNegative = numerator < 0 ^ denominator < 0; 搭配研讀 The simple math behind decimal-binary conversion algorithms - [ ] 在 Linux 核心原始程式碼的 mm/ 目錄 (即記憶體管理) 中,找到特定整數比值 (即 fraction) 和 bitwise 操作的程式碼案例,予以探討和解說其應用場景 ::: [LeetCode 166. Fraction to Recurring Decimal](https://leetcode.com/problems/fraction-to-recurring-decimal/) 給定 numerator (分子)以及 denominator (分母),返回小數的字串。如果遇到重複的小數迴圈,則把重複的部分以括號 `()` 表示。 從建立的 Hash table 中找尋特定的 key。 ```c #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "list.h" struct rem_node { int key; int index; struct list_head link; }; static int find(struct list_head *heads, int size, int key) { struct rem_node *node; int hash = key % size; list_for_each_entry (node, &heads[hash], link) { if (key == node->key) return node->index; } return -1; } char *fractionToDecimal(int numerator, int denominator) { ``` 分母為零,則直接返回 `\0` (null terminated string);分子為零,則返回 `0`。雖然在 Leetcode 題目中有提到輸入的分母並不會為 0,但保險起見,還是檢查分母是否為零以免出現非預期行為。 > Constraints > - -2^31 <= numerator, denominator <= 2^31 - 1 > - denominator != 0 根據 C 語言規格書,第二個運算元如果為 0 則此行為是 `undefined`。 > The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined. ```c int size = 1024; char *result = malloc(size); char *p = result; if (denominator == 0) { result[0] = '\0'; return result; } if (numerator == 0) { result[0] = '0'; result[1] = '\0'; return result; } /* using long long type make sure there has no integer overflow */ long long n = numerator; long long d = denominator; ``` > 6.5.3.3 Unary arithmetic operators > The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type. Promote 在此解釋為 **型別提升**。例如 `float, double, long double` 皆為浮點數,假設 `float` 跟 `double` 做運算,需要先對 `float` 做型別提升,否則運算的結果會不如預期,因為 `float` 跟 `double` 所佔有的位元組空間不同。 ```c /* deal with negtive cases */ if (n < 0) n = -n; if (d < 0) d = -d; ``` > 6.5.3.4 The sizeof and _Alignof operators > The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type 判別小數為正數或是負數。`p` 為 pointer to char,在 x84_64 位元的儲存空間為 8 bytes。 > 6.5.3.2 Address and indirection operators > The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. 根據以上敘述,`p` 為 pointer to char type 則 `*p` 為 char type。如果 `sign` 為負數,則把 `'-'` 寫入至第一個 char slot,即 `result[0]`,並且把 `p` 指標指向 `result[1]`。 ```c bool sign = (float) numerator / denominator >= 0; if (!sign) *p++ = '-'; ``` 可以透過下方的小程式了解其中的關係: ```c #include <stdio.h> #include <stdlib.h> int main() { int size = 1024; char *result = malloc(size); char *p = result; printf("result[0]'s value is %c \n", result[0]); printf("result[0]'s address %p \n", &result[0]); printf("*p's value is %c \n", *p); printf("*p's address %p \n", &(*p)); *p++ = 'a'; printf("result[0]'s value is %c \n", result[0]); printf("*p's address %p \n", &(*p)); printf("result[1]'s address %p \n", &result[1]); return 0; } ``` 可以看到輸出結果如我們預期,一開始 `p` 是指向 `result[0]`。`*p++` 是指派 `a` 至 `result[0]`,再指派 `p` 的指標至下一個 `char` 的起始位置。 ```shell result[0]'s value is result[0]'s address 0x55bda8fab2a0 *p's value is *p's address 0x55bda8fab2a0 result[0]'s value is a *p's address 0x55bda8fab2a1 result[1]'s address 0x55bda8fab2a1 ``` 計算除完的結果 (division) 以及餘數 (remainder),如果整除則直接返回 division。 ```c long long remainder = n % d; long long division = n / d; ``` > 7.19.6.6 The sprintf function ```c #include <stdio.h> int sprintf(char * restrict s, const char * restrict format, ...); ``` > 2 The sprintf function is equivalent to fprintf, except that the output is written into an array (specified by the argument s) rather than to a stream. A null character is written at the end of the characters written; it is not counted as part of the returned value. If copying takes place between objects that overlap, the behavior is undefined. 3 The sprintf function returns the number of characters written in the array, not counting the terminating null character, or a negative value if an encoding error occurred. > 6.5.2.1 Array subscripting > A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently,apointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero). `sprintf` 將會把 `division` 寫入至 `p (pointer to char)` 現在的位置,也就是 `result[0]` (正數)或是 `result[1]` (負數),並再最後加入 `.` 字元。 ```c sprintf(p, "%ld", division > 0 ? (long) division : (long) -division); if (remainder == 0) return result; p = result + strlen(result); *p++ = '.'; ``` 初始化 hash table 的各個 hash slots。 ```c /* Using a map to record all of reminders and their position. * if the reminder appeared before, which means the repeated loop begin, */ char *decimal = malloc(size); memset(decimal, 0, size); char *q = decimal; size = 1333; struct list_head *heads = malloc(size * sizeof(*heads)); for (int i = 0; i < size; i++) INIT_LIST_HEAD(&heads[i]); ``` 可以判斷如果 `pos` 小於 0 則表示沒有在 hash table 找到 `remainder`,所以需要把目前 `remainder` 的值加入至目前 hash slot 所處的 linked list 中。而 `find()` 中的 hash function 為 `int hash = key % size;` 根據上述的描述,可以推測 `MMM` 為 `list_add`,而 `EEE` 為 `&heads[remainder % size]`。 ```c int pos = find(heads, size, remainder); if (pos >= 0) { ... } struct rem_node *node = malloc(sizeof(*node)); node->key = remainder; node->index = i; MMM(&node->link, EEE); *q++ = (remainder * 10) / d + '0'; remainder = (remainder * 10) % d; ``` 如果 pos 有值則表示為有循環小數,所以需要先把循環小數前的數寫到 `p`,故 `PPP = pos--`。 ```c for (int i = 0; remainder; i++) { int pos = find(heads, size, remainder); if (pos >= 0) { while (PPP > 0) *p++ = *decimal++; *p++ = '('; while (*decimal != '\0') *p++ = *decimal++; *p++ = ')'; *p = '\0'; return result; } ... } strcpy(p, decimal); return result; } ``` --- ## [測驗 6](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-6) :::info - [x] 請寫出 M & X 並解釋上述程式碼運作原理 - [ ] 在 Linux 核心原始程式碼中找出 __alignof__ 的使用案例 2 則,並針對其場景進行解說 - [ ] 在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集,探討其實作機制和用途,並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻,儘量共用相同的巨集 ::: ```c /* * ALIGNOF - get the alignment of a type * @t: the type to test * * This returns a safe alignment for the given type. */ #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X) ``` `ALIGNOF(t)` 即為計算出 type t 的 alignment。 > 6.3.2.3 Pointers > An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.5 ```c ((struct { char c; t _h; } *)0 // null pointer contant &((struct { char c; t _h; } *)0->M // dereference M(_h) (char *)&((struct { char c; t _h; } *)0->M // cast to pointer to char(_h) ``` 下列相減要得出 type t 的 alignment。兩項相減需要為 type t 的 alignment,故 X 為 0。 ```c ((char *)(...) - (char *)X) ``` 可以透過下列實驗得證,為什麼 dereference type t 的記憶體位置相減會得到 type t 的 alignment。 ```c #include <stdio.h> #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->_h) - (char *)0) int main() { struct foo { char c; // 1 byte // 1 byte padding short b; // 2 byte }; struct foo1 { char c; // 1 byte // 7 bytes padding long b; // 8 byte }; struct foo f; struct foo1 f1; printf("size of struct foo: %lu\n", sizeof(f)); printf("%p\n", &f.c); printf("%p\n", &f.b); printf("gcc built-in alignof: %lu\n", __alignof__(short)); printf("ALIGNOF: %lu\n", ALIGNOF(short)); printf("size of struct foo1: %lu\n", sizeof(f1)); printf("%p\n", &f1.c); printf("%p\n", &f1.b); printf("gcc built-in alignof: %lu\n", __alignof__(long)); printf("ALIGNOF: %lu\n", ALIGNOF(long)); return 0; } /* size of struct foo: 4 0x7ffdfd970c4c 0x7ffdfd970c4e gcc built-in alignof: 2 ALIGNOF: 2 size of struct foo1: 16 0x7ffdfd970c50 0x7ffdfd970c58 gcc built-in alignof: 8 ALIGNOF: 8 */ ``` -- ## [測驗 7](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-7) :::info - [ ] 解釋上述程式運作原理並評估 naive.c 和 bitwise.c 效能落差 - 避免 stream I/O 帶來的影響,可將 printf 更換為 sprintf - [ ] 分析 Faster remainders when the divisor is a constant: beating compilers and libdivide 一文的想法 (可參照同一篇網誌下方的評論),並設計另一種 bitmask,如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」,然後再改寫 bitwise.c 程式碼,試圖運用更少的指令來實作出 branchless; - 參照 fastmod: A header file for fast 32-bit division remainders on 64-bit hardware - [ ] 研讀 The Fastest FizzBuzz Implementation 及其 Hacker News 討論,提出 throughput (吞吐量) 更高的 Fizzbuzz 實作 - [ ] 解析 Linux 核心原始程式碼 kernel/time/timekeeping.c 裡頭涉及到除法運算的機制,探討其快速除法的實作 (注意: 你可能要對照研讀 kernel/time/ 目錄的標頭檔和程式碼)過程中,你可能會發現可貢獻到 Linux 核心的空間,請充分討論 ::: 以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (bitwise.c) `KK1 = div5`, `KK2 = div3`, `KK3 = div3 << 2` ```c static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << KK1) << KK2; char fmt[9]; strncpy(fmt, &"FizzBuzz%u"[(8 >> div5) >> (KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ```