# 2022q1 Homework2 (quiz2)
contributed by < [`leewei05`](https://github.com/leewei05) >
###### tags: `linux2022`
## [測驗 1](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-1)
:::info
- [x] 解釋下方程式碼運作的原理
- [ ] 比較下方實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 (可研讀 CS:APP 第 3 章)
- [ ] 研讀 Linux 核心原始程式碼 include/linux/average.h,探討其 Exponentially weighted moving average (EWMA) 實作,以及在 Linux 核心的應用
>移動平均(Moving average),又稱滾動平均值、滑動平均,在統計學中是種藉由建立整個資料集合中不同子集的一系列平均數,來分析資料點的計算方法。
移動平均通常與時間序列資料一起使用,以消除短期波動,突出長期趨勢或周期。短期和長期之間的閾值取決於應用,移動平均的參數將相應地設置。例如,它通常用於對財務數據進行技術分析,如股票價格、收益率或交易量。它也用於經濟學中研究國內生產總值、就業或其他宏觀經濟時間序列。
:::
這個直覺的解法會有 overflow 的問題,若我們已知 a, b 數值的大小,可用下方程式避免 overflow:
```c
#include <stdint.h>
uint32_t average(uint32_t low, uint32_t high)
{
return low + (high - low) / 2;
}
```
接著我們可改寫為以下等價的實作:
下列程式碼 a 與 b 分別先除 2 再相加,當兩個數值皆為奇數時平均數會少加 1,故透過 `a & b & 1` 來判斷是否把少加的 1 加回去。
```c
#include <stdint.h>
uint32_t average(uint32_t a, uint32_t b)
{
// EXP1
return (a >> 1) + (b >> 1) + (a & b & 1);
// a = 6, b = 4, avg = 3 + 2 + 0 = 5
// a = 6, b = 3, avg = 3 + 1 + 0 = 4
// a = 5, b = 3, avg = 2 + 1 + 1 = 4
}
```
根據[數值系統的 bit-wise operator 筆記](https://hackmd.io/@sysprog/c-numerics#%E9%81%8B%E7%94%A8-bit-wise-operator),運用加法器的邏輯思考。兩數的和為 `a ^ b`,兩數的進位為 `(a & b) << 1`,可得 `a + b = a ^ b + (a & b) << 1`。
```c
(a + b) / 2 = (a ^ b + (a & b) << 1) >> 1
= (a & b) + ((a ^ b) >> 1)
```
`EXP2` 為 `a & b`,`EXP3` 為 `a ^ b`。
```c
uint32_t average(uint32_t a, uint32_t b)
{
// EXP2 EXP3
return (a & b) + ((a ^ b) >> 1);
}
```
### 編譯器輸出結果分析
環境:Ubuntu 20.04.3 LTS
編譯器版本:gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
```c
// avg.c
#include <stdio.h>
#include <stdint.h>
uint32_t average(uint32_t low, uint32_t high)
{
return low + (high - low) / 2;
}
uint32_t average_bitwise(uint32_t a, uint32_t b)
{
return (a >> 1) + (b >> 1) + (a & b & 1);
}
uint32_t average_bitadder(uint32_t a, uint32_t b)
{
return (a & b) + ((a ^ b) >> 1);
}
int main()
{
int low = 3;
int high = 5;
printf("average is %u", average(low, high));
return 0;
}
```
gdb 反組譯結果,或是可以透過 `gcc -Og -S avg.c` 輸出組合語言。gcc 預設是匯出 ATT 格式, Intel 格式的則是可以透過 `-masm=intel` 來產生。
```shell
$ gcc -g -o avg -O2 avg.c
$ gdb -q avg
Reading symbols from avg...
(No debugging symbols found in avg)
(gdb) break average
Breakpoint 1 at 0x1149
(gdb) run
Starting program: /home/lee/dev/playground/average/avg
Breakpoint 1, average (low=21845, high=1431654832) at avg.c:5
5 {
(gdb) disassemble
Dump of assembler code for function average:
=> 0x0000555555555149 <+0>: endbr64
...
0x0000555555555167 <+30>: retq
End of assembler dump.
```
閱讀 [Intel IA-32 規格書](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html) 以及 CS:APP 第三章並試著解讀以下的組合語言指令。首先分析 `avg.s` 的內容,`LFB` 以及 `LFE` 分別是 [gcc](https://github.com/gcc-mirror/gcc/blob/releases/gcc-9/gcc/dwarf2out.c#L298) 編譯時根據 [DWARF(除錯輸出格式)](http://eagercon.com/dwarf/dwarf-2.0.0.pdf) 所產生的 label (`.L` 為 [local label](https://sourceware.org/binutils/docs/as/Symbol-Names.html#Symbol-Names)),分別代表函式的開始以及結束。
> Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging.
因為 `gcc -S` 只會編譯 (compile) 程式碼並輸出組合語言,尚未進行組譯 (assemble) 以及連結 (link),所以可以直接看到組合語言用到的 local label。
```shell
# avg.s
average:
.LFB23:
...
.LFE23:
```
`.cfi` 代表 DWARF 所定義的 [`Call Frame Instruction(CFI)`](http://eagercon.com/dwarf/dwarf-2.0.0.pdf):
> Debuggers often need to be able to view and modify the state of any subroutine activation that is on the call stack. An activation consists of:
• A code location that is within the subroutine. This location is either the place where the
program stopped when the debugger got control (e.g. a breakpoint), or is a place where a
subroutine made a call or was interrupted by an asynchronous event (e.g. a signal).
• An area of memory that is allocated on a stack called a ‘‘call frame.’’ The call frame is
identified by an address on the stack. We refer to this address as the Canonical Frame
Address or CFA.
• A set of registers that are in use by the subroutine at the code location.
根據以上的描述,可以理解成 CFI 為一個子程序 (subroutine) 的起始或是結束觸發點。一個配置於 stack memory 的記憶體區塊稱作 `call frame`,而此 `call frame` 的記憶體位址我們稱作 `Canonical Frame Address(CFA)`。回到下方的組合語言,[`.cfi-startproc`](https://sourceware.org/binutils/docs/as/CFI-directives.html#CFI-directives) 以及 [`.cfi-endproc`](https://sourceware.org/binutils/docs/as/CFI-directives.html#CFI-directives) 分別是組合語言中定義的 CFI directives (指令),代表著函式的起始以及結束。
> .cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc.
```shell
# avg.s
average:
.LFB23:
.cfi_startproc
endbr64
...
.cfi_endproc
.LFE23:
```
在了解 `endbr64` 指令以前,需要先知道 [ROP](https://en.wikipedia.org/wiki/Return-oriented_programming) 以及 [COP/JOP](https://www.comp.nus.edu.sg/~liangzk/papers/asiaccs11.pdf) 這兩項攻擊手法。黑客可以藉由上述提到的手法竄改原先的指令(e.g. RET, CALL, JUMP)來操作既有的指令。而 Intel 處理器提供的 Control-Flow Enforcement Technology (CET) 則是用來防範上述提到的攻擊,以下是 CET 所提供的兩項主要功能:
- Shadow stack: Return address protection to defend against ROP
- Indirect branch tracking: Free branch protection to defend against COP/JOP
`endbr64` 則是隸屬於 [Indirect branch tracking](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html) 的指令。
> 18.1.2 Indirect Branch Tracking
>
> The `ENDBRANCH` instruction is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. ... The CPU implements a state machine that tracks indirect `JMP` and `CALL` instructions. When one of these instructions is executed, the state machine moves from `IDLE` to `WAIT_FOR_ENDBRANCH` state. In `WAIT_FOR_ENDBRANCH` state the next instruction in the program stream must be an `ENDBRANCH`. If the next instruction is not an `ENDBRANCH`, the processor causes a control protection exception (#CP); otherwise, the state machine moves back to `IDLE` state.
接著我們看 average 函式當中的指令。`mov`
```shell
0000000000000000 <average>:
0: f3 0f 1e fa endbr64
4: 89 f0 mov %esi,%eax
6: 29 f8 sub %edi,%eax
8: d1 e8 shr %eax
a: 01 f8 add %edi,%eax
c: c3 retq
d: 0f 1f 00 nopl (%rax)
```
```shell
0000000000000010 <average_bitwise>:
10: f3 0f 1e fa endbr64
14: 89 f8 mov %edi,%eax
16: 89 f2 mov %esi,%edx
18: 21 f7 and %esi,%edi
1a: d1 e8 shr %eax
1c: d1 ea shr %edx
1e: 83 e7 01 and $0x1,%edi
21: 01 d0 add %edx,%eax
23: 01 f8 add %edi,%eax
25: c3 retq
26: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
2d: 00 00 00
0000000000000030 <average_bitadder>:
30: f3 0f 1e fa endbr64
34: 89 f8 mov %edi,%eax
36: 21 f7 and %esi,%edi
38: 31 f0 xor %esi,%eax
3a: d1 e8 shr %eax
3c: 01 f8 add %edi,%eax
3e: c3 retq
```
---
## [測驗 2](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-2)
:::info
延伸問題:
- [x] 解釋下方程式碼運作的原理
- [ ] 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作
- [ ] Linux 核心也有若干 branchless / branch-free 的實作,例如 lib/sort.c:
```c
/*
* Logically, we're doing "if (i & lsbit) i -= size;", but since the
* branch is unpredictable, it's done with a bit of clever branch-free
* code instead.
*/
__attribute_const__ __always_inline
static size_t parent(size_t i, unsigned int lsbit, size_t size)
{
i -= size;
i -= size & -(i & lsbit);
return i / 2;
}
```
請在 Linux 核心原始程式碼中,找出更多類似的實作手法。請善用 git log 檢索。
:::
```c
int32_t min(int32_t a, int32_t b) {
int32_t diff = (a - b);
return b + (diff & (diff >> 31));
}
```
改寫〈解讀計算機編碼〉一文的「不需要分支的設計」一節提供的程式碼 min,我們得到以下實作 (max):
> EXP4 為 a 和 b 進行某種特別 bitwise 操作,限定用 OR, AND, XOR 這三個運算子
> EXP5 為 a 和 b 的比較操作,亦即 logical operator,限定用 >, <, ==, >=, <=, !=
```c
#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
return a ^ ((EXP4) & -(EXP5));
}
```
- 假設 a 是比較大的值,右邊的 `((EXP4) & -(EXP5))` 會需要等於 0,因為 `a ^ 0 = a`
- 假設 b 是比較大的值,右邊的 `((EXP4) & -(EXP5))` 會需要是 `b - a`
> 6.5.8 Relational operators
> Each of the operators < (less than), > (greater than), <= (less than or equal to), and >=
(greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.92)
The result has type int.
根據上述規格書的內容,`EXP5` 只會返回 1 或是 0。
- 假設 `EXP5 == 1`,則可以得到 `((EXP4) & -1)`,亦等於 `((EXP4) & 0xFFFF)`
- 假設 `EXP5 == 0`,則可以得到 `((EXP4) & 0x0000)`。從第二個條件回推,任何數跟 0 做 AND 運算會等於 0,條件二即符合 `max = a`。
根據現有條件,可以推測出 `EXP5` 為 `a <= b`,如果 a 等於 b 那直接回傳 a 也合理。但題目有說到以最精簡的方式撰寫程式碼,那其實 `a < b` 即可。`EXP4` 為 `a ^ b` 也等同於 `b - a`,任何數跟 `0xFFFF` 做 AND 運算都會返回自己。
```c
#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
return a ^ ((a ^ b) & -(a < b));
}
```
### 32 位元無號/有號整數實作
---
## [測驗 3](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-3)
:::info
- [x] 解釋下方程式運作原理;
- [x] 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升;
- [ ] Linux 核心中也內建 GCD (而且還不只一種實作),例如 lib/math/gcd.c,請解釋其實作手法和探討在核心內的應用場景。
:::
完整程式碼
```c
#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v)
{
if (!u || !v) return u | v;
int shift;
for (shift = 0; !((u | v) & 1); shift++) {
u /= 2, v /= 2;
}
while (!(u & 1))
u /= 2;
do {
while (!(v & 1))
v /= 2;
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (COND);
return RET;
}
```
任何一個變數為 0 則返回另一個非零的變數。如果兩個變數皆為偶數則兩個變數皆除以二,並且把除以二的次數記錄到 `shift`。
```c
if (!u || !v) return u | v;
int shift;
for (shift = 0; !((u | v) & 1); shift++) {
u /= 2, v /= 2;
}
```
如果 `u` 為偶數則除以二。 `do ... while` 的迴圈裡頭則是實作 [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm#:~:text=In%20mathematics%2C%20the%20Euclidean%20algorithm,them%20both%20without%20a%20remainder.)。根據 Euclid's algorithm 的描述可以推斷`COND` 為 `u != v && u && v`,`RET` 為 `u << shift`。最後需要把一開始兩邊皆除以二的次數乘回 `u` 才會是最大公因數。
```c
while (!(u & 1))
u /= 2;
do {
while (!(v & 1))
v /= 2;
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (u != v && u && v);
return u << shift;
```
:::danger
COND = v
不需要 u != v 因為如果 u == v,COND = v 會直接跳出 while loop。而 COND = v 是因為在做輾轉相除法時,只要當 u - v 為 0 時就會跳出 while loop。
:::
### 透過 `__builtin_ctz` 改寫 GCD
> Built-in Function: int __builtin_ctz (unsigned int x)
Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.
> Built-in Function: int __builtin_ctzll (unsigned long long)
Similar to __builtin_ctz, except the argument type is unsigned long long.
根據 gcc 官方文件的描述,可以透過 `__builtin_ctz` 來取得無號數 `x` 的 trailing 0 bits,換句話說即等於 `shift` 的值。而原本題目給定的變數是 `unsigned long long`,即替換 `__builtin_ctz` 為 `__builtin_ctzll`。
```c
#include <stdint.h>
#include <stdio.h>
#define MIN(a,b) (((a)<(b))?(a):(b))
uint64_t gcd64(uint64_t u, uint64_t v)
{
if (!u || !v) return u | v;
int u_ctz = __builtin_ctzll(u);
int v_ctz = __builtin_ctzll(v);
u >>= u_ctz;
v >>= v_ctz;
do {
if (u < v) {
v -= u;
} else {
uint64_t t = u - v;
u = v;
v = t;
}
} while (v);
return u << MIN(u_ctz, v_ctz);
}
```
---
## [測驗 4](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-4)
:::info
- [x] 解釋上述程式運作原理,並舉出這樣的程式碼用在哪些真實案例中;
- [ ] 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density;
- [ ] 思考進一步的改進空間;
- [ ] 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例;
:::
其中第 9 行的作用是找出目前最低位元的 1,並紀錄到 t 變數。舉例來說,若目前 bitset 為 $000101_b$,那 t 就會變成 $000001_b$,接著就可以透過 XOR 把該位的 1 清掉,其他保留 (此為 XOR 的特性)。
請補完程式碼。書寫規範:
- bitwise 運算子和運算元之間用一個半形空白區隔,如: bitset | 0x1
- 考慮到 -bitwise 的特性
```c=
#include <stddef.h>
size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
size_t pos = 0;
uint64_t bitset;
for (size_t k = 0; k < bitmapsize; ++k) {
bitset = bitmap[k];
while (bitset != 0) {
uint64_t t = EXP6;
int r = __builtin_ctzll(bitset);
out[pos++] = k * 64 + r;
bitset ^= t;
}
}
return pos;
}
```
考慮到 `-bitset` 會對 `bitset` 做二的補數,透過下列程式範例理解:
```c
#include <stdio.h>
#include <stdint.h>
int main()
{
uint8_t biset = 5;
printf("%u\n", (int8_t)biset);
printf("%u\n", (uint8_t)-biset);
printf("%u\n", (uint8_t)-biset&biset);
return 0;
}
//5
//251
//1
```
:::danger
EXP6
bitset & -bitset
-bitset & bitset
:::
真實案例:Bloomfilter, Compression
## [測驗 5](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-5)
:::info
- [ ] 解釋上述程式碼運作原理,指出其中不足,並予以改進
例如,判斷負號只要寫作 bool isNegative = numerator < 0 ^ denominator < 0;
搭配研讀 The simple math behind decimal-binary conversion algorithms
- [ ] 在 Linux 核心原始程式碼的 mm/ 目錄 (即記憶體管理) 中,找到特定整數比值 (即 fraction) 和 bitwise 操作的程式碼案例,予以探討和解說其應用場景
:::
[LeetCode 166. Fraction to Recurring Decimal](https://leetcode.com/problems/fraction-to-recurring-decimal/)
給定 numerator (分子)以及 denominator (分母),返回小數的字串。如果遇到重複的小數迴圈,則把重複的部分以括號 `()` 表示。
從建立的 Hash table 中找尋特定的 key。
```c
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "list.h"
struct rem_node {
int key;
int index;
struct list_head link;
};
static int find(struct list_head *heads, int size, int key)
{
struct rem_node *node;
int hash = key % size;
list_for_each_entry (node, &heads[hash], link) {
if (key == node->key)
return node->index;
}
return -1;
}
char *fractionToDecimal(int numerator, int denominator)
{
```
分母為零,則直接返回 `\0` (null terminated string);分子為零,則返回 `0`。雖然在 Leetcode 題目中有提到輸入的分母並不會為 0,但保險起見,還是檢查分母是否為零以免出現非預期行為。
> Constraints
> - -2^31 <= numerator, denominator <= 2^31 - 1
> - denominator != 0
根據 C 語言規格書,第二個運算元如果為 0 則此行為是 `undefined`。
> The result of the / operator is the quotient from the division of the first operand by the
second; the result of the % operator is the remainder. In both operations, if the value of
the second operand is zero, the behavior is undefined.
```c
int size = 1024;
char *result = malloc(size);
char *p = result;
if (denominator == 0) {
result[0] = '\0';
return result;
}
if (numerator == 0) {
result[0] = '0';
result[1] = '\0';
return result;
}
/* using long long type make sure there has no integer overflow */
long long n = numerator;
long long d = denominator;
```
> 6.5.3.3 Unary arithmetic operators
> The result of the unary - operator is the negative of its (promoted) operand. The integer
promotions are performed on the operand, and the result has the promoted type.
Promote 在此解釋為 **型別提升**。例如 `float, double, long double` 皆為浮點數,假設 `float` 跟 `double` 做運算,需要先對 `float` 做型別提升,否則運算的結果會不如預期,因為 `float` 跟 `double` 所佔有的位元組空間不同。
```c
/* deal with negtive cases */
if (n < 0)
n = -n;
if (d < 0)
d = -d;
```
> 6.5.3.4 The sizeof and _Alignof operators
> The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type
判別小數為正數或是負數。`p` 為 pointer to char,在 x84_64 位元的儲存空間為 8 bytes。
> 6.5.3.2 Address and indirection operators
> The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’.
根據以上敘述,`p` 為 pointer to char type 則 `*p` 為 char type。如果 `sign` 為負數,則把 `'-'` 寫入至第一個 char slot,即 `result[0]`,並且把 `p` 指標指向 `result[1]`。
```c
bool sign = (float) numerator / denominator >= 0;
if (!sign)
*p++ = '-';
```
可以透過下方的小程式了解其中的關係:
```c
#include <stdio.h>
#include <stdlib.h>
int main()
{
int size = 1024;
char *result = malloc(size);
char *p = result;
printf("result[0]'s value is %c \n", result[0]);
printf("result[0]'s address %p \n", &result[0]);
printf("*p's value is %c \n", *p);
printf("*p's address %p \n", &(*p));
*p++ = 'a';
printf("result[0]'s value is %c \n", result[0]);
printf("*p's address %p \n", &(*p));
printf("result[1]'s address %p \n", &result[1]);
return 0;
}
```
可以看到輸出結果如我們預期,一開始 `p` 是指向 `result[0]`。`*p++` 是指派 `a` 至 `result[0]`,再指派 `p` 的指標至下一個 `char` 的起始位置。
```shell
result[0]'s value is
result[0]'s address 0x55bda8fab2a0
*p's value is
*p's address 0x55bda8fab2a0
result[0]'s value is a
*p's address 0x55bda8fab2a1
result[1]'s address 0x55bda8fab2a1
```
計算除完的結果 (division) 以及餘數 (remainder),如果整除則直接返回 division。
```c
long long remainder = n % d;
long long division = n / d;
```
> 7.19.6.6 The sprintf function
```c
#include <stdio.h>
int sprintf(char * restrict s,
const char * restrict format, ...);
```
> 2 The sprintf function is equivalent to fprintf, except that the output is written into
an array (specified by the argument s) rather than to a stream. A null character is written
at the end of the characters written; it is not counted as part of the returned value. If
copying takes place between objects that overlap, the behavior is undefined.
3 The sprintf function returns the number of characters written in the array, not
counting the terminating null character, or a negative value if an encoding error occurred.
> 6.5.2.1 Array subscripting
> A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently,apointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
`sprintf` 將會把 `division` 寫入至 `p (pointer to char)` 現在的位置,也就是 `result[0]` (正數)或是 `result[1]` (負數),並再最後加入 `.` 字元。
```c
sprintf(p, "%ld", division > 0 ? (long) division : (long) -division);
if (remainder == 0)
return result;
p = result + strlen(result);
*p++ = '.';
```
初始化 hash table 的各個 hash slots。
```c
/* Using a map to record all of reminders and their position.
* if the reminder appeared before, which means the repeated loop begin,
*/
char *decimal = malloc(size);
memset(decimal, 0, size);
char *q = decimal;
size = 1333;
struct list_head *heads = malloc(size * sizeof(*heads));
for (int i = 0; i < size; i++)
INIT_LIST_HEAD(&heads[i]);
```
可以判斷如果 `pos` 小於 0 則表示沒有在 hash table 找到 `remainder`,所以需要把目前 `remainder` 的值加入至目前 hash slot 所處的 linked list 中。而 `find()` 中的 hash function 為 `int hash = key % size;`
根據上述的描述,可以推測 `MMM` 為 `list_add`,而 `EEE` 為 `&heads[remainder % size]`。
```c
int pos = find(heads, size, remainder);
if (pos >= 0) {
...
}
struct rem_node *node = malloc(sizeof(*node));
node->key = remainder;
node->index = i;
MMM(&node->link, EEE);
*q++ = (remainder * 10) / d + '0';
remainder = (remainder * 10) % d;
```
如果 pos 有值則表示為有循環小數,所以需要先把循環小數前的數寫到 `p`,故 `PPP = pos--`。
```c
for (int i = 0; remainder; i++) {
int pos = find(heads, size, remainder);
if (pos >= 0) {
while (PPP > 0)
*p++ = *decimal++;
*p++ = '(';
while (*decimal != '\0')
*p++ = *decimal++;
*p++ = ')';
*p = '\0';
return result;
}
...
}
strcpy(p, decimal);
return result;
}
```
---
## [測驗 6](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-6)
:::info
- [x] 請寫出 M & X 並解釋上述程式碼運作原理
- [ ] 在 Linux 核心原始程式碼中找出 __alignof__ 的使用案例 2 則,並針對其場景進行解說
- [ ] 在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集,探討其實作機制和用途,並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻,儘量共用相同的巨集
:::
```c
/*
* ALIGNOF - get the alignment of a type
* @t: the type to test
*
* This returns a safe alignment for the given type.
*/
#define ALIGNOF(t) \
((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X)
```
`ALIGNOF(t)` 即為計算出 type t 的 alignment。
> 6.3.2.3 Pointers
> An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.5
```c
((struct { char c; t _h; } *)0 // null pointer contant
&((struct { char c; t _h; } *)0->M // dereference M(_h)
(char *)&((struct { char c; t _h; } *)0->M // cast to pointer to char(_h)
```
下列相減要得出 type t 的 alignment。兩項相減需要為 type t 的 alignment,故 X 為 0。
```c
((char *)(...) - (char *)X)
```
可以透過下列實驗得證,為什麼 dereference type t 的記憶體位置相減會得到 type t 的 alignment。
```c
#include <stdio.h>
#define ALIGNOF(t) \
((char *)(&((struct { char c; t _h; } *)0)->_h) - (char *)0)
int main()
{
struct foo {
char c; // 1 byte
// 1 byte padding
short b; // 2 byte
};
struct foo1 {
char c; // 1 byte
// 7 bytes padding
long b; // 8 byte
};
struct foo f;
struct foo1 f1;
printf("size of struct foo: %lu\n", sizeof(f));
printf("%p\n", &f.c);
printf("%p\n", &f.b);
printf("gcc built-in alignof: %lu\n", __alignof__(short));
printf("ALIGNOF: %lu\n", ALIGNOF(short));
printf("size of struct foo1: %lu\n", sizeof(f1));
printf("%p\n", &f1.c);
printf("%p\n", &f1.b);
printf("gcc built-in alignof: %lu\n", __alignof__(long));
printf("ALIGNOF: %lu\n", ALIGNOF(long));
return 0;
}
/*
size of struct foo: 4
0x7ffdfd970c4c
0x7ffdfd970c4e
gcc built-in alignof: 2
ALIGNOF: 2
size of struct foo1: 16
0x7ffdfd970c50
0x7ffdfd970c58
gcc built-in alignof: 8
ALIGNOF: 8
*/
```
--
## [測驗 7](https://hackmd.io/@sysprog/linux2022-quiz2#%E6%B8%AC%E9%A9%97-7)
:::info
- [ ] 解釋上述程式運作原理並評估 naive.c 和 bitwise.c 效能落差
- 避免 stream I/O 帶來的影響,可將 printf 更換為 sprintf
- [ ] 分析 Faster remainders when the divisor is a constant: beating compilers and libdivide 一文的想法 (可參照同一篇網誌下方的評論),並設計另一種 bitmask,如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」,然後再改寫 bitwise.c 程式碼,試圖運用更少的指令來實作出 branchless;
- 參照 fastmod: A header file for fast 32-bit division remainders on 64-bit hardware
- [ ] 研讀 The Fastest FizzBuzz Implementation 及其 Hacker News 討論,提出 throughput (吞吐量) 更高的 Fizzbuzz 實作
- [ ] 解析 Linux 核心原始程式碼 kernel/time/timekeeping.c 裡頭涉及到除法運算的機制,探討其快速除法的實作 (注意: 你可能要對照研讀 kernel/time/ 目錄的標頭檔和程式碼)過程中,你可能會發現可貢獻到 Linux 核心的空間,請充分討論
:::
以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (bitwise.c)
`KK1 = div5`, `KK2 = div3`, `KK3 = div3 << 2`
```c
static inline bool is_divisible(uint32_t n, uint64_t M)
{
return n * M <= M - 1;
}
static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1;
static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1;
int main(int argc, char **argv)
{
for (size_t i = 1; i <= 100; i++) {
uint8_t div3 = is_divisible(i, M3);
uint8_t div5 = is_divisible(i, M5);
unsigned int length = (2 << KK1) << KK2;
char fmt[9];
strncpy(fmt, &"FizzBuzz%u"[(8 >> div5) >> (KK3)], length);
fmt[length] = '\0';
printf(fmt, i);
printf("\n");
}
return 0;
}
```