Try   HackMD

2022q1 Homework2 (quiz2)

contributed by < leewei05 >

tags: linux2022

測驗 1

  • 解釋下方程式碼運作的原理
  • 比較下方實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 (可研讀 CS:APP 第 3 章)
  • 研讀 Linux 核心原始程式碼 include/linux/average.h,探討其 Exponentially weighted moving average (EWMA) 實作,以及在 Linux 核心的應用

移動平均(Moving average),又稱滾動平均值、滑動平均,在統計學中是種藉由建立整個資料集合中不同子集的一系列平均數,來分析資料點的計算方法。
移動平均通常與時間序列資料一起使用,以消除短期波動,突出長期趨勢或周期。短期和長期之間的閾值取決於應用,移動平均的參數將相應地設置。例如,它通常用於對財務數據進行技術分析,如股票價格、收益率或交易量。它也用於經濟學中研究國內生產總值、就業或其他宏觀經濟時間序列。

這個直覺的解法會有 overflow 的問題,若我們已知 a, b 數值的大小,可用下方程式避免 overflow:

#include <stdint.h>
uint32_t average(uint32_t low, uint32_t high)
{
    return low + (high - low) / 2;
}

接著我們可改寫為以下等價的實作:

下列程式碼 a 與 b 分別先除 2 再相加,當兩個數值皆為奇數時平均數會少加 1,故透過 a & b & 1 來判斷是否把少加的 1 加回去。

#include <stdint.h>
uint32_t average(uint32_t a, uint32_t b)
{
    //                            EXP1
    return (a >> 1) + (b >> 1) + (a & b & 1);
    // a = 6, b = 4, avg = 3 + 2 + 0 = 5
    // a = 6, b = 3, avg = 3 + 1 + 0 = 4
    // a = 5, b = 3, avg = 2 + 1 + 1 = 4
}

根據數值系統的 bit-wise operator 筆記,運用加法器的邏輯思考。兩數的和為 a ^ b,兩數的進位為 (a & b) << 1,可得 a + b = a ^ b + (a & b) << 1

(a + b) / 2 = (a ^ b + (a & b) << 1) >> 1
            = (a & b) + ((a ^ b) >> 1)

EXP2a & bEXP3a ^ b

uint32_t average(uint32_t a, uint32_t b)
{
    //      EXP2       EXP3
    return (a & b) + ((a ^ b) >> 1);
}

編譯器輸出結果分析

環境:Ubuntu 20.04.3 LTS
編譯器版本:gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

// avg.c
#include <stdio.h>
#include <stdint.h>

uint32_t average(uint32_t low, uint32_t high)
{
    return low + (high - low) / 2;
}

uint32_t average_bitwise(uint32_t a, uint32_t b)
{
    return (a >> 1) + (b >> 1) + (a & b & 1);
}

uint32_t average_bitadder(uint32_t a, uint32_t b)
{
    return (a & b) + ((a ^ b) >> 1);
}

int main()
{
    int low = 3;
    int high = 5;
    printf("average is %u", average(low, high));

    return 0;
}

gdb 反組譯結果,或是可以透過 gcc -Og -S avg.c 輸出組合語言。gcc 預設是匯出 ATT 格式, Intel 格式的則是可以透過 -masm=intel 來產生。

$ gcc -g -o avg -O2 avg.c

$ gdb -q avg
Reading symbols from avg...
(No debugging symbols found in avg)
(gdb) break average
Breakpoint 1 at 0x1149
(gdb) run
Starting program: /home/lee/dev/playground/average/avg

Breakpoint 1, average (low=21845, high=1431654832) at avg.c:5
5       {
(gdb) disassemble
Dump of assembler code for function average:
=> 0x0000555555555149 <+0>:     endbr64
   ...
   0x0000555555555167 <+30>:    retq
End of assembler dump.

閱讀 Intel IA-32 規格書 以及 CS:APP 第三章並試著解讀以下的組合語言指令。首先分析 avg.s 的內容,LFB 以及 LFE 分別是 gcc 編譯時根據 DWARF(除錯輸出格式) 所產生的 label (.Llocal label),分別代表函式的開始以及結束。

Local symbols are defined and used within the assembler, but they are normally not saved in object files. Thus, they are not visible when debugging.

因為 gcc -S 只會編譯 (compile) 程式碼並輸出組合語言,尚未進行組譯 (assemble) 以及連結 (link),所以可以直接看到組合語言用到的 local label。

# avg.s
average:
.LFB23:
        ...
.LFE23:

.cfi 代表 DWARF 所定義的 Call Frame Instruction(CFI)

Debuggers often need to be able to view and modify the state of any subroutine activation that is on the call stack. An activation consists of:
• A code location that is within the subroutine. This location is either the place where the
program stopped when the debugger got control (e.g. a breakpoint), or is a place where a
subroutine made a call or was interrupted by an asynchronous event (e.g. a signal).
• An area of memory that is allocated on a stack called a ‘‘call frame.’’ The call frame is
identified by an address on the stack. We refer to this address as the Canonical Frame
Address or CFA.
• A set of registers that are in use by the subroutine at the code location.

根據以上的描述,可以理解成 CFI 為一個子程序 (subroutine) 的起始或是結束觸發點。一個配置於 stack memory 的記憶體區塊稱作 call frame,而此 call frame 的記憶體位址我們稱作 Canonical Frame Address(CFA)。回到下方的組合語言,.cfi-startproc 以及 .cfi-endproc 分別是組合語言中定義的 CFI directives (指令),代表著函式的起始以及結束。

.cfi_startproc is used at the beginning of each function that should have an entry in .eh_frame. It initializes some internal data structures. Don’t forget to close the function by .cfi_endproc.

# avg.s
average:
.LFB23:
        .cfi_startproc
        endbr64
        ...
        .cfi_endproc
.LFE23:

在了解 endbr64 指令以前,需要先知道 ROP 以及 COP/JOP 這兩項攻擊手法。黑客可以藉由上述提到的手法竄改原先的指令(e.g. RET, CALL, JUMP)來操作既有的指令。而 Intel 處理器提供的 Control-Flow Enforcement Technology (CET) 則是用來防範上述提到的攻擊,以下是 CET 所提供的兩項主要功能:

  • Shadow stack: Return address protection to defend against ROP
  • Indirect branch tracking: Free branch protection to defend against COP/JOP

endbr64 則是隸屬於 Indirect branch tracking 的指令。

18.1.2 Indirect Branch Tracking

The ENDBRANCH instruction is a new instruction that is used to mark valid jump target addresses of indirect calls and jumps in the program. The CPU implements a state machine that tracks indirect JMP and CALL instructions. When one of these instructions is executed, the state machine moves from IDLE to WAIT_FOR_ENDBRANCH state. In WAIT_FOR_ENDBRANCH state the next instruction in the program stream must be an ENDBRANCH. If the next instruction is not an ENDBRANCH, the processor causes a control protection exception (#CP); otherwise, the state machine moves back to IDLE state.

接著我們看 average 函式當中的指令。mov

0000000000000000 <average>:
   0:   f3 0f 1e fa             endbr64
   4:   89 f0                   mov    %esi,%eax
   6:   29 f8                   sub    %edi,%eax
   8:   d1 e8                   shr    %eax
   a:   01 f8                   add    %edi,%eax
   c:   c3                      retq
   d:   0f 1f 00                nopl   (%rax)
0000000000000010 <average_bitwise>:
  10:   f3 0f 1e fa             endbr64
  14:   89 f8                   mov    %edi,%eax
  16:   89 f2                   mov    %esi,%edx
  18:   21 f7                   and    %esi,%edi
  1a:   d1 e8                   shr    %eax
  1c:   d1 ea                   shr    %edx
  1e:   83 e7 01                and    $0x1,%edi
  21:   01 d0                   add    %edx,%eax
  23:   01 f8                   add    %edi,%eax
  25:   c3                      retq
  26:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  2d:   00 00 00

0000000000000030 <average_bitadder>:
  30:   f3 0f 1e fa             endbr64
  34:   89 f8                   mov    %edi,%eax
  36:   21 f7                   and    %esi,%edi
  38:   31 f0                   xor    %esi,%eax
  3a:   d1 e8                   shr    %eax
  3c:   01 f8                   add    %edi,%eax
  3e:   c3                      retq

測驗 2

延伸問題:

  • 解釋下方程式碼運作的原理
  • 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作
  • Linux 核心也有若干 branchless / branch-free 的實作,例如 lib/sort.c:
/*
 * Logically, we're doing "if (i & lsbit) i -= size;", but since the
 * branch is unpredictable, it's done with a bit of clever branch-free
 * code instead.
 */
__attribute_const__ __always_inline
static size_t parent(size_t i, unsigned int lsbit, size_t size)
{
    i -= size;
    i -= size & -(i & lsbit);
    return i / 2;
}

請在 Linux 核心原始程式碼中,找出更多類似的實作手法。請善用 git log 檢索。

int32_t min(int32_t a, int32_t b) {
    int32_t diff = (a - b);
    return b + (diff & (diff >> 31));
}

改寫〈解讀計算機編碼〉一文的「不需要分支的設計」一節提供的程式碼 min,我們得到以下實作 (max):

EXP4 為 a 和 b 進行某種特別 bitwise 操作,限定用 OR, AND, XOR 這三個運算子
EXP5 為 a 和 b 的比較操作,亦即 logical operator,限定用 >, <, ==, >=, <=, !=

#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
    return a ^ ((EXP4) & -(EXP5));
}
  • 假設 a 是比較大的值,右邊的 ((EXP4) & -(EXP5)) 會需要等於 0,因為 a ^ 0 = a
  • 假設 b 是比較大的值,右邊的 ((EXP4) & -(EXP5)) 會需要是 b - a

6.5.8 Relational operators
Each of the operators < (less than), > (greater than), <= (less than or equal to), and >=
(greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.92)
The result has type int.

根據上述規格書的內容,EXP5 只會返回 1 或是 0。

  • 假設 EXP5 == 1,則可以得到 ((EXP4) & -1),亦等於 ((EXP4) & 0xFFFF)
  • 假設 EXP5 == 0,則可以得到 ((EXP4) & 0x0000)。從第二個條件回推,任何數跟 0 做 AND 運算會等於 0,條件二即符合 max = a

根據現有條件,可以推測出 EXP5a <= b,如果 a 等於 b 那直接回傳 a 也合理。但題目有說到以最精簡的方式撰寫程式碼,那其實 a < b 即可。EXP4a ^ b 也等同於 b - a,任何數跟 0xFFFF 做 AND 運算都會返回自己。

#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
    return a ^ ((a ^ b) & -(a < b));
}

32 位元無號/有號整數實作


測驗 3

  • 解釋下方程式運作原理;
  • 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升;
  • Linux 核心中也內建 GCD (而且還不只一種實作),例如 lib/math/gcd.c,請解釋其實作手法和探討在核心內的應用場景。

完整程式碼

#include <stdint.h>
uint64_t gcd64(uint64_t u, uint64_t v)
{
    if (!u || !v) return u | v;
    int shift;
    for (shift = 0; !((u | v) & 1); shift++) {
        u /= 2, v /= 2;
    }
    while (!(u & 1))
        u /= 2;
    do {
        while (!(v & 1))
            v /= 2;
        if (u < v) {
            v -= u;
        } else {
            uint64_t t = u - v;
            u = v;
            v = t;
        }
    } while (COND);
    return RET;
}

任何一個變數為 0 則返回另一個非零的變數。如果兩個變數皆為偶數則兩個變數皆除以二,並且把除以二的次數記錄到 shift

    if (!u || !v) return u | v;
    int shift;
    for (shift = 0; !((u | v) & 1); shift++) {
        u /= 2, v /= 2;
    }

如果 u 為偶數則除以二。 do ... while 的迴圈裡頭則是實作 Euclid's algorithm。根據 Euclid's algorithm 的描述可以推斷CONDu != v && u && vRETu << shift。最後需要把一開始兩邊皆除以二的次數乘回 u 才會是最大公因數。

    while (!(u & 1))
        u /= 2;
    do {
        while (!(v & 1))
            v /= 2;
        if (u < v) {
            v -= u;
        } else {
            uint64_t t = u - v;
            u = v;
            v = t;
        }
    } while (u != v && u && v);
    return u << shift;

COND = v

不需要 u != v 因為如果 u == v,COND = v 會直接跳出 while loop。而 COND = v 是因為在做輾轉相除法時,只要當 u - v 為 0 時就會跳出 while loop。

透過 __builtin_ctz 改寫 GCD

Built-in Function: int __builtin_ctz (unsigned int x)
Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.
Built-in Function: int __builtin_ctzll (unsigned long long)
Similar to __builtin_ctz, except the argument type is unsigned long long.

根據 gcc 官方文件的描述,可以透過 __builtin_ctz 來取得無號數 x 的 trailing 0 bits,換句話說即等於 shift 的值。而原本題目給定的變數是 unsigned long long,即替換 __builtin_ctz__builtin_ctzll

#include <stdint.h>
#include <stdio.h>
#define MIN(a,b) (((a)<(b))?(a):(b))

uint64_t gcd64(uint64_t u, uint64_t v)
{
    if (!u || !v) return u | v;
    int u_ctz = __builtin_ctzll(u);
    int v_ctz = __builtin_ctzll(v);
    u >>= u_ctz; 
    v >>= v_ctz;
    do {
        if (u < v) {
            v -= u;
        } else {
            uint64_t t = u - v;
            u = v;
            v = t;
        }
    } while (v);
    return u << MIN(u_ctz, v_ctz);
}

測驗 4

  • 解釋上述程式運作原理,並舉出這樣的程式碼用在哪些真實案例中;
  • 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density;
  • 思考進一步的改進空間;
  • 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例;

其中第 9 行的作用是找出目前最低位元的 1,並紀錄到 t 變數。舉例來說,若目前 bitset 為

000101b,那 t 就會變成
000001b
,接著就可以透過 XOR 把該位的 1 清掉,其他保留 (此為 XOR 的特性)。

請補完程式碼。書寫規範:

  • bitwise 運算子和運算元之間用一個半形空白區隔,如: bitset | 0x1
  • 考慮到 -bitwise 的特性
#include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; }

考慮到 -bitset 會對 bitset 做二的補數,透過下列程式範例理解:

#include <stdio.h>
#include <stdint.h>

int main()
{
    uint8_t biset = 5;
    printf("%u\n", (int8_t)biset);
    printf("%u\n", (uint8_t)-biset);
    printf("%u\n", (uint8_t)-biset&biset);
    return 0;
}

//5
//251
//1

EXP6
bitset & -bitset
-bitset & bitset

真實案例:Bloomfilter, Compression

測驗 5

  • 解釋上述程式碼運作原理,指出其中不足,並予以改進
    例如,判斷負號只要寫作 bool isNegative = numerator < 0 ^ denominator < 0;
    搭配研讀 The simple math behind decimal-binary conversion algorithms
  • 在 Linux 核心原始程式碼的 mm/ 目錄 (即記憶體管理) 中,找到特定整數比值 (即 fraction) 和 bitwise 操作的程式碼案例,予以探討和解說其應用場景

LeetCode 166. Fraction to Recurring Decimal

給定 numerator (分子)以及 denominator (分母),返回小數的字串。如果遇到重複的小數迴圈,則把重複的部分以括號 () 表示。

從建立的 Hash table 中找尋特定的 key。

#include <stdbool.h>                       
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "list.h"
    
struct rem_node {
    int key;
    int index;
    struct list_head link;
};  
    
static int find(struct list_head *heads, int size, int key)
{
    struct rem_node *node;
    int hash = key % size;
    list_for_each_entry (node, &heads[hash], link) {
        if (key == node->key)
            return node->index;
    }
    return -1;
}
char *fractionToDecimal(int numerator, int denominator)
{

分母為零,則直接返回 \0 (null terminated string);分子為零,則返回 0。雖然在 Leetcode 題目中有提到輸入的分母並不會為 0,但保險起見,還是檢查分母是否為零以免出現非預期行為。

Constraints

  • -2^31 <= numerator, denominator <= 2^31 - 1
  • denominator != 0

根據 C 語言規格書,第二個運算元如果為 0 則此行為是 undefined

The result of the / operator is the quotient from the division of the first operand by the
second; the result of the % operator is the remainder. In both operations, if the value of
the second operand is zero, the behavior is undefined.

    int size = 1024;
    char *result = malloc(size);
    char *p = result;

    if (denominator == 0) {
        result[0] = '\0';
        return result;
    }

    if (numerator == 0) {
        result[0] = '0';
        result[1] = '\0';
        return result;
    }

    /* using long long type make sure there has no integer overflow */
    long long n = numerator;
    long long d = denominator;

6.5.3.3 Unary arithmetic operators
The result of the unary - operator is the negative of its (promoted) operand. The integer
promotions are performed on the operand, and the result has the promoted type.

Promote 在此解釋為 型別提升。例如 float, double, long double 皆為浮點數,假設 floatdouble 做運算,需要先對 float 做型別提升,否則運算的結果會不如預期,因為 floatdouble 所佔有的位元組空間不同。

    /* deal with negtive cases */
    if (n < 0)
        n = -n;
    if (d < 0)
        d = -d;

6.5.3.4 The sizeof and _Alignof operators
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type

判別小數為正數或是負數。p 為 pointer to char,在 x84_64 位元的儲存空間為 8 bytes。

6.5.3.2 Address and indirection operators
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’.

根據以上敘述,p 為 pointer to char type 則 *p 為 char type。如果 sign 為負數,則把 '-' 寫入至第一個 char slot,即 result[0],並且把 p 指標指向 result[1]

    bool sign = (float) numerator / denominator >= 0;
    if (!sign)
        *p++ = '-';

可以透過下方的小程式了解其中的關係:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int size = 1024;
    char *result = malloc(size);
    char *p = result;

    printf("result[0]'s value is %c \n", result[0]);
    printf("result[0]'s address %p \n", &result[0]);
    printf("*p's value is %c \n", *p);
    printf("*p's address %p \n", &(*p));
    *p++ = 'a';
    printf("result[0]'s value is %c \n", result[0]);
    printf("*p's address %p \n", &(*p));
    printf("result[1]'s address %p \n", &result[1]);

    return 0;
}

可以看到輸出結果如我們預期,一開始 p 是指向 result[0]*p++ 是指派 aresult[0],再指派 p 的指標至下一個 char 的起始位置。

result[0]'s value is  
result[0]'s address 0x55bda8fab2a0 
*p's value is  
*p's address 0x55bda8fab2a0 
result[0]'s value is a 
*p's address 0x55bda8fab2a1 
result[1]'s address 0x55bda8fab2a1 

計算除完的結果 (division) 以及餘數 (remainder),如果整除則直接返回 division。

    long long remainder = n % d;
    long long division = n / d;

7.19.6.6 The sprintf function

#include <stdio.h>
int sprintf(char * restrict s,
const char * restrict format, ...);

2 The sprintf function is equivalent to fprintf, except that the output is written into
an array (specified by the argument s) rather than to a stream. A null character is written
at the end of the characters written; it is not counted as part of the returned value. If
copying takes place between objects that overlap, the behavior is undefined.
3 The sprintf function returns the number of characters written in the array, not
counting the terminating null character, or a negative value if an encoding error occurred.

6.5.2.1 Array subscripting
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently,apointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).

sprintf 將會把 division 寫入至 p (pointer to char) 現在的位置,也就是 result[0] (正數)或是 result[1] (負數),並再最後加入 . 字元。

    sprintf(p, "%ld", division > 0 ? (long) division : (long) -division);
    if (remainder == 0)
        return result;

    p = result + strlen(result);
    *p++ = '.';

初始化 hash table 的各個 hash slots。

    /* Using a map to record all of reminders and their position.
     * if the reminder appeared before, which means the repeated loop begin,
     */
    char *decimal = malloc(size);
    memset(decimal, 0, size);
    char *q = decimal;

    size = 1333;
    struct list_head *heads = malloc(size * sizeof(*heads));
    for (int i = 0; i < size; i++)
        INIT_LIST_HEAD(&heads[i]);

可以判斷如果 pos 小於 0 則表示沒有在 hash table 找到 remainder,所以需要把目前 remainder 的值加入至目前 hash slot 所處的 linked list 中。而 find() 中的 hash function 為 int hash = key % size;

根據上述的描述,可以推測 MMMlist_add,而 EEE&heads[remainder % size]

        int pos = find(heads, size, remainder);
        if (pos >= 0) {
            ...
        }
        struct rem_node *node = malloc(sizeof(*node));
        node->key = remainder;
        node->index = i;

        MMM(&node->link, EEE);

        *q++ = (remainder * 10) / d + '0';
        remainder = (remainder * 10) % d;

如果 pos 有值則表示為有循環小數,所以需要先把循環小數前的數寫到 p,故 PPP = pos--

    for (int i = 0; remainder; i++) {
        int pos = find(heads, size, remainder);
        if (pos >= 0) {
            while (PPP > 0)
                *p++ = *decimal++;
            *p++ = '(';
            while (*decimal != '\0')
                *p++ = *decimal++;
            *p++ = ')';
            *p = '\0';
            return result;
        }
        ...
    }

    strcpy(p, decimal);
    return result;
}

測驗 6

  • 請寫出 M & X 並解釋上述程式碼運作原理
  • 在 Linux 核心原始程式碼中找出 alignof 的使用案例 2 則,並針對其場景進行解說
  • 在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集,探討其實作機制和用途,並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻,儘量共用相同的巨集
/*
 * ALIGNOF - get the alignment of a type
 * @t: the type to test
 *
 * This returns a safe alignment for the given type.
 */
#define ALIGNOF(t) \
    ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X)

ALIGNOF(t) 即為計算出 type t 的 alignment。

6.3.2.3 Pointers
An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.5

        ((struct { char c; t _h; } *)0     // null pointer contant
        &((struct { char c; t _h; } *)0->M // dereference M(_h)
(char *)&((struct { char c; t _h; } *)0->M // cast to pointer to char(_h)

下列相減要得出 type t 的 alignment。兩項相減需要為 type t 的 alignment,故 X 為 0。

((char *)(...) - (char *)X)

可以透過下列實驗得證,為什麼 dereference type t 的記憶體位置相減會得到 type t 的 alignment。

#include <stdio.h>

#define ALIGNOF(t) \
    ((char *)(&((struct { char c; t _h; } *)0)->_h) - (char *)0)

int main()
{
    struct foo {
        char c; // 1 byte
        // 1 byte padding
        short b; // 2 byte
    };
    
    struct foo1 {
        char c; // 1 byte
        // 7 bytes padding
        long b; // 8 byte
    };
    
    struct foo f;
    struct foo1 f1;
    printf("size of struct foo: %lu\n", sizeof(f));
    printf("%p\n", &f.c);
    printf("%p\n", &f.b);
    printf("gcc built-in alignof: %lu\n", __alignof__(short));
    printf("ALIGNOF: %lu\n", ALIGNOF(short));
    
    printf("size of struct foo1: %lu\n", sizeof(f1));
    printf("%p\n", &f1.c);
    printf("%p\n", &f1.b);
    printf("gcc built-in alignof: %lu\n", __alignof__(long));
    printf("ALIGNOF: %lu\n", ALIGNOF(long));

    return 0;
}

/*
size of struct foo: 4
0x7ffdfd970c4c
0x7ffdfd970c4e
gcc built-in alignof: 2
ALIGNOF: 2

size of struct foo1: 16
0x7ffdfd970c50
0x7ffdfd970c58
gcc built-in alignof: 8
ALIGNOF: 8
*/

測驗 7

  • 解釋上述程式運作原理並評估 naive.c 和 bitwise.c 效能落差
    • 避免 stream I/O 帶來的影響,可將 printf 更換為 sprintf
  • 分析 Faster remainders when the divisor is a constant: beating compilers and libdivide 一文的想法 (可參照同一篇網誌下方的評論),並設計另一種 bitmask,如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」,然後再改寫 bitwise.c 程式碼,試圖運用更少的指令來實作出 branchless;
    • 參照 fastmod: A header file for fast 32-bit division remainders on 64-bit hardware
  • 研讀 The Fastest FizzBuzz Implementation 及其 Hacker News 討論,提出 throughput (吞吐量) 更高的 Fizzbuzz 實作
  • 解析 Linux 核心原始程式碼 kernel/time/timekeeping.c 裡頭涉及到除法運算的機制,探討其快速除法的實作 (注意: 你可能要對照研讀 kernel/time/ 目錄的標頭檔和程式碼)過程中,你可能會發現可貢獻到 Linux 核心的空間,請充分討論

以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (bitwise.c)

KK1 = div5, KK2 = div3, KK3 = div3 << 2

static inline bool is_divisible(uint32_t n, uint64_t M)
{
    return n * M <= M - 1;
}

static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1;
static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1;

int main(int argc, char **argv)
{
    for (size_t i = 1; i <= 100; i++) {
        uint8_t div3 = is_divisible(i, M3);
        uint8_t div5 = is_divisible(i, M5);
        unsigned int length = (2 << KK1) << KK2;

        char fmt[9];
        strncpy(fmt, &"FizzBuzz%u"[(8 >> div5) >> (KK3)], length);
        fmt[length] = '\0';

        printf(fmt, i);
        printf("\n");
    }
    return 0;
}