2022q1 Homework2 (quiz2)

contributed by < robertnthu >

tags: `linux2022`

測驗 1

EXP1 = a & b & 1
EXP2 = a & b
EXP3 = a ^ b

解釋運作原理

EXP1

#include <stdint.h>
uint32_t average(uint32_t a, uint32_t b)
{
    return (a >> 1) + (b >> 1) + (EXP1);
}

因為 a >> 1就是 a / 2，b >> 1 就是 b / 2
只要 a 不是奇數且 b 不是奇數，那(a >> 1) + (b >> 1)都會是(a + b) / 2
如果 a ,b 都是奇數，那我們應該多加一個1在(a >> 1) + (b >> 1)後，其餘情況都是0
這個判斷可以用a & b & 1來實現
- a, b 的第一個 bit 都為1時，a & b & 1 = 1

EXP2, EXP3
參考：你所不知道的 C 語言：數值系統篇

用加法器來思考: x & y 是進位, x ^ y 是位元和, / 2 是向右移一位
位元相加不進位的總和: x ^ y
位元相加產生的進位值: (x & y) << 1
x + y = x ^ y + ( x & y ) << 1
所以 (x + y) / 2
= (x + y) >> 1
= (x ^ y + (x & y) << 1) >> 1
= (x & y) + ((x ^ y) >> 1)

比較上述實作在編譯器最佳化開啟的狀況，對應的組合語言輸出，並嘗試解讀

輸出組合語言

程式

#include <stdio.h>
#include <stdint.h>

unsigned int avg(unsigned int a, unsigned int b) {
   return a + (b - a) / 2;
}

int main(void) 
{
    unsigned int x, y;

    scanf("%u", &x);
    scanf("%u", &y);
    avg(x, y);
    return 0;
}

組合語言
利用 gcc -O2 -S test.c來編譯程式，-O2是開啟最佳化，取出程式片段

b + (a - b) / 2

	movl	%esi, %eax
	subl	%edi, %eax
	shrl	%eax
	addl	%edi, %eax

%esi 存的是 a 的值，把 a 的值放到 %eax, 而 b 放在 %edi
subl %edi %eax 是 %eax = %eax - %edi，也就是 a - b 的運算
shrl %eax shift right，是 (a - b) / 2 的運算
addl %edi, %eax再加上 b ，就完成了

(a >> 1) + (b >> 1) + (a & b & 1)

	movl	%edi, %eax
	movl	%esi, %edx
	andl	%esi, %edi
	shrl	%eax
	shrl	%edx
	andl	$1, %edi
	addl	%edx, %eax
	addl	%edi, %eax

(a & b) + ((a ^ b) >> 1)

	movl	%edi, %eax
	andl	%esi, %edi
	xorl	%esi, %eax
	shrl	%eax
	addl	%edi, %eax

以 instruction 數量來說，b + (a - b) / 2是最少的
但是它有一個addl以及subl指令
而(a & b) + ((a ^ b) >> 1)雖然多一個 instruction，但是只有一個addl指令。

研讀 Linux 核心原始程式碼

BUILD_BUG_ON: 參考linux tricks 之 BUILD_BUG_ON_ZERO.

include/linux/kernel.h
/* Force a compilation error if condition is true */
#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))

這個巨集是為了檢查 condition這個數，如果 condition == 0會回傳 1，若condition != 0則會回傳 -1

WRITE_ONCE

#define WRITE_ONCE(var, val)
(*((volatile typeof(val) *)(&(var))) = (val))

參考Linux内核的WRITE_ONCE函数分析，volatile可以確保這個指令不會因為編譯器最佳化而省略，為了解決編譯時同步問題的方法
average.h

需要三個參數，一個是要被拿來計算的資料結構的名字，一個是浮點數運算的精度，一個是計算 EWMA 時的加權係數

The third argument, the weight reciprocal, determines how the new values will be weighed vs. the old state, new values will get weight 1/weight_rcp and old values 1-1/weight_rcp. Note that this parameter must be a power of two for efficiency.

所以 new values 要乘上 1/weight_rcp，old values 要乘上 1-1/weight_rcp，兩個相加就是 EWMA。
這個程式是最主要的計算，internal 就是 old values

WRITE_ONCE(e->internal, internal ?			\
			(((internal << weight_rcp) - internal) +	\
				(val << precision)) >> weight_rcp :	\
			(val << precision));

為了計算方便，令 weight_rcp 是 2 的冪，這樣乘法運算只需要用 shift right 跟 shift left 就可以完成

測驗 2

EXP4 = a ^ b
EXP5 = a <= b

回顧解讀計算機編碼

int32_t min(int32_t a, int32_t b) {
    int32_t diff = (a - b);
    return b + (diff & (diff >> 31));
}

先計算 a, b 的 difference
(diff >> 31)
- 如果 diff 是正數，則得到 0 ; diff & (diff >> 31) 就是 0
- 如果 diff 是複數，則得到 -1 (0xFFFFFFFF)，diff & (diff >> 31) 就還是 diff
總結
- 如果 a > b ，則回傳 b，也就是 b +０
- 如果 a < b ，則回傳 a，也就是 b + (a - b) = b + diff = a

程式運作

題目要求是以下程式

#include <stdint.h>
uint32_t max(uint32_t a, uint32_t b)
{
    return a ^ ((EXP4) & -(EXP5));
}

接下來根據提示來回答，希望的操作是
max(a, b) = (a > b) ? a : b

EXP5 是 a, b 的比較操作
- 比較操作的回傳值是 0 或 1
- EXP4 & -(0) 等於 EXP4 & 0 = 0
- EXP4 & -(1) 等於 EXP4 & 0xFFFFFFFF = EXP4
  所以可以判斷出 EXP4 & ~(EXP5) 等於 EXP4 或 0
假設 EXP4 & -(EXP5) 等於 0，此時 EXP5 = 0
- 則回傳 a ^ 0 = a，所以是 a > b 的結果
- 我們希望 a > b 時，EXP5 = 0
- EXP5 : a <= b
假設EXP4 & -(EXP5)等於 EXP4，此時 EXP5 = 1
- 則回傳 a ^ EXP4
- 所以我們希望a ^ EXP4等於 b
- 因為 a ^ a = 0，且 0 ^ b = b
- 所以 EXP4 : a ^ b

針對 32 位元無號/有號整數的 branchless 實作

我認為在不考慮「uint32_t 會出現負數」的前提下，兩者的實作是一樣的。Assign 一個負數給 unsigned int，在 "<=" 的判斷會出錯。負數因為 sign bit 的關係，會判斷為「負數 > 正數」

在 Linux 核心原始程式碼中，找出更多類似的實作手法。

如何利用git log檢索

測驗 3

COND = v & (-1) 或 v | 0
RET = u << shift

程式運作

為何 binary 運算可以用減法代替 mod 運算?

我認為這是因為， mod 運算本身可以用減法實現。
如 37 mod 6，如果用%可以直接計算出餘數，但是如果用大減小的方式，37 - 6, 31 - 6, 25 - 6, …，最後一樣可以得到同樣的結果

uint64_t gcd64(uint64_t u, uint64_t v)
{
    if (!u || !v) return u | v; // if there's one 0, return u | v. Simplify the condition statement

    int shift;
    for (shift = 0; !((u | v) & 1); shift++) {
        u /= 2, v /= 2; // right shift until "u | v" has no 0 in 1-bit
    }
    while (!(u & 1)) // shift right until 1-bit in u is not 0
        u /= 2; // Because if u = 2^k, v = 2^h, we can divide 2^min(k, h)
    do {
        while (!(v & 1)) // shift right until 1-bit in v is not 0
            v /= 2;
        if (u < v) {
            v -= u; // if u < v, v = v - u
        } else { // u >= v
            uint64_t t = u - v;
            u = v; // u = v
            v = t; // v = u - v
        }
    } while (COND); // gcd will keep going until v == 0 or v == 1
    return RET; // it should be u, but i can't garantee
}

if (!u || !v) return u | v; 檢查 u, v 是否都非零
如果 u, v 的二進位有連續的 0，那就一起 shift，而最後還必須要把 shift 再乘回來，因為這也是最小公倍數的一部分

u = 1 1 0 0 0 0
v = 1 0 1 0 0 0
    ||
    \/
u = 1 1 0
v = 1 0 1

接著是輾轉相除法

while (!(u & 1))
        u /= 2;

為什麼要把 u, v 各自 shift 到 1-bit 不為 0?

因為一個奇數與一個偶數的最小公倍數，必定不會是偶數，所以可以直接把偶數 shift right，相當於除以2，直到出現奇數。
- 假設經過一開始的 shift 後，u = 10011, v = 11000，此時因為 u 是奇數，gcd(u, v)不可能會是2, 4, 8，所以 v 可以直接 shift right 3 bits

接著是一般的輾轉相除法，如果 u > v，則 gcd(u, v) = gcd(v, u - v)，後續的程式就是這裡的實作

if (u < v) {
    v -= u; // if u < v, v = v - u
} else { // u >= v
    uint64_t t = u - v;
    u = v; // u = v
    v = t; // v = u - v
}

因為上面的條件判斷方式，最後 v 一定是較小的數，所以迴圈的中止條件是 v == 0，用 bitwise 操作就是 v & (-1) 或是 v | 0。而中止迴圈後要回傳 u，再乘上一開始 shift 的 bits 數

uint64_t gcd64(uint64_t u, uint64_t v)
{
    if (!u || !v) return u | v; // if there's one 0, return u | v. Simplify the condition statement

    int shift;
    for (shift = 0; !((u | v) & 1); shift++) {
        u /= 2, v /= 2; // right shift until "u | v" has no 0 in 1-bit
    }
    while (!(u & 1)) // shift right until 1-bit in u is not 0
        u /= 2; // Because if u = 2^k, v = 2^h, we can divide 2^min(k, h)
    do {
        while (!(v & 1)) // shift right until 1-bit in v is not 0
            v /= 2;
        if (u < v) {
            v -= u; // if u < v, v = v - u
        } else { // u >= v
            uint64_t t = u - v;
            u = v; // u = v
            v = t; // v = u - v
        }
    } while (v & (-1)); // gcd will keep going until v == 0
    return u << shift;
}

在 x86_64 上透過 __builtin_ctz 改寫 GCD

int __builtin_ctz(unsigned int)
- This builtin method by GCC determines the count of trailing zero in the binary representation of a number.
- from Builtin GCC Functions - __builtin_clz(); __builtin_ctz(); __builtin_popcount();
所以我們可以利用 __builtin_ctz 來取代計算 trailing zero 的迴圈

計算 shift

for (shift = 0; !((u | v) & 1); shift++) {
    u /= 2, v /= 2; // right shift until "u | v" has no 0 in 1-bit
}

且這裡 u, v 的 right shift 可以移到後面再做，所以改寫為

shift = min(__builtin_ctz(u), __builtin_ctz(v));

使用先前「不需要分支的 min」

int64_t min(int64_t a, int64_t b) {
    int32_t diff = (a - b);
    return b + (diff & (diff >> 31));
}

後續的

while (!(u & 1)) // shift right until 1-bit in u is not 0
        u /= 2;

以及

while (!(v & 1)) // shift right until 1-bit in u is not 0
        v /= 2;

也改寫成

u >>= __builtin_ctz(u);

以及

v >>= __builtin_ctz(v);

簡化條件判斷
由於條件判斷的部份看起來可以更簡潔，配合先前的 min(), max() 改寫
最終程式

int64_t min(uint64_t a, uint64_t b) {
    int32_t diff = (a - b);
    return b + (diff & (diff >> 31));
}

int64_t max(uint64_t a, uint64_t b) {
    int32_t diff = (a - b);
    return a - (diff & (diff >> 31));
}

uint64_t gcd64(uint64_t u, uint64_t v)
{
    if (!u || !v) return u | v;
    
    int shift;
    shift = min(__builtin_ctz(u), __builtin_ctz(v));

    u >>= __builtin_ctz(u);
    do {
        v >>= __builtin_ctz(v);
        
        int64_t tmp = min(u, v);
        v = max(u, v) - tmp;
        u = tmp;
    } while (v & (-1)); // gcd will keep going until v == 0
    return u << shift;
}

測試效能

未完成

lib/math/gcd.c

__ffs 作用跟__builtin_ctz一樣，參考__ffs.h
- 但是定在 <string.h> 的 ffs跟 __ffs.h裡面的作用不一樣，ffs會多計算一位

#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
...gcd()...
#else
...gcd()...

這裡有兩種 gcd 的程式，如果__ffs無法使用，則用第二種 gcd()

gcd() with __ffs available

完整程式

unsigned long gcd(unsigned long a, unsigned long b)
{
unsigned long r = a | b;

if (!a || !b)
	return r;

b >>= __ffs(b);
if (b == 1)
	return r & -r;

for (;;) {
	a >>= __ffs(a);
	if (a == 1)
		return r & -r;
	if (a == b)
		return a << __ffs(r);

	if (a < b)
		swap(a, b);
	a -= b;
}

}

unsigned long r = a | b;

if (!a || !b)
	return r;

如果 a 或 b 有 0 ，則回傳 r ，因為 r = a | b，所以不管 a, b 中有一個 0 還是兩者都是 0 ，都能回傳正確的值

b >>= __ffs(b);
if (b == 1)
	return r & -r;

b >>= __ffs(b); 跟先前使用 __builtin_ctz 的u >>= __builtin_ctz(u)作用一樣，都是針對 trailing zero 做 shift right
若 b == 1，則代表 b 為 2 的冪，b的公因數只有 2 ，所以只需要確認 a 是否也有因數為 2 ，就能回傳，2 以外因數可以不用在乎，因為必定不是公因數
而檢查 a 有因數為 2 的方法，就是看 a | b 有幾個 trailing zero，
r & -r = (a | b) & (~(a | b) + 1)
r = a | b，所以 a, b 的共同 trailing zero 也會是 r 的 trailing zero
而 -r首先會取 compliment，所以本來的 trailing zero 會變成 1 ，再加 1 之後會全部進位變成 0，且除了進位的那個 carry bit 以外的 bit，在 r & -r運算之後都會變成 0 ，也就得到想要的最大公因數

// 舉例
     a = 1 1 0 1 1 0 0 0 
     b = 0 0 1 0 0 0 0 0
     r = 1 1 1 1 1 0 0 0 
    -r = 0 0 0 0 1 0 0 0 
r & -r = 0 0 0 0 1 0 0 0
// 舉例
     a = 1 0 0 1 0 0 1 0 
     b = 0 0 1 0 0 0 0 0
     r = 1 0 1 1 0 0 1 0 
    -r = 0 1 0 0 1 1 1 0
r & -r = 0 0 0 0 0 0 1 0

這個操作等價於

if (b == 1)
    return 1 << min(__ffs(a), __ffs(b));

但這裡選擇的是r & -r的做法

for (;;) {
    a >>= __ffs(a);
    if (a == 1)
        return r & -r;
    
    if (a == b)
        return a << __ffs(r);
    
    if (a < b)
        swap(a, b);
    a -= b;
}

一開始的a >>= __ffs(a);與先前的v >>= __builtin_ctz(v);作用一樣
- 因為 b 已經先 shift right 變成奇數，所以如果 a 是偶數，2 也必定不是公因數，所以把 a 也做 shift right，直到 a 也變成奇數

if (a == 1)
    return r & -r;

若 a shift right 之後等於 1，則此時的 a 為 2 的冪，與前面對 b 的操作一樣，若 a, b 兩數有一數為 2 的冪，則最大公因數必是 2 的冪

if (a == b)
    return a << __ffs(r);

若 a == b，此時就是最大公因數，因為 a, b 兩數的trailing zero 已經被 shift right 消去了，所以回傳時要再 shift left 回來，這裡等價於先前的 return u << shift

if (a < b)
    swap(a, b);
a -= b;

把比較小的數放到 b ，然後 a = a - b

我認為這個程式與一開始使用 int shift紀錄共同 trailing zero 以及__builtin_ctz的程式是一樣的，只是實作方式有些微的不同

gcd() without __ffs

完整程式

unsigned long gcd(unsigned long a, unsigned long b)
{
	unsigned long r = a | b;

	if (!a || !b)
		return r;

	/* Isolate lsbit of r */
	r &= -r;

	while (!(b & r))
		b >>= 1;
	if (b == r)
		return r;

	for (;;) {
		while (!(a & r))
			a >>= 1;
		if (a == r)
			return r;
		if (a == b)
			return a;

		if (a < b)
			swap(a, b);
		a -= b;
		a >>= 1;
		if (a & r)
			a += b;
		a >>= 1;
	}
}

	unsigned long r = a | b;

	if (!a || !b)
		return r;

	/* Isolate lsbit of r */
	r &= -r;

這裡把 r 存成 r & -r，也就是先前說明過的一個公因數且為 2 的冪

while (!(b & r))
    b >>= 1;
if (b == r)
    return r;

此處把 b 向右 shift，直到 b 的 first set bit 跟 r 對在一起

// 本來
r = 0 0 0 0 1 0 0 0
b = 1 0 1 0 0 0 0 0
// 之後
r = 0 0 0 0 1 0 0 0
b = 0 0 1 0 1 0 0 0

如果 b shift right 之後等於 r ，則 b 也是 2 的冪，且我們已經找到一個 r 是 a, b 的公因數是 2 的冪，那 r 就必然是最大公因數，因為 b 既然是 2 的冪，b 就不會有 2 以外的因數

for (;;) {
    while (!(a & r))
        a >>= 1;
    if (a == r)
        return r;
    if (a == b)
        return a;

    if (a < b)
        swap(a, b);
    a -= b;
    a >>= 1;
    if (a & r)
        a += b;
    a >>= 1;
}

首先也是對 a 做一樣的處理，向右 shift 到跟 r 對齊，並檢查 a 是否也是 2 的冪，如果是則回傳 r
若此時a == b， a 就是 a, b 的最大公因數，則回傳
如果 a < b，則 a, b兩者交換，把比較大的數放在 a,較小的數放在 b

接下來的操作較先前兩個較為不同，先a -= b，然後 a 向右 shift 一位

因為經過先前的操作，a, b兩者的 first set bit 都對齊在 r 的 first set bit，所以相減之後，在該位置必定是 0

r = 0 0 0 0 1 0 0 0
b = 0 0 1 0 1 0 0 0
a = 0 1 0 1 1 0 0 0
// 相減之後
r = 0 0 0 0 1 0 0 0
b = 0 0 1 0 1 0 0 0
a = 0 0 1 1 0 0 0 0
// shift right 之後
r = 0 0 0 0 1 0 0 0
b = 0 0 1 0 1 0 0 0
a = 0 0 0 1 1 0 0 0

為何檢查 a & r且相等的話再a += b ?
- 這樣代表 a - b 會是 r 的倍數，因為是 shift right 之後跟 r 相等

測驗 4

EXP6 = bitset & -bitset

程式運作原理

naive()

#include <stddef.h>
size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
    size_t pos = 0;
    for (size_t k = 0; k < bitmapsize; ++k) {
        uint64_t bitset = bitmap[k];
        size_t p = k * 64;
        for (int i = 0; i < 64; i++) {
            if ((bitset >> i) & 0x1)
                out[pos++] = p + i;
        }
    }
    return pos;
}

bitmap 是一個 uint64_t 的一維 array，每個 uint64_t 代表了 64 個 bits 的資訊
bitmapsize 是 bitmap 的元素個數
out 是一個 uint32_t 的一維 array
這個程式把整個 bitmap 出現的 1 的位置都存在 out 裡面，並回傳整個 bitmap 的 1 的數量

// e.g
// bitmap
bitmap[0] = 0000 0000 ... 0000 1101
bitmap[1] = 0000 0000 ... 0100 1000
.
.
// naive()
out[0] = 0
out[1] = 2
out[2] = 3
out[3] = 67
out[4] = 70
.
.

improved()

#include <stddef.h>
size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
    size_t pos = 0;
    uint64_t bitset;
    for (size_t k = 0; k < bitmapsize; ++k) {
        bitset = bitmap[k];
        while (bitset != 0) {
            uint64_t t = EXP6;
            int r = __builtin_ctzll(bitset);
            out[pos++] = k * 64 + r;
            bitset ^= t;                           
        }
    }
    return pos;
}

其中第 9 行的作用是找出目前最低位元的 1，並紀錄到 t 變數。舉例來說，若目前 bitset 為 000101，那 t 就會變成 000001，接著就可以透過 XOR 把該位的 1 清掉，其他保留 (此為 XOR 的特性)。

根據測驗 3 lib/math/gcd.c 的程式，我們知道 r & -r可以有留下 first set bit 並把其他 bits 都設為 0的作用，所以這題答案就是bitset & -bitset

真實案例

Linux 核心設計: 不只挑選任務的排程器談到Linux 2.6 O(1) 排程器時，使用 140-bits bitmap 來作為 priority array，並且使用 find first set 來找出目前 priority 最高的 process。
在這樣的結構裡，這個程式可以找到所有需要被 schedule 的所有優先權級別

測驗 5

PPP = pos - -
MMM = list_add
EEE = &heads[remainder % size]

程式說明

struct rem_node {
    int key;
    int index;
    struct list_head link;
};

這個資料結構是為了使用 hash table 而設計的。

static int find(struct list_head *heads, int size, int key)
{
    struct rem_node *node;
    int hash = key % size;
    list_for_each_entry (node, &heads[hash], link) {
        if (key == node->key)
            return node->index;
    }
    return -1;
}

配合 find() ，可以在 hash table 裡面找到相對應的 key ，如果有找到同樣的 key 的話，會回傳 index。

fractionToDecimal()

char *fractionToDecimal(int numerator, int denominator)
{
    int size = 1024;
    char *result = malloc(size);
    char *p = result;

    if (denominator == 0) {
        result[0] = '\0';
        return result;
    }

    if (numerator == 0) {
        result[0] = '0';
        result[1] = '\0';
        return result;
    }

    /* using long long type make sure there has no integer overflow */
    long long n = numerator;
    long long d = denominator;

    /* deal with negtive cases */
    if (n < 0)
        n = -n;
    if (d < 0)
        d = -d;

    bool sign = (float) numerator / denominator >= 0;
    if (!sign)
        *p++ = '-';

    long long remainder = n % d;
    long long division = n / d;

    sprintf(p, "%ld", division > 0 ? (long) division : (long) -division);
    if (remainder == 0)
        return result;

    p = result + strlen(result);
    *p++ = '.';

    /* Using a map to record all of reminders and their position.
     * if the reminder appeared before, which means the repeated loop begin,
     */
    char *decimal = malloc(size);
    memset(decimal, 0, size);
    char *q = decimal;

    size = 1333;
    struct list_head *heads = malloc(size * sizeof(*heads));
    for (int i = 0; i < size; i++)
        INIT_LIST_HEAD(&heads[i]);

    for (int i = 0; remainder; i++) {
        int pos = find(heads, size, remainder);
        if (pos >= 0) {
            while (PPP > 0)
                *p++ = *decimal++;
            *p++ = '(';
            while (*decimal != '\0')
                *p++ = *decimal++;
            *p++ = ')';
            *p = '\0';
            return result;
        }
        struct rem_node *node = malloc(sizeof(*node));
        node->key = remainder;
        node->index = i;

        MMM(&node->link, EEE);

        *q++ = (remainder * 10) / d + '0';
        remainder = (remainder * 10) % d;
    }

    strcpy(p, decimal);
    return result;
}

    int size = 1024;
    char *result = malloc(size);
    char *p = result;

    if (denominator == 0) {
        result[0] = '\0';
        return result;
    }

    if (numerator == 0) {
        result[0] = '0';
        result[1] = '\0';
        return result;
    }

程式的前半段是檢查如果 input 有 0 的情況，並可以直接回傳。

    /* using long long type make sure there has no integer overflow */
    long long n = numerator;
    long long d = denominator;

    /* deal with negtive cases */
    if (n < 0)
        n = -n;
    if (d < 0)
        d = -d;

    bool sign = (float) numerator / denominator >= 0;
    if (!sign)
        *p++ = '-';

處理正負號，把 numerator, denominator 的負號都去掉，並把正確的正負號加在 *p，也就是result裡面，並且用 n, d 取代本來的 numerator, denominator

    long long remainder = n % d;
    long long division = n / d;

    sprintf(p, "%ld", division > 0 ? (long) division : (long) -division);
    if (remainder == 0)
        return result;

    p = result + strlen(result);
    *p++ = '.';

先進行第一次運算，如果餘數是 0 的話，代表可以整除，回傳答案，不然就把 result 加進答案，並且開始小數運算
sprintf()，C 語言規格書的描述

The sprintf function is equivalent to fprintf, except that the output is written into an array (specified by the argument s) rather than to a stream. A null character is written at the end of the characters written; it is not counted as part of the returned value. If copying takes place between objects that overlap, the behavior is undefined.
疑問：為何需要確認 division 的正負號？因為前面處理過 n, d 的正負號，此時 n, d 應該都是正的數字才對，而 n / d 如果應該會大於等於 0 才對

    /* Using a map to record all of reminders and their position.
     * if the reminder appeared before, which means the repeated loop begin,
     */
    char *decimal = malloc(size);
    memset(decimal, 0, size);
    char *q = decimal;

    size = 1333;
    struct list_head *heads = malloc(size * sizeof(*heads));
    for (int i = 0; i < size; i++)
        INIT_LIST_HEAD(&heads[i]);

decimal 是一個 1024 bytes 的位址來紀錄小數部份，並把 q 也指向該位址，q 是除法的商
接下來建立 hash table，選擇 1333 當作 hash table 的大小
- 有點好奇 1333 並非是質數，比起其他質數，發生碰撞的機會比較高

    for (int i = 0; remainder; i++) {
        int pos = find(heads, size, remainder);
        if (pos >= 0) {
            while (PPP > 0)
                *p++ = *decimal++;
            *p++ = '(';
            while (*decimal != '\0')
                *p++ = *decimal++;
            *p++ = ')';
            *p = '\0';
            return result;
        }
        struct rem_node *node = malloc(sizeof(*node));
        node->key = remainder;
        node->index = i;

        MMM(&node->link, EEE);

        *q++ = (remainder * 10) / d + '0';
        remainder = (remainder * 10) % d;
    }

只要 remainder 不是 0 ，這個 loop 就會一直做下去
int pos = find(heads, size, remainder);
- 到 hash table 去找 remainder，如果找到，就代表出現循環小數，則可以回傳答案
  - 小數部份的答案放在 decimal 裡面，按照題目要求處理之後，一個一個放進 *p 裡面
- 如果沒有找到一樣的餘數，就建立一個新的 rem_node，並把 remainder 當作 key、 i 當作 index 存入
  - 此處的 i 是紀錄了這個迴圈執行了第幾次
MMM(&node->link, EEE);是把 rem_node存入 hash table 的巨集，所以 MMM = list_add() 或 list_add_tail() 都可以
EEE 因為是要放進 hash table 的某個 slot，透過前面 find()可以知道，hash 值就是 remainder % size，所以 EEE = &heads[remainder % size]
最後把商存進 q 字串裡，數字 + '0'的用法是把0~9的整數換成 char，然後更新 remainder，就是國高中學的除法

    *q++ = (remainder * 10) / d + '0';
    remainder = (remainder * 10) % d;

所以可以判斷出，每次存進去的 index 值 i ，可以用來判斷在第幾位出現循環小數，那 PPP = pos - -
- 假設在第 6 次出現循環小數，代表先前除過的 decimal 會有 6 位重複，所以是 pos - -

測驗 6

M = _h
X = 0

回顧 container_of

/**
 * container_of() - Calculate address of object that contains address ptr
 * @ptr: pointer to member variable
 * @type: type of the structure containing ptr
 * @member: name of the member variable in struct @type
 *
 * Return: @type pointer of object containing ptr
 */
#ifndef container_of
#ifdef __LIST_HAVE_TYPEOF
#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })
#else
#define container_of(ptr, type, member) \
    ((type *) ((char *) (ptr) -offsetof(type, member)))
#endif
#endif

利用 offsetof 找到 member 與結構起始點的偏移量，再用當前位置扣除，就可以得到整個結構的起始點

/*
 * ALIGNOF - get the alignment of a type
 * @t: the type to test
 *
 * This returns a safe alignment for the given type.
 */
#define ALIGNOF(t) \
    ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X)

根據提示，alignof() 要回傳 t 的長度，也就是 t 佔了幾個 byte

首先，宣告了一個 struct { } 然後取出個位址，再扣掉(char *)X，所以 (char *)X)應該要是這個結構的起點，而(&((struct { char c; t _h; } *)0)->M)要是這個結構加上 t 的位址，這樣相減，才會得到 t 的長度
M = _h
X = 0

測驗 7

KK1 = div3
KK2 = div5
KK3 = div3 << ２

Faster remainders when the divisor is a constant: beating compilers and libdivide

You have that n/d = n * (2^N/d) / (2^N). The division by a power of two (/ (2^N)) can be implemented as a right shift if we are working with unsigned integers, which compiles to single instruction: that is possible because the underlying hardware uses a base 2. Thus if 2^N/d has been precomputed, you can compute the division n/d as a multiplication and a shift.

How do compilers compute the remainder? They first compute the quotient n/d and then they multiply it by the divisor, and subtract all of that from the original value (using the identity n % d = n - (n/d) * d).

The intuition is as follows. To divide by four, you might choose to multiply by 0.25 instead. Take 5 * 0.25, you get 1.25. The integer part (1) gives you the quotient, and the decimal part (0.25) is indicative of the remainder: multiply 0.25 by 4 and you get 1, which is the remainder. Not only is this more direct and potential useful in itself, it also gives us a way to check quickly whether the remainder is zero. That is, it gives us a way to check that we have an integer that is divisible by another: do x * 0.25, the decimal part is less than 0.25 if and only if x is a multiple of 4.

要檢查一個數會不會被 4 整除，就把那個數乘上 0.25，然後檢查小數部份，如果小於 0.25，那這個數可以被 4 整除

is_divisible會回傳一個 bool ，如果可以整除的話，則會回傳 1，否則回傳 0
由此可知
- div3 = 0, div5 = 0，要印出數字，所以長度為 2
- div3 = 0, div5 = 1，要印出 "Buzz"，長度為 4
- div3 = 1，div5 = 0，要印出 "Fizz"，長度為 4
- div3 = 1，div5 = 1，要印出 "FizzBuzz"，長度為 8

觀察 unsigned int length = (2 << KK1) << KK2;
根據提示，KK1, KK2 都是變數名稱，那就是 div3, div5 了，因為 2 shift left 就是乘以 2 ，再往下看：

        char fmt[9];
        strncpy(fmt, &"FizzBuzz%u"[(9 >> div5) >> (KK3)], length);
        fmt[length] = '\0';

div3 = 0，div5 = 0, 要印出數字，希望操作後為 8
div3 = 1，div5 = 0, 要印出 Fizz ，希望操作後為 0
div3 = 0，div5 = 1, 要印出 Buzz ，希望操作後為 4
div3 = 1，div5 = 1, 要印出 FizzBuzz ，希望操作後為 0

div5 = 1時，9 已經 shift right 變成 4 (0100)

如果 div3 = 1，希望操作後為 0，所以還要在 shift right 至少三個 bits
如果 div3 = 0，希望操作後為 4，所以不需要 shift 了

div5 = 0時，9 還是 9 (1001)

如果 div3 = 1，希望操作後為 0(0000)，所以還要 shift right 至少四個 bit
如果 div3 = 0，希望操作後為 8(1000)，此處如果-1，操作無法維持與前述幾個同樣形式，若把 9 改成 8 ，則不影響前面的 shift 操作，且此處不操作即可滿足條件

KK3 = div3 << 2，因為 1 << 2 = 4，可滿足前述操作

2022q1 Homework2 (quiz2)

tags: linux2022

測驗 1

解釋運作原理

比較上述實作在編譯器最佳化開啟的狀況，對應的組合語言輸出，並嘗試解讀

輸出組合語言

研讀 Linux 核心原始程式碼

測驗 2

程式運作

針對 32 位元無號/有號整數的 branchless 實作

在 Linux 核心原始程式碼中，找出更多類似的實作手法。

測驗 3

程式運作

為何 binary 運算可以用減法代替 mod 運算?

在 x86_64 上透過 __builtin_ctz 改寫 GCD

計算 shift

測試效能

lib/math/gcd.c

gcd() with __ffs available

gcd() without __ffs

測驗 4

程式運作原理

naive()

improved()

真實案例

測驗 5

程式說明

測驗 6

回顧 container_of

測驗 7

Read more

2022q1 Homework1 (quiz1)

2022q1 Homework1 (lab0-c)

tags: `linux2022`