2021q1 Homework2 (quiz2)

contributed by < xxiex123 >

tags: `linux2021`

測驗一

解釋上述程式碼運作原理

list_head 的結構如同 quiz1 所述，它是一個通用的 doubly-linked list 結構，只要將他當作struct 裏的一個 member ，那他就能幫助該 struct 建立一個雙向鏈接。其能達到這樣的功能主要靠歸功于 container_of

#ifndef container_of
#define container_of(ptr, type, member)                            \
    __extension__({                                                \
        const __typeof__(((type *) 0)->member) *__pmember = (ptr); \
        (type *) ((char *) __pmember - offsetof(type, member));    \
    })
#endif

__extension__ 用來防止使用 GNU extension 時可能發生的警告訊號。

將list_head作爲member 放到struct 裏，便擁有雙向鏈接特性。

typedef struct __element {
    char *value;
    struct __element *next;
    struct list_head list;
} list_ele_t;

這裏的成員 __element *next 其實沒必要存在，因爲 linked list 功能可直接用 list_head 完成。

從 DDerveialm 同學那邊看到，將 list_head list 成員放到該結構第一個成員的位置，可以使用 (list_ele_t *) list 去直接得到 list_ele_t 的位置，省下用 container_of 的麻煩和成本。因此修改後

typedef struct __element {
    struct list_head list;
    char *value;
} list_ele_t;

而queue_t 結構定義如下

typedef struct {
    list_ele_t *head; /* Linked list of elements */
    list_ele_t *tail;
    size_t size;
    struct list_head list;
} queue_t;

該例題並不需要用到 head 和tail，因此將其移除使結構更簡潔

typedef struct {
    size_t size;
    struct list_head list;
} queue_t;

在做了上述改動後，所有程式碼中的list_entry函數就都可以用(list_ele_t *)取代。

下面get middle 的方法

static list_ele_t *get_middle(struct list_head *list)
{
    struct list_head *fast = list->next, *slow;
    list_for_each (slow, list) {
        if (fast->next == list ||fast->next->next == list)
            break;
        fast = fast->next->next;
    }
    return list_entry(TTT, list_ele_t, list);
}

利用走一步比上走兩步的方法達到一種類似除2的效果，道理類似與100米賽跑，A選手速度為50m/s，B選手速度100m/s，所以B選手到終點時，A選手正好到中間的地方，也就是50米。

這邊list merge的部分，把左邊已排序好的list 和右邊已排序好的list 合并成一組已排序好的list。

static void list_merge(struct list_head *lhs,
                       struct list_head *rhs,
                       struct list_head *head)
{
    INIT_LIST_HEAD(head);
    if (list_empty(lhs)) {
        list_splice_tail(lhs, head);
        return;
    }
    if (list_empty(rhs)) {
        list_splice_tail(rhs, head);
        return;
    }

    while (!list_empty(lhs) && !list_empty(rhs)) {
        char *lv = list_entry(lhs->next, list_ele_t, list)->value;
        char *rv = list_entry(rhs->next, list_ele_t, list)->value;
        struct list_head *tmp = strcmp(lv, rv) <= 0 ? lhs->next : rhs->next;
        list_del(tmp);
        list_add_tail(tmp, head);
    }
    list_splice_tail(list_empty(lhs) ? rhs : lhs, head);
}

上面的判別式應該是寫錯了

    if (list_empty(lhs)) {
        list_splice_tail(lhs, head);
        return;
    }

意為左邊list 如果為空，就把左邊list加到head裏，正確應該是：
左邊為空，則把右邊加到head，再return。
同理右邊空，把左邊加到head ，return。

下面 merge 不多説，把list 分成左右邊去處理，再合并，是一個遞迴演算法。

void list_merge_sort(queue_t *q)
{
    if (list_is_singular(&q->list))
        return;

    queue_t left;
    struct list_head sorted;
    INIT_LIST_HEAD(&left.list);
    list_cut_position(&left.list, &q->list, MMM);
    list_merge_sort(&left);
    list_merge_sort(q);
    list_merge(&left.list, &q->list, &sorted);
    INIT_LIST_HEAD(&q->list);
    list_splice_tail(&sorted, &q->list);
}

從quiz1 的 Non-recursive所述，遞迴函數大量使用stack的特性在效率上并不樂觀，且穩定性收到限制。

they’re really doing is using the stack as their own private array. This is much slower than using a real array, and could cause stack overflow on some systems

因此可以將merge sort 用iteration 方式實作。

測驗二

解釋程式碼運行

考慮函式 func 接受一個 16 位元無號整數 N，並回傳小於或等於 N 的 power-of-2

uint16_t func(uint16_t N) {
    /* change all right side bits to 1 */
    N |= N >> 1;
    N |= N >> X;
    N |= N >> Y;
    N |= N >> Z;

    return (N + 1) >> 1;
}

我們觀察二進制 21 ， 21=0000000000010101，其小於等於N的power of 2 是16，二進制為16=0000000000010000，可以發現其實答案只是左邊開始遇到的第一個1留著，其他為0。上述實作做法是吧第一個1後面全都變成1，即00010101->00011111，此時讓其+1，變成00100000，再往左位移，變成00010000，就是答案。

但考慮到第16bit為1的輸入，假設1000001000000000，變成左邊變1后111111111111111，加1變成0000000000000000，發生overflow，因此更好的做法是，1111111111111111 往右先位移，變成0111111111111111，再加1。

更：
從 linD026 的作業得知，N + 1 這個動作，因爲 1 是 int 形態，當 uint16 與 int 做運算時，會把較低位元的形態轉成高位元的形態再做運算，因此 N 會先轉成 int 形態再做 +1 的動作，所以并不會有 overflow 發生。

但儘管如此，上面所使用的方法還是一個較佳的做法，因爲假設我想做的是 uint32 形態的 rounddown_pow_of_2 ，那 overflow 就會發生。

觀察 power of 2 相關函數

由 linux 的 bitops.h

static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
	return (n != 0 && ((n & (n - 1)) == 0));
}

/*
 * round up to nearest power of two
 */
static inline __attribute__((const))
unsigned long __roundup_pow_of_two(unsigned long n)
{
	return 1UL << fls_long(n - 1);
}

/*
 * round down to nearest power of two
 */
static inline __attribute__((const))
unsigned long __rounddown_pow_of_two(unsigned long n)
{
	return 1UL << (fls_long(n) - 1);
}

is_power_of_2 這個函數比較直觀，
x 假設是 2 的冪，那必定只有一個位元是 1，其他都是 0 ，而此數被 -1 的話，結果其實就是由這一個 1 （包含）開始向右做 reverse bits。
因此 x & （x - 1）必定為 0 。

而下面的 roundup 和 rounddown 則要追蹤到 fls 這個函數
而 1UL 代表 type 為 unsigned long 的 1。

/**
 * fls - find last (most-significant) bit set
 * @x: the word to search
 *
 * This is defined the same way as ffs.
 * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32.
 */

static __always_inline int fls(unsigned int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u)) {
		x <<= 1;
		r -= 1;
	}
	return r;
}

由注解可以知道此函數尋找 most-significant bit 開始遇到的第一個 1。

由此函數，roundup 回傳 1UL << fls_long(n - 1) ，可以先思考，n 的 roundup pow of 2 其實就是原本的 fls + 1 的2的冪次，比方説 000100100100 fls 為 9 ，那其結果就是 2的10次方，也就是 001000000000。此結果可由 000000000001 << fls 來得到。而傳入 fls 的值為 n - 1 是因爲要避免值已經是2的冪的情況還去做 roundup 。

而 rounddown 則是回傳 1UL << (fls_long(n) - 1) ,可以想成是 roundup 的結果 >> 1 。這裏用到了 shift -1 的情況。

考慮到 n = 0 的時候，fls 會 return 0，因此變成 1UL << -1 ，因此 n = 0 的 rounddown 也 = 0。

slab allocation

由維基百科 Slab allocation 的定義，由於 CPU 在初始化和拆除（destruction）核心的資料物件時，耗費了大量的時間導致效能降低，因此利用 slap allocation 的機制：
預先 allocated 一些常用到的特定形態和大小的資料物件在記憶體裏，當 kernel 要用到時可以不需要重新初始化記憶體，而直接到 slab 去拿一個空的（free）位置來用。
這個方法降低記憶體 fragmentation 的問題，因爲 slab 是由一塊實體上連續的記憶體空間所形成。

由 linux kernel organization 網站的 Chapter 8 Slab Allocator 也有提到

The slab allocator has three principle aims:

The allocation of small blocks of memory to help eliminate internal fragmentation that would be otherwise caused by the buddy system;
The caching of commonly used objects so that the system does not waste time allocating, initialising and destroying objects. Benchmarks on Solaris showed excellent speed improvements for allocations with the slab allocator in use [Bon94];
The better utilisation of hardware cache by aligning objects to the L1 or L2 caches.

buddy system 在 linD026 同學作業中有詳細的説明，是一種 memory allocation 的方法，顯然這種方法效能被 fragmentation 所限制。

第三點指出，slab allocator 也讓 cpu 裏的 L1、L2 caches 使用率更佳。

2021q1 Homework2 (quiz2)

tags: linux2021

測驗一

解釋上述程式碼運作原理

測驗二

解釋程式碼運行

觀察 power of 2 相關函數

slab allocation

Read more

Fibonacci Device

2021q1 Homework1 (lab0)

2021q1 Homework1 (quiz1)

tags: `linux2021`