2022q1 Homework4 (quiz4)

contributed by < steven1lung >

測驗一

瞭解程式碼

int ceil_log2(uint32_t x)
{
    uint32_t r, shift;

    x--;
    r = (x > 0xFFFF) << 4;
    x >>= r;
    shift = (x > 0xFF) << 3;
    x >>= shift;
    r |= shift;
    shift = (x > 0xF) << 2;
    x >>= shift;
    r |= shift;
    shift = (x > 0x3) << 1;
    x >>= shift;
    return (r | shift | x >> 1) + 1;       
}

從判斷是否是大於某個數去右移 x 看得出來這是一個二分搜尋法，找出一個數在哪個區域中有最靠近 MSB 的 1。找到後就將每次的位移數量 | 起來再加一（取 ceiling ）就可以得到答案。

01101011 檢查有沒有大於 1111，有所以要右移 4 位
00000110 檢查有沒有大於 11，有所以要右移 2 位
00000001 檢查有沒有大於 1，沒有所以不動
答案就是剛剛的 4 | 2 | 1 >> 1 + 1 = 7

$⌈ l o g_{2} (01101011)_{2} ⌉ = ⌈ 6.74 ⌉ = 7$

在 Linux 核心原始程式碼找出 ceil 和 log2 的組合

測驗二

瞭解程式碼

#define BITS_PER_BYTE 8
#define BITS_PER_LONG (sizeof(unsigned long) * BITS_PER_BYTE)

#include <stddef.h>

static inline size_t ffs(unsigned long x)
{
    if (x == 0)
        return 0;

    size_t o = 1;
    unsigned long t = ~0UL;
    size_t shift = BITS_PER_LONG;

    shift >>= 1;
    t >>= shift;

    while (shift) {
        if ((x & t) == 0) {
            x >>= shift;
            o += shift;
        }

        shift >>= 1;
        t >>= shift;
    }

    return o;
}

這個程式是要從最左邊找出第一個 1 的位置，並且回傳在第幾個位元。使用的演算法跟上面那題非常相似，也是用二分搜尋法的思路去找出第一個 1 的位置。x & t 就是判斷出 1 在不在右半邊，如果 x & t 回傳 0 就代表我們要找的 1 在左半邊，所以將 x 位移 shift 位，並且紀錄目前位移的位元數。如果回傳 0 就代表在右半邊，繼續找並且把搜尋的範圍減半。這樣經過六次迭代就可以找出第一個 1 所在的位置了。

測驗三

瞭解程式碼

struct foo_consumer {
    int (*handler)(struct foo_consumer *self, void *);
    struct foo_consumer *next;
};

struct foo {
    struct foo_consumer *consumers;
    unsigned long flags;
};

#include <stdbool.h>

/*
 * For foo @foo, delete the consumer @fc.
 * Return true if the @fc is deleted sfccessfully
 * or return false.
 */
static bool consumer_del(struct foo *foo, struct foo_consumer *fc)
{
    struct foo_consumer **con;
    bool ret = false;

    for (con = &foo->consumers; *con; con = &(*con)->next) {
        if (*con == fc) {
            *con = fc->next;
            ret = true;
            break;
        }
    }

    return ret;
}

可以看到使用的 con 變數是指標的指標，所以在迭代的時候要注意 con 變數的更新。這個程式很簡單就看出來是要刪除給定的 foo_consumer *fc。

在 for 迴圈中每次迭代的更新都應該寫成：con 取值（這時候會是一個指向結構體的指標）的下一個結構體再取址。

TODO: 用巨集包裝更高階的操作，並參照 Moving the kernel to modern C

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

用巨集包裝更高階操作

Moving the kernel to modern C

這篇討論是從一個提交的 patch開始。提交 patch 的作者提到 list_for_each_entry() 會在 pos 達到 HEAD 後結束。但是通常 HEAD 不是迭代的結構體，通常會自己獨立在外或是被宣告在其他 type 的結構體裡。所以這時候對迭代結束後的 HEAD 使用 container_of 來拿 pos 的值會是 invalid 而且或造成 type confusion。

提交的 patch 會確保 pos 不會指向不合法的 HEAD，使用 branchless 邏輯來做到將 pos 達到結束迴圈的條件時，將 pos 指向 NULL。確保迴圈執行完，不會對 pos 進行使用。

但是 Linus Torvalds 不太喜歡這份 patch，雖然在 patch 作者後來多做解釋後，torvalds 也同意「這是一個一般的 bug」。

Linus Torvalds didn't much like the patch and didn't see how it related to speculative-execution vulnerabilities. After Koschel explained the situation further, though, Torvalds agreed that "this is just a regular bug, plain and simple" and said it should be fixed independently of the larger series.

Torvalds 後來找出問題真正的所在：傳進去的 pos 迭代指標必須在迴圈之外進行宣告。

The whole reason this kind of non-speculative bug can happen is that we historically didn't have C99-style "declare variables in loops". So list_for_each_entry() - and all the other ones - fundamentally always leaks the last HEAD entry out of the loop, simply because we couldn't declare the iterator variable in the loop itself.

最後他也提到或許是時候向前邁進到 C11 甚至是 C17。

測驗四

瞭解程式碼

#include <setjmp.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "list.h"

struct task {
    jmp_buf env;
    struct list_head list;
};

static LIST_HEAD(tasklist);
static void (**tasks)(void *);
static int ntasks;
static jmp_buf sched;

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))

int main(void)
{
    void (*registered_task[])(void *) = {task0, task1};
    tasks = registered_task;
    ntasks = ARRAY_SIZE(registered_task);

    schedule();

    return 0;
}

從 main() 開始讀，看到一開始宣告了一個函數指標陣列，陣列元素分別指向 task0() 與 task1()。接著將在最上方宣告的函數指標的指標 **tasks 指向剛剛的函數指標陣列的開頭。再將 ntasks 設成任務的數量，使用 ARRAY_SIZE() 巨集來算出陣列元素的數量。最後呼叫 schedule() 函式讓排程器挑選任務執行。

void schedule(void)
{
    static int i;

    srand(0xCAFEBABE ^ (uintptr_t) &schedule); /* Thanks to ASLR */

    setjmp(sched);

    while (ntasks-- > 0) {
        int n = rand() % 5;
        tasks[i++](&n);
        printf("Never reached\n");
    }

    task_join(&tasklist);
}

schedule() 函式中，使用了 srand() 來達到隨機選擇任務的機制，而其中輸入的種子是將一組數字 0xCAFEBABE 去跟 schedule() 函式的位址進行 XOR。而後面註解提到的 ASLR 全名是 Address Space Layout Randomization，ALSR 會將程式隨機載入記憶體中，防止惡意程式直接跳到特定位址使用函式。

Address space layout randomization

接著 setjmp() 這邊是將現在的資訊儲存起來，像是：stack pointer、instruction pointer、registers 等等，這些資料會被儲存到一個 buffer jmp_buf。jmp_buf 的定義可以在 setjmp.h 裡看到，jmp_buf 是一個大小為 23 的 long 形態陣列。

#define JMP_BUF_LEN    23
typedef long jmp_buf[JMP_BUF_LEN];

接著的 while 迴圈就是迭代每一個在 tasks 裡面的任務，隨機指派每個任務要執行的次數。因為 ntasks 只有在這個迴圈裡會被遞減，所以只有第一次初始化任務執行次數的時候會執行到此 while 迴圈，之後再呼叫 schedule() 就會因為 while 的判斷而跳過執行，直接執行下一行的 taskjoin(&tasklist)。

static void task_join(struct list_head *tasklist)
{
    jmp_buf env;

    while (!list_empty(tasklist)) {
        struct task *t = list_first_entry(tasklist, struct task, list);
        list_del(&t->list);
        memcpy(env, t->env, sizeof(jmp_buf));
        free(t);
        longjmp(env, 1);
    }
}

taskjoin() 所做的事情是將 tasklist 中所有的 task 都執行完。當 tasklist 不是空的，就會將 tasklist 第一個節點拿出來執行並且移除。

迴圈最後的 longjmp() 就是跳躍到 env 所儲存的位址，搭配 setjmp() 使用就可以達到 nonlocal 的 goto。

static void task_add(struct list_head *tasklist, jmp_buf env)
{
    struct task *t = malloc(sizeof(*t));
    memcpy(t->env, env, sizeof(jmp_buf));
    INIT_LIST_HEAD(&t->list);
    list_add_tail(&t->list, tasklist);
}

static void task_switch(struct list_head *tasklist)
{
    jmp_buf env;

    if (!list_empty(tasklist)) {
        struct task *t = list_first_entry(tasklist, struct task, list);
        list_del(&t->list);
        memcpy(env, t->env, sizeof(jmp_buf));
        free(t);
        longjmp(env, 1);
    }
}

剛剛有提到第一次執行 schedule() 會初始化各個任務的執行次數，而將任務要放幾個進入 tasklist 會在 task_add() 中做到。

void task0(void *arg)
{
    jmp_buf env;
    static int n;
    n = *(int *) arg;

    printf("Task 0: n = %d\n", n);

    if (setjmp(env) == 0) {
        task_add(&tasklist, env);
        longjmp(sched, 1);
    }

    for (int i = 0; i < n; i++) {
        if (setjmp(env) == 0) {
            task_add(&tasklist, env);
            task_switch(&tasklist);
        }
        printf("Task 0: resume\n");
    }

    printf("Task 0: complete\n");
    longjmp(sched, 1);
}

void task1(void *arg)
{
    jmp_buf env;
    static int n;
    n = *(int *) arg;

    printf("Task 1: n = %d\n", n);

    if (setjmp(env) == 0) {
        task_add(&tasklist, env);
        longjmp(sched, 1);
    }

    for (int i = 0; i < n; i++) {
        if (setjmp(env) == 0) {
            task_add(&tasklist, env);
            task_switch(&tasklist);
        }
        printf("Task 1: resume\n");
    }

    printf("Task 1: complete\n");
    longjmp(sched, 1);
}

task0 跟 task1 其實都在做一樣的事情，模仿排程器裡不同任務互相切換執行。舉 task0 做例子，輸入的參數就是這個任務要做幾次，也就是要加幾個任務進 tasklist。

一開始有一個 if 判斷 setjmp(env) 回傳的值是不是 0，setjmp 可以從官方文件中看到回傳值會依據 setjmp() 是否在一次成功的 longjmp() 後呼叫，如果不是就回傳 0，先前有呼叫過 longjmp() 就會回傳 longjmp() 裡的 val。

接著會使用 for 迴圈將先前設定的執行次數透過 task_add() 加入到 tasklist 並呼叫 task_switch() 將目前狀態處存到 tasklist 的尾端，並切換執行。

測驗五

瞭解程式碼

#include <stdint.h>
#include <string.h>                       
#include <stdlib.h>

#define __READ_ONCE_SIZE                                  \
    ({                                                    \
        switch (size) {                                   \
        case 1:                                           \
            *(uint8_t *) res = *(volatile uint8_t *) p;   \
            break;                                        \
        case 2:                                           \
            *(uint16_t *) res = *(volatile uint16_t *) p; \
            break;                                        \
        case 4:                                           \
            *(uint32_t *) res = *(volatile uint32_t *) p; \
            break;                                        \
        case 8:                                           \
            *(uint64_t *) res = *(volatile uint64_t *) p; \
            break;                                        \
        default:                                          \
            memcpy((void *) res, (const void *) p, size); \
        }                                                 \
    })

static inline void __read_once_size(const volatile void *p, void *res, int size)
{
    __READ_ONCE_SIZE;
}

#define READ_ONCE(x)                                \
    ({                                              \
        union {                                     \
            typeof(x) __val;                        \
            char __c[1];                            \
        } __u;                                      \
        __read_once_size(&(x), __u.__c, sizeof(x)); \
        __u.__val;                                  \
    })

READ_ONCE 作用是確實讀取該記憶體位址的變數一次，並且不會受編譯器最佳化影響造成實際上運作跟程式碼邏輯不同。

因為要針對 1、2、4、8 或是其他位元組大小的資料進行操作，所以使用到 union 的特性，union 結構體的位址大小是由最大的成員決定的，並且所有成員的位址都是對齊的。所以我們要考慮到最小的 1 位元組情況，來達到 union 結構體的大小是以 typeof(x) 決定。

故 union 裡的第二個成員要寫成：char __c[1] 大小為 1 位元組的成員。在接下來的函式就可以使用 __c 指標直接對此 union 操作。

Volatile is unnecessary in Linux Kernel

以 volatile 關鍵字宣告的變數會允許變數在執行緒之外被修改，通常會被使用在共享的資料結構。

以下面程式碼為例：

spin_lock(&the_lock);
do_something_on(&shared_data);
do_something_else_with(&shared_data);
spin_unlock(&the_lock);

可以看到對共享資料 shared_data 進行存取之前，使用了 spin_lock() 確保鎖起來的時候不會有其他執行緒對共享變數進行存取，會被擋在外面。那這時候的 spin_lock() 也會被編譯器當成是 memory barrier，防止編譯器對這段存取做最佳化。

但是當共享變數被鎖起來時，這時候不會有其他執行緒對共享變數進行存取（都被 lock 擋在外面），使得裡面的共享變數並不是 volatile。這樣對共享變數宣告 volatile 就會顯得不必要，甚至有可能拖慢速度。

對共享變數宣告 volatile 也會防止編譯器對其做最佳化，但在 spin_lock() 裡我們已經知道不會有其他人對共享變數進行存取，所以在 critical section 裡的最佳化反而被關掉，所以會拖慢速度。

rwonce.h 中 `READ_ONCE` `WRITE_ONCE` 的實作與演化

從最早的 commit 去看，是將 READ_ONCE 跟 WRITE_ONCE 從 linux/compiler.h 移走，直接寫在一個新檔案 rwonce.h 裡。這樣是為了之後讓不同的架構去定義實作自己的 READ_ONCE 巨集。

/*
 * Yes, this permits 64-bit accesses on 32-bit architectures. These will
 * actually be atomic in some cases (namely Armv7 + LPAE), but for others we
 * rely on the access being split into 2x32-bit accesses for a 32-bit quantity
 * (e.g. a virtual address) and a strong prevailing wind.
 */
#define compiletime_assert_rwonce_type(t)					\
	compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),	\
		"Unsupported access size for {READ,WRITE}_ONCE().")
/*
 * Use __READ_ONCE() instead of READ_ONCE() if you do not require any
 * atomicity or dependency ordering guarantees. Note that this may result
 * in tears!
 */
#define __READ_ONCE(x)	(*(const volatile __unqual_scalar_typeof(x) *)&(x))
#define __READ_ONCE_SCALAR(x)						\
({									\
	__unqual_scalar_typeof(x) __x = __READ_ONCE(x);			\
	smp_read_barrier_depends();					\
	(typeof(x))__x;							\
})
#define READ_ONCE(x)							\
({									\
	compiletime_assert_rwonce_type(x);				\
	__READ_ONCE_SCALAR(x);						\
})

這時的 READ_ONCE 寫法是先確定 (assert) 要讀取的變數 x 的native_word(x) 或是 sizeof(x) 大小等於一個 long long。

在 linux/compiler_types.h 可以看到 native_word() 就只是判斷變數的大小是否為 char、short、int、long。

#define __native_word(t) \
	(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
	 sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))

判斷完成後就是對變數進行一次讀取，會展開 __READ_ONCE_SCALAR(x) 這個巨集。巨集開頭的 __unqual_scalar_typeof() 是在 compiler_types.h 定義的巨集。

/*
 * __unqual_scalar_typeof(x) - Declare an unqualified scalar type, leaving
 *			       non-scalar types unchanged.
 */
/*
 * Prefer C11 _Generic for better compile-times and simpler code. Note: 'char'
 * is not type-compatible with 'signed char', and we define a separate case.
 */
#define __scalar_type_to_expr_cases(type)				\
		unsigned type:	(unsigned type)0,			\
		signed type:	(signed type)0

#define __unqual_scalar_typeof(x) typeof(				\
		_Generic((x),						\
			 char:	(char)0,				\
			 __scalar_type_to_expr_cases(char),		\
			 __scalar_type_to_expr_cases(short),		\
			 __scalar_type_to_expr_cases(int),		\
			 __scalar_type_to_expr_cases(long),		\
			 __scalar_type_to_expr_cases(long long),	\
			 default: (x)))

其中的 _Generic 類似 switch 用法，會根據編譯時期的形態來選擇展開的值。而 __unqual_scalar_typeof(x) __x 做的事情就是新宣告一個變數 __x 且變數形態是跟 x 一樣。

接著會將 (*(const volatile __unqual_scalar_typeof(x) *)&(x)) 所讀取到的值賦給 __x，這邊做的事情就是將 &x 轉換成指標，再對指標取值。

2022q1 Homework4 (quiz4)

測驗一

瞭解程式碼

在 Linux 核心原始程式碼找出 ceil 和 log2 的組合

測驗二

瞭解程式碼

測驗三

瞭解程式碼

用巨集包裝更高階操作

Moving the kernel to modern C

測驗四

瞭解程式碼

測驗五

瞭解程式碼

Volatile is unnecessary in Linux Kernel

rwonce.h 中 READ_ONCE WRITE_ONCE 的實作與演化

Read more

Linux 紅黑樹探討

Git 基礎教學

提交第一份 Patch 到 Linux Kernel

2022q1 Homework6 (ktcp)

rwonce.h 中 `READ_ONCE` `WRITE_ONCE` 的實作與演化