2025q1 Homework4 (quiz3+4)

contributed by < salmoniscute >

3-1

解釋程式碼運作原理

Multi-Precision Integer, MPI 實作了處理大數的功能，超越了標準整數類型的大小限制。

typedef struct {
    uint32_t *data;                           
    size_t capacity;
} mpi_t[1];

為什麼是 mpi_t[1] 而不是 mpi_t 就好？

將結構體定義為固定長度為 1 的陣列

mpi_t a;      
mpi_init(a);

當宣告變數 mpi_t a; 時，a 本身就是一個含有一個結構體元素的陣列
因為在 C 中，陣列作為參數時會退化為指標
當傳遞 mpi_t 到函式時，實際上傳遞的會是指向結構體的指標
也就是說 mpi_init 函式會拿到指向 a 的第一個元素的指標

所以後續的程式就可以用像是 rop->capacity = 0; 和 rop->data = NULL; 的指標操作寫法，並且不需要使用 & 來取址。

大數被拆分為多個 31 位的區塊（每個區塊範圍是 0 到

2^{31} - 1

，即 0 到

2147483647

）。
每個 data 元素存儲 31 位（使用 INTMAX = 0x7fffffff 來限制），而非完整的 32 位

如果在 data[0] 為 a, data[1] 為 b 的情況下，大數就會是

b * 2^{31} + a * 1

。以此類推。

static size_t ceil_div(size_t n, size_t d)
向上取整 (n + d - 1) / d
舉例來說，ceil_div(32,31) = 2，代表一個 32 位的整數需要 2 個 31 位的區塊才可以完整表示。
int mpi_set_str(mpi_t rop, const char *str, int base)
將字串形式的十進制數字轉換成整數並儲存到變數 rop 中。
迴圈處理：

for (size_t i = 0; i < len; ++i) {
    mpi_mul_u32(rop, rop, 10U);
    assert(str[i] >= '0' && str[i] <= '9');
    mpi_add_u32(rop, rop, (uint32_t) (str[i] - '0'));
}

先將當前結果乘以 10，然後將當前字元的轉成數字（str[i] - '0'）並加到結果中。

以 mpi_set_str(n, "549755813889", 10); 為例：
可以知道 549755813889 的二進位表示為
1000000000000000000000000000000000000001，一共有 40 個 digits，超過了 31 digits，所以需要 2 個 block 來存（

40 = 31 + 9

）。

data[0] 為

0000000000000000000000000000001

，十進位為

1

data[1] 為

100000000

，十進位為

256

549755813889 = 256 * 2^{31} + 1 * 1

不過現有程式碼因為在 mpi_mul_u32 函式裡面會一直增加 capacity 的值，因為乘數最大可能是

2^{31} - 1

，即

2147483647

，這樣會需要直接多一個 block 來表示。
因此目前測試結過會是以下輸出：

 Block 0: 0x00000001 (decimal: 1)
 Block 1: 0x00000100 (decimal: 256)
 Block 2: 0x00000000 (decimal: 0)
 Block 3: 0x00000000 (decimal: 0)
 Block 4: 0x00000000 (decimal: 0)
 Block 5: 0x00000000 (decimal: 0)
 Block 6: 0x00000000 (decimal: 0)
 Block 7: 0x00000000 (decimal: 0)
 Block 8: 0x00000000 (decimal: 0)
 Block 9: 0x00000000 (decimal: 0)
 Block 10: 0x00000000 (decimal: 0)
 Block 11: 0x00000000 (decimal: 0)
 Block 12: 0x00000000 (decimal: 0)
 Block 13: 0x00000000 (decimal: 0)

但是其實在 mpi_set_str 中，每次只會乘以 10。

實作依賴遞迴呼叫，請避免使用並降低記憶體開銷

在 mpi_gcd 函式是使用遞迴
改成用迭代的方法來做輾轉相除

mpi_t n, d, q, r;
mpi_init(n);
mpi_init(d);
mpi_init(q);
mpi_init(r);
mpi_set(n, op1);
mpi_set(d, op2);

while (mpi_cmp_u32(d, 0)) {
    mpi_fdiv_qr(q, r, n, d);
    mpi_set(n, d);
    mpi_set(d, r);
}
mpi_set(rop, n);

3-2

SWAR 的基本概念是將多個「小型資料」打包進一個大型的暫存器中（例如 32-bit 或 64-bit register），然後透過整數運算一次處理這些資料。

就像考試說明中的例子
原本是要用 (x & 1) 和 (y & 1) 來檢查 x 與 y 是否為奇數。因為二進位中，奇數的最後一位一定是 1，所以 & 1 就能判斷。

使用 SWAR 的概念優化後
會直接把兩個 32-bit 的整數合併成一個 64-bit，然後一次性的跟 0x0000000100000001 做 &。

unsigned long *asrc = (unsigned long *) src;
unsigned long mask = d << 8 | d;
mask |= mask << 16;
for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1)
    mask |= mask << i;

把字元 d 擴展成一個 long 寬度的 mask，例如：
若 d = 0x2E，則 mask 會變成：
在 32-bit 裝置上：0x2E2E2E2E
這種填充的技巧讓後續的程式能夠同時比對多個 byte。

/* @return nonzero if (long)X contains the byte used to fill MASK. */
#define DETECT_CHAR(X, mask) DETECT_NULL((X) ^ (mask))

XOR 之後只在相同位元處為 0，因此若某個 byte 等於 d，XOR 後該 byte 為 0。接著利用 DETECT_NULL 巨集檢查是否有 0 byte。

src = (unsigned char *) asrc;

while (len--) {
    if (*src == d)
        return (void *) src;
    src++;
}

不知道是哪個 byte，所以一一檢查確切位置。

3-3

4/16 上課剛好有討論到相關內容所以先看第三題
Linux 核心搶佔

typedef struct Env {
    int status;          /* Current status of the thread */
    ucontext_t state;    /* Context for saving and restoring thread execution */
    int state_reentered; /* Indicate if the context has been reentered */
    int ipc_sender;      /* Identifier of the thread that sent an IPC message */
    int ipc_value;       /* Value received from an IPC message */
} Env;

static void enable_preemption(void)

用 Linux 的 SIGRTMIN 加上 POSIX timer
註冊 preempt 函式作為 SIGRTMIN 的 handler。然後建立一個 POSIX timer，設定每 10ms 送出一次 SIGRTMIN。所以是每當時間到時，preempt 就會被呼叫。

preempt 是什麼？

static void preempt(int signum UNUSED,
                    siginfo_t *si UNUSED,
                    void *context UNUSED)
{
    coro_yield(); 
}

就是呼叫 coro_yield 函式，強制當前 coroutine 暫停執行並讓出控制權給排程氣選擇的下一個 coroutine。

void coro_yield(void)
{
    envs[curenv].state_reentered = 0;
    getcontext(&envs[curenv].state);
    if (envs[curenv].state_reentered++ == 0) {
        /* Context successfully saved; schedule the next user-level thread to
         * run.
	 */
        coro_shedule();
    }
    /* Upon re-entry, simply resume execution */
}

但是在 preempt 發生時，切到另一個 coroutine 前有件事要做：保存目前 coroutine 的 context
這樣等它之後要繼續跑時，才能從剛才中斷的地方繼續執行。這是透過 getcontext(&envs[curenv].state); 來做到的。
這邊的中斷不是指 interrupt，interrupt 是硬體層級的，這邊是透過 SIGRTMIN 這種 signal 來模擬。雖然有 timer，但他也不是真正的硬體 timer interrupt。一切都是跑在 user space。

state_reentered 是幹嘛的？

第一次執行，是要暫停、切換 coroutine。
第二次進來（從 setcontext 回來），繼續工作。

排程如何進行?
實作 round-robin 排程策略

static void coro_schedule(void)

while (attempts < NENV) {
    int candidate = (curenv + 1) % NENV;
    if (envs[candidate].status == ENV_RUNNABLE) {
        curenv = candidate;
        /* Request delivery of TIMERSIG after 10 ms */
        timer_settime(timer, 0, &ts, NULL);
        setcontext(&envs[curenv].state);
    }
    attempts++;
}

coro_schedule 函式負責選下一個要執行的 coroutine。從 curenv 的下一個開始找，繞一圈（用 % NENV），找到第一個 ENV_RUNNABLE 的就切過去
在切過去之前，它也會重設 timer，這樣 10ms 後會再進行一次 preempt。確保大家都有機會被執行到，不會被一個 coroutine 壟斷。
非常直觀

2025q1 Homework4 (quiz3+4)

3-1

解釋程式碼運作原理

實作依賴遞迴呼叫，請避免使用並降低記憶體開銷

3-2

3-3

Read more

EEVDF 論文研讀

2025q1 Homework5 (assessment)

2025q1 Homework3 (kxo)

2025q1 Homework1 (lab0)