2022q1 Homework4 (quiz4)

--- tags: linux2022 --- # 2022q1 Homework4 (quiz4) contributed by < [Eddielin0926](https://github.com/Eddielin0926) > ## 測驗 1 已知輸入必為大於 0 的數值 (即 x > 0)，以下程式碼可計算 $\lceil log_2(x) \rceil$，也就是 ceil 和 log2 的組合並轉型為整數: ```c int ceil_log2(uint32_t x) { uint32_t r, shift; x--; r = (x > 0xFFFF) << 4; x >>= r; shift = (x > 0xFF) << 3; x >>= shift; r |= shift; shift = (x > 0xF) << 2; x >>= shift; r |= shift; shift = (x > 0x3) << 1; x >>= shift; return (r | shift | x >> 1) + 1; } ``` ### 解釋上述程式碼運作原理這個程式的概念是將輸入的 `x` 拆解成 $$ (a \times 2^{16}) \times (b \times 2^{8}) \times (c \times 2^{4}) \times (d \times 2^{2}) \times (e \times 2^{1}) \\ ans: abcde_2 + 1 \\ (a,\ b,\ c,\ d\ and\ e\ are\ all\ 0\ or\ 1) $$ 因為要取的是 $log2(x)$ 的上界，所以需要在最後的的結果加一，而 `x--` 是用來避免掉 x 剛好是 2 的指數倍，導致的結果多一。 :::info 我認為這個程式碼有一個 bug，在 `x = 1` 時結果會是 1。但這並不符合 $\lceil log_2(x) \rceil$，正確的結果應為 0。 ::: ### x = 0 的狀況雖然題目要求 `x > 0`，但如果 `x = 0` 時會發生甚麼事情? 結果會回傳 32，這是因為 `x--` 導致溢位讓 `x = 0xFFFFFFFF`。我修改了原本的程式碼，移除掉 `x--`，利用 `__builtin_popcount` 來判斷 `x` 是否為 2 的指數倍，如果不是的話最後就會加 1。 ```c int ceil_log2(uint32_t x) { int cps = __builtin_popcount(x) > 1; uint32_t r, shift; r = (x > 0xFFFF) << 4; x >>= r; shift = (x > 0xFF) << 3; x >>= shift; r |= shift; shift = (x > 0xF) << 2; x >>= shift; r |= shift; shift = (x > 0x3) << 1; x >>= shift; return (r | shift | x >> 1) + cps; } ``` :::warning 因為延伸說明並沒有限制當 `x = 0` 時要回傳多少，我就先當作 0，但其實回傳 -1 是比較佳的選擇，這裡也修正了 `ceil_log2(1)` 使其等於 0。 ::: ### 在 Linux 核心原始程式碼找出 ceil 和 log2 的組合 (類似的形式) 在 [fair.c/get_update_sysctl_factor](https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c#L219) 中有使用到 `1 + ilog2(cpus)`，這邊用來猜測 CPU 的數量。 ```c /* * Increase the granularity value when there are more CPUs, * because with more CPUs the 'effective latency' as visible * to users decreases. But the relationship is not linear, * so pick a second-best guess by going with the log2 of the * number of CPUs. * * This idea comes from the SD scheduler of Con Kolivas: */ static unsigned int get_update_sysctl_factor(void) { unsigned int cpus = min_t(unsigned int, num_online_cpus(), 8); unsigned int factor; switch (sysctl_sched_tunable_scaling) { case SCHED_TUNABLESCALING_NONE: factor = 1; break; case SCHED_TUNABLESCALING_LINEAR: factor = cpus; break; case SCHED_TUNABLESCALING_LOG: default: factor = 1 + ilog2(cpus); break; } return factor; } ``` :::info TODO: 這份程式碼是 [Completely Fair Scheduler](https://en.wikipedia.org/wiki/Completely_Fair_Scheduler) 的實作。 ::: --- ## 測驗 2 改寫[第 3 週測驗題](https://hackmd.io/@sysprog/linux2022-quiz3#%E6%B8%AC%E9%A9%97-11)的測驗 `11` 裡頭的 fls 函式 (fls 意謂 “find last set”)，使其得以計算 [Find first set](https://en.wikipedia.org/wiki/Find_first_set) (與 `1 + __builtin_ctzl` 類似)。 ```c #define BITS_PER_BYTE 8 #define BITS_PER_LONG (sizeof(unsigned long) * BITS_PER_BYTE) #include <stddef.h> static inline size_t ffs(unsigned long x) { if (x == 0) return 0; size_t o = 1; unsigned long t = ~0UL; size_t shift = BITS_PER_LONG; shift >>= 1; t >>= shift; while (shift) { if ((x & t) == 0) { x >>= shift; o += shift; } shift >>= 1; t >>= shift; } return o; } ``` ### 解釋上述程式碼運作原理這個程式是要計算最低位的 1 從低位數過來的位置，這邊利用了類似二分搜尋法的方式來查找 1，`unsigned long` 是 64 位元，因此從 32 位元開始切，利用 x 與 t 做 AND 來判斷 x 的右邊是否都是 0，是的話就記錄到 `o` 並且將找到的 0 都右移掉，最後再將 1 的數量減半繼續檢查。 ### 迴避 while, for, goto 等關鍵字以改寫出功能等價的實作最簡單的方式就完全展開，並且使用常數來做 bit mask (與測驗 1 的概念類似)。 ```c #define BITS_PER_BYTE 8 #define BITS_PER_LONG (sizeof(unsigned long) * BITS_PER_BYTE) #include <stddef.h> static inline size_t ffs(unsigned long x) { if (x == 0) return 0; size_t o = 1; if ((x & 0xFFFFFFFF) == 0) { x >>= 32; o += 32; } if ((x & 0xFFFF) == 0) { x >>= 16; o += 16; } if ((x & 0xFF) == 0) { x >>= 8; o += 8; } if ((x & 0xF) == 0) { x >>= 4; o += 4; } if ((x & 0x3) == 0) { x >>= 2; o += 2; } return o + ((x & 0x1) == 0); } ``` ### include/asm-generic/bitops 探討應用案例 [ffs.h](https://github.com/torvalds/linux/blob/master/include/asm-generic/bitops/ffs.h) 有類似的 `ffs`，差別只在於它是 `int`。另外我們能在 [sched.h](https://github.com/torvalds/linux/blob/5bfc75d92efd494db37f5c4c173d3639d4772966/include/asm-generic/bitops/sched.h) 中看到它的應用，用來查找 100-bit bitmap 中最低位 1 的位置。 ```c /* * Every architecture must define this function. It's the fastest * way of searching a 100-bit bitmap. It's guaranteed that at least * one of the 100 bits is cleared. */ static inline int sched_find_first_bit(const unsigned long *b) { #if BITS_PER_LONG == 64 if (b[0]) return __ffs(b[0]); return __ffs(b[1]) + 64; #elif BITS_PER_LONG == 32 if (b[0]) return __ffs(b[0]); if (b[1]) return __ffs(b[1]) + 32; if (b[2]) return __ffs(b[2]) + 64; return __ffs(b[3]) + 96; #else #error BITS_PER_LONG not defined #endif } ``` :::info 這邊的註解有些奇怪 > It's guaranteed that at least one of the 100 bits is cleared. 反向思考的話，就是不會出現 100 個 1，但這個情況與任何一個最低位元是 1 的資料沒有差別，似乎沒有必要寫在這邊，但這應該要繼續追蹤程式碼才能確定原因。 > 準備提交 patch 到 LKML 嗎？ :notes: jserv ::: --- ## 測驗 3 ```c struct foo_consumer { int (*handler)(struct foo_consumer *self, void *); struct foo_consumer *next; }; struct foo { struct foo_consumer *consumers; unsigned long flags; }; #include <stdbool.h> /* * For foo @foo, delete the consumer @fc. * Return true if the @fc is deleted successfully * or return false. */ static bool consumer_del(struct foo *foo, struct foo_consumer *fc) { struct foo_consumer **con; bool ret = false; for (con = &foo->consumers; *con; con = &(*con)->next) { if (*con == fc) { *con = fc->next; ret = true; break; } } return ret; } ``` ### 解釋上述程式碼運作原理 ### 區分 `foo` 和 `foo_consumer` 的好處 ### 在 Linux 核心原始程式碼相似技巧的程式碼 (例如 [uprobe](https://docs.kernel.org/trace/uprobetracer.html))，並探討應用案例