--- tags: linux kernel --- # 2022q1 Homework2 (quiz2) [linux kernel internal 2022 quiz2](https://hackmd.io/@sysprog/linux2022-quiz2) ## 測驗 `1` ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a >> 1 /*EXP1*/) + (b >> 1) + (a & b & 1); } ``` 主要思路為 `a / 2` + `b / 2` ,但是這裡並沒有考慮兩奇數需要進位的問題。若兩者都是奇數,如 `a = 3, b = 5` ,依照原本的運算方法,會產生 `1 + 2 = 3` ,但是與答案 `4` 仍有落差。因此 `EXP1` 需要填入兩者皆是奇數的判斷式,故 `EXP1 = a & b & 1`,若兩者皆為奇數,則進位 ```c uint32_t average(uint32_t a, uint32_t b) { return (a & b/*EXP2*/) + ((a ^ b/*EXP3*/) >> 1); } ``` 由[你所不知道的C語言:數值系統篇](https://hackmd.io/@sysprog/c-numerics?type=view)可知此題解答 如果沒有看過數值系統篇的話,主要思路須從 `>> 1`著手。我們可以知道加法是由一個`carry`及一個`sum`組成,`>> 1`代表除以`2` 。因此,我們可以先做一個加法器如下: ``` a = 3 0 0 1 1 b = 5 0 1 0 1 ------------------------------------- a + b sum 0 1 1 0 carry 0 0 1 0 ------------------------------------- sum + carry sum 0 1 0 0 carry 0 1 0 0 ------------------------------------- sum + carry sum 0 0 0 0 carry 1 0 0 0 ------------------------------------- sum + carry result 1 0 0 0 ``` 其中,`sum = a ^ b`,`carry = a & b`,上面的過程亦可使用遞迴實做。 由此可知,若 `(a + b)` 其實是可以寫成 `sum + carry << 2` ,此時所求為 `(a + b) / 2` 也就可以寫為 `(sum + carry << 2) >> 2`此時將此式展開為 `sum >> 2 + carry`此時將`sum`與`carry`帶入得到`(a ^ b) >> 2 + (a & b)`,即為所求 加法程式碼,遞迴: ```c= int add(int a, int b){ if(!a) return b; int carry = (a & b) << 1; int sum = a ^ b; return add(carry, sum); } ``` 比較三種加法,這邊開啟`O3`優化,比較其組合語言 原始程式碼 ```c= uint32_t avg_0(uint32_t low, uint32_t high) { return low + (high - low) / 2; } uint32_t avg_1(uint32_t a, uint32_t b) { return (a >> 1 /*EXP1*/) + (b >> 1) + (a & b & 1); } uint32_t avg_2(uint32_t a, uint32_t b) { return (a & b/*EXP2*/) + ((a ^ b/*EXP3*/) >> 1); } ``` 對應組合語言: ```asm= avg_0: .LFB23: .cfi_startproc endbr64 movl %esi, %eax subl %edi, %eax shrl %eax addl %edi, %eax ret .cfi_endproc .LFE23: .size avg_0, .-avg_0 .p2align 4 .globl avg_1 .type avg_1, @function avg_1: .LFB24: .cfi_startproc endbr64 movl %edi, %eax movl %esi, %edx andl %esi, %edi shrl %eax shrl %edx andl $1, %edi addl %edx, %eax addl %edi, %eax ret .cfi_endproc .LFE24: .size avg_1, .-avg_1 .p2align 4 .globl avg_2 .type avg_2, @function avg_2: .LFB25: .cfi_startproc endbr64 movl %edi, %eax andl %esi, %edi xorl %esi, %eax shrl %eax addl %edi, %eax ret .cfi_endproc .LFE25: .size avg_2, .-avg_2 .section .rodata.str1.8,"aMS",@progbits,1 .align 8 ``` 由組合語言我們可以發現,除以 2 在組合語言是 `>> 2`,比較讓人意外的是,`avg_0`使用最少指令,原本會以為利用 `bitwise operator` 轉譯成指令的數量最少,由此可以得到加法的成本還是比位元操作的成本還要低 ----- 研讀 Linux 核心原始程式碼 include/linux/average.h - 理解 `EWMA` - 理解巨集 - `BUILD_BUG_ON` - 由 `linux/build_bug.h`得 > If you have some code which relies on certain constants being equal, or some other compile-time-evaluated condition, you should use BUILD_BUG_ON to detect if someone changes it. - `__builtin_constant_p` - `BUILD_BUG_ON_NOT_POWER_OF_2` - `READ_ONCE` - `ilog2` - 程式架構 - 利用巨集進行物件宣告,其物件包含一個結構與三個函數 - 結構 ```c struct ewma_##name { \ unsigned long internal; \ }; ``` - :::info 問題: - [x] 完成程式碼 - [x] 解釋上述程式碼運作的原理 - [x] 比較上述實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 (可研讀 CS:APP 第 3 章) - [ ] 研讀 Linux 核心原始程式碼 include/linux/average.h,探討其 Exponentially weighted moving average (EWMA) 實作,以及在 Linux 核心的應用 > 移動平均(Moving average),又稱滾動平均值、滑動平均,在統計學中是種藉由建立整個資料集合中不同子集的一系列平均數,來分析資料點的計算方法。移動平均通常與時間序列資料一起使用,以消除短期波動,突出長期趨勢或周期。短期和長期之間的閾值取決於應用,移動平均的參數將相應地設置。例如,它通常用於對財務數據進行技術分析,如股票價格、收益率或交易量。它也用於經濟學中研究國內生產總值、就業或其他宏觀經濟時間序列。 ::: ---- ## 測驗 `2` Reference: [Bitwise Max](https://bytefreaks.net/programming-2/c-programming-2/a-peculiar-way-to-get-the-biggest-maxmaximum-value-between-two-variables-using-bitwise-operations) 改寫〈解讀計算機編碼〉一文的「不需要分支的設計」一節提供的程式碼 min,我們得到以下實作 (max): ```c #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } ``` :::info 問題: - [x] 完成程式碼 - [x] 解釋上述程式碼運作的原理 - [ ] 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作 - [ ] Linux 核心也有若干 branchless / branch-free 的實作,例如 lib/sort.c: ```c /* * Logically, we're doing "if (i & lsbit) i -= size;", but since the * branch is unpredictable, it's done with a bit of clever branch-free * code instead. */ __attribute_const__ __always_inline static size_t parent(size_t i, unsigned int lsbit, size_t size) { i -= size; i -= size & -(i & lsbit); return i / 2; } ``` 請在 Linux 核心原始程式碼中,找出更多類似的實作手法。請善用 git log 檢索。 ::: 主要解題思路為 ```c= if a > b: a ^ ((EXP4) & -(EXP5)) = a if a <= b: a ^ ((EXP4) & -(EXP5)) = b ``` 由互斥或 (exclusive or, xor) 的性質得到: ```c= if a > b: a ^ 0 = a if a <= b: a ^ a ^ b = b ``` 此時由提示可知 `EXP4` 為 `bitwise operator`,可以推斷出 `EXP4 = a ^ b`,由提示可知 `EXP5` 屬於 `>`, `<`, `==`, `>=`, `<=`, `!=`中的其中一個,可以推得 `EXP5 = a < b` ## 測驗 `3` ```c= #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); return RET; } ``` :::info 問題 - [x] 完成程式碼 - [x] 解釋上述程式運作原理; - [x] 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升; - [ ] Linux 核心中也內建 GCD (而且還不只一種實作),例如 lib/math/gcd.c,請解釋其實作手法和探討在核心內的應用場景。 ::: 輾轉相除法如下所示,以 72, 48 為例: ```shell | 72 48 | 24 | 48 48 | | 24 0 | ``` 由上可知,最大公因數為 24,以相同的例子帶入上方程式碼: ```c= if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` `shift` 為右移次數,右移三次時,兩者有其一為奇數: `u = 9, v = 6` ```c= do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); ``` 進入迴圈後: ``` first: v = 3, t = 6, u = 3, v = 6 second: v = v - u = 3, u = 3 third: t = u - v = 0, u = 3, v = 0 ``` 可以推得 `RET = u << shift` , `COND = v` ---- 利用 `__builtin_ctz` 改寫 `GCD`,由 [GCC 6.59 Other Built-in Functions Provided by GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 找到 `__builtin_ctz` 含式的意義 > Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. ---- 由 [kernel __ffs](https://www.kernel.org/doc/htmldocs/kernel-api/API---ffs.html) 中找到巨集的意義 > __ffs — find first set bit in word ```c= unsigned long gcd(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __ffs(b); if (b == 1) return r & -r; for (;;) { a >>= __ffs(a); if (a == 1) return r & -r; if (a == b) return a << __ffs(r); if (a < b) swap(a, b); a -= b; } } ``` ---- 效能分析: 測試程式碼: ## 測驗 `4` ```c #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` :::info 問題: - [x] 請補完程式碼 - [ ] 解釋上述程式運作原理,並舉出這樣的程式碼用在哪些真實案例中; - [ ] 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density; - [ ] 思考進一步的改進空間; - [ ] 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例; ::: ## 測驗 `5` Reference: [Leetcode 166](https://leetcode.com/problems/fraction-to-recurring-decimal/) :::spoiler 原始碼: ```c= #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include "list.h" struct rem_node { int key; int index; struct list_head link; }; static int find(struct list_head *heads, int size, int key) { struct rem_node *node; int hash = key % size; list_for_each_entry (node, &heads[hash], link) { if (key == node->key) return node->index; } return -1; } char *fractionToDecimal(int numerator, int denominator) { int size = 1024; char *result = malloc(size); char *p = result; if (denominator == 0) { result[0] = '\0'; return result; } if (numerator == 0) { result[0] = '0'; result[1] = '\0'; return result; } /* using long long type make sure there has no integer overflow */ long long n = numerator; long long d = denominator; /* deal with negtive cases */ if (n < 0) n = -n; if (d < 0) d = -d; bool sign = (float) numerator / denominator >= 0; if (!sign) *p++ = '-'; long long remainder = n % d; long long division = n / d; sprintf(p, "%ld", division > 0 ? (long) division : (long) -division); if (remainder == 0) return result; p = result + strlen(result); *p++ = '.'; /* Using a map to record all of reminders and their position. * if the reminder appeared before, which means the repeated loop begin, */ char *decimal = malloc(size); memset(decimal, 0, size); char *q = decimal; size = 1333; struct list_head *heads = malloc(size * sizeof(*heads)); for (int i = 0; i < size; i++) INIT_LIST_HEAD(&heads[i]); for (int i = 0; remainder; i++) { int pos = find(heads, size, remainder); if (pos >= 0) { while (PPP > 0) *p++ = *decimal++; *p++ = '('; while (*decimal != '\0') *p++ = *decimal++; *p++ = ')'; *p = '\0'; return result; } struct rem_node *node = malloc(sizeof(*node)); node->key = remainder; node->index = i; MMM(&node->link, EEE); *q++ = (remainder * 10) / d + '0'; remainder = (remainder * 10) % d; } strcpy(p, decimal); return result; } ``` ::: :::info 問題: - [ ] 解釋上述程式碼運作原理,指出其中不足,並予以改進 > 例如,判斷負號只要寫作 `bool isNegative = numerator < 0 ^ denominator < 0;` > 搭配研讀 [The simple math behind decimal-binary conversion algorithms](https://indepth.dev/posts/1019/the-simple-math-behind-decimal-binary-conversion-algorithms) - [ ] 在 Linux 核心原始程式碼的 mm/ 目錄 (即記憶體管理) 中,找到特定整數比值 (即 fraction) 和 bitwise 操作的程式碼案例,予以探討和解說其應用場景 ::: ## 測驗 `6` ```c= #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->M) - (char *)X) ``` 解題思路: 由 [`__alignof__`](https://gcc.gnu.org/onlinedocs/gcc/Alignment.html) 可知 遇到上面的問題先上我們抽絲剝繭,一步一步分析,這邊模仿[頭腦體操]()進行解析: ```c= (int*)0; // 0 (struct {int data;}*)0; // 0 &((struct { char c; int data;}*)0); // lvalue error &((struct { char c; int data;}*)0)->data; // 4 (char *)(&((struct { char c; int data;}*)0)->data); // 4 (char *)(&((struct { char c; int data;}*)0)->data) - (char *) 0; // 4 - 0 = 4 ``` --- 利用 `linux command: grep -rn alignof` 搜尋 `alignof`,尋找結果多的驚人,以下僅節錄部分結果: ```shell include/uapi/linux/netfilter_bridge/ebtables.h:132: unsigned char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); include/uapi/linux/netfilter_bridge/ebtables.h:145: unsigned char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); include/uapi/linux/netfilter_bridge/ebtables.h:158: unsigned char data[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); include/uapi/linux/netfilter_bridge/ebtables.h:191: unsigned char elems[0] __attribute__ ((aligned (__alignof__(struct ebt_replace)))); include/crypto/hash.h:167: __aligned(__alignof__(struct shash_desc)); \ tools/perf/include/bpf/linux/socket.h:9:#define _K_SS_ALIGNSIZE (__alignof__ (struct sockaddr *)) include/uapi/asm-generic/siginfo.h:77:#define __ADDR_BND_PKEY_PAD (__alignof__(void *) < sizeof(short) ? \ include/uapi/asm-generic/siginfo.h:78: sizeof(short) : __alignof__(void *)) ``` --- `ALIGN`, `ALIGN_DOWN`, `ALIGN_UP`... 在 `include/linux/align.h`。然而,在這個標頭檔中的 `ALIGN` 與本題的作法不同,與 `quiz3` 的題目相近 :::info 問題: - [x] 請補完程式碼 - [x] 解釋上述程式碼運作原理 - [ ] 在 Linux 核心原始程式碼中找出 __alignof__ 的使用案例 2 則,並針對其場景進行解說 - [ ] 在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集,探討其實作機制和用途,並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻,儘量共用相同的巨集 ::: ## 測驗 `7` Reference: [Leetcode 412](https://leetcode.com/problems/fizz-buzz/) ```c static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << KK1) << KK2; char fmt[9]; strncpy(fmt, &"FizzBuzz%u"[(9 >> div5) >> (KK3)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` :::info 問題: - [x] 解釋上述程式運作原理並評估 `naive.c` 和 `bitwise.c` 效能落差 - [ ] 避免 `stream I/O` 帶來的影響,可將 `printf` 更換為 `sprintf` - [ ] 分析 Faster remainders when the divisor is a constant: beating compilers and libdivide 一文的想法 (可參照同一篇網誌下方的評論),並設計另一種 bitmask,如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」,然後再改寫 bitwise.c 程式碼,試圖運用更少的指令來實作出 branchless; - 參照 fastmod: A header file for fast 32-bit division remainders on 64-bit hardware - [ ] 研讀 The Fastest FizzBuzz Implementation 及其 Hacker News 討論,提出 throughput (吞吐量) 更高的 Fizzbuzz 實作 - [ ] 解析 Linux 核心原始程式碼 kernel/time/timekeeping.c 裡頭涉及到除法運算的機制,探討其快速除法的實作 (注意: 你可能要對照研讀 kernel/time/ 目錄的標頭檔和程式碼) > 過程中,你可能會發現可貢獻到 Linux 核心的空間,請充分討論 ::: 首先需要了解本題的 `mod` 怎麼實做,以下面為例: ```c= static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; ``` 與一般的取餘數不同,本題實做採用溢位 (overflow) 的方式,若為 3 的倍數,以 3 為例, `n * M = 0xFFFFFFFFFFFFFFFF + 3 = 2`;若不是 3 的倍數,以 2 為例,則 `n * M = 0xAAAAAAAAAAAAAAAC > 0x5555555555555555`