2020 年秋季進階電腦系統理論與實作課程作業 —— kcalc

--- title: 2020 年秋季進階電腦系統理論與實作課程作業 —— kcalc image: https://repository-images.githubusercontent.com/173585388/eeb53d80-4a6c-11ea-9508-76db66d33b03 description: 檢驗學員對浮點數和定點數運算的認知，連同 Linux 核心對於浮點數運算的限制， --- # I09: kcalc ###### tags: `sysprog2020` > 主講人: [jserv](http://wiki.csie.ncku.edu.tw/User/jserv) / 課程討論區: [2020 年系統軟體課程](https://www.facebook.com/groups/system.software2020/) :mega: 返回「[進階電腦系統理論與實作](http://wiki.csie.ncku.edu.tw/sysprog/schedule)」課程進度表 ## :memo: 預期目標 * 學習數值系統：浮點運算和定點數 * 撰寫可適用於使用者和核心層級的程式 * 自動測試機制 ## :calling: Linux 核心的浮點數運算 Robert Love 在 [Linux Kernel Development](https://www.amazon.com/Linux-Kernel-Development-Robert-Love/dp/0672329468/) 一書論及浮點運算: > No (Easy) Use of Floating Point > When using floating-point instructions kernel normally catches a trap and then initiates the transition from integer to floating point mode. Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot easily trap itself. Using a floating point inside the kernel requires manually saving and restoring the floating point registers. Except in the rare cases, no floating-point operations are in the kernel. * 解說: [Use of floating point in the Linux kernel](https://stackoverflow.com/questions/13886338/use-of-floating-point-in-the-linux-kernel) Rusty Russell 在 [Unreliable Guide To Hacking The Linux Kernel](http://landley.net/kdocs/htmldocs/kernel-hacking.html) 則說: > The FPU context is not saved; even in user context the FPU state probably won't correspond with the current process: you would mess with some user process' FPU state. If you really want to do this, you would have to explicitly save/restore the full FPU state (and avoid context switches). It is generally a bad idea; use fixed point arithmetic first. #### Lazy FP state restore [CVE-2018-3665](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-3665) 存在於 Intel Core 系列微處理器中，因為 speculative execution（推測執行）技術中的一些缺陷加上特定作業系統中對 FPU（Floating point unit）進行 context switch 所產生的漏洞，允許一個本地端的程式洩漏其他程序的 FPU 暫存器內容。 [Lazy FP state leak](https://en.wikipedia.org/wiki/Lazy_FP_state_restore) 的原理是透過 FPU 的 lazy state switching 機制達成。因為 FPU 和 SIMD 暫存器不一定會在每個任務持續被使用到，因此作業系統排程器可將不需要使用到 FPU 的任務，標註為不可使用 FPU，而不更改裡面的內容。然而，在現今的[亂序執行](https://en.wikipedia.org/wiki/Out-of-order_execution) CPU 中，lazy state switching 裡會設定的 "FPU not available" 可能沒辦法馬上被偵測到，導致我們在 process B，但仍然可以存取到 process A 的 FPU 暫存器內容。基於上述原因，儘管我們在 Linux 核心模式中仍可使用浮點運算，但這不僅會拖慢執行速度，還需要手動儲存/取回相關的 FPU 暫存器，因此核心文件不建議在核心中使用到浮點運算。 ### 浮點運算時間差異比較 [afcidk](https://github.com/afcidk) 透過開發簡單的 Linux 核心模組，來測試在單純的浮點數運算及整數運算花費的時間差異。 > 相關的程式碼可見[這裡](https://gist.github.com/afcidk/7178ef368d98ee4b49c7a1acc3704303) ![](https://i.imgur.com/Rpe3DnR.png) 可見到核心模式中的浮點數運算，時間成本較整數運算高。 ## :slot_machine: 數學式計算器 [Reverse Polish Notation](https://en.wikipedia.org/wiki/Reverse_Polish_notation) * Reverse Polish notation (RPN) 中 operator 會放在所有 operand 後面 * 平常人們慣用 [infix notation](https://en.wikipedia.org/wiki/Infix_notation)，用括號 `( )` 表示順序範例: - Infix notation ![](https://i.imgur.com/Iad0t5O.png) * 運算子標為藍色，數值標為黑色 * human-friendly - Reverse Polish notation (RPN) ![](https://i.imgur.com/lrCnadG.png) * computer-friendly - infix notation 轉 postfix notation - [Shunting-yard algorithm](https://en.wikipedia.org/wiki/Shunting-yard_algorithm) - RPN evaluated from left to right - 遇到數值 push 進 stack - 遇到運算子取出 stack 前兩個運算，再把結果 push 進 stack - 最後 stack 中剩下的該數 (在 `s[0]`) 即為答案 ```cpp a : s[0] = a 1 : s[1] = 1 2 : s[2] = 2 + : s[1] = s[1] + s[2] b : s[2] = b * : s[1] = s[1] * s[2] + : s[0] = s[0] + s[1] 3 : s[1] = 3 - : s[0] = s[0] - s[1] ``` [MathEX](https://github.com/jserv/mathex) 是成功大學師生開發的數學表示式求值工具，接受一組數學表達式，輸出這段數學表達式所對應的答案。 > `Ex` 是 "expression" 的簡稱取得 `MathEX` 原始程式碼並測試: ```shell $ git clone https://github.com/jserv/MathEx $ cd MathEx $ make check ``` 預期可見以下測試輸出: ``` OK: six=6, seven=7, six*seven == 42.000000 OK: 小熊=6, 維尼=7, 小熊*維尼 == 42.000000 OK: τ=1.618, 3*τ == 4.854000 OK: $(τ, 1.618), 3*τ() == 4.854000 ... Execute build/test-bench... BENCH 5: 9.383917 ns/op (106M op/sec) BENCH 5+5+5+5+5+5+5+5+5+5: 137.161016 ns/op (7M op/sec) BENCH 5*5*5*5*5*5*5*5*5*5: 127.733946 ns/op (7M op/sec) BENCH 5,5,5,5,5,5,5,5,5,5: 102.152109 ns/op (9M op/sec) BENCH ((5+5)+(5+5))+((5+5)+(5+5))+(5+5): 126.007080 ns/op (7M op/sec) ... ``` `MathEX` 基本的使用方式： ```cpp #include "expression.h" /* Custom function that returns the sum of its two arguments */ static float add(struct expr_func *f, vec_expr_t args, void *c) { float a = expr_eval(&vec_nth(&args, 0)); float b = expr_eval(&vec_nth(&args, 1)); return a + b; } static struct expr_func user_funcs[] = { {"add", add, 0}, {NULL, NULL, 0}, }; int main() { const char *s = "x = 40, add(2, x)"; struct expr_var_list vars = {0}; struct expr *e = expr_create(s, strlen(s), &vars, user_funcs); if (!e) { printf("Syntax error"); return 1; } printf("result: %f\n", expr_eval(e)); expr_destroy(e, &vars); return 0; } ``` 上面的程式碼為範例，利用 `expr_create()` 將字串 `x = 40, add(2, x)` 輸入，再使用 `expr_eval()` 來產生輸出結果 ### `MathEx` API MathEx 提供 4 個 API： - `struct expr *expr_create(const char *s, size_t len, struct expr_var_list *vars, struct expr_func *funcs)`：將輸入的字串數學表達式編譯成 `struct expr` 型態，並且回傳 - `float expr_eval(struct expr *e)`：計算 `struct expr` 的結果並回傳 - `void expr_destroy(struct expr *e, struct expr_var_list *vars)`：清除記憶體，`vars` 可為 NULL (不清除 `vars` 的記憶體) - `struct expr_var *expr_var(struct expr_var_list *vars, const char *s, size_t len)`：在給定的變數清單 `vars` 中回傳對應名稱 `s` 的變數，如果清單中不存在該變數則會產生一個並回傳 ### 應用情境這類的數學表達式分析工具可以被應用於工程計算機類的軟體，如：[insect](https://github.com/sharkdp/insect) 是個支援物理單位的工程計算機，其中程式碼 [insect/src/Insect/Parser.purs](https://github.com/sharkdp/insect/blob/master/src/Insect/Parser.purs) 就是計算機的數學表達式的解析器 ### 解析 `MathEx` 程式碼透過 `expression.h` 的 macro 以及 struct 可以找到用來表示 expression 的 struct： ```cpp #define vec(T) struct { T *buf; int len; int cap; } typedef vec(struct expr) vec_expr_t; struct expr { int type; union { struct { float value; } num; struct { float *value; } var; struct { vec_expr_t args; } op; struct { struct expr_func *f; vec_expr_t args; void *context; } func; } param; }; ``` 從上面的定義可見在 `struct expr` 中，`param.op.args` 裡面其實包裝著 `struct expr` 的指標，在後續的程式中可發現該指標指向 1 或 2 個 `struct expr`，因此可推測 `struct expr` 是個 tree 的結構。其中 `T *buf` 是存放此 vector 所要儲存的目標資料的起始位置，`len` 為這個 vector 的長度，`cap`（即 "capability" 的意思) 表示此 vector 可容納幾個 vector 項目的空間，故條件 $len \le cap$ 永遠成立。 ![](https://i.imgur.com/G5U7Xhz.png) `expression.h` 為 `vec` 相關的資料結構定義了以下的操作 (巨集) * `vec_nth` * `vec_peek` * `vec_push` * `vec_pop` * `vec_free` * `vec_foreach` 以上這些巨集與 Linux 核心的 [include/linux/list.h](https://github.com/torvalds/linux/blob/master/include/linux/list.h) 手段相似，都隱藏背後的實做，讓撰寫程式碼時更為簡潔，更專注在邏輯上。可見 `vec_push` 的實做 ```cpp #define vec_unpack(v) \ (char **) &(v)->buf, &(v)->len, &(v)->cap, sizeof(*(v)->buf) #define vec_push(v, val) \ vec_expand(vec_unpack(v)) ? -1 : ((v)->buf[(v)->len++] = (val), 0) /* Simple expandable vector implementation */ static inline int vec_expand(char **buf, int *length, int *cap, int memsz) { if (*length + 1 > *cap) { void *ptr; int n = (*cap == 0) ? 1 : *cap << 1; ptr = realloc(*buf, n * memsz); if (ptr == NULL) { return -1; /* allocation failed */ } *buf = (char *) ptr; *cap = n; } return 0; } ``` 實做 `vec_push` 要考慮到許多議題，例如記憶體不足、vector 本身資料結構的更新維護。但都透過巨集定義，隱藏實作細節。這邊 vector 的作法，每當空間不夠就 realloc 兩倍的空間，這樣預先分配多的記憶體，可以減少之後每次需要增加記憶體的次數。 :+1: 改進方案: 在 `vector` 初始化階段，預先透過 kmalloc/krealloc 分配多個結構程式碼修改如下所示 (對應 commit [557fa1ca14e2](https://github.com/AdrianHuang/kcalc/commit/557fa1ca14e250977232ffccbe0ee503e0f348bd)): 延續上述手法，從 `heap` 空間取得記憶體空間，但可省去一開始 `vec_push()` 跟作業系統核心要求分配記憶體空間的成本。 ```diff diff --git a/expression.c b/expression.c index 8fa95bb..355b795 100644 --- a/expression.c +++ b/expression.c @@ -738,15 +738,15 @@ struct expr *expr_create(const char *s, struct expr *result = NULL; - vec_expr_t es = vec_init(); - vec_str_t os = vec_init(); - vec_arg_t as = vec_init(); + define_vec(vec_expr_t, es); + define_vec(vec_str_t, os); + define_vec(vec_arg_t, as); struct macro { char *name; vec_expr_t body; }; - vec(struct macro) macros = vec_init(); + define_vec(vec(struct macro), macros); int flags = EXPR_TDEFAULT; int paren = EXPR_PAREN_ALLOWED; @@ -819,7 +819,8 @@ struct expr *expr_create(const char *s, if (paren == EXPR_PAREN_EXPECTED) { struct expr_string str = {"{", 1}; vec_push(&os, str); - struct expr_arg arg = {vec_len(&os), vec_len(&es), vec_init()}; + struct expr_arg arg = {vec_len(&os), vec_len(&es), + vec_init(arg.args)}; vec_push(&as, arg); } else if (paren == EXPR_PAREN_ALLOWED) { struct expr_string str = {"(", 1}; diff --git a/expression.h b/expression.h index 6d5b57f..bc47678 100644 --- a/expression.h +++ b/expression.h @@ -22,10 +22,18 @@ extern "C" { int len; \ int cap; \ } -#define vec_init() \ - { \ - NULL, 0, 0 \ + +#define PRE_ALLOCATED_VEC_NR 4 + +#define vec_init(var) \ + { \ + (__typeof__(var.buf)) krealloc( \ + NULL, PRE_ALLOCATED_VEC_NR * sizeof(*var.buf), GFP_KERNEL), \ + 0, PRE_ALLOCATED_VEC_NR \ } + +#define define_vec(v, var) v var = vec_init(var); + #define vec_len(v) ((v)->len) #define vec_unpack(v) \ (char **) &(v)->buf, &(v)->len, &(v)->cap, sizeof(*(v)->buf) ``` 接著可透過 `expression.c` 中的 `expr_eval()` 中得知如何計算 expression 結果： ```cpp float expr_eval(struct expr *e) { float n; switch (e->type) { case OP_UNARY_MINUS: return -(expr_eval(&e->param.op.args.buf[0])); case OP_UNARY_LOGICAL_NOT: return !(expr_eval(&e->param.op.args.buf[0])); case OP_UNARY_BITWISE_NOT: return ~(to_int(expr_eval(&e->param.op.args.buf[0]))); case OP_POWER: return powf(expr_eval(&e->param.op.args.buf[0]), expr_eval(&e->param.op.args.buf[1])); case OP_MULTIPLY: return expr_eval(&e->param.op.args.buf[0]) * expr_eval(&e->param.op.args.buf[1]); case OP_DIVIDE: return expr_eval(&e->param.op.args.buf[0]) / expr_eval(&e->param.op.args.buf[1]); case OP_REMAINDER: return fmodf(expr_eval(&e->param.op.args.buf[0]), expr_eval(&e->param.op.args.buf[1])); } ... } ``` 從上面可以知道，它透過遞迴呼叫，計算目前節點的左子節點跟右子節點的結果，最後根據自己的 operator 將左子節點跟右子節點的結果作運算 ### 效能分析利用 `test-bench.c` 來查看結果： ```shell BENCH 5: 5.650043 ns/op (176M op/sec) BENCH 5+5+5+5+5+5+5+5+5+5: 39.499998 ns/op (25M op/sec) BENCH 5*5*5*5*5*5*5*5*5*5: 38.609982 ns/op (25M op/sec) BENCH 5,5,5,5,5,5,5,5,5,5: 42.056084 ns/op (23M op/sec) BENCH ((5+5)+(5+5))+((5+5)+(5+5))+(5+5): 34.670115 ns/op (28M op/sec) BENCH x=5: 3.753185 ns/op (266M op/sec) BENCH x=5,x+x+x+x+x+x+x+x+x+x: 44.126987 ns/op (22M op/sec) BENCH x=5,((x+x)+(x+x))+((x+x)+(x+x))+(x+x): 43.509007 ns/op (22M op/sec) BENCH a=1,b=2,c=3,d=4,e=5,f=6,g=7,h=8,i=9,j=10: 59.928894 ns/op (16M op/sec) BENCH a=1,a=2,a=3,a=4,a=5,a=6,a=7,a=8,a=9,a=10: 58.668852 ns/op (17M op/sec) ``` ## :point_right: 學習浮點運算和定點數操作 * [你所不知道的 C 語言: 浮點數運算](https://hackmd.io/@sysprog/c-floating-point) * [Fixed-point arithmetic](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) * Linux 核心的 CFS 排程器就用到 fixed-point * [linux/include/linux/sched.h](https://github.com/torvalds/linux/blob/master/include/linux/sched.h) ```cpp /* * Integer metrics need fixed point arithmetic, e.g., sched/fair * has a few: load, load_avg, util_avg, freq, and capacity. * * We define a basic fixed point arithmetic range, and then formalize * all these metrics based on that basic range. */ ``` * [linux/kernel/sched/fair.c](https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c) ```clike /* * The exact cpuload calculated at every tick would be: * * load' = (1 - 1/2^i) * load + (1/2^i) * cur_load * * If a CPU misses updates for n ticks (as it was idle) and update gets * called on the n+1-th tick when CPU may be busy, then we have: * * load_n = (1 - 1/2^i)^n * load_0 * load_n+1 = (1 - 1/2^i) * load_n + (1/2^i) * cur_load * * decay_load_missed() below does efficient calculation of * * load' = (1 - 1/2^i)^n * load * * Because x^(n+m) := x^n * x^m we can decompose any x^n in power-of-2 factors. * This allows us to precompute the above in said factors, thereby allowing the * reduction of an arbitrary n in O(log_2 n) steps. (See also * fixed_power_int()) * * The calculation is approximated on a 128 point scale. */ ``` * [Floating Point Numbers and printk()](http://www.kernelthread.com/publications/faq/335.html) ### 計算 Load Average 命令 `uptime` 可顯示過去 1/5/15 分鐘的系統平均負載 ("load average: 0.01, 0.06, 0.02")，如下所示: ```shell $ uptime 11:40:45 up 49 days, 2:47, 3 users, load average: 0.01, 0.06, 0.02 ``` 計算公式使用 [Exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing)，或者可參考 [Linux Kernel Load Average 計算分析](http://brytonlee.github.io/blog/2014/05/07/linux-kernel-load-average-calc/)。正如字面上的意思，Load Average 表示近期的負載平均，近期可分成三種時間間隔，計算的公式如下： $load_t = n \times \alpha + (1- \alpha) \times load_{t-1}$ > α : 平滑因子 (smoothing factor)。顯然，這個公式的計算涉及浮點數運算，但 Linux 核心無法直接使用浮點數運算，於是定點運算就派上用場。在 Linux 的核心中為了計算 load average 的數值，使用到了定點數的概念，首先看到 [`include/linux/sched.h`](https://elixir.bootlin.com/linux/v4.0/source/include/linux/sched.h#L154) 中相關的 macro : ```cpp ... extern unsigned long avenrun[]; /* Load averages */ extern void get_avenrun(unsigned long *loads, unsigned long offset, int shift); #define FSHIFT 11 /* nr of bits of precision */ #define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */ #define LOAD_FREQ (5*HZ+1) /* 5 sec intervals */ #define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-point */ #define EXP_5 2014 /* 1/exp(5sec/5min) */ #define EXP_15 2037 /* 1/exp(5sec/15min) */ #define CALC_LOAD(load,exp,n) \ load *= exp; \ load += n*(FIXED_1-exp); \ load >>= FSHIFT; ... ``` 可以看到在計算時會使用到的精度 `FSHIFT` 與在使用 [Exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing) 會用到的常數 `EXP_1`, `EXP_5` 與 `EXP_15` 而在實作的部分在 [`kernel/sched/proc.c`](https://elixir.bootlin.com/linux/v4.0/source/kernel/sched/proc.c) 中可以看到在 `calc_global_load` 函式中 ```cpp void calc_global_load(unsigned long ticks) { long active, delta; if (time_before(jiffies, calc_load_update + 10)) return; /* * Fold the 'old' idle-delta to include all NO_HZ cpus. */ delta = calc_load_fold_idle(); if (delta) atomic_long_add(delta, &calc_load_tasks); active = atomic_long_read(&calc_load_tasks); active = active > 0 ? active * FIXED_1 : 0; avenrun[0] = calc_load(avenrun[0], EXP_1, active); avenrun[1] = calc_load(avenrun[1], EXP_5, active); avenrun[2] = calc_load(avenrun[2], EXP_15, active); calc_load_update += LOAD_FREQ; /* * In case we idled for multiple LOAD_FREQ intervals, catch up in bulk. */ calc_global_nohz(); } ``` 為了分別計算 1, 5, 15 秒的 load average，在實作上對 `avenrun` 三個元素進行運算，利用 `calc_load` 計算數值而 `calc_load` 的函式如下： ```cpp /* * a1 = a0 * e + a * (1 - e) */ static unsigned long calc_load(unsigned long load, unsigned long exp, unsigned long active) { load *= exp; load += active * (FIXED_1 - exp); load += 1UL << (FSHIFT - 1); return load >> FSHIFT; } ``` 可以看到數值使用是 `unsigned long` 來實作，而為了維持定點數計算的正確性，在結束乘法運算後需要將數值向右 shift 才會是計算的結果當 load average 被算出來之後，接著就是讓其可以正確地顯示出來。在 `fs/proc/loadavg.c` 的實作中 ```cpp static int loadavg_proc_show(struct seq_file *m, void *v) { unsigned long avnrun[3]; get_avenrun(avnrun, FIXED_1/200, 0); seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n", LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]), LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]), LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]), nr_running(), nr_threads, task_active_pid_ns(current)->last_pid); return 0; } ``` `LOAD_INT` 與 `LOAD_FRAC` 的定義如下 ```cpp #define LOAD_INT(x) ((x) >> FSHIFT) #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100) ``` `LOAD_INT` 的實作就是將小數部分捨棄，就可以直接使用。而 `LOAD_FRAC` 的話，首先將 `x` 與 `FIXED_1-1` 進行 AND 運算，因為 `FIXED_1 = 1 << FSHIFT` 也就是 $100000000000_2$，減 1 後就變成 $011111111111_2$ 所以透過遮罩的方式就可以將 `x` 的小數部分取出。接著將其乘以 100 再使用 `LOAD_INT` 就可以取出小數部分的前兩位了 ### time 命令的 real time 計算在 `Linux Programmer's Manual` 的 `time(7)` 提到 > Real time is defined as time **measured from some fixed point**, either from a standard point in the past (see the description of the Epoch and calendar time below), or from some point (e.g., the start) in the life of a process (elapsed time). ## kcalc: 將 [MathEx](https://github.com/jserv/mathex) 整合進 Linux 核心模組原本 [MathEx](https://github.com/jserv/mathex) 使用單精度浮點數格式 (即 `float`)，為了將其移植到 Linux 核心，我們將所有會使用到浮點數的程式碼操作都改為 32-bit 整數，並自行規範 fixed-point 用以計算。在現行 [kcalc](https://github.com/sysprog21/kcalc/) 實作中，LSB 4 bit 作為 exponent，MSB 28 bit 則挪作 mantissa。 ### 前期準備 :::info 1. 開發環境預期和 ==[lab0](https://hackmd.io/@sysprog/2020-lab0)== 相似，如果你剛好重新安裝 Ubuntu Linux，請依據指示將必要的開發套件裝好; 2. 從 [Homework2](https://hackmd.io/@sysprog/linux2020-homework2) 起，我們就有分析應用程式和 Linux 核心效能的需求，==不該在虛擬機器環境== 裡頭測試 (但 [container](https://linuxcontainers.org/) 或 [Linux KVM](https://www.linux-kvm.org/page/Main_Page) 可接受)，否則會有效能偏差的問題 $\to$ 及早在實體機器上安裝好 GNU/Linux; ::: * 自從 Linux 核心 4.4 版以來，Ubuntu Linux 預設開啟 `EFI_SECURE_BOOT_SIG_ENFORCE`，這使得核心模組需要適度的簽章才可掛載進入 Linux 核心，為了後續測試的便利，我們需要將 UEFI Secure Boot 的功能**關閉**，請見 [Why do I get “Required key not available” when install 3rd party kernel modules or after a kernel upgrade?](https://askubuntu.com/questions/762254/why-do-i-get-required-key-not-available-when-install-3rd-party-kernel-modules) * 檢查 Linux 核心版本 ```shell $ uname -r ``` 預期是大於等於 `4.15` 的版本，例如 `4.15.0-45-generic`。若在你的開發環境中，核心版本低於 `4.15` 的話，需要更新 Linux 核心，請自行參照相關文件 * 安裝 `linux-headers` 套件 (注意寫法裡頭有 `s`)，以 [Ubuntu Linux 18.04 LTS](https://packages.ubuntu.com/bionic/linux-headers-generic) 為例: ```shell $ sudo apt install linux-headers-`uname -r` ``` * 確認 `linux-headers` 套件已正確安裝於開發環境 ```shell $ dpkg -L linux-headers-4.15.0-45-generic | grep "/lib/modules" ``` 預期得到以下輸出: ``` /lib/modules/4.15.0-45-generic/build ``` * 檢驗目前的使用者身份 ```shell $ whoami ``` 預期為「不是 root 的使用者名稱」，例如 `jserv` (或者你安裝 Ubuntu Linux 指定的登入帳號名稱)。由於測試過程需要用到 sudo，請一併查驗: ```shell $ sudo whoami ``` 預期輸出是 `root` :notes: 在下列操作中，請==避免用 root 帳號輸入命令==，而該善用 `sudo` :::danger 之後的實驗中，我們會重建 root file system，若濫用 root 權限，很可能就把 GNU/Linux 開發環境不小心破壞 (當然，你還是可重新安裝)，現在開始養成好習慣 ::: ### 取得 kcalc 程式碼並測試取得 kcalc 原始程式碼並編譯: (注意到 ==`kcalc-fixed`==) ```shell $ git clone https://github.com/sysprog21/kcalc-fixed $ cd kcalc $ make ``` 預期在 `kcalc` 目錄可見到新檔案: * `calc.ko`: 將 [MathEx](https://github.com/jserv/mathex) 整合進 Linux 核心作為 character device，對使用者層級提供 `/dev/calc` 裝置檔案; 測試方式: ```shell $ sudo insmod calc.ko $ sudo chmod 0666 /dev/calc ``` 接著就可輸入欲計算數學式: ```shell $ echo -ne "3*5\0" > /dev/calc ``` 執行 `$ dmesg` 可傾印 Linux 核心訊息，預期會有相關的訊息，例如: ``` [415096.766537] calc: Initializing the module [415096.766599] calc: registered correctly with major number 236 [415096.766668] calc: device class registered correctly [415096.766782] calc: device class created correctly [415110.111843] calc: Received 4 -> 3*5 [415110.111910] calc: Result: 64424509440 ``` 你一定會納悶，為何 `3 * 5` 的結果竟是 `64424509440` 呢？因為這是我們自行定義的 fixed-point 表示法，我們透過 kcalc 提供的工具來轉換: ```shell $ ./eval 64424509440 ``` 預期會得到 `15.000000` 這個數值，也就是 $3 \times 5$ 的運算結果。 kcalc 提供一套內建的自動測試工具，使用方式: ```shell $ make check ``` 預期輸出: (僅列出最後 15 行) ``` Testing $(number, 1), $(number, 2+3), number() ... 5 Testing nop() ... -.10000000000000000000 Testing nop() ... 0 [417351.062510] calc: Received 6 -> nop() [417351.062641] calc: Result: 0 [417351.062696] calc: Device successfully closed [417351.063787] calc: size: 2 result: 0 [417351.063861] calc: Device successfully closed Complete ``` 關於 `MathEx` 支援的語法範例，可參見 [kcalc/scripts/test.sh](https://github.com/sysprog21/kcalc/blob/master/scripts/test.sh) 內容。 ### fixed-point 結構本實作參考 [`include/linux/sched.h`](https://elixir.bootlin.com/linux/v4.0/source/include/linux/sched.h#L154) 中所使用的定點數，並設計以下結構： ```cpp typedef union __fixedp { struct { uint32_t frac; uint32_t inte; }; uint64_t data; } fixedp; ``` 也就是使用 32 bit 記錄「整數」部分，以及另外 32 bit 記錄「小數」部分。除了擴充可記錄的數值範圍外，更減少在存取整數部分時的計算步驟：在 [`include/linux/sched.h`](https://elixir.bootlin.com/linux/v4.0/source/include/linux/sched.h#L154) 中所使用的結構在存取整數部分時需要做位移，但在此結構中只需取出 `inte` 的 field member 即可；而在進行加減運算時，不需要考慮 exponent 不同的議題，直接運算即可。在進行乘除時，則可以利用 `data` 的 field member 來做位移修正。將字串轉換為 fixed-point 的實作如下： ```cpp static uint64_t expr_parse_number(const char *s, size_t len) { fixedp num = {0}; int frac = 0; int dot = 0; unsigned int digits = 0; for (unsigned int i = 0; i < len; i++) { if (s[i] == '.' && dot == 0) { dot = i + 1; continue; } if (isdigit(s[i])) { if (dot) frac++; else digits++; } else return NAN_INT; } static int pow10[] = { 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000, 100000000, 1000000000, }; if (dot) { uint32_t ipt = 0, tmp; uint32_t mask = pow10[frac]; int i = 31; sscanf(s, "%u.%u", &tmp, &ipt); while (ipt && i) { ipt <<= 1; if (ipt >= mask) { num.frac |= 1 << i; ipt %= mask; } i--; } } if (digits) sscanf(s, "%u", &num.inte); return (digits > 0 ? num.data : NAN_INT); } ``` ### 設計特殊值的表示法因為在計算時可能會碰到 `NaN`, `INF` 等情況發生，所以需要定義這些特殊值的表示方法： 1. `INF`: 針對 `fraction` 全為 1 的情況予以記錄 2. `NaN`: 最小的數值來表示 `NaN` ```cpp const uint64_t INF_INT = 9223372036854775808; const uint64_t NAN_INT = 1; ``` 將 `main.c` 中，為了符合新的 fixed-point 的實作，將 `user_func_fib` 進行修改： ```cpp uint64_t user_func_fib(struct expr_func *f, vec_expr_t args, void *c) { uint64_t n = expr_eval(&vec_nth(&args, 0)); return kfib(n >> 32) << 32; } ``` 因為在新的 fixed-point 的定義下，整數部分實際上能運算到的最大範圍還是 $2^{31} - 1$，所以在這樣的實作下 `fib` 實際能正確計算出的最大數值是 `fib(75)`。 ### 實作 fixed-point 轉換程式因為更改了 fixed-point 的實作，所以為了能檢驗運算的結果，需要一個新的 fixed-point 轉換的程式： ```cpp #include <stdio.h> #include <stdlib.h> #include <string.h> #include "fixed-point.h" int main(int argc, char *argv[]) { if (argc < 2) return -1; fixedp ipt = {0}; ipt.data = strtoul(argv[1], NULL, 10); if (ipt.data == NAN_INT) puts("NAN_INT"); else if (ipt.data == INF_INT) puts("INF_INT"); else { double result = (int) ipt.inte + ((double) ipt.frac / UINT32_MAX); printf("%lf\n", result); } return 0; } ``` 使用定義在 `<stdint.h>` 中的 `UINT32_MAX` 將小數部分轉換為浮點數形式，並印出對應的浮點數數值。 ### 核心 API 在給定的 [kcalc/main.c](https://github.com/sysprog21/kcalc/blob/master/main.c) 檔案中，和 [fibdrv](https://github.com/sysprog21/fibdrv) 一樣提供 character device，但註冊用的核心 API 不同 (`register_chrdev` vs. `alloc_chrdev_region`)，差異和適用場合為何呢？ ```cpp int register_chrdev_region(dev_t from, unsigned count, const char *name) ``` > 參數: > * `from`: the first in the desired range of device numbers; must include the major number. > * `count`: the number of consecutive device numbers required > * `name`: the name of the device or driver 搭配 ` MKDEV ` 這個 macro 使用。行為： * 先使用 major number(12 bits) 與 minor number (20 bits) 使用` MKDEV `取得`from` 這個參數，即這個裝置驅動程式獨一無二的識別碼; * 呼叫 `register_chrdev_region`，若回傳值為 `0`，即成功; * 即使上述成功，還要使用 `cdev_init()` 將把 file operations 註冊到 remapping table 裡; * 最後 `cdev_add`，將裝置驅動程式註冊到 Linux 核心; ```cpp int register_chrdev(unsigned int major, const char *name, const struct file_operations *fops); ``` > 參數: > * `major`: major device number or 0 for dynamic allocation > * `name`: name of this range of devices > * `fops`: file operations associated with the devices 行為： * 該函式回傳值就是 major number * 因此只要 `= 0` 或 `< 0` 都是失敗的情況。 ```cpp int alloc_chrdev_region (dev_t *dev, unsigned baseminor, unsigned count, const char *name); ``` > 參數: > * `dev`: output parameter for first assigned number > * `baseminor`: first of the requested range of minor numbers > * `count`: the number of minor numbers required > * `name`: the name of the associated device or driver 行為： * 這邊的 `dev` 這個參數不需要事先指定數值; * 傳入 `baseminor` 到函式，核心會開始找可用的 minor number > For instance, baseminor = 10, your minor number could be any number >= 10! * 此數 major number 都由系統配置 * 回傳值若等於 0，即成功。 * 同樣，後面要做 `cdev_init`, `cdev_add` 比較上述 3 個核心 API: | 核心 API | Major number | Minor number | | --- | --- | --- | | `register_chrdev` | 可自動分配也可開發者指定 | 系統分配 | | `alloc_chrdev_region` | 系統自動分配 | 開發者自行指定基底和範圍 | | `register_chrdev_region` | 開發者指定 | 系統分配 | 簡言之，`register_chrdev` 承包大部分的工作，也可自動分配，不過沒辦法指定 minor number。`alloc_chrdev_region` 與 `register_chrdev_region` 作用相似，只分配 major, minor number，剩下的 file operations remapping 跟後續核心的動作仍要由開發者自行處理。`register_chrdev_region` 可指定 major number。在 [Char Drivers](http://static.lwn.net/images/pdf/LDD3/ch03.pdf) 可看到下面這段描述: > If you dig through much driver code in the 2.6 kernel, you may notice that quite a few char drivers do not use the cdev interface that we have just described. What you are seeing is older code that has not yet been upgraded to the 2.6 interface. Since that code works as it is, this upgrade may not happen for a long time. For completeness, we describe the older char device registration interface, but new code should not use it; this mechanism will likely go away in a future kernel. 大意是說在 Linux 核心 2.6 版以前，只有 `register_chrdev` 可使用，但核心開發者建議我們用新的 API (即 `alloc_chrdev_region`)，因為在日後的版本，原本的 `register_chrdev` 可能會移去。 ## :penguin: 作業要求 * 在 GitHub 上 fork [kcalc-fixed](https://github.com/sysprog21/kcalc-fixed)，目標是擴充數值運算器的功能，例如新增 `sqrt`, `Sigma` ($\sum$), `Pi` ($\Pi$) 等等，可從 [math-expression-evaluator](https://github.com/bugwheels94/math-expression-evaluator) 的 "Supported symbols" 選出幾項來實作。過程中應一併完成以下: * 擴充數值運算器的時候，應一併提供測試案例，且該儘量反映出有效的數值範圍 * 留意乘法和加法的 overflow、減法的 underflow 等處理機制，分析後予以強化 * 確保數值運算的 precision 和 accuracy，甚至要考慮 [Kahan summation algorithm](https://en.wikipedia.org/wiki/Kahan_summation_algorithm) $\to$ 你可以修改原有 fixed-point 的設計和實作方式 * 觀察原始程式碼的 `expression.[ch]` 裡頭 `vec` 相關的程式碼，提出對空間效率更好的實作。考慮到 `kcalc-fixed` 的使用情境往往是持續接受一系列字串輸入，應避免頻繁的記憶體配置和釋放 * 修正現有 `kcalc-fixed` 內部數值運算的錯誤，並留意到不合法的記憶體操作; * 指出原本 fixed-point 實作的限制，觀察 Linux 核心所用的 fixed-point 運算程式碼，提出針對 kcalc 的改進方案 * 原有的測試程式不會檢驗輸出數值，應予以擴充，得以自動和預先輸入的期望數值進行比對; ## 繳交方式編輯 [Homework6 作業區共筆](https://hackmd.io/@sysprog/2020-homework6)，將你的觀察、上述要求的解說、應用場合探討，以及各式效能改善過程，善用 gnuplot 製圖，紀錄於新建立的共筆 ## 截止日期 * Nov 9, 2020 (含) 之前 > 越早在 GitHub 上有動態、越早接受 code review，評分越高