2020q1 Homework2 (fibdrv)

# 2020q1 Homework2 (fibdrv) contributed by < `ambersun1234` > ## 環境 ```shell $ uname -a Linux 4.15.0-74-generic #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 ``` ## 作業說明 + [作業說明](https://hackmd.io/@sysprog/linux2020-fibdrv#H03-fibdrv) ## 作業要求 ### 自我檢查清單 - [x] 研讀上述 ==Linux 效能分析的提示== 描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [x] 研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [x] `lsmod` 的輸出結果有一欄名為 `Used by`，這是 "each module's use count and a list of referring modules"，但如何實作出來呢？模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢？ - [x] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題 ### 作業要求 * 回答上述==自我檢查清單==的所有問題，需要附上對應的參考資料和必要的程式碼，以第一手材料 (包含自己設計的實驗) 為佳 * 在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv)，目標是修正 Fibonacci 數計算的正確性 (現有實作存在缺陷，請指出)，隨後改善 `fibdrv` 計算 Fibinacci 數列的執行效率，過程中需要量化執行時間，可在 Linux 核心模組和使用層級去測量 * 原本的程式碼只能列出到 Fibonacci(100) 而且==部分輸出是錯誤的數值==，請修改程式碼，列出後方更多數值 (注意: 檢查正確性和數值系統的使用) * 務必研讀上述 ==Linux 效能分析的提示== 的描述，降低效能分析過程中的干擾; * 在 Linux 核心模組中，可用 ktime 系列的 API; * 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API; * 分別[用 gnuplot 製圖](https://hackmd.io/@sysprog/Skwp-alOg)，分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷，單位需要用 us 或 ns (自行斟酌) * 嘗試解讀上述實驗的時間分佈，特別是隨著 Fibonacci 數列增長後，對於 Linux 核心的影響。可修改 VFS 函式，允許透過寫入 `/dev/fibonacci` 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。 ![](https://i.imgur.com/7xyCUVO.png) * 用 printk 固然方便，但其執行時間易受各種因素而不穩定，除了透過統計來觀察，也可改用核心的 sysfs 介面，後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作，可參見 [Sample kobject implementation](https://elixir.bootlin.com/linux/latest/source/samples/kobject/kobject-example.c) (注意: 切換到對應的 Linux 核心版本)。 * 逐步最佳化 Fibonacci 的執行時間，引入[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number) 提到的策略，並善用 [clz / ffs](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令 (Linux 核心有對應的包裝函式)，過程中都要充分量化 ## 自我檢查清單 + GNU/Linux 效能必要設定 + 以 `isolcpus` & `taskset` 限定 CPU給特定的程式使用 + 關閉 [Address Space Layout Randomization (ASLR)](https://blog.csdn.net/Plus_RE/article/details/79199772) + `$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"` + ASLR 是一個跟安全性有關的東西，一開始我覺的它應該很難跟效能扯上關係。於是我找到了一篇2015的論文 [Performance and Entropy of Various ASLR Implementations](http://pages.cs.wisc.edu/~riccardo/736finalpaper.pdf) 他的結論如下： > Overall, our findings and experiments supported our initial hypothesis that ASLR adds significant security with very minimal performance impact + 也就是說 ASLR 大大的增加了安全性，同時僅僅犧牲少量的效能 + [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 + ### Q-matrix + 考慮費氏數列兔子問題，假設成年兔子為 `a` ，幼年兔子為 `b` ，我們可以得到以下關係 $$a_{n+1} = a_n + b_n \\ b_{n+1} = a_n$$ + 根據上述推導公式，把它寫成矩陣的形式就會變成 $$ \begin{pmatrix} a_{n+1} \\ b_{n+1} \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} a_n \\ b_n \end{pmatrix} $$ + 則 $\begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}$ 就是所謂的 ==Q-matrix==，那我們可以把它寫成 $$ Q = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} = \begin{pmatrix} F_2 & F_1 \\ F_1 & F_0 \end{pmatrix} \\ Q^n = \begin{pmatrix} F_{n+1} & F_n \\ F_n & F_{n-1} \end{pmatrix} $$ > [The Fibonacci Q-matrix](https://www.youtube.com/watch?v=lTHVwsHJrG0) + ### fast-doubling $\begin{split} \begin{bmatrix} F(2n+1) \\ F(2n) \end{bmatrix} &= \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^{2n} \begin{bmatrix} F(1) \\ F(0) \end{bmatrix}\\ \\ &= \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^n \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^n \begin{bmatrix} F(1) \\ F(0) \end{bmatrix}\\ \\ &= \begin{bmatrix} F(n+1) & F(n) \\ F(n) & F(n-1) \end{bmatrix} \begin{bmatrix} F(n+1) & F(n) \\ F(n) & F(n-1) \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix}\\ \\ &= \begin{bmatrix} F(n+1)^2 + F(n)^2\\ F(n)F(n+1) + F(n-1)F(n) \end{bmatrix} \end{split}$ + 所以得到兩個公式 $\begin{split} F(2k) &= F(k)[2F(k+1) - F(k)] \\ F(2k+1) &= F(k+1)^2+F(k)^2 \end{split}$ + 根據 fibonacci sequence 原始定義 `F(0) = 0, F(1) = 1, F(2) = 1` 我們就可以用這三個值，依據上述公式推導出之後的值 + 而 fast doubling 裡面有用到以2為基礎的乘法運算，而這種運算恰好可以使用 `clz, ctz` 做加速，以當前 bit 為0或1選擇 fast doubling 的公式；而數字前多餘的0做運算是沒有任何作用的，於是可以使用 `clz` 計算有多少 bit 是不需要做運算的，進而達到加速作用 + `lsmod` 的輸出結果有一欄名為 `Used by`，這是 "each module's use count and a list of referring modules"，但如何實作出來呢？模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢？ + [reference counting](https://en.wikipedia.org/wiki/Reference_counting) 可以用在 `garbage collection(gc)`, `file system` 等。 > In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. + 尤其在效能低落的 real time system 裡面，gc 反應性可為至關重要，透過 reference counting 可以很即時的知道該資源是否有被使用，能否釋放 + [Is there a way to figure out what is using a Linux kernel module? ](https://stackoverflow.com/questions/448999/is-there-a-way-to-figure-out-what-is-using-a-linux-kernel-module) 討論區中有提到可以使用 `try_module_get` 以及 `try_module_put` 新增刪減 reference count + `try_module_get` 定義於 [linux/module.c](https://github.com/torvalds/linux/blob/14cd0bd04907df79b36a31e55f18768172230987/kernel/module.c#L1125) ```c bool try_module_get(struct module *module) { bool ret = true; if (module) { preempt_disable(); /* Note: here, we can fail to get a reference */ if (likely(module_is_live(module) && atomic_inc_not_zero(&module->refcnt) != 0)) trace_module_get(module, _RET_IP_); else ret = false; preempt_enable(); } return ret; } EXPORT_SYMBOL(try_module_get); ``` + 我嘗試搜尋了一下 `lsmod` 的實做，找到了 `query_module` 這個 system call，但是可惜的是這個已經在 kernel 2.6 之後被棄用了詳情：[man 2 query_module](http://man7.org/linux/man-pages/man2/query_module.2.html) + 而 `lsmod` 的實做根據 [man lsmod](http://man7.org/linux/man-pages/man8/lsmod.8.html) 所述也只是 format `/proc/modules` 而已 + 為了更加的了解 `lsmod` 我參考了 [Do both lsmod and /proc/modules use the same mechanism to retrieve modules?](https://superuser.com/a/1211088) 的方法，得出結果如下 ```shell $ sudo strace lsmod |& grep nvme open("/sys/module/nvme/refcnt", O_RDONLY|O_CLOEXEC) = 3 open("/sys/module/nvme", O_RDONLY|O_CLOEXEC) = 3 open("/sys/module/nvme/holders", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 open("/sys/module/nvme_core/refcnt", O_RDONLY|O_CLOEXEC) = 3 open("/sys/module/nvme_core", O_RDONLY|O_CLOEXEC) = 3 open("/sys/module/nvme_core/holders", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 nvme 36864 0 nvme_core 61440 5 nvme ``` + 可以發現到基本上還是透過取得檔案文件內容輸出做顯示 + 根據 ==refcnt== 這條線索，`refcnt` 是透過這個 function 進行取得的 [linux/module.c](https://github.com/torvalds/linux/blob/14cd0bd04907df79b36a31e55f18768172230987/kernel/module.c#L952) ```c /** * module_refcount - return the refcount or -1 if unloading * * @mod: the module we're checking * * Returns: * -1 if the module is in the process of unloading * otherwise the number of references in the kernel to the module */ int module_refcount(struct module *mod) { return atomic_read(&mod->refcnt) - MODULE_REF_BASE; } EXPORT_SYMBOL(module_refcount); ``` + 而透過 struct module 裡面的 `source_list` & `target_list` 則可以取得被哪些模組使用 ```c /* * Module a uses b * - we add 'a' as a "source", 'b' as a "target" of module use * - the module_use is added to the list of 'b' sources (so * 'b' can walk the list to see who sourced them), and of 'a' * targets (so 'a' can see what modules it targets). */ static int add_module_usage(struct module *a, struct module *b) { struct module_use *use; pr_debug("Allocating new usage for %s.\n", a->name); use = kmalloc(sizeof(*use), GFP_ATOMIC); if (!use) return -ENOMEM; use->source = a; use->target = b; list_add(&use->source_list, &b->source_list); list_add(&use->target_list, &a->target_list); return 0; } ``` <hr> + `struct module` 定義在 [linux/module.h](https://github.com/torvalds/linux/blob/master/include/linux/module.h#L360) 裡面 ```c struct module { enum module_state state; /* Member of list of modules */ struct list_head list; /* Unique handle for this module */ char name[MODULE_NAME_LEN]; /* Sysfs stuff. */ struct module_kobject mkobj; struct module_attribute *modinfo_attrs; const char *version; const char *srcversion; struct kobject *holders_dir; /* Exported symbols */ const struct kernel_symbol *syms; const s32 *crcs; unsigned int num_syms; /* Kernel parameters. */ #ifdef CONFIG_SYSFS struct mutex param_lock; #endif struct kernel_param *kp; unsigned int num_kp; /* GPL-only exported symbols. */ unsigned int num_gpl_syms; const struct kernel_symbol *gpl_syms; const s32 *gpl_crcs; #ifdef CONFIG_UNUSED_SYMBOLS /* unused exported symbols. */ const struct kernel_symbol *unused_syms; const s32 *unused_crcs; unsigned int num_unused_syms; /* GPL-only, unused exported symbols. */ unsigned int num_unused_gpl_syms; const struct kernel_symbol *unused_gpl_syms; const s32 *unused_gpl_crcs; #endif #ifdef CONFIG_MODULE_SIG /* Signature was verified. */ bool sig_ok; #endif bool async_probe_requested; /* symbols that will be GPL-only in the near future. */ const struct kernel_symbol *gpl_future_syms; const s32 *gpl_future_crcs; unsigned int num_gpl_future_syms; /* Exception table */ unsigned int num_exentries; struct exception_table_entry *extable; /* Startup function. */ int (*init)(void); /* Core layout: rbtree is accessed frequently, so keep together. */ struct module_layout core_layout __module_layout_align; struct module_layout init_layout; /* Arch-specific module values */ struct mod_arch_specific arch; unsigned long taints; /* same bits as kernel:taint_flags */ #ifdef CONFIG_GENERIC_BUG /* Support for BUG */ unsigned num_bugs; struct list_head bug_list; struct bug_entry *bug_table; #endif #ifdef CONFIG_KALLSYMS /* Protected by RCU and/or module_mutex: use rcu_dereference() */ struct mod_kallsyms __rcu *kallsyms; struct mod_kallsyms core_kallsyms; /* Section attributes */ struct module_sect_attrs *sect_attrs; /* Notes attributes */ struct module_notes_attrs *notes_attrs; #endif /* The command line arguments (may be mangled). People like keeping pointers to this stuff */ char *args; #ifdef CONFIG_SMP /* Per-cpu data. */ void __percpu *percpu; unsigned int percpu_size; #endif #ifdef CONFIG_TRACEPOINTS unsigned int num_tracepoints; tracepoint_ptr_t *tracepoints_ptrs; #endif #ifdef CONFIG_TREE_SRCU unsigned int num_srcu_structs; struct srcu_struct **srcu_struct_ptrs; #endif #ifdef CONFIG_BPF_EVENTS unsigned int num_bpf_raw_events; struct bpf_raw_event_map *bpf_raw_events; #endif #ifdef CONFIG_JUMP_LABEL struct jump_entry *jump_entries; unsigned int num_jump_entries; #endif #ifdef CONFIG_TRACING unsigned int num_trace_bprintk_fmt; const char **trace_bprintk_fmt_start; #endif #ifdef CONFIG_EVENT_TRACING struct trace_event_call **trace_events; unsigned int num_trace_events; struct trace_eval_map **trace_evals; unsigned int num_trace_evals; #endif #ifdef CONFIG_FTRACE_MCOUNT_RECORD unsigned int num_ftrace_callsites; unsigned long *ftrace_callsites; #endif #ifdef CONFIG_LIVEPATCH bool klp; /* Is this a livepatch module? */ bool klp_alive; /* Elf information */ struct klp_modinfo *klp_info; #endif #ifdef CONFIG_MODULE_UNLOAD /* What modules depend on me? */ struct list_head source_list; /* What modules do I depend on? */ struct list_head target_list; /* Destruction function. */ void (*exit)(void); atomic_t refcnt; #endif #ifdef CONFIG_CONSTRUCTORS /* Constructor functions. */ ctor_fn_t *ctors; unsigned int num_ctors; #endif #ifdef CONFIG_FUNCTION_ERROR_INJECTION struct error_injection_entry *ei_funcs; unsigned int num_ei_funcs; #endif } ____cacheline_aligned __randomize_layout; ``` + `fibdrv.c` 裡面存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣，什麼場景會需要呢？ + fibdrv 本身是一個 `character device driver`，device driver 可以跑在 `kernel space` 以及 `user space` > [Emulating a character device from userspace ](https://unix.stackexchange.com/questions/477117/emulating-a-character-device-from-userspace) + 而本次作業是使用在 kernel space，作業系統當中充斥著許多中斷信號(`interrupt`)，使得 CPU 得要保存當前程式數據，避免資料遺失。 + 為了測試在多執行緒下，`device driver` 執行程式的資料正確性，我做了一個簡單的實驗，程式碼可參考 [\[gist\] ambersun1234](https://gist.github.com/ambersun1234/a003dd264d8ec3fe95f60a4254cf6721)。執行結果如下所示 + ![](https://i.imgur.com/bkn4GVa.png) + 可以看到在大部份的情況下皆正確，但是 highlight 的那一行資料是錯誤的。由此可知在多執行緒的狀況下 ## 分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷 + ![](https://i.imgur.com/j5gWmRT.png) + 可以看到不管在多少數字下，kernel space 傳遞到 user space 的時間基本上是固定的 > 但是這個僅代表 **傳遞時間為固定的** + 以下為實驗程式碼 [ambersun1234 - fibdrv(ee926c5)](https://github.com/ambersun1234/fibdrv/commit/ee926c5c46c3cbfd2c06d1f03676583141e3eec9) ```c static ktime_t curTime; /* calculate the fibonacci number at given offset */ static ssize_t fib_read(struct file *file, char *buf, size_t size, loff_t *offset) { char kbuffer[100] = {0}; curTime = ktime_get(); struct u64 result = fibonacci((int) (*offset)); curTime = ktime_sub(ktime_get(), curTime); sprintf(kbuffer, "%llu*18446744073709551616+%llu\n", result.msl, result.lsl); copy_to_user(buf, kbuffer, 100); return 0; } /* write operation is skipped */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { return ktime_to_ns(ktime_get()); } ``` ## fibonacci 執行時間 + ![](https://i.imgur.com/2OIaHjV.png) + 此程式碼為使用 [clz/ctz](https://en.wikipedia.org/wiki/Find_first_set#CLZ) 加速過的版本，可以看到隨著數字的增大，所耗費的時間也隨之增長 + 以下為實驗程式碼 [ambersun1234 - fibdrv(b4e9b70)](https://github.com/ambersun1234/fibdrv/commit/b4e9b7054b9599e20f876186aa29aa0b26e1963b) ```c static ktime_t curTime; /* write operation is skipped */ static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { curTime = ktime_get(); fibonacci((int) (*offset)); curTime = ktime_sub(ktime_get(), curTime); return ktime_to_ns(curTime); } ``` ## 印出更大的數字 + 原本的 fibdrv 只能夠印到 `92`，超過此數字會因為 `unsigned long long int` 的範圍限制而導致錯誤 + 於是我將2個 ull 合併成一個 struct，用來儲存更大的數值 ```c struct u64 { unsigned long long lsl; unsigned long long msl; }; ``` + 為了使 u64 的加減乘皆正確，必須撰寫相對應的大數處理 + 以下程式碼是相對應的實做 + 加法 :arrow_right: 需判斷 lsl 加法後會不會溢位，若會溢位則將 msl + 1 + 減法 :arrow_right: 需考慮是否需要借位進行減法即可 + 乘法 :arrow_right: 採用直式乘法的思考邏輯去做，不過是以 bit 為單位 ```c static struct u64 *adder(struct u64 *input1, struct u64 *input2) { struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL); r->lsl = 0; r->msl = 0; unsigned long long diff = 0; r->lsl = input1->lsl; r->msl = input1->msl + input2->msl; diff = ULLONG_MAX - r->lsl; if (input2->lsl >= diff) { r->lsl += input2->lsl; r->msl += 1; } else { r->lsl += input2->lsl; } return r; } static struct u64 *subtracter(struct u64 *input1, struct u64 *input2) { struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL); if (input1->lsl < input2->lsl) { unsigned long long mycarry = ULLONG_MAX; r->lsl = mycarry + input1->lsl - input2->lsl + 1; r->msl = input1->msl - input2->msl - 1; } else { r->lsl = input1->lsl - input2->lsl; r->msl = input1->msl - input2->msl; } return r; } static struct u64 *multiplier(struct u64 *input1, struct u64 *input2) { struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL); r->lsl = 0; r->msl = 0; size_t width = 8 * sizeof(unsigned long long); // 8 bits * how many bytes for (size_t i = 0; i < width; i++) { if ((input2->lsl >> i) & 0x1) { struct u64 tmp; r->msl += input1->msl << i; tmp.lsl = (input1->lsl << i); tmp.msl = i == 0 ? 0 : (input1->lsl >> (width - i)); r = adder(r, &tmp); } } for (size_t i = 0; i < width; i++) { if ((input2->msl >> i) & 0x1) { r->msl += (input1->lsl << i); } } return r; } ``` + 為了要將 fibonacci 加速，使用 `clz` 搭配 `fast doubling` 進行實做 ```c static int myclz(int input) { // use binary search method to check int count = 0; if ((input & 0xFFFF0000) == 0) { input <<= 16; count += 16; } // 1111 1111 1111 1111 if ((input & 0xFF000000) == 0) { input <<= 8; count += 8; } // 1111 1111 if ((input & 0xF0000000) == 0) { input <<= 4; count += 4; } // 1111 if ((input & 0xC0000000) == 0) { input <<= 2; count += 2; } // 1100 if ((input & 0x80000000) == 0) { input <<= 0; count += 1; } // 1000 return count; } ``` ```c static struct u64 fibonacci(int input) { unsigned int msb = myclz(input); unsigned int mask = (1 << (31 - msb - 1)); struct u64 cur = {.msl = 0, .lsl = 1}, next = {.msl = 0, .lsl = 1}; struct u64 mul = {.msl = 0, .lsl = 2}, zero = {.msl = 0, .lsl = 0}; /* fast doubling formula * f(2k) = f(k)[2f(k + 1) - f(k)] * f(2k + 1) = f(k + 1)^2 + f(k)^2 */ if (input == 0) return zero; if (input >= 1 && input <= 2) return cur; while (mask > 0) { if (mask & input) { // bit = 1: fast doubling then iterate 1 struct u64 *t0, *t1, *t2, *temp, *temp2; // t0 = cur * (2 * next - cur); // t1 = next * next + cur * cur; temp = multiplier(&next, &mul); temp = subtracter(temp, &cur); t0 = multiplier(&cur, temp); t2 = &next; temp = multiplier(&next, t2); t2 = &cur; temp2 = multiplier(&cur, t2); t1 = adder(temp, temp2); cur = *t0; next = *t1; // iterate 1 temp = adder(&cur, &next); cur = next; next = *temp; } else { // bit = 0: fast doubling struct u64 *t0, *t1, *t2, *temp, *temp2; // t0 = cur * (2 * next - cur); // t1 = next * next + cur * cur; temp = multiplier(&next, &mul); temp = subtracter(temp, &cur); t0 = multiplier(&cur, temp); t2 = &next; temp = multiplier(&next, t2); t2 = &cur; temp2 = multiplier(&cur, t2); t1 = adder(temp, temp2); cur = *t0; next = *t1; } mask >>= 1; } return cur; } ``` + 而除了自己實做 `clz` 之外，linux 也有包裝好的 `ffs` 可使用 + 實做結果正確性可使用 `make verify` & `make ktest` 進行驗證 ###### tags: `linux2020`