# 2020q1 Homework2 (fibdrv)
contributed by < `ambersun1234` >
## 環境
```shell
$ uname -a
Linux 4.15.0-74-generic #83~16.04.1-Ubuntu SMP Wed Dec 18 04:56:23 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
```
## 作業說明
+ [作業說明](https://hackmd.io/@sysprog/linux2020-fibdrv#H03-fibdrv)
## 作業要求
### 自我檢查清單
- [x] 研讀上述 ==Linux 效能分析的提示== 描述,在自己的實體電腦運作 GNU/Linux,做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗;
- [x] 研讀上述費氏數列相關材料 (包含論文),摘錄關鍵手法,並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
- [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise),思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本;
- [x] `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
- [x] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題
### 作業要求
* 回答上述==自我檢查清單==的所有問題,需要附上對應的參考資料和必要的程式碼,以第一手材料 (包含自己設計的實驗) 為佳
* 在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv),目標是修正 Fibonacci 數計算的正確性 (現有實作存在缺陷,請指出),隨後改善 `fibdrv` 計算 Fibinacci 數列的執行效率,過程中需要量化執行時間,可在 Linux 核心模組和使用層級去測量
* 原本的程式碼只能列出到 Fibonacci(100) 而且==部分輸出是錯誤的數值==,請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用)
* 務必研讀上述 ==Linux 效能分析的提示== 的描述,降低效能分析過程中的干擾;
* 在 Linux 核心模組中,可用 ktime 系列的 API;
* 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API;
* 分別[用 gnuplot 製圖](https://hackmd.io/@sysprog/Skwp-alOg),分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌)
* 嘗試解讀上述實驗的時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響。可修改 VFS 函式,允許透過寫入 `/dev/fibonacci` 裝置檔案來變更不同的 Fibonacci 求值的實作程式碼。

* 用 printk 固然方便,但其執行時間易受各種因素而不穩定,除了透過統計來觀察,也可改用核心的 sysfs 介面,後者可讓我們自行宣告 show 和 store 介面來提供讀寫操作,可參見 [Sample kobject implementation](https://elixir.bootlin.com/linux/latest/source/samples/kobject/kobject-example.c) (注意: 切換到對應的 Linux 核心版本)。
* 逐步最佳化 Fibonacci 的執行時間,引入[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number) 提到的策略,並善用 [clz / ffs](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令 (Linux 核心有對應的包裝函式),過程中都要充分量化
## 自我檢查清單
+ GNU/Linux 效能必要設定
+ 以 `isolcpus` & `taskset` 限定 CPU給特定的程式使用
+ 關閉 [Address Space Layout Randomization (ASLR)](https://blog.csdn.net/Plus_RE/article/details/79199772)
+ `$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"`
+ ASLR 是一個跟安全性有關的東西,一開始我覺的它應該很難跟效能扯上關係。於是我找到了一篇2015的論文 [Performance and Entropy of Various ASLR Implementations](http://pages.cs.wisc.edu/~riccardo/736finalpaper.pdf) 他的結論如下:
> Overall, our findings and experiments supported our initial hypothesis that ASLR adds significant security with very minimal performance impact
+ 也就是說 ASLR 大大的增加了安全性,同時僅僅犧牲少量的效能
+ [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說
+ ### Q-matrix
+ 考慮費氏數列兔子問題,假設成年兔子為 `a` ,幼年兔子為 `b` ,我們可以得到以下關係
$$a_{n+1} = a_n + b_n \\ b_{n+1} = a_n$$
+ 根據上述推導公式,把它寫成矩陣的形式就會變成
$$
\begin{pmatrix}
a_{n+1} \\ b_{n+1}
\end{pmatrix}
=
\begin{pmatrix}
1 & 1 \\
1 & 0
\end{pmatrix}
\begin{pmatrix}
a_n \\ b_n
\end{pmatrix}
$$
+ 則 $\begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}$ 就是所謂的 ==Q-matrix==,那我們可以把它寫成
$$
Q =
\begin{pmatrix}
1 & 1 \\
1 & 0
\end{pmatrix}
=
\begin{pmatrix}
F_2 & F_1 \\
F_1 & F_0
\end{pmatrix}
\\
Q^n =
\begin{pmatrix}
F_{n+1} & F_n \\
F_n & F_{n-1}
\end{pmatrix}
$$
> [The Fibonacci Q-matrix](https://www.youtube.com/watch?v=lTHVwsHJrG0)
+ ### fast-doubling
$\begin{split}
\begin{bmatrix}
F(2n+1) \\
F(2n)
\end{bmatrix} &=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^{2n}
\begin{bmatrix}
F(1) \\
F(0)
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^n
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}^n
\begin{bmatrix}
F(1) \\
F(0)
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
F(n+1) & F(n) \\
F(n) & F(n-1)
\end{bmatrix}
\begin{bmatrix}
F(n+1) & F(n) \\
F(n) & F(n-1)
\end{bmatrix}
\begin{bmatrix}
1 \\
0
\end{bmatrix}\\ \\ &=
\begin{bmatrix}
F(n+1)^2 + F(n)^2\\
F(n)F(n+1) + F(n-1)F(n)
\end{bmatrix}
\end{split}$
+ 所以得到兩個公式
$\begin{split}
F(2k) &= F(k)[2F(k+1) - F(k)] \\
F(2k+1) &= F(k+1)^2+F(k)^2
\end{split}$
+ 根據 fibonacci sequence 原始定義 `F(0) = 0, F(1) = 1, F(2) = 1` 我們就可以用這三個值,依據上述公式推導出之後的值
+ 而 fast doubling 裡面有用到以2為基礎的乘法運算,而這種運算恰好可以使用 `clz, ctz` 做加速,以當前 bit 為0或1選擇 fast doubling 的公式;而數字前多餘的0做運算是沒有任何作用的,於是可以使用 `clz` 計算有多少 bit 是不需要做運算的,進而達到加速作用
+ `lsmod` 的輸出結果有一欄名為 `Used by`,這是 "each module's use count and a list of referring modules",但如何實作出來呢?模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢?
+ [reference counting](https://en.wikipedia.org/wiki/Reference_counting) 可以用在 `garbage collection(gc)`, `file system` 等。
> In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.
+ 尤其在效能低落的 real time system 裡面,gc 反應性可為至關重要,透過 reference counting 可以很即時的知道該資源是否有被使用,能否釋放
+ [Is there a way to figure out what is using a Linux kernel module?
](https://stackoverflow.com/questions/448999/is-there-a-way-to-figure-out-what-is-using-a-linux-kernel-module) 討論區中有提到可以使用 `try_module_get` 以及 `try_module_put` 新增刪減 reference count
+ `try_module_get` 定義於 [linux/module.c](https://github.com/torvalds/linux/blob/14cd0bd04907df79b36a31e55f18768172230987/kernel/module.c#L1125)
```c
bool try_module_get(struct module *module)
{
bool ret = true;
if (module) {
preempt_disable();
/* Note: here, we can fail to get a reference */
if (likely(module_is_live(module) &&
atomic_inc_not_zero(&module->refcnt) != 0))
trace_module_get(module, _RET_IP_);
else
ret = false;
preempt_enable();
}
return ret;
}
EXPORT_SYMBOL(try_module_get);
```
+ 我嘗試搜尋了一下 `lsmod` 的實做,找到了 `query_module` 這個 system call,但是可惜的是這個已經在 kernel 2.6 之後被棄用了 詳情:[man 2 query_module](http://man7.org/linux/man-pages/man2/query_module.2.html)
+ 而 `lsmod` 的實做根據 [man lsmod](http://man7.org/linux/man-pages/man8/lsmod.8.html) 所述也只是 format `/proc/modules` 而已
+ 為了更加的了解 `lsmod` 我參考了 [Do both lsmod and /proc/modules use the same mechanism to retrieve modules?](https://superuser.com/a/1211088) 的方法,得出結果如下
```shell
$ sudo strace lsmod |& grep nvme
open("/sys/module/nvme/refcnt", O_RDONLY|O_CLOEXEC) = 3
open("/sys/module/nvme", O_RDONLY|O_CLOEXEC) = 3
open("/sys/module/nvme/holders", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
open("/sys/module/nvme_core/refcnt", O_RDONLY|O_CLOEXEC) = 3
open("/sys/module/nvme_core", O_RDONLY|O_CLOEXEC) = 3
open("/sys/module/nvme_core/holders", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
nvme 36864 0
nvme_core 61440 5 nvme
```
+ 可以發現到基本上還是透過取得檔案文件內容輸出做顯示
+ 根據 ==refcnt== 這條線索,`refcnt` 是透過這個 function 進行取得的 [linux/module.c](https://github.com/torvalds/linux/blob/14cd0bd04907df79b36a31e55f18768172230987/kernel/module.c#L952)
```c
/**
* module_refcount - return the refcount or -1 if unloading
*
* @mod: the module we're checking
*
* Returns:
* -1 if the module is in the process of unloading
* otherwise the number of references in the kernel to the module
*/
int module_refcount(struct module *mod)
{
return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
}
EXPORT_SYMBOL(module_refcount);
```
+ 而透過 struct module 裡面的 `source_list` & `target_list` 則可以取得被哪些模組使用
```c
/*
* Module a uses b
* - we add 'a' as a "source", 'b' as a "target" of module use
* - the module_use is added to the list of 'b' sources (so
* 'b' can walk the list to see who sourced them), and of 'a'
* targets (so 'a' can see what modules it targets).
*/
static int add_module_usage(struct module *a, struct module *b)
{
struct module_use *use;
pr_debug("Allocating new usage for %s.\n", a->name);
use = kmalloc(sizeof(*use), GFP_ATOMIC);
if (!use)
return -ENOMEM;
use->source = a;
use->target = b;
list_add(&use->source_list, &b->source_list);
list_add(&use->target_list, &a->target_list);
return 0;
}
```
<hr>
+ `struct module` 定義在 [linux/module.h](https://github.com/torvalds/linux/blob/master/include/linux/module.h#L360) 裡面
```c
struct module {
enum module_state state;
/* Member of list of modules */
struct list_head list;
/* Unique handle for this module */
char name[MODULE_NAME_LEN];
/* Sysfs stuff. */
struct module_kobject mkobj;
struct module_attribute *modinfo_attrs;
const char *version;
const char *srcversion;
struct kobject *holders_dir;
/* Exported symbols */
const struct kernel_symbol *syms;
const s32 *crcs;
unsigned int num_syms;
/* Kernel parameters. */
#ifdef CONFIG_SYSFS
struct mutex param_lock;
#endif
struct kernel_param *kp;
unsigned int num_kp;
/* GPL-only exported symbols. */
unsigned int num_gpl_syms;
const struct kernel_symbol *gpl_syms;
const s32 *gpl_crcs;
#ifdef CONFIG_UNUSED_SYMBOLS
/* unused exported symbols. */
const struct kernel_symbol *unused_syms;
const s32 *unused_crcs;
unsigned int num_unused_syms;
/* GPL-only, unused exported symbols. */
unsigned int num_unused_gpl_syms;
const struct kernel_symbol *unused_gpl_syms;
const s32 *unused_gpl_crcs;
#endif
#ifdef CONFIG_MODULE_SIG
/* Signature was verified. */
bool sig_ok;
#endif
bool async_probe_requested;
/* symbols that will be GPL-only in the near future. */
const struct kernel_symbol *gpl_future_syms;
const s32 *gpl_future_crcs;
unsigned int num_gpl_future_syms;
/* Exception table */
unsigned int num_exentries;
struct exception_table_entry *extable;
/* Startup function. */
int (*init)(void);
/* Core layout: rbtree is accessed frequently, so keep together. */
struct module_layout core_layout __module_layout_align;
struct module_layout init_layout;
/* Arch-specific module values */
struct mod_arch_specific arch;
unsigned long taints; /* same bits as kernel:taint_flags */
#ifdef CONFIG_GENERIC_BUG
/* Support for BUG */
unsigned num_bugs;
struct list_head bug_list;
struct bug_entry *bug_table;
#endif
#ifdef CONFIG_KALLSYMS
/* Protected by RCU and/or module_mutex: use rcu_dereference() */
struct mod_kallsyms __rcu *kallsyms;
struct mod_kallsyms core_kallsyms;
/* Section attributes */
struct module_sect_attrs *sect_attrs;
/* Notes attributes */
struct module_notes_attrs *notes_attrs;
#endif
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
char *args;
#ifdef CONFIG_SMP
/* Per-cpu data. */
void __percpu *percpu;
unsigned int percpu_size;
#endif
#ifdef CONFIG_TRACEPOINTS
unsigned int num_tracepoints;
tracepoint_ptr_t *tracepoints_ptrs;
#endif
#ifdef CONFIG_TREE_SRCU
unsigned int num_srcu_structs;
struct srcu_struct **srcu_struct_ptrs;
#endif
#ifdef CONFIG_BPF_EVENTS
unsigned int num_bpf_raw_events;
struct bpf_raw_event_map *bpf_raw_events;
#endif
#ifdef CONFIG_JUMP_LABEL
struct jump_entry *jump_entries;
unsigned int num_jump_entries;
#endif
#ifdef CONFIG_TRACING
unsigned int num_trace_bprintk_fmt;
const char **trace_bprintk_fmt_start;
#endif
#ifdef CONFIG_EVENT_TRACING
struct trace_event_call **trace_events;
unsigned int num_trace_events;
struct trace_eval_map **trace_evals;
unsigned int num_trace_evals;
#endif
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
unsigned int num_ftrace_callsites;
unsigned long *ftrace_callsites;
#endif
#ifdef CONFIG_LIVEPATCH
bool klp; /* Is this a livepatch module? */
bool klp_alive;
/* Elf information */
struct klp_modinfo *klp_info;
#endif
#ifdef CONFIG_MODULE_UNLOAD
/* What modules depend on me? */
struct list_head source_list;
/* What modules do I depend on? */
struct list_head target_list;
/* Destruction function. */
void (*exit)(void);
atomic_t refcnt;
#endif
#ifdef CONFIG_CONSTRUCTORS
/* Constructor functions. */
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
#ifdef CONFIG_FUNCTION_ERROR_INJECTION
struct error_injection_entry *ei_funcs;
unsigned int num_ei_funcs;
#endif
} ____cacheline_aligned __randomize_layout;
```
+ `fibdrv.c` 裡面存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景會需要呢?
+ fibdrv 本身是一個 `character device driver`,device driver 可以跑在 `kernel space` 以及 `user space`
> [Emulating a character device from userspace
](https://unix.stackexchange.com/questions/477117/emulating-a-character-device-from-userspace)
+ 而本次作業是使用在 kernel space,作業系統當中充斥著許多中斷信號(`interrupt`),使得 CPU 得要保存當前程式數據,避免資料遺失。
+ 為了測試在多執行緒下,`device driver` 執行程式的資料正確性,我做了一個簡單的實驗,程式碼可參考 [\[gist\] ambersun1234](https://gist.github.com/ambersun1234/a003dd264d8ec3fe95f60a4254cf6721)。執行結果如下所示
+ 
+ 可以看到在大部份的情況下皆正確,但是 highlight 的那一行資料是錯誤的。由此可知在多執行緒的狀況下
## 分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷
+ 
+ 可以看到不管在多少數字下,kernel space 傳遞到 user space 的時間基本上是固定的
> 但是這個僅代表 **傳遞時間為固定的**
+ 以下為實驗程式碼 [ambersun1234 - fibdrv(ee926c5)](https://github.com/ambersun1234/fibdrv/commit/ee926c5c46c3cbfd2c06d1f03676583141e3eec9)
```c
static ktime_t curTime;
/* calculate the fibonacci number at given offset */
static ssize_t fib_read(struct file *file,
char *buf,
size_t size,
loff_t *offset)
{
char kbuffer[100] = {0};
curTime = ktime_get();
struct u64 result = fibonacci((int) (*offset));
curTime = ktime_sub(ktime_get(), curTime);
sprintf(kbuffer, "%llu*18446744073709551616+%llu\n", result.msl,
result.lsl);
copy_to_user(buf, kbuffer, 100);
return 0;
}
/* write operation is skipped */
static ssize_t fib_write(struct file *file,
const char *buf,
size_t size,
loff_t *offset)
{
return ktime_to_ns(ktime_get());
}
```
## fibonacci 執行時間
+ 
+ 此程式碼為使用 [clz/ctz](https://en.wikipedia.org/wiki/Find_first_set#CLZ) 加速過的版本,可以看到隨著數字的增大,所耗費的時間也隨之增長
+ 以下為實驗程式碼 [ambersun1234 - fibdrv(b4e9b70)](https://github.com/ambersun1234/fibdrv/commit/b4e9b7054b9599e20f876186aa29aa0b26e1963b)
```c
static ktime_t curTime;
/* write operation is skipped */
static ssize_t fib_write(struct file *file,
const char *buf,
size_t size,
loff_t *offset)
{
curTime = ktime_get();
fibonacci((int) (*offset));
curTime = ktime_sub(ktime_get(), curTime);
return ktime_to_ns(curTime);
}
```
## 印出更大的數字
+ 原本的 fibdrv 只能夠印到 `92`,超過此數字會因為 `unsigned long long int` 的範圍限制而導致錯誤
+ 於是我將2個 ull 合併成一個 struct,用來儲存更大的數值
```c
struct u64 {
unsigned long long lsl;
unsigned long long msl;
};
```
+ 為了使 u64 的加減乘皆正確,必須撰寫相對應的大數處理
+ 以下程式碼是相對應的實做
+ 加法 :arrow_right: 需判斷 lsl 加法後會不會溢位,若會溢位則將 msl + 1
+ 減法 :arrow_right: 需考慮是否需要借位進行減法即可
+ 乘法 :arrow_right: 採用直式乘法的思考邏輯去做,不過是以 bit 為單位
```c
static struct u64 *adder(struct u64 *input1, struct u64 *input2)
{
struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL);
r->lsl = 0;
r->msl = 0;
unsigned long long diff = 0;
r->lsl = input1->lsl;
r->msl = input1->msl + input2->msl;
diff = ULLONG_MAX - r->lsl;
if (input2->lsl >= diff) {
r->lsl += input2->lsl;
r->msl += 1;
} else {
r->lsl += input2->lsl;
}
return r;
}
static struct u64 *subtracter(struct u64 *input1, struct u64 *input2)
{
struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL);
if (input1->lsl < input2->lsl) {
unsigned long long mycarry = ULLONG_MAX;
r->lsl = mycarry + input1->lsl - input2->lsl + 1;
r->msl = input1->msl - input2->msl - 1;
} else {
r->lsl = input1->lsl - input2->lsl;
r->msl = input1->msl - input2->msl;
}
return r;
}
static struct u64 *multiplier(struct u64 *input1, struct u64 *input2)
{
struct u64 *r = (struct u64 *) kmalloc(sizeof(struct u64), GFP_KERNEL);
r->lsl = 0;
r->msl = 0;
size_t width = 8 * sizeof(unsigned long long);
// 8 bits * how many bytes
for (size_t i = 0; i < width; i++) {
if ((input2->lsl >> i) & 0x1) {
struct u64 tmp;
r->msl += input1->msl << i;
tmp.lsl = (input1->lsl << i);
tmp.msl = i == 0 ? 0 : (input1->lsl >> (width - i));
r = adder(r, &tmp);
}
}
for (size_t i = 0; i < width; i++) {
if ((input2->msl >> i) & 0x1) {
r->msl += (input1->lsl << i);
}
}
return r;
}
```
+ 為了要將 fibonacci 加速,使用 `clz` 搭配 `fast doubling` 進行實做
```c
static int myclz(int input)
{
// use binary search method to check
int count = 0;
if ((input & 0xFFFF0000) == 0) {
input <<= 16;
count += 16;
} // 1111 1111 1111 1111
if ((input & 0xFF000000) == 0) {
input <<= 8;
count += 8;
} // 1111 1111
if ((input & 0xF0000000) == 0) {
input <<= 4;
count += 4;
} // 1111
if ((input & 0xC0000000) == 0) {
input <<= 2;
count += 2;
} // 1100
if ((input & 0x80000000) == 0) {
input <<= 0;
count += 1;
} // 1000
return count;
}
```
```c
static struct u64 fibonacci(int input)
{
unsigned int msb = myclz(input);
unsigned int mask = (1 << (31 - msb - 1));
struct u64 cur = {.msl = 0, .lsl = 1}, next = {.msl = 0, .lsl = 1};
struct u64 mul = {.msl = 0, .lsl = 2}, zero = {.msl = 0, .lsl = 0};
/* fast doubling formula
* f(2k) = f(k)[2f(k + 1) - f(k)]
* f(2k + 1) = f(k + 1)^2 + f(k)^2
*/
if (input == 0)
return zero;
if (input >= 1 && input <= 2)
return cur;
while (mask > 0) {
if (mask & input) {
// bit = 1: fast doubling then iterate 1
struct u64 *t0, *t1, *t2, *temp, *temp2;
// t0 = cur * (2 * next - cur);
// t1 = next * next + cur * cur;
temp = multiplier(&next, &mul);
temp = subtracter(temp, &cur);
t0 = multiplier(&cur, temp);
t2 = &next;
temp = multiplier(&next, t2);
t2 = &cur;
temp2 = multiplier(&cur, t2);
t1 = adder(temp, temp2);
cur = *t0;
next = *t1;
// iterate 1
temp = adder(&cur, &next);
cur = next;
next = *temp;
} else {
// bit = 0: fast doubling
struct u64 *t0, *t1, *t2, *temp, *temp2;
// t0 = cur * (2 * next - cur);
// t1 = next * next + cur * cur;
temp = multiplier(&next, &mul);
temp = subtracter(temp, &cur);
t0 = multiplier(&cur, temp);
t2 = &next;
temp = multiplier(&next, t2);
t2 = &cur;
temp2 = multiplier(&cur, t2);
t1 = adder(temp, temp2);
cur = *t0;
next = *t1;
}
mask >>= 1;
}
return cur;
}
```
+ 而除了自己實做 `clz` 之外,linux 也有包裝好的 `ffs` 可使用
+ 實做結果正確性可使用 `make verify` & `make ktest` 進行驗證
###### tags: `linux2020`