2022q1 Homework3 (fibdrv)

--- tags: linux2022 --- # 2022q1 Homework3 (fibdrv) contributed by < [ShienF](https://github.com/ShienF/fibdrv) > > [](https://hackmd.io/@sysprog/linux2022-fibdrv) ## 實驗環境 ```shell $ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Stepping: 9 CPU MHz: 2800.000 CPU max MHz: 3800.0000 CPU min MHz: 800.0000 BogoMIPS: 5599.85 Virtualization: VT-x L1d cache: 128 KiB L1i cache: 128 KiB L2 cache: 1 MiB L3 cache: 6 MiB NUMA node0 CPU(s): 0-7 ``` ## 自我檢查清單 - [ ] 研讀 ==Linux 效能分析的提示== 描述，在自己的實體電腦運作 GNU/Linux，做好必要的設定和準備工作 $\to$ 從中也該理解為何不希望在虛擬機器中進行實驗; - [ ] 研讀上述費氏數列相關材料 (包含論文)，摘錄關鍵手法，並思考 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令對 Fibonacci 數運算的幫助。請列出關鍵程式碼並解說 - [ ] 複習 C 語言 [數值系統](https://hackmd.io/@sysprog/c-numerics) 和 [bitwise operation](https://hackmd.io/@sysprog/c-bitwise)，思考 Fibonacci 數快速計算演算法的實作中如何減少乘法運算的成本; - [ ] 研讀 [KYG-yaya573142 的報告](https://hackmd.io/@KYWeng/rkGdultSU)，指出針對大數運算，有哪些加速運算和縮減記憶體操作成本的舉措？ - [ ] `lsmod` 的輸出結果有一欄名為 `Used by`，這是 "each module's use count and a list of referring modules"，但如何實作出來呢？模組間的相依性和實際使用次數 ([reference counting](https://en.wikipedia.org/wiki/Reference_counting)) 在 Linux 核心如何追蹤呢？ > 搭配閱讀 [The Linux driver implementer’s API guide » Driver Basics](https://www.kernel.org/doc/html/latest/driver-api/basics.html) - [ ] 注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣，什麼場景中會需要呢？撰寫多執行緒的 userspace 程式來測試，觀察 Linux 核心模組若沒用到 mutex，到底會發生什麼問題。嘗試撰寫使用 [POSIX Thread](https://en.wikipedia.org/wiki/POSIX_Threads) 的程式碼來確認。 ## Makefile ```cmake= CONFIG_MODULE_SIG = n TARGET_MODULE := fibdrv obj-m := $(TARGET_MODULE).o ccflags-y := -std=gnu99 -Wno-declaration-after-statement KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) GIT_HOOKS := .git/hooks/applied all: $(GIT_HOOKS) client $(MAKE) -C $(KDIR) M=$(PWD) modules $(GIT_HOOKS): @scripts/install-git-hooks @echo clean: $(MAKE) -C $(KDIR) M=$(PWD) clean $(RM) client out load: sudo insmod $(TARGET_MODULE).ko #install module unload: sudo rmmod $(TARGET_MODULE) || true >/dev/null #remove module client: client.c $(CC) -o $@ $^ PRINTF = env printf PASS_COLOR = \e[32;01m NO_COLOR = \e[0m pass = $(PRINTF) "$(PASS_COLOR)$1 Passed [-]$(NO_COLOR)\n" check: all $(MAKE) unload $(MAKE) load sudo ./client > out $(MAKE) unload @diff -u out scripts/expected.txt && $(call pass) @scripts/verify.py ``` - reference [makefile說明](https://hackmd.io/@sysprog/SySTMXPvl) - 行尾不能有空格 - target: prerequisites \<tab>instructions - = recursive assignment (?) - := immediate assignment - $(MAKE) 預設 make, 防止 user 將 make 指令修改 - $(MAKE) -C $(KDIR) M=\$(PWD) modules used to invoke make to build kernel modules located in the current working directory $(PWD) using the kernel source tree specified by $(KDIR) - obj-m, ccflags-y When the make command is invoked with the target modules, the kernel build system will use the value of obj-m to determine which modules to build. In this case, it will attempt to build the module specified by $(TARGET_MODULE).o. the kernel build system will use the value of ccflags-y to pass additional compiler flags to the compiler when building the kernel module ## 設置環境 * 依據作業要求設置環境，在編譯並測試時遇到以下錯誤： ```shell $ cd fibdrv $ make check Git hooks are installed successfully. cc -o client client.c make: cc: Command not found make: *** [Makefile:28: client] Error 127 ``` 檢查錯誤訊息後發現可能是沒有安裝 gcc，但跑完安裝指令後發現沒有安裝與更新任何東西 ```shell $ sudo apt update $ sudo apt install build-essential 0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded. ``` 查看 gcc 版本，發現確實有安裝 gcc ```shell $ gcc --version gcc (Ubuntu 9.4.0-1ubuntu1~20.04) 9.4.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` 所以再查查看 `make: cc: Command not found` 這種錯誤，發現可能沒有建立好 symbolic link ```shell $ make CC=gcc ``` 再編譯並測試就可以了 ```shell $ make check Passed [-] f(93) fail input: 7540113804746346429 expected: 12200160415121876738 ``` * 觀察產生的 `fibdrv.ko` 核心模組 ```shell $ modinfo fibdrv.ko ``` 掛載模組及檢查是否掛載成功 ```shell $ sudo insmod fibdrv.ko ``` - /dev => hardwares and devices, eg. disks, cpu, network, fibonacci(which we load)... ```shell $ ls -l /dev/fibonacci crw------- 1 root root 235, 0 一 5 17:10 /dev/fibonacci ##first c means characters device ##wner, group, others permision ##first root is owner ##second root is group ##235 means size in bytes ``` - the /sys directory is a virtual filesystem that provides an interface to the kernel, allowing userspace programs to interact with various aspects of the system's hardware, drivers, and kernel configuration. ```shell $ cat /sys/class/fibonacci/fibonacci/dev 235:0 ``` 關鍵在 fibdrv.c 中的函式 `register_chrdev()`, 用來註冊 kernel device major and minor number, major number 代表 device type, minor number 代表在此 type 中, fibdrv 這個 device - 查看 module 的版本，和 fibdrv.c 透過 `MODULE_VERSION` 所指定的版本號碼相同 ```shell $ cat /sys/module/fibdrv/version ``` - 計算此 module [reference counting](https://en.wikipedia.org/wiki/Reference_counting) (storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others.), 意指此 module 被多少 device referenced, 若為 0, 代表可以安全釋放此 kernel module, 因為沒有被其他 device 使用 ```shell $ lsmod | grep fibdrv fibdrv 16384 0 ## size $ cat /sys/module/fibdrv/refcnt 0 ``` ## Iterative 方式 * 修正原程式碼 VLA 無法在 linux kernel 的情況原程式碼： ```c static long long fib_sequence(long long k) { /* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */ long long f[k + 2]; f[0] = 0; f[1] = 1; for (int i = 2; i <= k; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[k]; } ``` 迭代 i 時只需記錄前兩項數字，不需要把 $F(0)$ 到 $F(k)$ 都保存起來，因此修改後的程式碼： ```c static long long fib_sequence(long long k) { if(k < 2) { return k; } long long k1 = 0, k2 = 1, sum = 0; for(int i = 2; i <= k; i++) { sum = k1 + k2; k1 = k2; k2 = sum; } return k2; } ``` ## 探討使用者層級程式 `client.c` 與 fibdrv 的互動在` Makefile` 中，`check` 執行了下述： ```cmake check: all $(MAKE) unload #unload kernel $(MAKE) load sudo ./client > out #execute client.c and output to out file $(MAKE) unload @diff -u out scripts/expected.txt && $(call pass) @scripts/verify.py ``` 可以看出 `out`（提供部分內容如下）的內容是由 `client.c` 開始執行的，於是下述將從 `client.c` 開始 trace。 ``` ... Writing to /dev/fibonacci, returned the sequence 1 Writing to /dev/fibonacci, returned the sequence 1 Writing to /dev/fibonacci, returned the sequence 1 Writing to /dev/fibonacci, returned the sequence 1 Reading from /dev/fibonacci at offset 0, returned the sequence 0. Reading from /dev/fibonacci at offset 1, returned the sequence 1. Reading from /dev/fibonacci at offset 2, returned the sequence 1. Reading from /dev/fibonacci at offset 3, returned the sequence 2. Reading from /dev/fibonacci at offset 4, returned the sequence 3. Reading from /dev/fibonacci at offset 5, returned the sequence 5. Reading from /dev/fibonacci at offset 6, returned the sequence 8. ... ``` ### Linux system call in [`client.c`](https://github.com/ShienF/fibdrv/blob/master/client.c) * `open()` 根據[手冊](https://man7.org/linux/man-pages/man2/open.2.html)定義如下： > int open(const char *pathname, int flags); 從 `pathname` 中開啟 `flags` 相對應權限的檔案。若成功開啟檔案，回傳的值為 file descriptor，根據[參考](https://www.computerhope.com/jargon/f/file-descriptor.htm)： > When a process makes a successful request to open a file, the kernel returns a file descriptor which points to an entry in the kernel's global file table. * `write()` 根據[手冊](https://man7.org/linux/man-pages/man2/write.2.html)定義如下： > ssize_t write(int fd, const void *buf, size_t count); 將 `buf` 裡的內容寫入 `count` bytes 到 `fd` 這個 file table 的位置。若成功寫入，回傳的值為成功寫入多少 bytes。 * `lseek()` 根據[手冊](https://man7.org/linux/man-pages/man2/lseek.2.html)定義如下： > off_t lseek(int fd, off_t offset, int whence); 根據 `whence` 相應的設定方式，及 `offset` 去更改 `fd` 裡的讀寫檔案游標位置（read/write file offset）。若成功更改，回傳值為現在的游標位置（從檔案開使到現在這個位置的 byte 數）。 * `read()` 根據[手冊](https://man7.org/linux/man-pages/man2/read.2.html)定義如下： > ssize_t read(int fd, void *buf, size_t count); 從 `fd` 內讀取 `count` bytes 到 `buf` 裡，並且增加相應的 file offset。若成功讀取，回傳值為實際讀取了多少 bytes。顯然地 `out` 的內容並非上述 `client.c` 中的 system call 直接輸出的，下述為閱讀作業說明後得到`out` 來源的解答。 ### [`fibdrv.c`](https://github.com/ShienF/fibdrv/blob/master/fibdrv.c) 如何連動 `client.c` 中的 system call? 根據[作業說明](https://hackmd.io/@sysprog/linux2022-fibdrv#K04-fibdrv)其中《`fibdrv` 核心模組內部》及《Linux Virtual File System 介面》兩節，透過 [Linux Virtual File System 介面](https://www.win.tue.nl/~aeb/linux/lk/lk-8.html)，宣告一個 file_operations 的資料型態（如下程式碼），並設定一些對應到 VFS system call 的函式。當在使用者模式程式（`client.c`）中呼叫到 system call（`read()`,` write()`...）時，藉由 VFS 將其對應到在 file_operations 裡設定的函式（`fib_read()`, `fib_write()`...）。 ```c const struct file_operations fib_fops = { .owner = THIS_MODULE, .read = fib_read, .write = fib_write, .open = fib_open, .release = fib_release, .llseek = fib_device_lseek, }; ``` :::warning TODO: #define FIB_DEV "/dev/fibonacci" 意義？註冊到 kernel? 說明實際 fib 如何輸出 fib_read 為何需要用不到的參數？ lseek接到llseek? offset又是如何傳遞到fib_read裡的？ ::: ## Fast doubling 根據[費氏數列分析](https://hackmd.io/@sysprog/fibonacci-number)一文，探討到當數列來到大數區，公式解當中本來我們假定 $O(1)$ 的加法、乘法都不再是常數時間，而是與數字的位元數有關。因此只需 $O(logn)$ 次遞迴呼叫的演算法：Fast Doubling，在大數區的計算上有其優勢。根據[作業要求](https://hackmd.io/@sysprog/linux2022-fibdrv#-%E8%B2%BB%E6%B0%8F%E6%95%B8%E5%88%97) Fast Doubling 的遞迴式子為下述： > $F(2k) = F(k)[2F(k+1)-F(k)]$ > $F(2k+1) = F(k+1)^2 + F(k)^2$ 由式子可知若知道 $F(k)$ 及 $F(k+1)$，便可求出 $F(2k)$ 及 $F(2k+1)$。舉個例子說明，若要求 F(10)，需要遞迴幾次才能得出呢？ ```graphviz digraph fib{ S [label="F(0)=0"] 0 [label="F(0)"] 1 [label="F(1)=1", style=filled] 2 [label="F(2)=1", style=filled] 3 [label="F(3)=2"] 4 [label="F(4)"] 5 [label="F(5)=5", style=filled] 6 [label="F(6)=8"] 7 [label="F(7)"] 8 [label="F(8)"] 9 [label="F(9)"] 10 [label="F(10)=55", style=filled] 11 [label="F(11)=89"] S->0 S->1 1->2 1->3 2->4 2->5 3->6 3->7 4->8 4->9 5->10 5->11 } ``` 從 $F(0)$ 開始往下找 $F(2*0)=F(0)$ 及 $F(2*0+1)=F(1)$，$F(1)$ 再往下找 $F(2*1)=F(2)$ 及 $F(2*1+1)=F(3)$，$F(2)$ 往下找...$F(5)$ 往下找得到 $F(2*5)=F(10)$ 及 $F(2*5+1)=F(11)$，共找了 4 次。對應的[虛擬碼](https://hackmd.io/@sysprog/linux2020-fibdrv#-fibdrv-%E5%8F%AF%E8%BC%B8%E5%87%BA-Fibonacci-%E6%95%B8%E5%88%97%E7%9A%84-Linux-%E6%A0%B8%E5%BF%83%E6%A8%A1%E7%B5%84)如下: ```python Fast_Fib(n) a = 0; b = 1; // m = 0 for i = (number of binary digit in n) to 1 t1 = a*(2*b - a); t2 = b^2 + a^2; a = t1; b = t2; // m *= 2 if (current binary digit == 1) t1 = a + b; // m++ a = b; b = t1; return a; ``` e.g. 求 $F(10)$，我們知道 $10_{10}=1010_{2}$： | i | start | 4 | 3 | 2 | 1 | result | |---|-------|----------|----------|----------|---------|--------| | n | - | ==1==010 | 1==0==10 | 10==1==0 | 101==0== | - | |F(m) | F(0) | F(0*2+==1==) | F(1*2+==0==) | F(2*2+==1==) | F(5*2+==0==) | F(10) | | a | 0 | 1 | 1 | 5 | 55 | 55 | | b | 1 | 1 | 2 | 8 | 89 | - | 實作如下： ```c long long fib_sequence_fastb(long long k) { if (k < 2) { return k; } int bits = 0; for (unsigned int i = k; i; bits++, i >>= 1); long long k1 = 0, k2 = 1; for (unsigned int i = bits; i; i -= 1) { long long Tk1, Tk2; Tk1 = k1 * (2 * k2 - k1); Tk2 = k2 * k2 + k1 * k1; k1 = Tk1; k2 = Tk2; if (k & (1UL << (i-1))) { Tk1 = k1 + k2; k1 = k2; k2 = Tk1; } } return k1; } ``` ## 效能分析 ### 分析前準備參考作業說明及 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU#%E6%8E%92%E9%99%A4%E5%B9%B2%E6%93%BE%E6%95%88%E8%83%BD%E5%88%86%E6%9E%90%E7%9A%84%E5%9B%A0%E7%B4%A0) <排除干擾效能分析的因素> 一節 * 獨立 CPU 給特定程式使用 ```shell $ cd /etc/default $ sudo vim grub revise: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash isolcpus=1" #core name will be 0~7 if possesses 8 cores reboot $ cd /sys/devices/system/cpu $ cat isolated >> 1 $ taskset -cp 1 #check the affinity of PID=1 >> 0,2-7 or $ taskset -p 1 >> pid 1's current affinity mask: fd #0xfd means 11111101 ``` * 限定程式在特定 CPU 執行 ```shell $ sudo taskset -c 1 ./client ``` 因為 clent 只有在 fibdrv 掛載時才能被執行, 因此這個指令將會寫在一個專門量測的 bash 檔 * 抑制 [address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) (ASLR) 查看 [Linux randomize_va_space](https://docs.kernel.org/admin-guide/sysctl/kernel.html?highlight=randomize_va_space#randomize-va-space): 0: Turn the process address space randomization off. 1: 2: ...full randomization ```shell $ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" ``` * 設定 scaling_governor 為 "performance" 這個級別(意指最高速運行) ```shell $ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor >> powersave >> powersave >> powersave >> powersave >> powersave >> powersave >> powersave >> powersave ``` 同樣將以下指令放入專門量測的 bash 檔 ```shell for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor do echo performance > ${i} done ``` * 針對 Intel 處理器，關閉 turbo boost ```shell $ cat /sys/devices/system/cpu/intel_pstate/no_turbo >> 0 $ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` ### 時間量測方式參考作業說明及 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU#%E9%87%8F%E6%B8%AC%E6%99%82%E9%96%93%E7%9A%84%E6%96%B9%E5%BC%8F) <量測時間的方式> 一節 * User space - include <time.h> 使用 [`clock_gettime`](https://man7.org/linux/man-pages/man2/clock_getres.2.html) 來取得時間 ```c struct timespec start, end; for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); clock_gettime(CLOCK_MONOTONIC, &start); sz = read(fd, buf, 1); clock_gettime(CLOCK_MONOTONIC, &end); printf("%lld\n", (long long)((end.tv_sec * 1e9 + end.tv_nsec)- \ (start.tv_sec * 1e9 + start.tv_nsec))); } ``` * Kernel space - 使用 [ktime相關API](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html) 因為 fib_write() 沒有用到, 因此寫在此函式裡 ```c static ssize_t fib_write(struct file *file, const char *buf, size_t size, loff_t *offset) { static ktime_t kt; long long res; switch (size) { case 0: kt = ktime_get(); res = fib_sequence_vla(*offset); kt = ktime_sub(ktime_get(), kt); break; case 1: kt = ktime_get(); res = fib_sequence(*offset); kt = ktime_sub(ktime_get(), kt); break; case 2: kt = ktime_get(); res = fib_sequence_fastd(*offset); kt = ktime_sub(ktime_get(), kt); break; default: return 0; } return (ssize_t)ktime_to_ns(kt); } ``` 在另個 client 程式裡把在 kernel 所花時間存下來 ```c long long fib, fib_fastd, fib_vla; char *filename = "performance.txt"; FILE *fp = fopen(filename, "w"); for (int i = 0; i <= offset; i++) { lseek(fd, i, SEEK_SET); fib_vla = write(fd, buf, 0); //fib VLA fib = write(fd, buf, 1); //fib fib_fastd = write(fd, buf, 2); //fib_fastdoubling fprintf(fp, "%d %lld %lld %lld\n", i, fib_vla, fib, fib_fastd); } ``` * 準備 gnuplot, 建立專門量測的 bash 檔及修改 Makefile 透過 [gnuplot](https://hackmd.io/@sysprog/Skwp-alOg) 讀進先前存下來的時間 ```shell set title "Iterative vs Fast-doubling" set xlabel "n-th" set ylabel "ns" set terminal png font " Verdana,12 " set output "plot.png" set xtics 0 ,10 ,92 set key left plot \ "performance.txt" using 1:2 with linespoints linewidth 2 title "VLA", \ "performance.txt" using 1:3 with linespoints linewidth 2 title "iterative", \ "performance.txt" using 1:4 with linespoints linewidth 2 title "fast-doubling", \ ``` 新增 Makefile 內容 ```cmake client_m: client_measurement.c $(CC) -o $@ $^ plot: all sh performance.sh ``` 專門量測的 bash 檔 ```shell #!/bin/sh #This script adjust some system settings to avoid interference while analyzing the performance of fibdrv. #2022/04/19 CPUID=1 ORIG_ASLR=`cat /proc/sys/kernel/randomize_va_space` ORIG_GOV=`cat /sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor` ORIG_TURBO=`cat /sys/devices/system/cpu/intel_pstate/no_turbo` # adjust system setting sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space" sudo sh -c "echo performance > /sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor" sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo" # analyze the performance make client_m make unload make load sudo taskset -c 1 ./client_m gnuplot plot.gp make unload # restore system setting sudo sh -c "echo $ORIG_ASLR > /proc/sys/kernel/randomize_va_space" sudo sh -c "echo $ORIG_GOV > //sys/devices/system/cpu/cpu$CPUID/cpufreq/scaling_governor" sudo sh -c "echo $ORIG_TURBO > /sys/devices/system/cpu/intel_pstate/no_turbo" ``` * VLA vs Iterative vs Fast-Doubling 下指令 $ make plot 繪製 ![](https://i.imgur.com/nVdi9yi.png) 量測出來的時間十分不穩定，因此透過統計手法除去極端值 * 核心計算和傳遞到 userspace 的時間開銷 ![](https://i.imgur.com/CcpGWsL.png) :::warning TODO: 閱讀 affinity 的優點統計手法除去極端值為何不希望在虛擬機器中進行實驗? ::: ## 大數問題 `fib_sequence()` 的參數 `k` 在超過 92 時，輸出的答案皆相同，那是因為 `fibdrv.c` 裡設定最大數僅為 92，如此設置的原因為 `fib_sequence(93)` 的真實數值會 overflow，詳細解說[在此](https://hackmd.io/@KYWeng/rkGdultSU#F92-%E4%BB%A5%E5%BE%8C%E7%9A%84%E6%95%B8%E5%80%BC%E9%8C%AF%E8%AA%A4%E7%9A%84%E5%8E%9F%E5%9B%A0)。作業說明中提到兩種大數相加等其他運算的實作方式，一種是將大數拆成前後半兩個 long long 型別的數，如此可以儲存 16 bytes 的數，最大有號數可為 $2^{127} - 1_{10}$。另一種是將大數$_{10}$ 以 string 的形式儲存，如此就可以表達會造成 overflow 的數值。 * 在此研讀 [AdrianHuang 的大數運算](https://github.com/AdrianHuang/fibdrv) ：將大數$_{10}$ 以 string 的形式儲存首先來看在 `xs.h` 大數的結構： ```c= #define MAX_STR_LEN_BITS (54) #define MAX_STR_LEN ((1UL << MAX_STR_LEN_BITS) - 1) #define LARGE_STRING_LEN 256 #define xs_literal_empty() \ (xs) { .space_left = 15 } #define xs_tmp(x) xs_new(&xs_literal_empty(), x) typedef union { /* allow strings up to 15 bytes to stay on the stack * use the last byte as a null terminator and to store flags * much like fbstring: * https://github.com/facebook/folly/blob/master/folly/docs/FBString.md */ char data[16]; struct { uint8_t filler[15], /* how many free bytes in this stack allocated string * same idea as fbstring */ space_left : 4, //剩多少 bytes /* if it is on heap, set to 1 */ is_ptr : 1, is_large_string : 1, flag2 : 1, flag3 : 1; }; /* heap allocated */ struct { char *ptr; /* supports strings up to 2^MAX_STR_LEN_BITS - 1 bytes */ size_t size : MAX_STR_LEN_BITS,//十進制位數, 一位數代表一個 character = 1 byte /* capacity is always a power of 2 (unsigned)-1 */ capacity : 6; /* the last 4 bits are important flags */ }; } xs; ... static inline char *xs_data(const xs *x) { if (!xs_is_ptr(x))//stack return (char *) x->data; if (xs_is_large_string(x)) {//heap and large string return (char *) (x->ptr + 4); } return (char *) x->ptr;//heap } ... static inline void xs_set_ref_count(const xs *x, int val) { *((int *) ((size_t) x->ptr)) = val; } ``` xs 實際為 union，其內容為其中資料結構佔最多 bytes 的結構。另外可以從註解判斷 xs 裡的第一個 struct 裡處理 stack 有關的資料，第二個stuct處理heap有關的資料。搭配創建大數操作來分析： ```c= #include "xs.h" static void xs_allocate_data(xs *x, size_t len, bool reallocate)//heap { size_t n = 1 << x->capacity;//配置比原 string 還大的空間 /* Medium string */ if (len < LARGE_STRING_LEN) { x->ptr = reallocate ? krealloc(x->ptr, n, GFP_KERNEL) : kmalloc(n, GFP_KERNEL);//address in heap return; } /* * Large string */ x->is_large_string = 1; /* The extra 4 bytes are used to store the reference count */ x->ptr = reallocate ? krealloc(x->ptr, n + 4, GFP_KERNEL) : kmalloc(n + 4, GFP_KERNEL); xs_set_ref_count(x, 1);//set ->ptr int part? } xs *xs_new(xs *x, const void *p) { *x = xs_literal_empty(); size_t len = strlen(p) + 1; if (len > 16) {//heap x->capacity = ilog2(len) + 1;//二進制位數, + 1 為 '\0' 的位數 x->size = len - 1;//十進制位數, 一位數代表一個 character = 1 byte x->is_ptr = true;//in heap xs_allocate_data(x, x->size, 0); memcpy(xs_data(x), p, len); } else {//stack memcpy(x->data, p, len); x->space_left = 15 - (len - 1); } return x; } ``` 第 28 行初始化 xs，設置其在 stack 裡有 15 bytes 可使用。第 30 行開始，如果大數$_{10}$ 超過 16 位，將大數透過 `xs_allocate_data` 存在 heap 裡，且透過log2.h 裡以 macro 形式用 bitwise operation 算出的 logrithem: ilog2 算出 capacity（2 進制佔多少 bits）。在動態配置記憶體時，使用 kernel API-[kmalloc](https://www.kernel.org/doc/htmldocs/kernel-api/API-kmalloc.html)，其中參數 GFP_KERNEL 意義為 Allocate normal kernel ram. May sleep. 若尚未超過 16 位，直接放進 stack 裡。若用 iterative 方式實作 fibonacci 數列，需要有針對大數加法的函式： ```c= #define XOR_SWAP(a, b, type) \ do { \ type *__c = (a); \ type *__d = (b); \ *__c ^= *__d; \ *__d ^= *__c; \ *__c ^= *__d; \ } while (0) static void __swap(void *a, void *b, size_t size) { if (a == b) return; switch (size) { case 1: XOR_SWAP(a, b, char); break; case 2: XOR_SWAP(a, b, short); break; case 4: XOR_SWAP(a, b, unsigned int); break; case 8: XOR_SWAP(a, b, unsigned long); break; default: /* Do nothing */ break; } } static void reverse_str(char *str, size_t n) { for (int i = 0; i < (n >> 1); i++) __swap(&str[i], &str[n - i - 1], sizeof(char)); } static void string_number_add(xs *a, xs *b, xs *out) { char *data_a, *data_b; size_t size_a, size_b; int i, carry = 0; int sum; /* * Make sure the string length of 'a' is always greater than * the one of 'b'. */ if (xs_size(a) < xs_size(b)) __swap((void *) &a, (void *) &b, sizeof(void *)); data_a = xs_data(a); data_b = xs_data(b); size_a = xs_size(a); size_b = xs_size(b); reverse_str(data_a, size_a); reverse_str(data_b, size_b); char buf[size_a + 2]; for (i = 0; i < size_b; i++) { sum = (data_a[i] - '0') + (data_b[i] - '0') + carry; buf[i] = '0' + sum % 10; carry = sum / 10; } for (i = size_b; i < size_a; i++) { sum = (data_a[i] - '0') + carry; buf[i] = '0' + sum % 10; carry = sum / 10; } if (carry) buf[i++] = '0' + carry; buf[i] = 0; reverse_str(buf, i); /* Restore the original string */ reverse_str(data_a, size_a); reverse_str(data_b, size_b); if (out) *out = *xs_tmp(buf);//將buf存入out } ``` 第 1 行開始的部份，若要交換兩個指標的內容，可以透過做 3 次 `xor` 來達成。第 34 行開始，透過迭代一半字串長可以反轉字串。核心即為兩個數字當作 string 來加法，透過將兩數反轉，將 character 平移成真實數值，從最高位開始相加，並處理進位問題，最後再反轉回來。 * 實作 [AdrianHuang 的大數運算](https://github.com/AdrianHuang/fibdrv)：將大數$_{10}$ 以 string 的形式儲存首先如果要把大數傳回 user space，因為在 user space 配置的記憶體位置地址，在 kernel space 裡去讀取有可能會得到不同的地址。因此需要靠 copy_to_user() 將大數傳回 user space。一開始我將 xs 整合進 fib_sequence() 時犯了一個錯誤： ```c= int fib_sequence_bigNum(long long k, char __user *buf) { char *result = (char *)kmalloc(sizeof(char), GFP_KERNEL); int length = 1; if(k == 0) { *result = '0'; } else if(k == 1) { *result = '1'; } else { xs *k1 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL), *k2 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL),\ *sum = (xs *)kmalloc(sizeof(xs), GFP_KERNEL);// *k1 = *xs_tmp("0"); *k2 = *xs_tmp("1"); // sum = xs_tmp("0"); for(int i = 2; i <= k; i++) { string_number_add(k1, k2, sum); *k1 = *xs_new(k1, xs_data(k2)); *k2 = *xs_new(k2, xs_data(sum)); } length = xs_size(k2); result = xs_data(k2); printk(KERN_INFO "result = %s", xs_data(k2)); xs_free(k1); xs_free(k2); xs_free(sum); } copy_to_user(buf, result, length); return length; } ``` 在第 3 行我只配置了一個 byte 的空間給 `result`，但當第 34 行回傳的數值超過 8 個 bytes 時 buffer 看起來裝了其他東西： ``` ... Reading from /dev/fibonacci at offset 81, returned the sequence 3788906237314390�. Reading from /dev/fibonacci at offset 82, returned the sequence 6130579072161159. Reading from /dev/fibonacci at offset 83, returned the sequence 9919485309475549. ... ``` 改寫了一下把 result 這個變數拿掉，`out` 的答案就是對的了： ```c int fib_sequence_bigNum(long long k, char __user *buf) { int length = 1; if(k == 0) { char *tmp = "0"; copy_to_user(buf, tmp, length); } else if(k == 1) { char *tmp = "1"; copy_to_user(buf, tmp, length); } else { xs *k1 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL), *k2 = (xs *)kmalloc(sizeof(xs), GFP_KERNEL),\ *sum = (xs *)kmalloc(sizeof(xs), GFP_KERNEL);// *k1 = *xs_tmp("0"); *k2 = *xs_tmp("1"); for(int i = 2; i <= k; i++) { string_number_add(k1, k2, sum); *k1 = *xs_new(k1, xs_data(k2)); *k2 = *xs_new(k2, xs_data(sum)); } length = xs_size(k2); // printk(KERN_INFO "result = %s", xs_data(k2)); copy_to_user(buf, xs_data(k2), length); xs_free(k1); xs_free(k2); xs_free(sum); } return length; } ``` :::info 此法以十進制思考，若改以二進制執行(eg.GCD)，十進制顯示，可以改善(做實驗) ::: * 大數以二進制儲存，實作時以二進制計算，輸出再以十進制表示。==好處？？== 在此參考 [KYG-yaya573142](https://hackmd.io/@KYWeng/rkGdultSU) 及 [cwl0429](https://hackmd.io/@cwl0429/fibdrv) 的大數運算來實作 * 大數${_2}$ 結構為動態的 array，array[0] 放大數${_2}$ 從 lsb 開始算的前 32 bits，array[1] 放第前 33~64 bits，...以此類推，例如將以 $2^{33} + 1$ 2進制表示為:00000000 00000000 00000000 00000010 00000000 00000000 00000000 00000001 其存放在大數結構如下示意圖： ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|0|0|1"] a1[label = "0|0|0|0|...|0|0|1|0"] node[shape=none] array0[label = "array[0] total 32 bits"] array0 -> a0 array1[label = "array[1] total 32 bits"] array1 -> a1 } ``` * 若大數為負數，則大數裡的動態 array 存放的為 $-(大數)$，由 sign 為 0 表示負數。 * 大數${_2}$ 結構 ```c /* number[size - 1] = msb 32 bits, number[0] = lsb 32 bits*/ typedef struct _bn { unsigned *int number; /* the length of array: number */ unsigned int size; /* 0 means positive number, 1 means negative number */ int sign; } bn; ``` * 大數${_2}$ 結構轉十進制字串以例子說明轉換方式: bn a 的 a->number[0] 為 0...001101 = $13_{10}$，a->size 為 1，十進制位數為 13，對應結構如下： ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"] a1[label = "0|0|0|0|0|0|0|0|0|0|0|0|\\0"] node[shape=none] array0[label = "a->number[0]"] array0 -> a0 array1[label = "十進位以 string 表示"] array1 -> a1 } ``` 從 a->number[0] 的 msb 開始往右掃，若為1則carry為1，並更新整個string：對每個string的位元更新s[j] += s[j] - '0' + carry，更新後的 s[j] 若 > '10',則 carry 再為 1,讓輪到 s[j-1] 更新時的 carry 為 1,達到進位的效果, 此時s[j] -= '10' 實際流程： ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"] a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>0 + 0 + 1 = 1|\\0"] node[shape=none] array0[label = "current bit"] array0 -> a0:1 array1[label = "s[j]"] array1 -> a1:2 } ``` ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"] a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>1 + 1 + 1 = 3|\\0"] node[shape=none] array0[label = "current bit"] array0 -> a0:2 array1[label = "s[j]"] array1 -> a1:2 } ``` ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"] a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>3 + 3 + 0 = 6|\\0"] node[shape=none] array0[label = "current bit"] array0 -> a0:3 array1[label = "s[j]"] array1 -> a1:2 } ``` ```graphviz digraph{ node[shape="record"] a0[label = "0|0|0|0|...|0|<1>1|<2>1|<3>0|<4>1"] a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0|<2>6 + 6 + 1 = 13|\\0"] node[shape=none] array0[label = "current bit"] array0 -> a0:4 array1[label = "s[j]"] array1 -> a1:2 } ``` 因為 s[j]溢位: ```graphviz digraph{ node[shape="record"] a1[label = "0|0|0|0|0|0|0|0|0|0|<1>0 + 0 + 1 = 1|<2>13 - 10 = 3|\\0"] node[shape=none] array1[label = "s[j]"] array1 -> a1:2 array2[label = "s[j-1]"] array2 -> a1:1 } ``` 其實 s[j] = s[j] + (s[j] - '0') 便為兩倍 s[j] 當中的數值，所以每更新整個 string 即為乘以二之意,而加上 carry, 代表為下一輪的乘以二做準備,更新 string 次數減一, 即為乘以二的次數, 以 1011 的 msb 來看, 即為 1000_2 = 2 的 3 次方, 更新次數為4-1=3, 也乘以2 3次 ```c char *bnToString(bn *src) { /* total bits in decimal = log10(x) * log10(x) = log2(x) / log2(10) * log2(10) ~= 3.~ * log2(x) = totals bits in binary * so log10(x) ~= totals bits in binary / 3 + 1 * one more bit for '\0' * and another bit for sign */ size_t len = (8 * sizeof(unsigned int) * src->size) / 3 + 2 + src->sign; char *s = (char *) kmalloc(len, GFP_KERNEL); char *p = s; /* initialize decimal string */ memset(s, '0', len - 1); s[len - 1] = '\0'; /* binary to decimal string*/ for (int i = src->size - 1; i >= 0; i--) { /* starts from msb of binary number */ for (unsigned int check = 1U << 31; check; check >>= 1) { int carry = !!(check & src->number[i]); /* starts from 2nd lsb of decimal number, lsb = '\0' */ for (int j = len - 2; j >= 0; j--) { s[j] += s[j] - '0' + carry; //(s[j] + s[j]) equals (2 * s[j]) carry = (s[j] > '9'); if (carry) { s[j] -= 10; } } } } /* skip leading zeros */ while (p[0] == '0' && p[1] != '\0') { p++; } if (src->sign) *(--p) = '-'; //add sign bit memmove(s, p, strlen(p) + 1); return s; } ``` * 大數${_2}$ 加法首先探討加數與被加數的正負號, 若同號則由`bnDoAdd`實作無號數相加, 若異號則為相減之意, 先確保被加數為正數, 再來判斷被加數與加數的絕對值大小, 因為接下來要以`bnDoSub`無號數相減來計算, 不論相加減, 皆以無號數運算, 最後再確認相應的sign bit ```c /* c = a + b */ void bnAdd(bn *a, bn *b, bn *c) { /* a,b > 0 or a,b < 0 */ if (a->sign == b->sign) { bnDoAdd(a, b, c); c->sign = a->sign; } /* a and b have different sign */ else { if (a->sign) __swap(a, b); int cmp = bnCmp(a, b); /* |a| > |b|, a > 0, b < 0, a + b --> |a| - |b| */ if (cmp == 1) { bnDoSub(a, b, c); c->sign = 0; } /* |a| < |b|, a > 0, b < 0, a + b --> -(|b| - |a|) */ else if (cmp == -1) { bnDoSub(b, a, c); c->sign = 1; } /* |a| = |b| */ else { bnResize(c, 1); c->number[0] = 0; c->sign = 0; } } } ``` 舉例a, 與b同號, 計算 a + b = c, 無號數相加過程為先計算c需要多大多大的 number array(32bits) 來儲存大數接下來從a, b 的lsb開始前32bits相加, 再把結果複製到c的前32bits, 若有overflow的部分將由carry承接, 計算a,b下一個32bits的相加時一併加上carry ```c /* the sign of a and b are the same */ void bnDoAdd(bn *a, bn *b, bn *c) { int d = __max(digitsAtmsb(a), digitsAtmsb(b)) + 1; // count a and b's digits in binary; + 1 is sign bit d = divRoundUp(d, 32); bnResize(c, d); unsigned long long int carry = 0; // 64 bits for (int i = 0; i < c->size; i++) { unsigned int tmpA = i < a->size ? a->number[i] : 0; unsigned int tmpB = i < b->size ? b->number[i] : 0; carry += (unsigned long long int) tmpA + tmpB; // operate in unsigned long long int to prevent unsigned int overflow c->number[i] = carry; carry >>= 32; } if (!c->number[c->size - 1] && c->size > 1) bnResize(c, c->size - 1); } ``` 無號數相減 ```c /* |a| > |b| */ void bnDoSub(bn *a, bn *b, bn *c) { bnResize(c, a->size); long long int carry = 0; for (int i = 0; i < c->size; i++) { unsigned int tmpA = i < a->size ? a->number[i] : 0; unsigned int tmpB = i < b->size ? b->number[i] : 0; carry = (long long int) tmpA - tmpB - carry; c->number[i] = carry; if (carry < 0) { carry = 1; } else { carry = 0; } } int d = bnClz(c) / 32; if (d == c->size) d--; bnResize(c, c->size - d); } ``` ## clz / ctz 搭配 Fast doubling :::warning TODO: 並思考 clz / ctz 一類的指令對 Fibonacci 數運算的幫助, 請列出關鍵程式碼並解說實作大數+fast doubling（研讀 KYG-yaya573142 的報告，指出針對大數運算，有哪些加速運算和縮減記憶體操作成本的舉措？）作業要求最後兩點自我檢查清單最後兩點 sysfs 介面 :::