---
tags: linux2022
---
# 2022q1 Homework3 (fibdrv)
contributed by < `oucs638` >
> [Assignment](https://hackmd.io/@sysprog/linux2022-fibdrv)
> [oucs638/fibdrv](https://github.com/oucs638/fibdrv)
## Development Environment
:::spoiler Compiler
```shell
$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
```
:::
:::spoiler Hardware
```shell
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
Stepping: 13
CPU MHz: 2600.000
CPU max MHz: 4500.0000
CPU min MHz: 800.0000
BogoMIPS: 5199.98
Virtualization: VT-x
L1d cache: 192 KiB
L1i cache: 192 KiB
L2 cache: 1.5 MiB
L3 cache: 12 MiB
NUMA node0 CPU(s): 0-11
```
:::
:::spoiler Linux kernel version
```shell
$ uname -r
5.13.0-30-generic
```
:::
:::spoiler Install `linux-headers` package.
```shell
$ sudo apt install linux-headers-`uname -r`
$ dpkg -L linux-headers-5.13.0-30-generic | grep "/lib/modules"
/lib/modules
/lib/modules/5.13.0-30-generic
/lib/modules/5.13.0-30-generic/build
```
:::
:::spoiler Confirm user identity
```shell
$ whoami
oucs638
$ sudo whoami
root
```
:::
:::spoiler Install tools
```shell
$ sudo apt install util-linux strace gnuplot-nox
```
:::
## Exclude factors that affect performance analysis
- Isolate a CPU from the kernel scheduler.
```shell
$ sudo vim /etc/default/grub
...
// modify GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX="isolcous=11"
```
- Update grub and reboot.
```shell
$ sudo update-grub
$ reboot
```
- Check whether the CPU was successfully isolated.
```shell
$ taskset -cp 1
pid 1's current affinity list: 0-10
```
- Restrain address space layout randomization (ASLR)
```shell
$ sudo sh -c "echo 0 > /proc/sys/kernel/randomize_va_space"
```
- Create a shell script named `performance.sh` to set `scaling_governor` as `performance`.
```shell
// performance.sh
for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
echo performance > ${i}
done
// execute script
$ sudo sh performance.sh
```
- For Intel processors, turn off turbo mode.
```shell
$ sudo sh -c "echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
```
## Use class `clz/ctz` instructions to improve Fibonacci number operations.
- Definition of Fibonacci number:
$$
\left \{
\begin{split}
& F_0 = 0, F_1 = 1 \\
& F_n = F_{n-1} + F_{n-2} , n \ge 2
\end{split}
\right.
$$
- Fibonacci number can be expressed as:
$$
\vec{F_n}
=
\begin{bmatrix}
F_n \\
F_{n-1}
\end{bmatrix}
=
\begin{bmatrix}
1 & 1 \\
1 & 0
\end{bmatrix}
\cdot
\begin{bmatrix}
F_{n-1} \\
F_{n-2}
\end{bmatrix}
$$
- Implement in C code:
```c
int F(int n)
{
if (n == 0 || n == 1)
return n;
return F(n - 1) + F(n - 2);
}
```
- Assuming `n = 6`, the result of the program executing the function calls are as follows:
```graphviz
strict digraph G
{
1[label="F(6)"]
2[label="F(4)"]
3[label="F(5)"]
4[label="F(2)"]
5[label="F(3)"]
6[label="F(3)"]
7[label="F(4)"]
8[label="F(0)", style=filled]
9[label="F(1)", style=filled]
10[label="F(1)", style=filled]
11[label="F(2)"]
12[label="F(1)", style=filled]
13[label="F(2)"]
14[label="F(2)"]
15[label="F(3)"]
16[label="F(0)", style=filled]
17[label="F(1)", style=filled]
18[label="F(0)", style=filled]
19[label="F(1)", style=filled]
20[label="F(0)", style=filled]
21[label="F(1)", style=filled]
22[label="F(1)", style=filled]
23[label="F(2)", style=filled]
24[label="F(0)", style=filled]
25[label="F(1)", style=filled]
1 -> {2, 3}
2 -> {4, 5}
3 -> {6, 7}
4 -> {8, 9}
5 -> {10, 11}
6 -> {12, 13}
7 -> {14, 15}
11 -> {16, 17}
13 -> {18, 19}
14 -> {20, 21}
15 -> {22, 23}
23 -> {24, 25}
}
```
- Obviously there are a lot of unnecessary repeated function calls.
- Refer to [Calculating Fibonacci Numbers by Fast Doubling](https://chunminchang.github.io/blog/post/calculating-fibonacci-numbers-by-fast-doubling), the Fibonacci number can be redefined as:
$$
\left \{
\begin{split}
& F_0 = 0, F_1 = 1, F_2 = 1 \\
& F_{2n + 1} = {F_{n+1}}^2 + {F_{n}}^2 \\
& F_{2n} = F_n \cdot (2 \cdot F_{n + 1} - F_n) \\
& n \ge 0
\end{split}
\right.
$$
- Implement in C code:
```c
int F(int n)
{
if (n == 0 || n == 1)
return n;
if (n == 2)
return 1;
int k = 0;
if (n % 2) {
k = (n - 1) / 2;
return F(k) * F(k) + F(k + 1) * F(k + 1);
} else {
k = n / 2;
return F(k) * (2 * F(k + 1) - F(k));
}
}
```
- Assuming `n = 6`, the result of the program executing the function calls are as follows:
```graphviz
strict digraph G
{
1[label="F(6)"]
2[label="F(3)"]
3[label="F(4)"]
4[label="F(1)", style=filled]
5[label="F(2)", style=filled]
6[label="F(2)", style=filled]
7[label="F(3)"]
8[label="F(1)", style=filled]
9[label="F(2)", style=filled]
1 -> {2, 3}
2 -> {4, 5}
3 -> {6, 7}
7 -> {8, 9}
}
```
- The Fast Doubling method can significantly reduce thr function calls required for program execution.
- Because computers represent numbers in binary, we can start from MSB to determine which operation to perform.
- Find `0` : Calculate $F_{2n}$ and $F_{2n + 1}$.
- Find `1` : Calculate $F_{2n}$ and $F_{2n + 1}$ first, then calculate $F_{2n + 2}$.
- But the actual number would not have lead 0s, so we can use class `ctz` instructions to count how many lead 0s and skip them.
## Development Progress Records
- Test.
```shell
$ make check
make -C /lib/modules/5.13.0-30-generic/build M=/home/oucs638/GitHub/fibdrv modules
make[1]: Entering directory '/usr/src/linux-headers-5.13.0-30-generic'
make[1]: Leaving directory '/usr/src/linux-headers-5.13.0-30-generic'
make unload
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo rmmod fibdrv || true >/dev/null
rmmod: ERROR: Module fibdrv is not currently loaded
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
make load
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo insmod fibdrv.ko
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
sudo ./client > out
make unload
make[1]: Entering directory '/home/oucs638/GitHub/fibdrv'
sudo rmmod fibdrv || true >/dev/null
make[1]: Leaving directory '/home/oucs638/GitHub/fibdrv'
Passed [-]
f(93) fail
input: 7540113804746346429
expected: 12200160415121876738
```
- Observe the resulting `fibdrv.ko` module.
```shell
$ modinfo fibdrv.ko
filename: fibdrv.ko
version: 0.1
description: Fibonacci engine driver
author: National Cheng Kung University, Taiwan
license: Dual MIT/GPL
srcversion: 9A01E3671A116ADA9F2BB0A
depends:
retpoline: Y
name: fibdrv
vermagic: 5.13.0-30-generic SMP mod_unload modversions
```
- Observe the behavior of the `fibdrv.ko` after the linux kernel is mounted.
```shell
$ sudo insmod fibdrv.ko
$ ls -l /dev/fibonacci
crw------- 1 root root 510, 0 Mar 7 01:50 /dev/fibonacci
$ cat /sys/class/fibonacci/fibonacci/dev
510:0
$ cat /sys/module/fibdrv/version
0.1
$ lsmod | grep fibdrv
fibdrv 16384 0
$ cat /sys/module/fibdrv/refcnt
0
```
### Calculate the Fibonacci number after the 93rd item (inclusive)
- Refer to [assignment](https://hackmd.io/PMozqaxqRA-jF2KmgTZcPQ?view#%E8%A8%88%E7%AE%97-F93-%E5%8C%85%E5%90%AB-%E4%B9%8B%E5%BE%8C%E7%9A%84-Fibonacci-%E6%95%B8---%E4%BD%BF%E7%94%A8%E6%95%B8%E5%AD%97%E5%AD%97%E4%B8%B2%E4%B8%A6%E5%A5%97%E7%94%A8-quiz2-SSO-Small-String-Optimization), attempt to introduce big number that operate on string-number.
- Use [Small/Short String Optimization](http://wdv4758h.github.io/notes/cpp/string-optimization.html).
- Refer to the [example code](https://github.com/AdrianHuang/fibdrv/tree/big_number) provided in the assignment.
- The example code only use string-number addition to calculate the Fibonacci number.
- Expect to add a string-number subtraction and multiplication based on the addition-related code of the example code.
- With string-number substraction and multiplication, fast doubling can be intoduced to calculate Fibonacci numbers.
- But in this part, the Fibonacci number will be calculated by addtion first, and fast doubling will be introduced in the later part.
- Because C99 variable-length array(VLA) is not allowed in Linux kernel, so I use `kmalloc` to allocate memory space for the array dynamically.
```shell
...
/home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘fib_sequence’:
/home/oucs638/GitHub/fibdrv/fibdrv.c:70:5: warning: ISO C90 forbids variable length array ‘f’ [-Wvla]
70 | bn f[k + 2];
| ^~
/home/oucs638/GitHub/fibdrv/fibdrv.c: In function ‘bn_add’:
/home/oucs638/GitHub/fibdrv/fibdrv.c:263:5: warning: ISO C90 forbids variable length array ‘buf’ [-Wvla]
263 | char buf[size_a + 2];
| ^~~~
...
```
- After calculate the Fibonacci number, the allocated memory space should be freed.
```c
static int fib_sequence(long long k, char __user *buf)
{
/* FIXME: C99 variable-length array (VLA) is not allowed in Linux kernel. */
// bn f[k + 2];
bn *f = (bn *) kmalloc((k + 2) * sizeof(bn), GFP_KERNEL);
f[0] = *bn_tmp("0");
f[1] = *bn_tmp("1");
for (int i = 2; i <= k; i++)
bn_add(&f[i - 1], &f[i - 2], &f[i]);
int res = bn_size(&f[k]);
if (copy_to_user(buf, bn_data(&f[k]), res))
return -EFAULT;
for (int i = 0; i <= k; i++)
bn_free(&f[i]);
return res;
}
```
- String-number addition will reverse the string first, then add digit one by one from the least significant digit to the most significant digit.
```c
static void bn_add(bn *a, bn *b, bn *out)
{
char *data_a, *data_b;
size_t size_a, size_b;
int i, s, c = 0;
if (bn_size(a) < bn_size(b))
__swap((void *) &a, (void *) &b, sizeof(void *));
data_a = bn_data(a);
data_b = bn_data(b);
size_a = bn_size(a);
size_b = bn_size(b);
reverse_str(data_a, size_a);
reverse_str(data_b, size_b);
// char buf[size_a + 2];
char *buf = (char *) kmalloc((size_a + 2), GFP_KERNEL);
for (i = 0; i < size_b; i++) {
s = (data_a[i] - '0') + (data_b[i] - '0') + c;
buf[i] = '0' + s % 10;
c = s / 10;
}
for (i = size_b; i < size_a; i++) {
s = (data_a[i] - '0') + c;
buf[i] = '0' + s % 10;
c = s / 10;
}
if (c)
buf[i++] = '0' + c;
buf[i] = 0;
reverse_str(buf, i);
reverse_str(data_a, size_a);
reverse_str(data_b, size_b);
if (out)
*out = *bn_tmp(buf);
}
```
### Calculating Fibonacci Numbers by Fast Doubling (big number)
- I found it challenging to overcome positive-negtive sign problem issues when implementing "fasting doubling" with string numbers.
- It needs a lot of reallocating memory space to implement big-number operations with string number.
- Refer to [eecheng87](https://github.com/eecheng87/fibdrv), rebuilt the big number operation.
- Preset a space large enough for Fibonacci number value.
- As [previous part](https://hackmd.io/4u8VJAWiTEafpMav-6BTuA?both#Use-class-clzctz-instructions-to-improve-Fibonacci-number-operations), the Fibonacci number can be represented as:
$$
\left \{
\begin{split}
& F_0 = 0, F_1 = 1, F_2 = 1 \\
& F_{2n + 1} = {F_{n+1}}^2 + {F_{n}}^2 \\
& F_{2n} = F_n \cdot (2 \cdot F_{n + 1} - F_n) \\
& n \ge 0
\end{split}
\right.
$$
- Expressed as functions:
$$
\left \{
\begin{split}
f(0) &= 0 = f(k) \\
f(1) &= 1 = f(k + 1) \\
f(2k) &= f(k) \cdot [2 \cdot f(k + 1) - f(k)] \\
&= 2 \cdot f(k) \cdot f(k + 1) - f(k) \cdot f(k) \\
f(2k + 1) &= {f(k)}^2 + {f(k + 1)}^2 \\
&= f(k) \cdot f(k) + f(k + 1) \cdot f(k + 1)\\
k &\ge 0 \\
\end{split}
\right.
$$
- Assume:
$$
\left \{
\begin{split}
t(0) &= f(k) \cdot f(k) \\
t(1) &= f(k + 1) \cdot f(k + 1) \\
f(2) &= 2 \cdot f(k) \cdot f(k + 1) \\
\end{split}
\right.
$$
- Because computers represent numbers in binary, we can start from MSB to determine which operation to perform.
- Find `1` : next $k$ is odd.
$$
\left \{
\begin{split}
next\ \ \ \ k &= 2k + 1 \\
f(k) &= f(2k + 1) \\
&= t(0) + t(1) \\
f(k + 1) &= f(2k + 2) \\
&= f(2k) + f(2k + 1) \\
&= t(2) - t(0) + t(0) + t(1)\\
&= t(2) + t(1)
\end{split}
\right.
$$
- Find `0` : next $k$ is even.
$$
\left \{
\begin{split}
next\ \ \ \ k &= 2k \\
f(k) &= f(2k) \\
&= t(2) - t(0) \\
f(k + 1) &= f(2k + 1) \\
&= t(0) + t(1)
\end{split}
\right.
$$
- But the actual number would not have lead 0s, so we can use class `ctz` instructions to count how many lead 0s and skip them.
- Use `__builtin_clzll(k)` for `typeof(k) = unsigned long long`
```c
static bn fib_sequence(unsigned long long k)
{
bn f[2];
bn_init(&f[0], 0); // f(k)
bn_init(&f[1], 1); // f(k + 1)
if (k < 2)
return f[k];
// t(0) = f(k) * f(k)
// t(1) = f(k + 1) * f(k + 1)
// t(2) = 2 * f(k) * f(k + 1)
bn t[3];
// fast doubling
// f(2k) = f(k) * [2 * f(k + 1) - f(k)]
// = 2 * f(k) * f(k + 1) - f(k) * f(k)
// = t(2) - t(0)
// f(2k + 1) = f(k) ^ 2 + f(k + 1) ^ 2
// = f(k) * f(k) + f(k + 1) * f(k + 1)
// = t(0) + t(1)
for (unsigned long long i = 1U << (63 - __builtin_clzll(k)); i; i >>= 1) {
bn_mul(&f[0], &f[0], &t[0]); // t(0)
bn_mul(&f[1], &f[1], &t[1]); // t(1)
bn_mul(&f[0], &f[1], &t[2]);
bn_add(&t[2], &t[2], &t[2]); // t(2)
if (i & k) {
// next k = 2k + 1
// f(k) = f(2k + 1) = t(0) + t(1)
bn_add(&t[0], &t[1], &f[0]);
// f(k + 1) = f(2k + 2) = f(2k) + f(2k + 1)
// = t(2) - t(0) + t(0) + t(1)
// = t(2) + t(1)
bn_add(&t[2], &t[1], &f[1]);
} else {
// next k = 2k
// f(k) = f(2k)
bn_sub(&t[2], &t[0], &f[0]);
// f(k + 1) = f(2k + 1)
bn_add(&t[0], &t[1], &f[1]);
}
}
// return f(k)
return f[0];
}
```
## Analyze
- Use big number.
![](https://i.imgur.com/cyltHbr.png)
- Use builtin `int128`.
![](https://i.imgur.com/wgNBRxw.png)
- Found that if used builtin `int128`, the execution time would have unpredictable peaks.