# 2019q1 Homework2 (fibdrv)
contributed by < `ldotrg` >
###### tags: `linux2019`
## 自我檢查清單
- [ ] 檔案 `fibdrv.c` 裡頭的 `MODULE_LICENSE`, `MODULE_AUTHOR`, `MODULE_DESCRIPTION`, `MODULE_VERSION` 等巨集做了什麼事,可以讓核心知曉呢?
```clike
#define __stringify_1(x) #x
#define __stringify(x) __stringify_1(x)
#define __MODULE_INFO(tag, name, info) \
static const char __UNIQUE_ID(name)[] \
__used __attribute__((section(".modinfo"), unused, aligned(1))) \
= __stringify(tag) "=" info
```
#### `__attribute__((section(".modinfo")))`
- Put the variable into the `.modinfo` section.
Use the `readelf` to chceck the `.modinfo` section
```shell=
$ readelf fibdrv.ko -p .modinfo -s
String dump of section '.modinfo':
[ 0] version=0.1
[ c] description=Fibonacci engine driver
[ 30] author=National Cheng Kung University, Taiwan
[ 5e] license=Dual MIT/GPL
[ 78] srcversion=24DC5FB7E7608AF16B0CC1F
[ a0] depends=
[ a9] name=fibdrv
[ b5] vermagic=4.13.0-45-generic SMP mod_unload
# Hexdump the section
$ readelf -x .modinfo fibdrv.ko
```
The result willl same as `modinfo fibdrv.ko`
#### `__used` 定義於 include/linux/compiler_types.h:
```clike
#define __used __attribute__((__used__))
```
對 static variable 設定 `__attribute__((__used__))` 時,會要求 compiler 一定要產生 symbol,即使 variable 沒有被 reference 到.
#### [Argument Prescan](https://gcc.gnu.org/onlinedocs/cpp/Argument-Prescan.html)
prescan does make a difference in three special cases
- If an argument is stringized or concatenated, the prescan does not occur.
定義的巨集裡有 stringized 或 concatenated 是不會展開的.
必須使用令一個巨集來包裝.
```clike
#define __stringify_1(x...) #x
#define __stringify(x...) __stringify_1(x)
#define FOO bar
__stringify_1(FOO); // become "FOO" prescan does not occur
__stringify(FOO); // become "bar"
```
#### [Variadic Macros](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html)
> 還是不知道kernel 怎麼取得 .modinfo section的資訊
> 先看下一題insmod
- [ ] `insmod` 這命令背後,對應 Linux 核心內部有什麼操作呢?請舉出相關 Linux 核心原始碼並解讀
```shell
sudo strace insmod fibdrv.ko
...
getcwd("/home/jake/Workspace_home/fibdrv", 4096) = 33
stat("/home/jake/Workspace_home/fibdrv/fibdrv.ko", {st_mode=S_IFREG|0664, st_size=8312, ...}) = 0
open("/home/jake/Workspace_home/fibdrv/fibdrv.ko", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=8312, ...}) = 0
mmap(NULL, 8312, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fca3dd82000
finit_module(3, "", 0) = 0
munmap(0x7fca3dd82000, 8312) = 0
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
```
關鍵應該就是解析 finit_module
#### `finit_module` 實做部份
module.c
```clike
SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
{
...
err = may_init_module();
if (err)
return err;
...
err = kernel_read_file_from_fd(fd, &hdr, &size, INT_MAX, READING_MODULE);
...
return load_module(&info, uargs, flags);
}
```
> 為何函數定義會長那麼奇怪?
##### Let's find out how the `SYSCALL_DEFINE3` work?
- The article [Add Your Own System Calls to the Linux Kernel](https://williamthegrey.wordpress.com/2014/05/18/add-your-own-system-calls-to-the-linux-kernel/)
- arch/x86/syscalls/syscall_64.tbl 能看見所有系統呼叫號碼
- SYSCALL_DEFINE3 後面的3數字代表函數宣告原型的參數個數
- include/linux/syscalls.h 有 sys_finit_module 的原型宣告
- [ ]當我們透過 `insmod` 去載入一個核心模組時,為何 `module_init` 所設定的函式得以執行呢?Linux 核心做了什麼事呢?
- [ ]試著執行 `$ readelf -a fibdrv.ko`, 觀察裡頭的資訊和原始程式碼及 `modinfo` 的關聯,搭配上述提問,解釋像 `fibdrv.ko` 這樣的 ELF 執行檔案是如何「植入」到 Linux 核心
- [ ]這個 `fibdrv` 名稱取自 Fibonacci driver 的簡稱,儘管在這裡顯然是為了展示和教學用途而存在,但針對若干關鍵的應用場景,特別去撰寫 Linux 核心模組,仍有其意義,請找出 Linux 核心的案例並解讀。提示: 可參閱 [Random numbers from CPU execution time jitter](https://lwn.net/Articles/642166/)
- [ ]查閱 [ktime 相關的 API](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html),並找出使用案例 (需要有核心模組和簡化的程式碼來解說)
- [ ][clock_gettime](https://linux.die.net/man/2/clock_gettime) 和 [High Resolution TImers (HRT)](https://elinux.org/High_Resolution_Timers) 的關聯為何?請參閱 POSIX 文件並搭配程式碼解說
- [ ]`fibdrv` 如何透過 [Linux Virtual File System](https://www.win.tue.nl/~aeb/linux/lk/lk-8.html) 介面,讓計算出來的 Fibonacci 數列得以讓 userspace (使用者層級) 程式 (本例就是 `client.c` 程式) 得以存取呢?解釋原理,並撰寫或找出相似的 Linux 核心模組範例
## Linux Virtual File System Most important Object: superblock, inode, dentry, file object.
- The superblock object:
- Represents a specific mounted filesystem.
- linux-4.4.60/include/fs/fs.h `struct super_block`
- linux-4.4.60/fs/super.c
- The inode object, which represents a specific file.
- Each object in the filesystem is represented by an inode
- linux-4.4.60/include/linux/fs.h `struct inode`
- [inode structure 影片](https://www.youtube.com/watch?v=tMVj22EWg6A)
- The dentry object:
- linux-4.4.60/include/linux/dcache.h `struct dentry`
- Represents a directory entry, which is a single component of a path.
- The file object:
- Represents an open file as associated with a process.
- linux-4.4.60/include/linux/fs.h `struct file`
Above 4 [objects relationships](https://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/index.html#fig7)
- Roadmap for open file:
inode => cdev => simple_char_dev (a super-class of cdev containging device related data) => file
- [My Sample code](https://github.com/ldotrg/practical_coding/blob/master/kernel_module/char_devices/simple_char.c)

#### Driver `init_fib_dev` 前置準備
- `alloc_chrdev_region`: 向核心註冊Device Number, 或動態由核心分配拿到 Device Number(major_number+minor_number)
- 第三個參數是向核心要 N 個 minor_numbers
- `class_create()`: Populate sysfs entries `ls /sys/class`
- `device_create()`: will create the device file. (`/dev/`)
- `cdev_alloc()`: create cdev structure
- `cdev_init`: 初始化 cdev , 將 cdev 與 file_opreations 綁住
- `cdev_add()` 會將 cdev 與 Device Number 綁一起
```clike=
struct class *my_class;
struct cdev my_cdev[N_MINORS];
dev_t dev_num;
static int __init my_init(void)
{
int i;
dev_t curr_dev;
/* Request the kernel for N_MINOR devices */
alloc_chrdev_region(&dev_num, 0, N_MINORS, "my_driver");
/* Create a class : appears at /sys/class */
my_class = class_create(THIS_MODULE, "my_driver_class");
/* Initialize and create each of the device(cdev) */
for (i = 0; i < N_MINORS; i++) {
/* Associate the cdev with a set of file_operations */
cdev_init(&my_cdev[i], &fops);
/* Build up the current device number. To be used further */
curr_dev = MKDEV(MAJOR(dev_num), MINOR(dev_num) + i);
/* Create a device node for this device. Look, the class is
* being used here. The same class is associated with N_MINOR
* devices. Once the function returns, device nodes will be
* created as /dev/my_dev0, /dev/my_dev1,... You can also view
* the devices under /sys/class/my_driver_class.
*/
device_create(my_class, NULL, curr_dev, NULL, "my_dev%d", i);
/* Now make the device live for the users to access */
cdev_add(&my_cdev[i], curr_dev, 1);
}
return 0;
}
```
#### User space 操作
- 當 User space open file 時, 最終會呼叫到 KERNEL `chrdev_open()`
- `chrdev_open` 會利用 inode 的資訊(device number)找到對應的`cdev`,並填入`inode->i_cdev`
> inode->i_rdev 是什麼時候填入device number??
> cdev_map 是什麼?
```clike
static int chrdev_open(struct inode *inode, struct file *filp)
{
...
p = inode->i_cdev;
if (!p) {
struct kobject *kobj;
int idx;
spin_unlock(&cdev_lock);
kobj = kobj_lookup(cdev_map, inode->i_rdev, &idx);
if (!kobj)
return -ENXIO;
new = container_of(kobj, struct cdev, kobj);
...
}
```
#### Reference
[Anatomy of the Linux virtual file system switch](https://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/index.html)
[Chapter 13. The Virtual Filesystem](https://notes.shichao.io/lkd/ch13/)
[Linux Virtual File System example: Proc File System](https://likegeeks.com/linux-virtual-file-system/#proc-File-System)
[Software structure of a device driver](http://rts.lab.asu.edu/web_438_Fall_2014/ESP_F14_2_Linux_Kernel_driver.pptx)
[linux-cdev-vs-register-chrdev](https://stackoverflow.com/questions/27174404/linux-cdev-vs-register-chrdev)
- [ ]注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題
- [ ]許多現代處理器提供了 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令,你知道如何透過演算法的調整,去加速 [費氏數列](https://hackmd.io/s/BJPZlyDSV) 運算嗎?請列出關鍵程式碼並解說
- [Linux device and driver model](https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf):P134
## 作業要求
- [ ]在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv),目標是改善 `fibdrv` 計算 Fibinacci 數列的執行效率,過程中需要量化執行時間,可在 Linux 核心模組和使用層級去測量
* 在 Linux 核心模組中,可用 ktime 系列的 API
* 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API
* 分別用 gnuplot 製圖,分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌)
* 嘗試解讀上述時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響
* 原本的程式碼只能列出到 Fibonacci(100),請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用)
- [ ]逐步最佳化 Fibonacci 的執行時間,引入 [費氏數列](https://hackmd.io/s/BJPZlyDSV) 提到的策略,並善用 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令,過程中都要充分量化
### Original Assignment info: [F03: fibdrv](https://hackmd.io/s/SJ2DKLZ8E?fbclid=IwAR0xvKkG6G4nTqIUNVtQPyVr1iKe3o6m6kovqEGm60Wf6bl9k9kOY8faUV0#F03-fibdrv)
###### tags: `Linux Kernel Module`