# 2019q1 Homework2 (fibdrv) contributed by < `ldotrg` > ###### tags: `linux2019` ## 自我檢查清單 - [ ] 檔案 `fibdrv.c` 裡頭的 `MODULE_LICENSE`, `MODULE_AUTHOR`, `MODULE_DESCRIPTION`, `MODULE_VERSION` 等巨集做了什麼事,可以讓核心知曉呢? ```clike #define __stringify_1(x) #x #define __stringify(x) __stringify_1(x) #define __MODULE_INFO(tag, name, info) \ static const char __UNIQUE_ID(name)[] \ __used __attribute__((section(".modinfo"), unused, aligned(1))) \ = __stringify(tag) "=" info ``` #### `__attribute__((section(".modinfo")))` - Put the variable into the `.modinfo` section. Use the `readelf` to chceck the `.modinfo` section ```shell= $ readelf fibdrv.ko -p .modinfo -s String dump of section '.modinfo': [ 0] version=0.1 [ c] description=Fibonacci engine driver [ 30] author=National Cheng Kung University, Taiwan [ 5e] license=Dual MIT/GPL [ 78] srcversion=24DC5FB7E7608AF16B0CC1F [ a0] depends= [ a9] name=fibdrv [ b5] vermagic=4.13.0-45-generic SMP mod_unload # Hexdump the section $ readelf -x .modinfo fibdrv.ko ``` The result willl same as `modinfo fibdrv.ko` #### `__used` 定義於 include/linux/compiler_types.h: ```clike #define __used __attribute__((__used__)) ``` 對 static variable 設定 `__attribute__((__used__))` 時,會要求 compiler 一定要產生 symbol,即使 variable 沒有被 reference 到. #### [Argument Prescan](https://gcc.gnu.org/onlinedocs/cpp/Argument-Prescan.html) prescan does make a difference in three special cases - If an argument is stringized or concatenated, the prescan does not occur. 定義的巨集裡有 stringized 或 concatenated 是不會展開的. 必須使用令一個巨集來包裝. ```clike #define __stringify_1(x...) #x #define __stringify(x...) __stringify_1(x) #define FOO bar __stringify_1(FOO); // become "FOO" prescan does not occur __stringify(FOO); // become "bar" ``` #### [Variadic Macros](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) > 還是不知道kernel 怎麼取得 .modinfo section的資訊 > 先看下一題insmod - [ ] `insmod` 這命令背後,對應 Linux 核心內部有什麼操作呢?請舉出相關 Linux 核心原始碼並解讀 ```shell sudo strace insmod fibdrv.ko ... getcwd("/home/jake/Workspace_home/fibdrv", 4096) = 33 stat("/home/jake/Workspace_home/fibdrv/fibdrv.ko", {st_mode=S_IFREG|0664, st_size=8312, ...}) = 0 open("/home/jake/Workspace_home/fibdrv/fibdrv.ko", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0664, st_size=8312, ...}) = 0 mmap(NULL, 8312, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fca3dd82000 finit_module(3, "", 0) = 0 munmap(0x7fca3dd82000, 8312) = 0 close(3) = 0 exit_group(0) = ? +++ exited with 0 +++ ``` 關鍵應該就是解析 finit_module #### `finit_module` 實做部份 module.c ```clike SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags) { ... err = may_init_module(); if (err) return err; ... err = kernel_read_file_from_fd(fd, &hdr, &size, INT_MAX, READING_MODULE); ... return load_module(&info, uargs, flags); } ``` > 為何函數定義會長那麼奇怪? ##### Let's find out how the `SYSCALL_DEFINE3` work? - The article [Add Your Own System Calls to the Linux Kernel](https://williamthegrey.wordpress.com/2014/05/18/add-your-own-system-calls-to-the-linux-kernel/) - arch/x86/syscalls/syscall_64.tbl 能看見所有系統呼叫號碼 - SYSCALL_DEFINE3 後面的3數字代表函數宣告原型的參數個數 - include/linux/syscalls.h 有 sys_finit_module 的原型宣告 - [ ]當我們透過 `insmod` 去載入一個核心模組時,為何 `module_init` 所設定的函式得以執行呢?Linux 核心做了什麼事呢? - [ ]試著執行 `$ readelf -a fibdrv.ko`, 觀察裡頭的資訊和原始程式碼及 `modinfo` 的關聯,搭配上述提問,解釋像 `fibdrv.ko` 這樣的 ELF 執行檔案是如何「植入」到 Linux 核心 - [ ]這個 `fibdrv` 名稱取自 Fibonacci driver 的簡稱,儘管在這裡顯然是為了展示和教學用途而存在,但針對若干關鍵的應用場景,特別去撰寫 Linux 核心模組,仍有其意義,請找出 Linux 核心的案例並解讀。提示: 可參閱 [Random numbers from CPU execution time jitter](https://lwn.net/Articles/642166/) - [ ]查閱 [ktime 相關的 API](https://www.kernel.org/doc/html/latest/core-api/timekeeping.html),並找出使用案例 (需要有核心模組和簡化的程式碼來解說) - [ ][clock_gettime](https://linux.die.net/man/2/clock_gettime) 和 [High Resolution TImers (HRT)](https://elinux.org/High_Resolution_Timers) 的關聯為何?請參閱 POSIX 文件並搭配程式碼解說 - [ ]`fibdrv` 如何透過 [Linux Virtual File System](https://www.win.tue.nl/~aeb/linux/lk/lk-8.html) 介面,讓計算出來的 Fibonacci 數列得以讓 userspace (使用者層級) 程式 (本例就是 `client.c` 程式) 得以存取呢?解釋原理,並撰寫或找出相似的 Linux 核心模組範例 ## Linux Virtual File System Most important Object: superblock, inode, dentry, file object. - The superblock object: - Represents a specific mounted filesystem. - linux-4.4.60/include/fs/fs.h `struct super_block` - linux-4.4.60/fs/super.c - The inode object, which represents a specific file. - Each object in the filesystem is represented by an inode - linux-4.4.60/include/linux/fs.h `struct inode` - [inode structure 影片](https://www.youtube.com/watch?v=tMVj22EWg6A) - The dentry object: - linux-4.4.60/include/linux/dcache.h `struct dentry` - Represents a directory entry, which is a single component of a path. - The file object: - Represents an open file as associated with a process. - linux-4.4.60/include/linux/fs.h `struct file` Above 4 [objects relationships](https://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/index.html#fig7) - Roadmap for open file: inode => cdev => simple_char_dev (a super-class of cdev containging device related data) => file - [My Sample code](https://github.com/ldotrg/practical_coding/blob/master/kernel_module/char_devices/simple_char.c) ![](https://i.imgur.com/nOwjgHh.png) #### Driver `init_fib_dev` 前置準備 - `alloc_chrdev_region`: 向核心註冊Device Number, 或動態由核心分配拿到 Device Number(major_number+minor_number) - 第三個參數是向核心要 N 個 minor_numbers - `class_create()`: Populate sysfs entries `ls /sys/class` - `device_create()`: will create the device file. (`/dev/`) - `cdev_alloc()`: create cdev structure - `cdev_init`: 初始化 cdev , 將 cdev 與 file_opreations 綁住 - `cdev_add()` 會將 cdev 與 Device Number 綁一起 ```clike= struct class *my_class; struct cdev my_cdev[N_MINORS]; dev_t dev_num; static int __init my_init(void) { int i; dev_t curr_dev; /* Request the kernel for N_MINOR devices */ alloc_chrdev_region(&dev_num, 0, N_MINORS, "my_driver"); /* Create a class : appears at /sys/class */ my_class = class_create(THIS_MODULE, "my_driver_class"); /* Initialize and create each of the device(cdev) */ for (i = 0; i < N_MINORS; i++) { /* Associate the cdev with a set of file_operations */ cdev_init(&my_cdev[i], &fops); /* Build up the current device number. To be used further */ curr_dev = MKDEV(MAJOR(dev_num), MINOR(dev_num) + i); /* Create a device node for this device. Look, the class is * being used here. The same class is associated with N_MINOR * devices. Once the function returns, device nodes will be * created as /dev/my_dev0, /dev/my_dev1,... You can also view * the devices under /sys/class/my_driver_class. */ device_create(my_class, NULL, curr_dev, NULL, "my_dev%d", i); /* Now make the device live for the users to access */ cdev_add(&my_cdev[i], curr_dev, 1); } return 0; } ``` #### User space 操作 - 當 User space open file 時, 最終會呼叫到 KERNEL `chrdev_open()` - `chrdev_open` 會利用 inode 的資訊(device number)找到對應的`cdev`,並填入`inode->i_cdev` > inode->i_rdev 是什麼時候填入device number?? > cdev_map 是什麼? ```clike static int chrdev_open(struct inode *inode, struct file *filp) { ... p = inode->i_cdev; if (!p) { struct kobject *kobj; int idx; spin_unlock(&cdev_lock); kobj = kobj_lookup(cdev_map, inode->i_rdev, &idx); if (!kobj) return -ENXIO; new = container_of(kobj, struct cdev, kobj); ... } ``` #### Reference [Anatomy of the Linux virtual file system switch](https://www.ibm.com/developerworks/library/l-virtual-filesystem-switch/index.html) [Chapter 13. The Virtual Filesystem](https://notes.shichao.io/lkd/ch13/) [Linux Virtual File System example: Proc File System](https://likegeeks.com/linux-virtual-file-system/#proc-File-System) [Software structure of a device driver](http://rts.lab.asu.edu/web_438_Fall_2014/ESP_F14_2_Linux_Kernel_driver.pptx) [linux-cdev-vs-register-chrdev](https://stackoverflow.com/questions/27174404/linux-cdev-vs-register-chrdev) - [ ]注意到 `fibdrv.c` 存在著 `DEFINE_MUTEX`, `mutex_trylock`, `mutex_init`, `mutex_unlock`, `mutex_destroy` 等字樣,什麼場景中會需要呢?撰寫多執行緒的 userspace 程式來測試,觀察 Linux 核心模組若沒用到 mutex,到底會發生什麼問題 - [ ]許多現代處理器提供了 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令,你知道如何透過演算法的調整,去加速 [費氏數列](https://hackmd.io/s/BJPZlyDSV) 運算嗎?請列出關鍵程式碼並解說 - [Linux device and driver model](https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf):P134 ## 作業要求 - [ ]在 GitHub 上 fork [fibdrv](https://github.com/sysprog21/fibdrv),目標是改善 `fibdrv` 計算 Fibinacci 數列的執行效率,過程中需要量化執行時間,可在 Linux 核心模組和使用層級去測量 * 在 Linux 核心模組中,可用 ktime 系列的 API * 在 userspace 可用 [clock_gettime](https://linux.die.net/man/2/clock_gettime) 相關 API * 分別用 gnuplot 製圖,分析 Fibonacci 數列在核心計算和傳遞到 userspace 的時間開銷,單位需要用 us 或 ns (自行斟酌) * 嘗試解讀上述時間分佈,特別是隨著 Fibonacci 數列增長後,對於 Linux 核心的影響 * 原本的程式碼只能列出到 Fibonacci(100),請修改程式碼,列出後方更多數值 (注意: 檢查正確性和數值系統的使用) - [ ]逐步最佳化 Fibonacci 的執行時間,引入 [費氏數列](https://hackmd.io/s/BJPZlyDSV) 提到的策略,並善用 [clz / ctz](https://en.wikipedia.org/wiki/Find_first_set) 一類的指令,過程中都要充分量化 ### Original Assignment info: [F03: fibdrv](https://hackmd.io/s/SJ2DKLZ8E?fbclid=IwAR0xvKkG6G4nTqIUNVtQPyVr1iKe3o6m6kovqEGm60Wf6bl9k9kOY8faUV0#F03-fibdrv) ###### tags: `Linux Kernel Module`