Linux 核心專題: 改進 LKMPG

## Linux 核心專題: 改進 LKMPG > 執行人: mesohandsome > [專題解說錄影](https://youtu.be/JebyeUpHYGM) ## Reviewed by `Lccgth` > 在 tasklet 中切記不能加入 `sleep` 等暫停功能，運氣好時模組載入僅會出現錯誤訊息，嚴重時會導致系統 crash，這個時候只能乖乖地按下電源鍵重新開機。運氣好這個詞彙不能精確反映實際執行情形，是否有其他因素導致執行時發生兩種錯誤情況? ## 任務簡介閱讀 LKMPG 並改進其內容，預計加上 simrupt 到 LKMPG 來說明 tasklet, mutex, kfifo 等使用方式。 ## TODO: 閱讀 LKMPG 並紀錄問題 (含可改進之處) > 針對書中錯誤或過時資訊，提交 pull request 改進 ### 問題 1 在提及 major number 時，後續未介紹 minor number。 kernel 會自動分配 major number 給 device driver，那 device driver 要怎麼用 minor number 判斷目前的裝置為何？ ### 問題 2 僅有某些章節提供 reference 資料，其他章節也能針對該段重點提供更多參考資訊？ ### 問題 3 14.1 Tasklets 中 `example_tasklet.c` 的運作結果與書中的結果不同。 ```bash # Example tasklet example init Example tasklet starts Example tasklet init continues... Example tasklet ends # My result [702163.732017] tasklet example init [702163.732369] Example tasklet starts [702168.693030] Example tasklet ends [702168.890467] Example tasklet init continues... ``` 在 [該篇文章](https://embetronicx.com/tutorials/linux/device-drivers/tasklet-static-method/) 提到每個 tasklet 只能在其執行 `tasklet_schedule` 的 CPU 上運作： > Tasklets are used to queue up work to be done at a later time. Tasklets can be run in parallel, but the same tasklet cannot be run on multiple CPUs at the same time. Also, each tasklet will run only on the CPU that schedules it, to optimize cache usage. Since the thread that queued up the tasklet must complete before it can run the tasklet, race conditions are naturally avoided. However, this arrangement can be suboptimal, as other potentially idle CPUs cannot be used to run the tasklet. Therefore workqueues can and should be used instead, and workqueues were already discussed here. 因此程式一定會先將 tasklet 執行完畢再繼續執行 `module_init` 中剩下的操作，不會像書中所給的範例輸出會像是 multi-thread 那樣同步執行。 :::info 要修改 example_tasklet.c 範例下的的輸出或是直接變更 example_tasklet.c？ > 後者 ::: tasklet 在運作時是 non-blocking 的，因此一定會讓 `tasklet_fn` 全部完成後才會繼續做其他事，且在 tasklet 中不宜使用 delay 或 sleep 等操作。 ```c static void tasklet_fn(unsigned long data) { pr_info("Example tasklet starts\n"); mdelay(5000); pr_info("Example tasklet ends\n"); } static int __init example_tasklet_init(void) { pr_info("tasklet example init\n"); tasklet_schedule(&mytask); mdelay(200); pr_info("Example tasklet init continues...\n"); return 0; } ``` 將範例中的 delay 時間做調整後，會得到兩種結果，分別為先執行 tasklet 以及後執行，因此可以得到以下兩種 output： ```bash # 1 tasklet example init Example tasklet init continues... Example tasklet starts Example tasklet ends # 2 tasklet example init Example tasklet starts Example tasklet ends Example tasklet init continues... ``` 如果想得到書中所給的輸出，需要使用到 multi-thread 或是用像 simrupt 中使用 workqueue 等方式讓程式得以非同步執行，才能將 `Example tasklet init continues...` 穿插在 `Example tasklet starts` 和 `Example tasklet ends` 中間。 ### 問題 4 在 [procfs2.c](https://github.com/sysprog21/lkmpg/blob/master/examples/procfs2.c) 的 `procfile_write` 中有一段程式碼，當 `procfs_buffer_size == PROCFS_MAX_SIZE`，在 `procfs_buffer[0]` 填上 `\0`，等於是將 buffer 清空： ```c procfs_buffer[procfs_buffer_size & (PROCFS_MAX_SIZE - 1)] = '\0'; ``` 在剛進入該函式時就已經有使用一次 `if` 判斷 `procfs_buffer_size` 是否大於 `PROCFS_MAX_SIZE`，並將 `procfs_buffer_size` 設定為 `PROCFS_MAX_SIZE`，為何不在這時就清空 `procfs_buffer` 並 return，這樣也能少做一次 `copy_from_user`？ ```c if (procfs_buffer_size >= PROCFS_MAX_SIZE) { procfs_buffer[0] = '\0'; procfs_buffer_size = PROCFS_MAX_SIZE; *off += procfs_buffer_size; return procfs_buffer_size; } ... ``` 整體看下來是在 `procfs_buffer_size` 大小超出時清空 buffer，但做的不是很直覺？ ## TODO: 加上 simrupt 到 LKMPG 來說明 tasklet, mutex, kfifo 等使用方式 > 提交 pull request，作為對課程的貢獻和自身認知的驗證 ### 使用 simrupt 將 [simrupt](https://github.com/sysprog21/simrupt) 專案 clone 到 local 端，使用以下命令編譯程式並載入模組： ```bash $ git clone https://github.com/sysprog21/simrupt.git # Compile $ make # Insert module $ sudo insmod simrupt.ko ``` `insmod` 後，可以觀察 `simrupt` 是否有成功被載入，接著可以使用 `cat` 讓 `simrupt` 開始運作，且可以在畫面上看到輸出的資訊，掛載模組訊息中的最後兩個數字 `507` 和 `0` 分別為該模組的 Major number 和 Minor number： ```bash $ dmesg simrupt: registered new simrupt device: 507,0 $ cat /dev/simrupt !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ``` ```bash $ dmesg -w openm current cnt: 1 simrupt: [CPU#2] enter timer_handler simrupt: [CPU#2] produce data simrupt: [CPU#2] scheduling tasklet simrupt: [CPU#2] timer_handler in_irq: 4 usec simrupt: [CPU#2] simrupt_tasklet_func in_softirq: 3 usec simrupt: [CPU#7] simrupt_work_func simrupt: [CPU#2] enter timer_handler simrupt: [CPU#2] produce data simrupt: [CPU#2] scheduling tasklet simrupt: [CPU#2] timer_handler in_irq: 4 usec simrupt: [CPU#2] simrupt_tasklet_func in_softirq: 4 usec simrupt: [CPU#8] simrupt_work_func ... ``` ### tasklet `tasklet` 是 Linux kernel 中的一中軟中斷機制，用於中斷上下文中處理較長時間的任務，相關的定義及函式都存在 [linux/interrupt.h](https://github.com/torvalds/linux/blame/master/include/linux/interrupt.h) 中。在持續更新的 kernel 中，`DECLARE_TASKLET` 的定義被修改過後，比起原本多接收了一個 data 參數，在 `tasklet_struct` 中可見 `DECLARE_TASKLET` 巨集中的每項參數分別對應到其中的哪一項變數： ```c struct tasklet_struct { struct tasklet_struct *next; unsigned long state; atomic_t count; bool use_callback; union { void (*func)(unsigned long data); void (*callback)(struct tasklet_struct *t); }; unsigned long data; }; ``` ```c #define DECLARE_TASKLET(name, func, data) \ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(0), func, data } ``` :::danger 注意用語！ ::: 為了相容舊的 kernel 版本需要再定義 `DECLARE_TASKLET_OLD`，避免出現編譯錯誤，在新版本的 [interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) 已經有定義好的 `DECLARE_TASKLET_OLD`，所以主要是針對 kernel 版本還沒使用到最新的情況。 ```c #ifndef DECLARE_TASKLET_OLD #define DECLARE_TASKLET_OLD(arg1, arg2) DECLARE_TASKLET(arg1, arg2, 0L) #endif ``` :::info 好奇為什麼不是把以前使用到的 `DECLARE_TASKLET` 都更新，而是定義新的 `DECLARE_TASKLET_OLD` 去相容？ ::: 為了使用 tasklet，接下來會使用到剛才所提到的 macro，後續如要呼叫 takslet 便是呼叫 `DECLARED_TASKLET_OLD` 中的第一個參數，在範例中即為 `simrupt_tasklet`。在 tasklet 中切記不能加入 `sleep` 等暫停功能，運氣好時模組載入僅會出現錯誤訊息，嚴重時會導致系統 crash，這個時候只能乖乖地按下電源鍵重新開機。 ```c static void simrupt_tasklet_func(unsigned long __data){ ... } static DECLARE_TASKLET_OLD(simrupt_tasklet, simrupt_tasklet_func); ``` 在 `timer_handler` 中，會間隔一段時間模擬 keyboard interrupt，tasklet 通常在 bottom-half 處理不能被中斷的部分，可以看見函式中在處理完所有事情後，最後才呼叫 `tasklet_schedule` 將 tasklet 掛起。執行程式後可以從 `dmesg` 中觀察執行狀況，可以發現 tasklet 運作在呼叫 `tasklet_schedule` 時的同一個 CPU 上。 ```c static void process_data(void) { WARN_ON_ONCE(!irqs_disabled()); pr_info("simrupt: [CPU#%d] produce data\n", smp_processor_id()); fast_buf_put(update_simrupt_data()); pr_info("simrupt: [CPU#%d] scheduling tasklet\n", smp_processor_id()); tasklet_schedule(&simrupt_tasklet); } ``` 當 module 要 exit 時，也不要忘記將 garbage collection，使用 `tasklet_kill` 將先前宣告的 tasklet 給清除： ```c static void __exit simrupt_exit(void) { ... tasklet_kill(&simrupt_tasklet); ... } ``` ### mutex process 使用 mutex 時，必須先持有 mutex 才得以進入 CS (critical section) 存取資源，結束後再釋放 mutex 讓其他 process 使用。在 kernel module 中使用 mutex 非常便利，會用到的函式以及巨集都已經在 [mutex.h](https://github.com/torvalds/linux/blob/6ba59ff4227927d3a8530fc2973b80e94b54[d58f/include/linux/mutex.h#L100) 中定義好了。首先使用巨集 `DEFINE_MUTEX(mutexname)` 來宣告變數。 ```c #define DEFINE_MUTEX(mutexname) \ struct mutex mutexname = __MUTEX_INITIALIZER(mutexname) ``` 接著會使用到 `mutex_lock()` 以及 `mutex_unlock()` 來取得及釋放 mutex，在 [simrupt.c](https://github.com/sysprog21/simrupt/blob/6093cddab984286fff2f709f043a0a9bef2f93b6/simrupt.c#L143) 中提供了單一生產者及單一消費者的範例，當消費者想存取 buffer 中的資料 (進入 critical section) 之前，需先以 `mutex_lock(&consumer_lock)` 取得 mutex，且在存取完畢後以 `mutex_unlock(&consumer_lock)` 釋放 mutex，讓下一個消費者存取時可以正常取得，否則會陷入 [deadlock](https://zh.wikipedia.org/zh-tw/%E6%AD%BB%E9%94%81) 導致行為無法正常運作，對於生產者也是一樣，皆是存取 mutex 後才能操作記憶體空間。 ```c while (1) { /* Consume data from the circular buffer */ mutex_lock(&consumer_lock); val = fast_buf_get(); mutex_unlock(&consumer_lock); if (val < 0) break; /* Store data to the kfifo buffer */ mutex_lock(&producer_lock); produce_data(val); mutex_unlock(&producer_lock); } ``` 若無使用 mutex 確保 critical section 的單一存取，有可能會發生 [race condition](https://zh.wikipedia.org/zh-tw/%E7%AB%B6%E7%88%AD%E5%8D%B1%E5%AE%B3)。對於下面的狀況，預期結果會是 Process A 與 Process B 分別讀取數值並加 1 而結果為 2，若發生 race condition，沒有將讀寫保護在 critical section 中，導致結果的數值為 1，與預期並不相符。預期情況： | Process A | Process B | Data | | ----------- | ----------- | ---- | | Read (0) | | 0 | | Write (0+1) | | 1 | | | Read (1) | 1 | | | Write (1+1) | 2 | 實際情況： | Process A | Process B | Data | | ----------- | ----------- | ---- | | Read (0) | | 0 | | | Read (0) | 0 | | Write (0+1) | | 1 | | | Write (0+1) | 1 | ### kfifo > [linux/kfifo.h](https://github.com/torvalds/linux/blob/master/include/linux/kfifo.h) > [linux/kfifo.c](https://github.com/torvalds/linux/blob/master/lib/kfifo.c) `kfifo` 是在 Linux kernel 中的一種 Ring buffer 機制，如果只有一個讀取端執行緒、一個寫入端執行緒，二者沒有共享的被修改的控制變數，那麼可以證明這種情況下不需要並行控制，kfifo 就滿足上述條件，`kfifo` 主要用於生產者和消費者之間傳遞資訊，或者在中斷上下文和 process 上下文中傳遞資訊，且 `kfifo` 具有 circular buffer 的特性，當 buffer 寫入位置已經到末端時，會再從 buffer 的開頭繼續寫入，從而<s>實現</s> 循環利用，且因為 circular buffer 的特性，可以避免頻繁配置空間和釋放記憶體。 kfifo 中的 `in` 可以想成 queue 中的 `tail`，當生產者要寫入資訊時會從這端開始，`out` 則是串列中的 `head`，為消費者取得資料的目標處。 ```c struct __kfifo { unsigned int in; // 下一次寫入的位置 unsigned int out; // 下一次讀取的位置 unsigned int mask; // buffer size - 1 unsigned int esize; // buffer 中每個元素的大小 void *data; // 指向實際儲存空間的指標 }; ``` kfifo 要求 buffer size 為 2 的冪，這樣做可以讓傳入的 offset 和 `mask (buffer size - 1)` 以 `&` 快速得到 mod 後的值，得到的值即為目前要在環中做其他操作所需使用的 index，以 buffer size 為 16 來舉例： ```text buffer size = 16 -> 10000 mask = 15 -> 01111 off = 18 -> 10010 off &= mask = 2 -> 00010 ``` 在 [kfifo_copy_in](https://github.com/torvalds/linux/blob/e5b3efbe1ab1793bb49ae07d56d0973267e65112/lib/kfifo.c#L90) 與 [kfifo_copy_out](https://github.com/torvalds/linux/blob/e5b3efbe1ab1793bb49ae07d56d0973267e65112/lib/kfifo.c#L129) 中可以看到這樣的操作，且在函式的最後使用 `smp_wmb()` ( Symmetric Multi-Processing Write Memory Barrier ) 來確保所有的寫入操作在這個 memory barrier 之前完成，讓程式不會因為 CPU 或編譯器的 reordering 而被打亂順序，其他還有 `smp_mb()` 和 `smp_rmb()` 的類似應用。 ```c off &= fifo->mask; if (esize != 1) { off *= esize; size *= esize; len *= esize; } l = min(len, size - off); memcpy(fifo->data + off, src, l); memcpy(fifo->data, src + l, len - l); /* * make sure that the data in the fifo is up to date before * incrementing the fifo->in index counter */ smp_wmb(); ``` 若想得到整個 buffer 的長度以及大小，可以透過 `esize` 以及 `mask`： ```c unsigned int length = fifo->mask + 1; unsigned int size = length * fifo->esize; ``` 接著介紹 [simrupt.c](https://github.com/sysprog21/simrupt/blob/main/simrupt.c) 中，有使用到的 kfifo APIs 以及與他們相關的一些函式。 ### `kfifo_init` 有兩種方式可以用來初始化 kfifo，但兩者在使用的細節上有些許不同，在 simrupt.c 中使用巨集 `DECLARE_KFIFO_PTR`，另外一種是使用 `kfifo_init`，先看到 simrupt.c 中的用法，使用巨集 `DECLARE_KFIFO_PTR` 初始化之後，需要再以 `kfifo_alloc` 來配置其中的 buffer。 ```c /** * DECLARE_KFIFO_PTR - macro to declare a fifo pointer object * @fifo: name of the declared fifo * @type: type of the fifo elements */ #define DECLARE_KFIFO_PTR(fifo, type) \ STRUCT_KFIFO_PTR(type) fifo ``` `kfifo_alloc` 需傳入指向 `kfifo` 的指標，要分配的 buffer 大小，以及要使用的 [GFP mask](https://github.com/torvalds/linux/blob/master/include/linux/gfp.h)，在 `__kfifo_alloc` 中會使用 `roundup_pow_of_two` 將 buffer 規定為 2 的冪，並以 `kmalloc_array` 實際將空間配置完畢。 ```c #define kfifo_alloc(fifo, size, gfp_mask) \ __kfifo_int_must_check_helper( \ ({ \ typeof((fifo) + 1) __tmp = (fifo); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ __is_kfifo_ptr(__tmp) ? \ __kfifo_alloc(__kfifo, size, sizeof(*__tmp->type), gfp_mask) : \ -EINVAL; \ }) \ ) ``` simrupt.c 中的初始化範例： ```c /* Data are stored into a kfifo buffer before passing them to the userspace */ static DECLARE_KFIFO_PTR(rx_fifo, unsigned char); static int __init simrupt_init(void) { dev_t dev_id; int ret; if (kfifo_alloc(&rx_fifo, PAGE_SIZE, GFP_KERNEL) < 0) return -ENOMEM; ... } ``` 要想使用 `kfifo_init` 的話，需要先宣告要使用的 buffer 並以 `kmalloc` 將其配置好，再將之帶入參數進行初始化，以下是一個簡單的範例： ```c /** * kfifo_init - initialize a fifo using a preallocated buffer * @fifo: the fifo to assign the buffer * @buffer: the preallocated buffer to be used * @size: the size of the internal buffer, this have to be a power of 2 * * This macro initializes a fifo using a preallocated buffer. * * The number of elements will be rounded-up to a power of 2. * Return 0 if no error, otherwise an error code. */ #define kfifo_init(fifo, buffer, size) ``` ```c #define FIFO_SIZE 16 static struct kfifo my_kfifo; static unsigned char *buffer; static int __init module_init(void) { int ret; buffer = kmalloc(FIFO_SIZE, GFP_KERNEL); if (!buffer) { pr_err("Failed to allocate buffer\n"); return -ENOMEM; } ret = kfifo_init(&my_kfifo, buffer, FIFO_SIZE); if (ret) { pr_err("Failed to initialize kfifo\n"); kfree(buffer); return ret; } ... } ``` ### `kfifo_in` 介紹完初始化，接著是要將資料寫入到 `kfifo` 中，可以透過巨集 `kfifo_in` 來完成，他接收三個參數： - `fifo` : 指向 `kfifo` 結構的指標。 - `buf` : 要寫入的資料。 - `n` : 該筆資料的長度。 ```c #define kfifo_in(fifo, buf, n) \ ({ \ typeof((fifo) + 1) __tmp = (fifo); \ typeof(__tmp->ptr_const) __buf = (buf); \ unsigned long __n = (n); \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ (__recsize) ?\ __kfifo_in_r(__kfifo, __buf, __n, __recsize) : \ __kfifo_in(__kfifo, __buf, __n); \ }) ``` 在 `kfifo_in` 中會再呼叫 `__kfifo_in` 函式處理實際的資料寫入，首先以 `kfifo_unused` 獲取 `fifo` 中未使用的空間大小並存入 `l` 中，檢查長度否超出可用空間，若超過則將長度設為 `l`，防止資料溢出 buffer，接著使用 `kfifo_copy_in` 將資料複製到 `fifo` 中，資料長度為 `len`，起始位置為 `fifo->in`，最後以 `fifo->in += len` 更新下一次寫入的位置。 ```c unsigned int __kfifo_in(struct __kfifo *fifo, const void *buf, unsigned int len) { unsigned int l; l = kfifo_unused(fifo); if (len > l) len = l; kfifo_copy_in(fifo, buf, len, fifo->in); fifo->in += len; return len; } ``` ### `kfifo_unused` `__kfifo_in` 中使用到此函式來計算 `kfifo` 中未使用的空間大小，其中 `fifo->mask + 1` 表示 buffer 的總大小，`kfifo_len(fifo)` 會回傳 buffer 中已使用的大小。 ```c static inline unsigned int kfifo_unused(struct __kfifo *fifo) { return fifo->mask + 1 - kfifo_len(fifo); } ``` ### `kfifo_copy_in` 在 `kfifo_copy_in` 中，會先計算出資料要複製到 buffer 中的哪個位置，會使用到前面所提到的 `fifo->mask` 以 `&` 來快速取的取模後的值，又因為 ring buffer 的特性，資料在寫入時會被分成兩部分存取，一是從傳入的 `fifo->in` 位置寫入，接著計算出後半段後，將其從開頭處再繼續儲存，因此才會看到以下程式碼中的兩次 `memcpy`。 ```c static void kfifo_copy_in(struct __kfifo *fifo, const void *src, unsigned int len, unsigned int off) { unsigned int size = fifo->mask + 1; unsigned int esize = fifo->esize; unsigned int l; off &= fifo->mask; if (esize != 1) { off *= esize; size *= esize; len *= esize; } l = min(len, size - off); memcpy(fifo->data + off, src, l); memcpy(fifo->data, src + l, len - l); /* * make sure that the data in the fifo is up to date before * incrementing the fifo->in index counter */ smp_wmb(); } ``` ### `kfifo_to_user` 巨集 `kfifo_to_user` 用於將資料從 `kfifo` 複製到 userspace 中。 - `fifo` : 指向 `kfifo` 結構的指標。 - `to` : 指向 userspace 中 buffer 的指標。 - `len` : 要複製的資料長度。 - `copied` : 指向實際複製的數據長度的指標。 ```c #define kfifo_to_user(fifo, to, len, copied) \ __kfifo_int_must_check_helper( \ ({ \ typeof((fifo) + 1) __tmp = (fifo); \ void __user *__to = (to); \ unsigned int __len = (len); \ unsigned int *__copied = (copied); \ const size_t __recsize = sizeof(*__tmp->rectype); \ struct __kfifo *__kfifo = &__tmp->kfifo; \ (__recsize) ? \ __kfifo_to_user_r(__kfifo, __to, __len, __copied, __recsize) : \ __kfifo_to_user(__kfifo, __to, __len, __copied); \ }) \ ) ``` `kfifo_to_user` 會呼叫 `__kfifo_to_user` 將數據實際複製到 userspace 的 buffer ，其中主要是以 `kfifo_copy_to_user`，`to` 為 userspace 中的 buffer，`len` 為要複製的長度，`fifo->out` 為起始的讀取位置，`copied` 與剛才所提到的相同，當複製完成後會將 `fifo->out += len` 更新下一次的讀取位置。 ```c int __kfifo_to_user(struct __kfifo *fifo, void __user *to, unsigned long len, unsigned int *copied) { ... ret = kfifo_copy_to_user(fifo, to, len, fifo->out, copied); if (unlikely(ret)) { len -= ret; err = -EFAULT; } else err = 0; fifo->out += len; return err; } ``` :::danger 開發紀錄務必依據以下： * 資訊科技詞彙翻譯: https://hackmd.io/@sysprog/it-vocabulary * 詞彙對照表: https://hackmd.io/@l10n-tw/glossaries :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.