# Linux 核心專題: simrupt 研究和應用
> 執行人: brianlin314
> [專題解說錄影](https://youtu.be/Ge5sDqOIYI4)
:::success
:question: 提問清單
* ?
:::
## 簡述
[simrupt](https://github.com/sysprog21/simrupt) 專案名稱由 simulate 和 interrupt 二個單字組合而來,其作用是模擬 [IRQ 事件](https://www.kernel.org/doc/html/latest/core-api/genericirq.html),並展示以下 Linux 核心機制的運用:
- irq
- softirq
- tasklet
- workqueue
- kernel thread
- [kfifo](https://www.kernel.org/doc/htmldocs/kernel-api/kfifo.html)
- [memory barrier](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
相關資訊:
* [2021 年開發紀錄](https://hackmd.io/@linD026/simrupt-vwifi)
* [2022 年開發紀錄](https://hackmd.io/@Build-A-Moat/BJasWQhOc)
## 開發環境
檢查核心版本是在 Linux v5.15+
```shell
$ uname -r
5.19.0-43-generic
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i5-12400
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 2
CPU max MHz: 5600.0000
CPU min MHz: 800.0000
BogoMIPS: 4992.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfm
on pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2
x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shad
ow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves s
plit_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm
md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 288 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 7.5 MiB (6 instances)
L3: 18 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
```
## TODO: 解釋 [simrupt](https://github.com/sysprog21/simrupt) 運作原理
搭配 [Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/) 和課程教材,逐一解釋 [simrupt](https://github.com/sysprog21/simrupt) 所展示的核心機制,設計對應的實驗來解說。
### kfifo
[kfifo](https://archive.kernel.org/oldlinux/htmldocs/kernel-api/kfifo.html) 是 linux kernel 中一個 First-In-First-Out 的結構,在 Single Producer Single Consumer 情況中是 safe 的,即不需要額外的 lock 維護,在程式碼中註解中也有提及。
在此專案中有一個 kfifo 資料結構 rx_fifo,用來儲存即將傳到 userspace 的 data。
```c
/* Data are stored into a kfifo buffer before passing them to the userspace */
static struct kfifo rx_fifo;
/* NOTE: the usage of kfifo is safe (no need for extra locking), until there is
* only one concurrent reader and one concurrent writer. Writes are serialized
* from the interrupt context, readers are serialized using this mutex.
*/
static DEFINE_MUTEX(read_lock);
```
將 Data 插入到 rx_fifo 中,並檢查寫入的長度與避免過度輸出日誌而影響效能,之所以對 len 進行檢查的原因在於 kfifo_in 所回傳之值,是實際成功插入的數量。
```c
/* Insert a value into the kfifo buffer */
static void produce_data(unsigned char val)
{
/* Implement a kind of circular FIFO here (skip oldest element if kfifo
* buffer is full).
*/
unsigned int len = kfifo_in(&rx_fifo, &val, sizeof(val));
if (unlikely(len < sizeof(val)) && printk_ratelimit())
pr_warn("%s: %zu bytes dropped\n", __func__, sizeof(val) - len);
pr_debug("simrupt: %s: in %u/%u bytes\n", __func__, len,
kfifo_len(&rx_fifo));
}
```
- `kfifo_in(fifo, buf, n);`
- 複製 buf 資料並放到 fifo 中,最後會回傳插入的資料大小。
- 在程式碼中,會先確認要放入的大小是否大於剩餘的大小,若超過,則會以剩餘大小為主。
```c
unsigned int __kfifo_in(struct __kfifo *fifo,
const void *buf, unsigned int len)
{
unsigned int l;
l = kfifo_unused(fifo);
if (len > l)
len = l;
kfifo_copy_in(fifo, buf, len, fifo->in);
fifo->in += len;
return len;
}
```
- `kfifo_to_user(fifo, to, len, copied);`
- 將最多 len 個 bytes 資料從 fifo 移到 userspace。
- `kfifo_alloc(fifo, size, gfp_mask);`
- 動態配置一個 fifo buffer,配置成功會回傳 0。若要釋放 fifo 可透過 `kfifo_free(fifo);` 實現。
### fast circular buffer
首先先查閱相關資料,以了解 [Circular Buffers](https://www.kernel.org/doc/Documentation/core-api/circular-buffers.rst)。
circular buffer 是一個固定大小的 buffer,其中具有 2 個 indicies:
- `head index`: the point at which the producer inserts items into the buffer.
- `tail index`: the point at which the consumer finds the next item in the buffer.
當 head 和 tail 重疊時,代表當前是空的 buffer,相反的,當 head 比 tail 少 1 時,代表 buffer 是滿的。
當有項目被添加時,head index 會增加,當有項目被移除時,tail index 會被增加,tail 不會超過 head,且當兩者都到達 buffer 的末端時,都必須被設定回 0。也可以透過此方法清除 buffer 中的資料。
```c
/* Allocate fast circular buffer */
fast_buf.buf = vmalloc(PAGE_SIZE);
```
```c
/* Clear all data from the circular buffer fast_buf */
static void fast_buf_clear(void)
{
fast_buf.head = fast_buf.tail = 0;
}
```
Measuring power-of-2 buffers: 讓 buffer 大小維持 2 的冪次方,就可以使用 bitwise 操作去計算 buffer 空間,避免使用較慢的 modulus (divide) 操作。
- [/include/linux/circ_buf.h](https://elixir.bootlin.com/linux/latest/source/include/linux/circ_buf.h)
```c
#ifndef _LINUX_CIRC_BUF_H
#define _LINUX_CIRC_BUF_H 1
struct circ_buf {
char *buf;
int head;
int tail;
};
/* Return count in buffer. */
#define CIRC_CNT(head,tail,size) (((head) - (tail)) & ((size)-1))
/* Return space available, 0..size-1. We always leave one free char
as a completely full buffer has head == tail, which is the same as
empty. */
#define CIRC_SPACE(head,tail,size) CIRC_CNT((tail),((head)+1),(size))
/* Return count up to the end of the buffer. Carefully avoid
accessing head and tail more than once, so they can change
underneath us without returning inconsistent results. */
#define CIRC_CNT_TO_END(head,tail,size) \
({int end = (size) - (tail); \
int n = ((head) + end) & ((size)-1); \
n < end ? n : end;})
/* Return space available up to the end of the buffer. */
#define CIRC_SPACE_TO_END(head,tail,size) \
({int end = (size) - 1 - (head); \
int n = (end + (tail)) & ((size)-1); \
n <= end ? n : end+1;})
#endif /* _LINUX_CIRC_BUF_H */
```
- `CIRC_SPACE*()` 是 producer 使用的,`CIRC_CNT*()` 是 consumer 使用的。
在 simrupt 中,一個"更快速"的 circular buffer 被拿來儲存即將要放到 kfifo 的資料。
```c
/* We use an additional "faster" circular buffer to quickly store data from
* interrupt context, before adding them to the kfifo.
*/
static struct circ_buf fast_buf;
```
`READ_ONCE()` 是一個 relaxed-ordering 且保證 atomic 的 memory operation,可以確保在多執行序環境中,讀取到的值是正確的,並保證讀寫操作不會被 compiler 優化。
`smp_rmb()` 是一個 memory barrier,會防止記憶體讀取指令的重排,確保先讀取索引值後再讀取內容。在 [Lockless patterns: relaxed access and partial memory barriers](https://lwn.net/Articles/846700/) 中提到 `smp_rmb()` 與 `smp_wmb()` 的 barrier 效果比 `smp_load_acquire()` 與 `smp_store_release()` 還要來的差,但是因為 load-store 之間的排序關係很少有影響,所以開發人員常以 `smp_rmb()` 和 `smp_wmb()` 作為 memory barrier 。
`fast_buf_get` 扮演一個 consumer 的腳色,會從 buffer 中取得資料,並更新 tail index。
```c
static int fast_buf_get(void)
{
struct circ_buf *ring = &fast_buf;
/* prevent the compiler from merging or refetching accesses for tail */
unsigned long head = READ_ONCE(ring->head), tail = ring->tail;
int ret;
if (unlikely(!CIRC_CNT(head, tail, PAGE_SIZE)))
return -ENOENT;
/* read index before reading contents at that index */
smp_rmb();
/* extract item from the buffer */
ret = ring->buf[tail];
/* finish reading descriptor before incrementing tail */
smp_mb();
/* increment the tail pointer */
ring->tail = (tail + 1) & (PAGE_SIZE - 1);
return ret;
}
```
fast_buf_put 扮演一個 producer 的腳色,透過 `CIRC_SPACE()` 判斷 buffer 中是否有剩餘空間,並更新 head index。
```C
static int fast_buf_put(unsigned char val)
{
struct circ_buf *ring = &fast_buf;
unsigned long head = ring->head;
/* prevent the compiler from merging or refetching accesses for tail */
unsigned long tail = READ_ONCE(ring->tail);
/* is circular buffer full? */
if (unlikely(!CIRC_SPACE(head, tail, PAGE_SIZE)))
return -ENOMEM;
ring->buf[ring->head] = val;
/* commit the item before incrementing the head */
smp_wmb();
/* update header pointer */
ring->head = (ring->head + 1) & (PAGE_SIZE - 1);
return 0;
}
```
函數 process_data 呼叫了 `fast_buf_put(update_simrupt_data());` ,其中 `update_simrupt_data()` 會產生 data,這些 data 的範圍在 `0x20` 到 `0x7E` 之間,即 [ASCII](https://zh.wikipedia.org/zh-tw/ASCII) 中的可顯示字元,這些 data 會被放入 circular buffer 中,最後交由 tasklet_schedule 進行排程。
```c
static void process_data(void)
{
WARN_ON_ONCE(!irqs_disabled());
pr_info("simrupt: [CPU#%d] produce data\n", smp_processor_id());
fast_buf_put(update_simrupt_data());
pr_info("simrupt: [CPU#%d] scheduling tasklet\n", smp_processor_id());
tasklet_schedule(&simrupt_tasklet);
}
```
### tasklet
tasklet 是基於 softirq 之上建立的,但最大的差別在於 tasklet 可以動態分配且可以被用在驅動裝置上。
tasklet 可以被 workqueues、timers 或 threaded interrupts 取代,但 kernel 中尚有使用 tasklet 的情況,現在,開發人員正在進行API變更,而 `DECLARE_TASKLET_OLD` 的存在是為了保持兼容性。
[Modernizing the tasklet API](https://lwn.net/Articles/830964/)
```c
#define DECLARE_TASKLET_OLD(arg1, arg2) DECLARE_TASKLET(arg1, arg2, 0L)
```
首先會先確保函數在 interrupt context 和 softirq context 中執行,使用 queue_work 將 work 放入 workqueueu 中,並記錄執行時間。
```c
/**
* queue_work - queue work on a workqueue
* @wq: workqueue to use
* @work: work to queue */
static inline bool queue_work(struct workqueue_struct *wq,
struct work_struct *work)
{
return queue_work_on(WORK_CPU_UNBOUND, wq, work);
}
```
```c
/* Tasklet handler.
*
* NOTE: different tasklets can run concurrently on different processors, but
* two of the same type of tasklet cannot run simultaneously. Moreover, a
* tasklet always runs on the same CPU that schedules it.
*/
static void simrupt_tasklet_func(unsigned long __data)
{
ktime_t tv_start, tv_end;
s64 nsecs;
WARN_ON_ONCE(!in_interrupt());
WARN_ON_ONCE(!in_softirq());
tv_start = ktime_get();
queue_work(simrupt_workqueue, &work);
tv_end = ktime_get();
nsecs = (s64) ktime_to_ns(ktime_sub(tv_end, tv_start));
pr_info("simrupt: [CPU#%d] %s in_softirq: %llu usec\n", smp_processor_id(),
__func__, (unsigned long long) nsecs >> 10);
}
/* Tasklet for asynchronous bottom-half processing in softirq context */
static DECLARE_TASKLET_OLD(simrupt_tasklet, simrupt_tasklet_func);
```
透過上述註解可以得知:
- 不同 tasklet 可以在不同 CPU 同時執行
- 相同 tasklet 不能同時執行
- 一個 tasklet 只會在調度他的 CPU 上執行
| | softirq | tasklet |
|:------------------------:|:-------:|:-------:|
| 多個在同一個 CPU 執行? | No | No |
| 相同的可在不同 CPU 執行? | Yes | No |
| 會在同個 CPU 執行? | Yes | Maybe |
當 `tasklet_schedule()`被呼叫時,代表此 tasklet 被允許在 CPU 上執行,詳見 [linux/include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h)
### workqueue
[linux/include/linux/workqueue.h](https://elixir.bootlin.com/linux/latest/source/include/linux/workqueue.h)
定義兩個 mutex lock,producer_lock、consumer_lock。
```c
/* Mutex to serialize kfifo writers within the workqueue handler */
static DEFINE_MUTEX(producer_lock);
/* Mutex to serialize fast_buf consumers: we can use a mutex because consumers
* run in workqueue handler (kernel thread context).
*/
static DEFINE_MUTEX(consumer_lock);
```
`get_cpu()` 獲取當前 CPU 編號並 disable preemption,最後需要 `put_cpu()` 重新 enable preemption。
21-23行使用 `mutex_lock(&consumer_lock)` 鎖住消費者區域,防止其它的任務取得 circular buffer 的資料。
29-31行使用 `mutex_lock(&producer_lock)` 鎖住生產者區域,防止其它的任務寫入 kfifo buffer。
`wake_up_interruptible(&rx_wait)` 會換醒 wait queue 上的行程,將其狀態設置為 TASK_RUNNING。
```c=
/* Wait queue to implement blocking I/O from userspace */
static DECLARE_WAIT_QUEUE_HEAD(rx_wait);
/* Workqueue handler: executed by a kernel thread */
static void simrupt_work_func(struct work_struct *w)
{
int val, cpu;
/* This code runs from a kernel thread, so softirqs and hard-irqs must
* be enabled.
*/
WARN_ON_ONCE(in_softirq());
WARN_ON_ONCE(in_interrupt());
/* Pretend to simulate access to per-CPU data, disabling preemption
* during the pr_info().
*/
cpu = get_cpu();
pr_info("simrupt: [CPU#%d] %s\n", cpu, __func__);
put_cpu();
while (1) {
/* Consume data from the circular buffer */
mutex_lock(&consumer_lock);
val = fast_buf_get();
mutex_unlock(&consumer_lock);
if (val < 0)
break;
/* Store data to the kfifo buffer */
mutex_lock(&producer_lock);
produce_data(val);
mutex_unlock(&producer_lock);
}
wake_up_interruptible(&rx_wait);
}
```
在 workqueue 中執行的 work,可以由 `DECLARE_WORK()` 或 `INIT_WORK()` 定義。
- `DECLARE_WORK(name, void (*func) (void *), void *data)` 會在編譯時,靜態地初始化 work。
- `INIT_WORK(struct work_struct *work, woid(*func) (void *), void *data)` 在執行時,動態地初始化一個 work。
```c
/* Workqueue for asynchronous bottom-half processing */
static struct workqueue_struct *simrupt_workqueue;
/* Work item: holds a pointer to the function that is going to be executed
* asynchronously.
*/
static DECLARE_WORK(work, simrupt_work_func);
```
### timer
透過 `timer_setup()` 初始化 timer。
```c
/* Setup the timer */
timer_setup(&timer, timer_handler, 0);
void timer_setup(struct timer_list * timer,
void (*function)(struct timer_list *),
unsigned int flags);
```
目標是模擬 hard-irq,所以必須確保目前是在 softirq context,欲模擬在 interrupt context 中處理中斷,所以針對該 CPU disable interrupts。
```C
/* Timer to simulate a periodic IRQ */
static struct timer_list timer;
static void timer_handler(struct timer_list *__timer)
{
ktime_t tv_start, tv_end;
s64 nsecs;
pr_info("simrupt: [CPU#%d] enter %s\n", smp_processor_id(), __func__);
/* We are using a kernel timer to simulate a hard-irq, so we must expect
* to be in softirq context here.
*/
WARN_ON_ONCE(!in_softirq());
/* Disable interrupts for this CPU to simulate real interrupt context */
local_irq_disable();
tv_start = ktime_get();
process_data();
tv_end = ktime_get();
nsecs = (s64) ktime_to_ns(ktime_sub(tv_end, tv_start));
pr_info("simrupt: [CPU#%d] %s in_irq: %llu usec\n", smp_processor_id(),
__func__, (unsigned long long) nsecs >> 10);
mod_timer(&timer, jiffies + msecs_to_jiffies(delay));
local_irq_enable();
}
```
使用 mod_timer 對 timer 進行排程。
[Jiffy](https://en.wikipedia.org/wiki/Jiffy_(time)) 是一個非正式術語,表示不具體的非常短暫的時間段,可透過以下公式進行轉換。
```
jiffies_value = seconds_value * HZ;
seconds_value = jiffies_value / HZ;
```
### simrupt_init
在這個函數底下,會進行許多資料結構的初始化,包含:
1. 分配給 kfifo 一個 PAGE_SIZE 大小的空間
```c
kfifo_alloc(&rx_fifo, PAGE_SIZE, GFP_KERNEL)
```
2. 分配給 circular buffer 一個 PAGE_SIZE 大小的空間
```c
fast_buf.buf = vmalloc(PAGE_SIZE);
```
3. 為 simrupt 註冊一個設備號,並且加入到系統中
```c
ret = alloc_chrdev_region(&dev_id, 0, NR_SIMRUPT, DEV_NAME);
...
cdev_init(&simrupt_cdev, &simrupt_fops);
ret = cdev_add(&simrupt_cdev, dev_id, NR_SIMRUPT);
...
```
4. 註冊設備到 sysfs,即可允許 user space 使用設備文件 "/dev/simrupt" 來訪問和控制該設備
```c
device_create(simrupt_class, NULL, MKDEV(major, 0), NULL, DEV_NAME);
```
5. 分配一個新的 workqueue
```c
simrupt_workqueue = alloc_workqueue("simruptd", WQ_UNBOUND, WQ_MAX_ACTIVE);
```
6. 設定 timer
```c
timer_setup(&timer, timer_handler, 0);
```
### simrupt 流程圖
```graphviz
digraph simrupt {
node [shape = box]
rankdir = TD
timer_handler [label="timer_handler"]
process_data [label = "process_data"]
update_simrupt_data [label="update_simrupt_data"]
fast_circular_buffer [label="fast_circular_buffer", shape=ellipse]
simrupt_tasklet_func [label = "simrupt_tasklet_func"]
simrupt_work_func [label = "simrupt_work_func"]
kfifo [label = "kfifo", shape=ellipse]
timer_handler -> process_data
process_data -> update_simrupt_data
update_simrupt_data -> fast_circular_buffer [label = " fast_buf_put"]
process_data -> simrupt_tasklet_func [label = " tasklet_schedule"]
simrupt_tasklet_func -> simrupt_work_func [label = " queue_work"]
fast_circular_buffer -> simrupt_work_func [label = "fast_buf_get"]
simrupt_work_func -> kfifo [label = " produce_data"]
{rank=same update_simrupt_data simrupt_tasklet_func}
}
```
### simrupt 的使用
掛載核心模組。
```shell
sudo insmod simrupt.ko
```
掛載後,會產生一個裝置檔案`/dev/simrupt`,透過以下指令可以看到打印出的資料。
```shell
sudo cat /dev/simrupt
```
```shell
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
```
`dmesg` 顯示核心訊息,加入 `--follow` 可即時查看。
```shell
sudo dmesg --follow
```
```shell
[882265.813265] simrupt: [CPU#3] enter timer_handler
[882265.813297] simrupt: [CPU#3] produce data
[882265.813299] simrupt: [CPU#3] scheduling tasklet
[882265.813300] simrupt: [CPU#3] timer_handler in_irq: 2 usec
[882265.813350] simrupt: [CPU#3] simrupt_tasklet_func in_softirq: 0 usec
[882265.813383] simrupt: [CPU#5] simrupt_work_func
```
## TODO: 在 Linux 核心選定 irq/softirq/workqueue 的應用案例
在 Linux 核心原始程式碼選定規模較小、恰好可展現 irq/softirq/workqueue 的應用案例,需要確保在 Linux v5.15+ 可執行,設計實驗來驗證其行為,並解釋其原理。
### 應用案例
下載核心原始程式碼
```shell
git clone https://github.com/torvalds/linux.git
```
使用 find、grep 找尋原始程式碼的 irq 應用案例
```shell
find "linux/" -name "*.c" -exec grep -H "irq" {} \; > irq.txt
```
選定 [linux/samples/trace_printk/trace-printk.c](https://github.com/torvalds/linux/blob/master/samples/trace_printk/trace-printk.c) 作為應用案例
此案例演示 trace_printk 與 irq_work ,在 [Running work in hardware interrupt context](https://lwn.net/Articles/411605/) 文件中,說明了為了避免在 hardware interrupt context 中執行程式,Linux 中存在很多機制能將 interrupt-driven work 延遲執行,但在某些情況下,仍會需要在 hardware interrupt context 下執行程式,所以新增這個 API。
該 API 預期的使用情境是 non-maskable interrupts,也就是指關鍵的任務,不能被延遲執行,且執行在 hardware interrupt context 下的程式需要被確保不會對系統造成負面影響。
- init_irq_work
初始化 irq_work
```c
#define DEFINE_IRQ_WORK(name, _f) \
struct irq_work name = IRQ_WORK_INIT(_f)
static inline
void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
{
*work = IRQ_WORK_INIT(func);
}
```
- enqueue irq_work
將 irq_work enqueue 到當前 CPU 上,在 enqueue 之前,會先調用 irq_work_claim() 判斷此 work 是否已經交由另一個處理器進行處理? 即irq_work 的狀態會被宣告為 "claimed",且若 IRQ_WORK_PENDING 被設定,表示不需要觸發 [IPI](https://zh.wikipedia.org/zh-tw/%E5%A4%84%E7%90%86%E5%99%A8%E9%97%B4%E4%B8%AD%E6%96%AD) (Inter-Processor Interrupt)。
```c
/* Enqueue the irq work @work on the current CPU */
bool irq_work_queue(struct irq_work *work)
{
/* Only queue if not already pending */
if (!irq_work_claim(work))
return false;
/* Queue the entry and raise the IPI if needed. */
preempt_disable();
__irq_work_queue_local(work);
preempt_enable();
return true;
}
/*
* Claim the entry so that no one else will poke at it.
*/
static bool irq_work_claim(struct irq_work *work)
{
int oflags;
oflags = atomic_fetch_or(IRQ_WORK_CLAIMED | CSD_TYPE_IRQ_WORK, &work->node.a_flags);
/*
* If the work is already pending, no need to raise the IPI.
* The pairing smp_mb() in irq_work_single() makes sure
* everything we did before is visible.
*/
if (oflags & IRQ_WORK_PENDING)
return false;
return true;
}
```
- irq_work_sync
會先確保在此函數執行期間,中斷不會被禁用,若 kernel 配置啟用了 CONFIG_PREEMPT_RT 且該 work 不是 hardware interrupt,或者,不支持 hardware interrupt 的檢測時,會執行 `rcuwait_wait_event` 等待 irq_work 變成空閒狀態。
總結,`irq_work_sync` 會對 irq_work 進行同步操作,確保它當前未被使用。
```c
/*
* Synchronize against the irq_work @entry, ensures the entry is not
* currently in use.
*/
void irq_work_sync(struct irq_work *work)
{
lockdep_assert_irqs_enabled();
might_sleep();
if ((IS_ENABLED(CONFIG_PREEMPT_RT) && !irq_work_is_hard(work)) ||
!arch_irq_work_has_interrupt()) {
rcuwait_wait_event(&work->irqwait, !irq_work_is_busy(work),
TASK_UNINTERRUPTIBLE);
return;
}
while (irq_work_is_busy(work))
cpu_relax();
}
```
此案例演示在 harware interrupt context 下,使用 trace_printk 輸出 static 與 global 的字串。
```c
static void trace_printk_irq_work(struct irq_work *work)
{
trace_printk("(irq) This is a static string that will use trace_bputs\n");
trace_printk(trace_printk_test_global_str_irq);
trace_printk("(irq) This is a %s that will use trace_bprintk()\n",
"static string");
trace_printk(trace_printk_test_global_str_fmt,
"(irq) ", "dynamic string");
}
static int __init trace_printk_init(void)
{
init_irq_work(&irqwork, trace_printk_irq_work);
trace_printk("This is a static string that will use trace_bputs\n");
trace_printk(trace_printk_test_global_str);
/* Kick off printing in irq context */
irq_work_queue(&irqwork);
irq_work_sync(&irqwork);
trace_printk("This is a %s that will use trace_bprintk()\n",
"static string");
trace_printk(trace_printk_test_global_str_fmt, "", "dynamic string");
return 0;
}
```
#### ftrace 使用
先掛載 Debugfs
```shell
$ sudo mount -t debugfs nodev /sys/kernel/debug
```
在 root 權限下,到正確的目錄底下開啟追蹤功能
```shell
$ cd /sys/kernel/debug/tracing
$ echo 1 > tracing_on
```
停止追蹤
```shell
$ echo 0 > tracing_on
```
查看 ftrace 輸出信息,參照 [ftrace.txt](https://www.kernel.org/doc/Documentation/trace/ftrace.txt)
- `irqs-off`:
- `d` interrupts are disabled.
- `.` otherwise.
- `need-resched`:
- `N` both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
- `n` only TIF_NEED_RESCHED is set,
- `p` only PREEMPT_NEED_RESCHED is set,
- `.` otherwise.
- `hardirq/softirq`:
- `Z` - NMI occurred inside a hardirq
- `z` - NMI is running
- `H` - hard irq occurred inside a softirq.
- `h` - hard irq is running
- `s` - soft irq is running
- `.` - normal context.
- `preempt-depth`:
- The level of preempt_disabled
在 trace 追蹤檔可以觀察到,透過 irq_work 將`trace_printk_irq_work` 這個函數放在 interrupt context 下執行,且在 interrupt context 下會 disabled interrupt。
```shell
$ cat trace
_-----=> irqs-off/BH-disabled
/ _----=> need-resched
| / _---=> hardirq/softirq
|| / _--=> preempt-depth
||| / _-=> migrate-disable
|||| / delay
TASK-PID CPU# ||||| TIMESTAMP FUNCTION
| | | ||||| | |
insmod-19272 [011] ..... 153925.593562: 0xffffffffc0c67005: This is a static string that will use trace_bputs
insmod-19272 [011] ..... 153925.593563: 0xffffffffc0c67005: This is a dynamic string that will use trace_puts
insmod-19272 [011] d.h1. 153925.593564: 0xffffffffc0c62005: (irq) This is a static string that will use trace_bputs
insmod-19272 [011] d.h1. 153925.593564: 0xffffffffc0c62005: (irq) This is a dynamic string that will use trace_puts
insmod-19272 [011] d.h1. 153925.593564: 0xffffffffc0c62005: (irq) This is a static string that will use trace_bprintk()
insmod-19272 [011] d.h1. 153925.593565: 0xffffffffc0c62005: (irq) This is a dynamic string that will use trace_printk
insmod-19272 [011] ..... 153925.593568: 0xffffffffc0c67005: This is a static string that will use trace_bprintk()
insmod-19272 [011] ..... 153925.593568: 0xffffffffc0c67005: This is a dynamic string that will use trace_printk
```
### 設計實驗
workqueue 可以將 work 延後執行,並交由一個 kernel thread 去執行,相對於 tasklet 要在 interrupt context 下執行,workqueue 能在 process context 下執行,且允許重新排程與 sleep,所以 workqueue 可以取代 tasklet。
參考 [LKMPG workqueue](https://sysprog21.github.io/lkmpg/#work-queues) 實驗,自行設計一個 [workqueue-example.c](https://github.com/brianlin314/workqueue-examples/blob/main/workqueue-example.c) 實驗,該實驗會將 work 與 delayed_work 放到 workqueue 中,並記錄時間,之後,先執行 work,並再延遲5秒後,執行 delayed_work。
- `schedule_work()`
將 work 放到 workqueue 中,能指定 CPU 來執行 work。
```c
static inline bool schedule_work(struct work_struct *work)
{
return queue_work(system_wq, work);
}
...
static inline bool queue_work(struct workqueue_struct *wq,
struct work_struct *work)
{
return queue_work_on(WORK_CPU_UNBOUND, wq, work);
}
```
- `schedule_work_on()`
將 work 放到 workqueue 中,並且讓 kernel 選擇 CPU 來執行 work。
```c
static inline bool schedule_work_on(int cpu, struct work_struct *work)
{
return queue_work_on(cpu, system_wq, work);
}
```
```c
static int __init sched_init(void)
{
ktime_t tv_start, tv_end;
s64 nsecs;
queue = alloc_workqueue("workqueue_example", WQ_UNBOUND, 1);
if (!queue) {
pr_err("Failed to allocate workqueue\n");
return -ENOMEM;
}
INIT_WORK(&work, work_handler);
tv_start = ktime_get();
schedule_work_on(smp_processor_id(), &work);
tv_end = ktime_get();
nsecs = (s64) ktime_to_ns(ktime_sub(tv_end, tv_start));
pr_info("put work in workqueue: %llu usec\n", (unsigned long long) nsecs >> 10);
INIT_DELAYED_WORK(&delayed_work, delayed_work_handler);
tv_start = ktime_get();
schedule_delayed_work_on(smp_processor_id(), &delayed_work, msecs_to_jiffies(5000));
tv_end = ktime_get();
nsecs = (s64) ktime_to_ns(ktime_sub(tv_end, tv_start));
pr_info("put delay_work in workqueue: %llu usec\n", (unsigned long long) nsecs >> 10);
return 0;
}
```
```shell
$ make check
$ sudo dmesg
[ 6870.984578] put work in workqueue: 7 usec
[ 6870.984586] put delay_work in workqueue: 1 usec
[ 6870.984598] [work_handler]: Hello LINUX.
[ 6876.178292] [delayed_work_handler]: Hello NCKU.
[ 6878.000915] sched exit.
```
## TODO: 在 Linux 核心選定 kfifo 應用案例
在 Linux 核心原始程式碼選定規模較小、恰好可展現 kfifo 的應用案例,需要確保在 Linux v5.15+ 可執行,設計實驗來驗證其行為,並解釋其原理。
[kfifo](https://archive.kernel.org/oldlinux/htmldocs/kernel-api/kfifo.html) 是一個 [Circular buffer](https://en.wikipedia.org/wiki/Circular_buffer) 的資料結構,而 ring-buffer 就是參考 kfifo 所實作的。
kfifo 適合的使用情境,可以在 [linux/kfifo.h](https://github.com/torvalds/linux/blob/master/include/linux/kfifo.h) 中看到:
```
/*
* Note about locking: There is no locking required until only one reader
* and one writer is using the fifo and no kfifo_reset() will be called.
* kfifo_reset_out() can be safely used, until it will be only called
* in the reader thread.
* For multiple writer and one reader there is only a need to lock the writer.
* And vice versa for only one writer and multiple reader there is only a need
* to lock the reader.
*/
```
選定 [linux/samples/kfifo/](https://github.com/torvalds/linux/tree/master/samples/kfifo) 作為應用案例,並參考 [kfifo-examples](https://github.com/sysprog21/kfifo-examples) 進行實驗。
### 應用案例
#### record-example.c
- 定義一個大小為 100 的 char 陣列 buf,用於臨時儲存數據。
- 定義一個 struct hello,包含一個 unsigned char buf,且該 buf 初始化為 "hello"。
- `kfifo_in(&test, &hello, sizeof(hello))` 將 struct hello 寫入 kfifo buffer,並用 `kfifo_peek_len(&test)` 印出 kfifo buffer 下一個 record 的大小。
- for 迴圈裡面,每次會使用 memset 將 buf 的前 i+1 個元素設置為 'a'+i,並用 `kfifo_in(&test, buf, i + 1)` 寫入 kfifo buffer。
- `kfifo_skip(&test)` 跳過 kfifo buffer 的第一個值,即跳過 "hello"。
- `kfifo_out_peek(&test, buf, sizeof(buf)` 會在不刪除元素情況下,印出 kfifo buffer 的第一個元素。
- `kfifo_len(&test)` 印出目前 kfifo buffer 以占用的大小。
- while 迴圈用 `kfifo_out(&test, buf, sizeof(buf))` 逐一比對 kfifo buffer 中的元素是不是和 excepted_result 中的元素一樣。
```c
static int __init testfunc(void)
{
char buf[100];
unsigned int i;
unsigned int ret;
struct { unsigned char buf[6]; } hello = { "hello" };
printk(KERN_INFO "record fifo test start\n");
kfifo_in(&test, &hello, sizeof(hello));
/* show the size of the next record in the fifo */
printk(KERN_INFO "fifo peek len: %u\n", kfifo_peek_len(&test));
/* put in variable length data */
for (i = 0; i < 10; i++) {
memset(buf, 'a' + i, i + 1);
kfifo_in(&test, buf, i + 1);
}
/* skip first element of the fifo */
printk(KERN_INFO "skip 1st element\n");
kfifo_skip(&test);
printk(KERN_INFO "fifo len: %u\n", kfifo_len(&test));
/* show the first record without removing from the fifo */
ret = kfifo_out_peek(&test, buf, sizeof(buf));
if (ret)
printk(KERN_INFO "%.*s\n", ret, buf);
/* check the correctness of all values in the fifo */
i = 0;
while (!kfifo_is_empty(&test)) {
ret = kfifo_out(&test, buf, sizeof(buf));
buf[ret] = '\0';
printk(KERN_INFO "item = %.*s\n", ret, buf);
if (strcmp(buf, expected_result[i++])) {
printk(KERN_WARNING "value mismatch: test failed\n");
return -EIO;
}
}
if (i != ARRAY_SIZE(expected_result)) {
printk(KERN_WARNING "size mismatch: test failed\n");
return -EIO;
}
printk(KERN_INFO "test passed\n");
return 0;
}
```
掛載核心模組。
```shell
sudo insmod record-example.ko
```
利用 dmesg 查看信息
```shell
$ sudo dmesg
[360087.628314] record fifo test start
[360087.628316] fifo peek len: 6
[360087.628317] skip 1st element
[360087.628317] fifo len: 65
[360087.628318] a
[360087.628318] item = a
[360087.628319] item = bb
[360087.628319] item = ccc
[360087.628319] item = dddd
[360087.628319] item = eeeee
[360087.628320] item = ffffff
[360087.628320] item = ggggggg
[360087.628320] item = hhhhhhhh
[360087.628321] item = iiiiiiiii
[360087.628321] item = jjjjjjjjjj
[360087.628321] test passed
```
#### bytestream-example.c
- 分別用 `kfifo_in` 與 `kfifo_put` 將字串 "hello" 與數字 0-9 放入 kfifo buffer。
- `kfifo_in`: 可一次將 n Bytes 的 object 放到 kfifo buffer 中。
- `kfifo_put`: 與 `kfifo_in` 相似,只是用來處理要將單一個值放入 kfifo buffer 的情境,若要插入時,buffer 已滿,則會返回 0。
- `kfifo_out` 先將 kfifo buffer 前 5 個值拿出,即 "hello"。
- `kfifo_out` 將 kfifo buffer 前 2 個值 (0、1) 拿出,再用 `kfifo_in` 重新將 0、1 放入 kfifo buffer,並用 `kfifo_skip` 拿出並忽略 buffer 中第一個值。
- 值從 20 開始,逐一將大小為 32B 的 kfifo buffer 填滿。
- 並用 `kfifo_get` 逐一檢查 buffer 內的值是否與 expected_result 中的值一樣,若一樣,則 test passed。
```c
static int __init testfunc(void)
{
unsigned char buf[6];
unsigned char i, j;
unsigned int ret;
printk(KERN_INFO "byte stream fifo test start\n");
/* put string into the fifo */
kfifo_in(&test, "hello", 5);
/* put values into the fifo */
for (i = 0; i != 10; i++)
kfifo_put(&test, i);
/* show the number of used elements */
printk(KERN_INFO "fifo len: %u\n", kfifo_len(&test));
/* get max of 5 bytes from the fifo */
i = kfifo_out(&test, buf, 5);
printk(KERN_INFO "buf: %.*s\n", i, buf);
/* get max of 2 elements from the fifo */
ret = kfifo_out(&test, buf, 2);
printk(KERN_INFO "ret: %d\n", ret);
/* and put it back to the end of the fifo */
ret = kfifo_in(&test, buf, ret);
printk(KERN_INFO "ret: %d\n", ret);
/* skip first element of the fifo */
printk(KERN_INFO "skip 1st element\n");
kfifo_skip(&test);
/* put values into the fifo until is full */
for (i = 20; kfifo_put(&test, i); i++)
;
printk(KERN_INFO "queue len: %u\n", kfifo_len(&test));
/* show the first value without removing from the fifo */
if (kfifo_peek(&test, &i))
printk(KERN_INFO "%d\n", i);
/* check the correctness of all values in the fifo */
j = 0;
while (kfifo_get(&test, &i)) {
printk(KERN_INFO "item = %d\n", i);
if (i != expected_result[j++]) {
printk(KERN_WARNING "value mismatch: test failed\n");
return -EIO;
}
}
if (j != ARRAY_SIZE(expected_result)) {
printk(KERN_WARNING "size mismatch: test failed\n");
return -EIO;
}
printk(KERN_INFO "test passed\n");
return 0;
}
```
掛載核心模組。
```shell
sudo insmod bytestream-example.ko
```
利用 dmesg 查看信息
```shell
$ sudo dmesg
[130567.107610] byte stream fifo test start
[130567.107612] fifo len: 15
[130567.107613] buf: hello
[130567.107614] ret: 2
[130567.107614] ret: 2
[130567.107614] skip 1st element
[130567.107615] queue len: 32
[130567.107615] 3
[130567.107615] item = 3
[130567.107615] item = 4
[130567.107616] item = 5
[130567.107616] item = 6
[130567.107616] item = 7
[130567.107616] item = 8
[130567.107617] item = 9
[130567.107617] item = 0
[130567.107617] item = 1
[130567.107617] item = 20
[130567.107618] item = 21
[130567.107618] item = 22
[130567.107618] item = 23
[130567.107618] item = 24
[130567.107619] item = 25
[130567.107619] item = 26
[130567.107619] item = 27
[130567.107619] item = 28
[130567.107619] item = 29
[130567.107620] item = 30
[130567.107620] item = 31
[130567.107620] item = 32
[130567.107620] item = 33
[130567.107621] item = 34
[130567.107621] item = 35
[130567.107621] item = 36
[130567.107621] item = 37
[130567.107622] item = 38
[130567.107622] item = 39
[130567.107622] item = 40
[130567.107622] item = 41
[130567.107623] item = 42
[130567.107623] test passed
```
### 設計實驗
設計一個 kfifo 的生產者與消費者實驗 - [producer-consumer.c](https://github.com/brianlin314/kfifo-examples/blob/master/producer-consumer.c),包含一個 producer 與一個 consumer,producer 函數每1秒會將一個值放入 kfifo 中,並從1遞增到10,而consumer 函數每2秒會消耗一個 kfifo 的值。
```c
static int producer(void *data)
{
unsigned char i;
for (i = 1; i <= 10; i++) {
kfifo_put(&test, i);
pr_info("Producer inserted value: %d\n", i);
msleep(1000);
}
kthread_stop(producer_thread);
return 0;
}
static int consumer(void *data)
{
unsigned char j;
unsigned char buf[10];
unsigned int ret;
for (j = 1; j <= 5; j++) {
ret = kfifo_out(&test, buf, 1);
if (ret) {
pr_info("Consumer removed value: %d\n", j);
} else {
pr_info("Consumer failed to remove value from kfifo\n");
}
msleep(2000);
}
kthread_stop(consumer_thread);
return 0;
}
```
在 example_init 中,使用 `kthread_run` 建立兩個 kernel thread,分別是 producer_thread 與 consumer_thread。
```c
producer_thread = kthread_run(producer, NULL, "producer_thread");
...
consumer_thread = kthread_run(consumer, NULL, "consumer_thread");
```
在 example_exit 中,會用 `kfifo_get` 逐一檢查 kfifo 剩餘的值是否與 expected_result 相同。
```c
static void __exit example_exit(void)
{
unsigned char i, j;
/* check the correctness of all values in the fifo */
j = 0;
while (kfifo_get(&test, &i)) {
pr_info("kfifo item = %d\n", i);
if (i != expected_result[j++]) {
pr_warn("value mismatch: test failed\n");
goto error_EIO;
}
}
pr_info("test passed\n");
kfifo_free(&test);
pr_info("kfifo Example Exit\n");
error_EIO:
kfifo_free(&test);
}
```
```shell
$ make check
$ sudo dmesg
[96656.871161] kfifo Example Init
[96656.871280] Producer inserted value: 1
[96656.871364] Consumer removed value: 1
[96657.890006] Producer inserted value: 2
[96658.882042] Consumer removed value: 2
[96658.914025] Producer inserted value: 3
[96659.937999] Producer inserted value: 4
[96660.897975] Consumer removed value: 3
[96660.961976] Producer inserted value: 5
[96661.985950] Producer inserted value: 6
[96662.917915] Consumer removed value: 4
[96663.009917] Producer inserted value: 7
[96664.033895] Producer inserted value: 8
[96664.929874] Consumer removed value: 5
[96665.057866] Producer inserted value: 9
[96666.081860] Producer inserted value: 10
[96801.846529] kfifo item = 6
[96801.846536] kfifo item = 7
[96801.846539] kfifo item = 8
[96801.846540] kfifo item = 9
[96801.846542] kfifo item = 10
[96801.846544] test passed
[96801.846546] kfifo Example Exit
```
## TODO: 在 Linux 核心選定 [memory barrier](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
在 Linux 核心原始程式碼選定規模較小、恰好可展現 [memory barrier](https://www.kernel.org/doc/Documentation/memory-barriers.txt) 的應用案例,需要確保在 Linux v5.15+ 可執行,設計實驗來驗證其行為,並解釋其原理。要特別說明在多核處理器的影響。
### Memory Barriers
Memory barrier 用於控制記憶體存取的順序。在某些情況下,因為編譯器和硬體進行的優化可能導致記憶體的存取順序與開發人員預期的不同。Memory barrier 會影響記憶體存取指令的執行順序與指令完成的時間。
影響編譯器和處理器的記憶體屏障稱為 hardware memory barrier,只影響編譯器的記憶體屏障稱為 software memory barrier。另外,能同時影響讀取和寫入的記憶體屏障稱為 full memory barrier。
還有一類特定於多處理器環境的記憶體屏障。這些記憶體屏障的名稱以 `smp` 作為前綴。在多處理器系統上,這些屏障是 hardware memory barrier,在單處理器系統上,它們是 software memory barrier。
`barrier()` 是唯一的 software memory barrier,也是 full memory barrier。Linux kernel 中的所有其他記憶體屏障都是 hardware memory barrier。
### Memory Barrier Interfaces
- `mb`/`rmb`/`wmb`
These functions insert a hardware memory barrier that prevents any memory read/write access from being moved and executed on the other side of the barrier. It guarantees that any memory read/write access initiated before the memory barrier will be complete before passing the barrier, and all subsequent memory read/write accesses will be executed after the barrier.
- `smp_mb`/`smp_rmb`/`smp_wmb`
These function are the same as the `mb()`/`rmb()`/`smb()` function on multi-processor systems sequentially, and they are the same as the barrier() function on uni-processor systems.
- `barrier`
This function inserts a software memory barrier that affects the compiler code generation, but it does not affect the hardware's execution of instructions. The compiler will save to memory any modified values that it has loaded in registers, and it will reread all values from memory the next time they are needed.
### 設計實驗
## 開發紀錄
### 研讀 [The Linux Kernel Module Programming Guide](https://sysprog21.github.io/lkmpg/)
在本書[第 14 章](https://sysprog21.github.io/lkmpg/#scheduling-tasks)介紹 Scheduling Tasks,有兩種方法執行 tasks,分別是 tasklets 和 work queues。tasklets 能透過 interrupt 以快速且簡單的方式運行單一 function,而 work queues 相對複雜,但適合按順序運行多個 tasks。
雖然 tasklet 易於使用,但是運行於 software interrupt,代表不能 sleep 或存取 user-space。在 linux kernel 中,tasklet 可由 workqueue 取代。
要將 task 交給排程器前,可以使用 workqueue,kernel 會根據 CFS 在 queue 中排程 task。
| | tasklet | workqueue |
|:----------------------------------- |:-------:|:---------:|
| Can access user space ? | No | No |
| Can sleep ? | No | Yes |
| More than one can run on same CPU ? | No | Yes |
| Same one can run on multiple CPUs ? | No | Yes |
:::warning
探討 Linux 核心開發者逐步棄置 tasklet 使用的考量因素。
:notes: jserv
:::
### 研讀 [Linux 核心設計: 中斷處理和現代架構考量](https://hackmd.io/@sysprog/linux-interrupt)
#### What is Interrupt?
當一個 I/O 事件發生時,interrupt 是一個通知 CPU 事件發生的機制,需要 CPU 的回覆,且無論 CPU 是否忙碌。
當 interrupt 發生時,會強制改變 CPU 的處理流程,類似於 context switch,硬體會嘗試儲存原本程式所持有的狀態,此時會切換到 interrupt mode,接著 kernel 會根據對應的 interrupt handler 對事件進行處理,最後會執行 interrupt return,還原到原本程式在中斷發生前執行到的部分。需要注意的是不同的處理器會有不同的處理方式。
- Interrupt Handler: 也稱為 Interrupt Service Routine (ISR),會根據特定的 interrupt 執行相對應的操作。
取自 [COMS W4118: Operating Systems](http://www.cs.columbia.edu/~krj/os/index.html) 的 [Interrupts in Linux](http://www.cs.columbia.edu/~krj/os/lectures/L07-LinuxEvents.pdf)
![](https://hackmd.io/_uploads/Sy_vIJ5S2.png)
- Process: 抽象化的執行單元
- Syscall: 由作業系統提供,Process 可透過 syscall 要求 kernel 執行對應的服務。
- I/O Device: 涉及所有周邊操作,透過 interrupt 致使 CPU 做出回應。
- Timer: 維繫系統的運作。
![](https://hackmd.io/_uploads/BkAupkcrn.png)
- I/O 設備擁有自己的 Interrupt Request Lines(IRQs),通常一個 I/O 設備會有多條 IRQs,且可以共享。
- IRQs 會透過 Programmable Interrupt Controller(PIC) 映射到 Interrupt Vectors,中斷向量會存放 ISR 位址,並將此傳遞給 CPU。
- Interrupt Controller: Interrupt Controller 必須協助 CPU 處理來自多個不同周邊的 IRQ,它們可能會同時發生,此時 Interrupt Controller 會評估優先權,排序這些 IRQ,並交給 CPU 進行處理,通過使用 mask 決定哪些 IRQ 可以優先通過。
#### Interrupt 流程簡述
![](https://hackmd.io/_uploads/HkKRybqBh.png)
1. I/O Devices 發出 IRQ
2. PIC 匯集所有 IRQs,並排出優先順序
3. PIC 將 interrupt vector 傳給 CPU
4. 硬體保存程式狀態,並切換到 interrupt mode
5. 到 Interrupt Discriptor Table (IDT) 查詢對應的 ISR
6. 執行 ISR (注意: handler 在不同架構中存放的是不同的東西,在 Intel 中存放位址,ARM 中存放指令)
7. ISR 執行完畢,執行 iret 指令回到原本程式所保存的狀態
#### Nested Interrupt
當正在處理某個 ISR 的時候,此時又有一個新的中斷發生,稱為 Neated Interrupt。
不同的周邊會有很多不同的行為,而系統希望同一時間有更多的中斷能夠執行,這樣就會非常複雜,在早期可以透過 disable interrupt 解決,但在 Linux 中,會希望 disable interrupt 盡可能地少,當存在 disable interrupt 時,再重新 enable interrupt,就會致使延遲提升。
#### Top half 與 Bottom half
在 linux 中,有時 interrupt handler 需要及時處理,或 interrupt handler 需要處理大量的 works,所以 linux 將其分為兩個部分: top half 與 bottom half。在 top half 中,會執行關鍵的 interrupt,確保其不會受到延遲,同時將一些複雜且相對不重要的 interrupt 延遲到 bottom half 執行,以降低 interrupt 的延遲 。top half 與 bottom half 的最大差別在於,bottom half 在執行時,interrupt enabled,因此 CPU 仍然可以接受中斷請求。
在 linux 中,有3種 deferred interrupts 的方法:
- softirqs
- tasklets
- workqueues
### 問題討論與紀錄
#### 1. 為什麼需要 [Simrupt](https://github.com/sysprog21/simrupt) ?
老師在上課中以手機為例,當此刻按下手機電源鍵,會發生中斷並喚醒螢幕,但此中斷不僅僅只包含喚醒螢幕,還需要更新手機上的時間、訊息提醒等; 或透過搖晃手機以搜尋附近女性友人,也是中斷的一種,這些都需要進行模擬。
#### 2. softirq 與 irq 的差別
bottom half 就是 softirq,top half 就是 irq (hardirq)。
softirq 可以重新排程,而 irq 無法。
#### 3. edge trigger 與 level trigger
edge trigger 與 level trigger 皆是用來描述硬體中斷的處理模式。
- edge trigger 指 kernel 在每次發現有新的 IO 事件時,都會通知應用程式。
- level trigger 指 kernel 在察覺有新的 IO 事件時,會依照應用程式的狀態來決定是否通知。
#### 4. process context 與 interrupt context
[Difference between interrupt context and process context?](https://stackoverflow.com/questions/57987140/difference-between-interrupt-context-and-process-context)
在執行 interrupt handler 或 buttom half 時,kernel 處於 interrupt context,而執行一般 process 時,kernel 的狀態則稱 process context。
softirq 與 tasklet 運行在 interrupt context,而 workqueue 可以 sleep,所以不是運行在 interrupt context。
| context | 是否可 sleep | 是否可被 preempt |
| ----------------- |:------------:|:----------------:|
| interrupt context | 不可 | 不可 |
| process context | 可 | 可 |
### 參考資料
- [Introduction to deferred interrupts (Softirq, Tasklets and Workqueues)](https://0xax.gitbooks.io/linux-insides/content/Interrupts/linux-interrupts-9.html)
- [linux kernel的中斷子系统之(八):softirq](http://www.wowotech.net/irq_subsystem/soft-irq.html)
- [linux kernel的中斷子系統之(九):tasklet](http://www.wowotech.net/irq_subsystem/tasklet.html)
- [softirq, tasklet和workqueue的區别](https://blog.csdn.net/jusang486/article/details/51155277)