# Linux Kernel Deferred Work Note
傳統上,interrupt分成
- top half: receives the hardware interrupt
- handle immediate, time-critical part of interrupt processing
- run with some (or all) interrupt levels disabled
- are often time-critical and they deal with HW
- do not run in process context and cannot block
- bottom half (本篇主要內容)
- handle less time-critical part of interrupt processing
- Types of bottom halves
- Softirq
- Tasklets
- Workqueues
## softirq
- statically allocated
- fixed number of irq, for current linux, 10 irq vectors defined
```c
enum
{
HI_SOFTIRQ=0,
TIMER_SOFTIRQ,
NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ,
BLOCK_SOFTIRQ,
BLOCK_IOPOLL_SOFTIRQ,
TASKLET_SOFTIRQ,
SCHED_SOFTIRQ,
HRTIMER_SOFTIRQ,
RCU_SOFTIRQ,
NR_SOFTIRQS
};
```
- [ ] [IRQs: the Hard, the Soft, the Threaded and the Preemptible](http://she-devel.com/Chaiken_ELCE2016.pdf)

which can check with /proc/softirqs
```
# cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI: 34 0 0 0
TIMER: 494 26428 52205 54076
NET_TX: 0 0 0 0
NET_RX: 0 0 0 0
BLOCK: 0 0 0 0
IRQ_POLL: 0 0 0 0
TASKLET: 68 1 0 0
SCHED: 3099 27294 100455 62819
HRTIMER: 0 0 0 0
RCU: 3799 10153 81440 45881
```
### Example in kernel - Network Packet Processing
- 當network interface card收到packet,trigger HW interrupt,interrupt的處理便會細分成兩項,也就是需要緊急處理的top half,以及可以延後處理的bottom half (using softirqs)
- The deferred processing is handled by `NET_RX_SOFTIRQ` softirq
- Top half (Interrupt handler)
```c
static irqreturn_t nic_interrupt_handler(int irq, void *dev_id) {
struct net_device *dev = dev_id;
struct sk_buff *skb;
// Acknowledge the interrupt to the NIC
ack_interrupt(dev);
// Read the packet into an skb (socket buffer)
skb = dev_alloc_skb(dev->mtu + NET_IP_ALIGN);
if (!skb) {
return IRQ_HANDLED; // Drop packet if no memory
}
skb_reserve(skb, NET_IP_ALIGN);
nic_read_packet(dev, skb);
// Schedule the softirq for network processing
netif_rx(skb);
return IRQ_HANDLED;
}
```
+ `netif_rx`: to schedule the softirq for further processing of the packet (-->`netif_rx_internal`)
+ The function eventually leads to calling `__napi_schedule` (--> `____napi_schedule`), which schedules the softirq to run
```c
static inline void ____napi_schedule(struct softnet_data *sd,
struct napi_struct *napi)
{
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
}
```
- Bottom half (Softirq handler)
*linux/net/core/dev.c*
```c
static int __init net_dev_init(void)
{
open_softirq(NET_RX_SOFTIRQ, net_rx_action);
}
static __latent_entropy void net_rx_action(struct softirq_action *h)
{
struct softnet_data *sd = this_cpu_ptr(&softnet_data);
struct napi_struct *napi;
sd->in_net_rx_action = true;
local_irq_disable();
list_splice_init(&sd->poll_list, &list);
local_irq_enable();
for (;;) {
struct napi_struct *n;
...
//遍歷每個設備的napi,直到所有設備都處理完
if (list_empty(&list)) {
if (list_empty(&repoll)) {
sd->in_net_rx_action = false;
barrier();
if (!list_empty(&sd->poll_list))
goto start;
if (!sd_has_rps_ipi_waiting(sd))
goto end;
}
break;
}
n = list_first_entry(&list, struct napi_struct, poll_list);
budget -= napi_poll(n, &repoll);
...
/*重新放入list*/
list_splice_tail_init(&sd->poll_list, &list);
list_splice_tail(&repoll, &list);
list_splice(&list, &sd->poll_list);
}
}
```
+ `net_rx_action`是softirq `NET_RX_SOFTIRQ`的處理函數。會把`sd->poll_list`中所有設備取出來並依次處理
+ 如果超時或超過預算,將`napi`重新加入`sd->poll_list`等到下次的`NET_RX_SOFTIRQ`調度
## tasklets
- 是softirq的其中一個方法
- ***Because tasklets are implemented on top of softirqs, they are softirqs.***
- 在kernel中比softirq更常使用
- run in the software interrupt context
- support priority: 初始化兩個vector `tasklet_hi_vec`, `tasklet_vec`

- craete tasklets
+ `DECLARE_TASKLET(name, func, data)`
+ `void tasklet_init(struct tasklet_struct *t,
void (*func)(unsigned long), unsigned long data)`
- schedule tasklets
`void tasklet_schedule(struct tasklet_struct *t);`
`void tasklet_hi_schedule(struct tasklet_struct *t);`
- [ ] tasklet schedule flow

### Example in kernel - ASPEED Cryptographic Engine
* structure define
*linux/drivers/crypto/aspeed/aspeed-rsss.h*
```c
typedef int (*aspeed_rsss_fn_t)(struct aspeed_rsss_dev *);
struct aspeed_engine_rsa {
struct tasklet_struct done_task;
/* callback func */
aspeed_rsss_fn_t resume;
};
struct aspeed_rsss_dev {
struct aspeed_engine_rsa rsa_engine;
};
```
- initial tasklets
*linux/drivers/crypto/aspeed/aspeed-rsss-rsa.c*
```c
int aspeed_rsss_rsa_init(struct aspeed_rsss_dev *rsss_dev)
{
tasklet_init(&rsa_engine->done_task, aspeed_rsa_done_task,
(unsigned long)rsss_dev);
}
```
- schedule task
*linux/drivers/crypto/aspeed/aspeed-rsss.c*
```c
static irqreturn_t aspeed_rsss_irq(int irq, void *dev)
{
struct aspeed_rsss_dev *rsss_dev = (struct aspeed_rsss_dev *)dev;
struct aspeed_engine_rsa *rsa_engine = &rsss_dev->rsa_engine;
...
if (rsa_engine->flags & CRYPTO_FLAGS_BUSY)
tasklet_schedule(&rsa_engine->done_task);
else
dev_err(rsss_dev->dev, "RSA no active requests.\n");
...
}
```
- done task
*linux/drivers/crypto/aspeed/aspeed-rsss-rsa.c*
```c
//resume callback assign給aspeed_rsa_transfer
static int aspeed_rsa_trigger(struct aspeed_rsss_dev *rsss_dev)
{
...
rsa_engine->resume = aspeed_rsa_transfer;
...
}
```
```c
// Process the completed cryptographic operation
static void aspeed_rsa_done_task(unsigned long data)
{
struct aspeed_rsss_dev *rsss_dev = (struct aspeed_rsss_dev *)data;
struct aspeed_engine_rsa *rsa_engine = &rsss_dev->rsa_engine;
(void)rsa_engine->resume(rsss_dev);
}
```
## workqueue
- run in the context of kernel process
- must not be "atomic"
- worker_thread
- work queue are matained by `work_struct`
```c
struct work_struct {
atomic_long_t data;
struct list_head entry;
work_func_t func;
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
};
```
+ func: the function that will be scheduled by the workqueue
+ data: parameter of this function
- create work
* static creation
```c
#define DECLARE_WORK(n, f) \
struct work_struct n = __WORK_INITIALIZER(n, f)
```
+ n: name of workqueue
+ f: workqueue function
* runtime creation
```c
#define INIT_WORK(_work, _func) \
__INIT_WORK((_work), (_func), 0)
#define __INIT_WORK(_work, _func, _onstack) \
do { \
__init_work((_work), _onstack); \
(_work)->data = (atomic_long_t) WORK_DATA_INIT(); \
INIT_LIST_HEAD(&(_work)->entry); \
(_work)->func = (_func); \
} while (0)
/* delay version */
#define INIT_DELAYED_WORK(_work, _func) \
__INIT_DELAYED_WORK(_work, _func, 0)
#define __INIT_DELAYED_WORK(_work, _func, _tflags) \
do { \
INIT_WORK(&(_work)->work, (_func)); \
__init_timer(&(_work)->timer, \
delayed_work_timer_fn, \
(_tflags) | TIMER_IRQSAFE); \
} while (0)
```
+ n: name of workqueue
+ f: workqueue function
- create work queue
+ `create_workqueue(name)`: 會在每個CPU上創建`worker_thread`
+ `create_singlethread_workqueue(name)`:只負責在一個CPU上創建一個`worker_thread`
+ linux也會有預設的work queue
+ `system_wq`
+ `system_highpri_wq`
- schedule work
+ `int schedule_work(struct work_struct *work);`
+ `static inline bool schedule_delayed_work(struct delayed_work *dwork,
unsigned long delay)`
- After a work was created, we can put it into workqueue, use `queue_work` or `queue_delay_work`
```c
static inline bool queue_work(struct workqueue_struct *wq,
struct work_struct *work)
{
return queue_work_on(WORK_CPU_UNBOUND, wq, work);
}
```
+ `queue_work` -> `queue_work_on` -> `__queue_work`
- Summary table
| Feature | `INIT_WORK` | `INIT_DELAYED_WORK` |
| --------------------------| ---------------------| -------------------------|
| **Execution Timing** | As soon as scheduled |After a specified delay |
| **Queue Function** | `schedule_work()` |`schedule_delayed_work()` |
| **Purpose** | Immediate tasks |Deferred or periodic tasks|
| **Struct Used** | `struct work_struct` |`struct delayed_work` |
### Example in kernel - Ethernet MAC driver
- structure
```c
struct ftgmac100 {
struct work_struct reset_task;
}
```
- initial work
*linux/drivers/net/ethernet/farday/ftgmac100.c*
```c
static int ftgmac100_probe(struct platform_device *pdev)
{
struct ftgmac100 *priv;
...
/* setup private data */
priv = netdev_priv(netdev);
priv->netdev = netdev;
priv->dev = &pdev->dev;
INIT_WORK(&priv->reset_task, ftgmac100_reset_task);
}
```
- schedule work
*linux/drivers/net/ethernet/farday/ftgmac100.c*
```c
static irqreturn_t ftgmac100_interrupt(int irq, void *dev_id)
{
/* AHB error -> Reset the chip */
if (status & FTGMAC100_INT_AHB_ERR) {
if (net_ratelimit())
netdev_warn(netdev,
"AHB bus error ! Resetting chip.\n");
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
schedule_work(&priv->reset_task);
return IRQ_HANDLED;
}
}
```
- reset task
*linux/drivers/net/ethernet/farday/ftgmac100.c*
```c
static void ftgmac100_reset_task(struct work_struct *work)
{
struct ftgmac100 *priv = container_of(work, struct ftgmac100,
reset_task);
ftgmac100_reset(priv);
}
```
## Conclusion
| Context | Can Sleep? | Typical Use | Example API |
| ---------- | -------- | -------------------------------| ------------------------- |
| Hard IRQ | No | Immediate interrupt work |IRQ handler (`request_irq`) |
| Softirq | No | Deferred, fast, atomic |`open_softirq` |
| Tasklet | No | Even more deferred, serialized |`tasklet_init`, `tasklet_schedule`|
| Workqueue | Yes | 不能在IRQ做,但又不需要專屬kernel thread的工作 |`schedule_work`
| Threaded IRQ | Yes | Deferred, may sleep |`devm_request_threaded_irq` |