# Signal trace code
###### tags: `learning`
## Signal of Linux Kernel
- POSIX決定Linux中signal如何實現
- Process中的所有thread共享signal
- Linux kernel對signal的傳遞過程分成了兩階段
1. Signal generation: 產生signal, 由kernel在目標process的`task_struct`更新signal的狀態
2. Signal delivery: 傳遞signal, kernel將process的control flow交給signal handler
- Signal已被產生但還沒傳遞時,稱為<font color='red'>pending signal</font>
- 一個signal可以被多次產生,但只能被傳遞一次
- 可能發生在
1. process ignore該signal
`Note: SIGKILL. SIGSTOP不能被ignore`
2. process此時沒有在執行
- `task_struct`中跟signal相關的data與其structure
![](https://i.imgur.com/qsKwPxu.png =400x)
可以看到一個process會有兩個`sigpending`,一個是thread共享的,
放在`signal->shared_pending`中,另一個是private的,放在`pending`中,給`kill`這類針對所有thread的signal使用
## Signal的產生
- 以下函數都可以產生Signal
```
send_sig
send_sig_info
force_sig
force_sig_info
sys_kill
sys_tkill
sys_tgkill
```
- 以`send_sig`為例, function在`kernel/signal.c`中,`v5.10.5`版本的呼叫流程大概如下圖,而上述的function最終都會呼叫到`send_signal`
![](https://i.imgur.com/Yzyazd3.png)
- `__send_signal`
1. 用`pid_type`來判斷要signal要放進`signal->shared_pending`還是`pending`
2. `__sigqueue_alloc`來創建一個signal queue,加到`pending->list`中
3. 如果是`SIGKILL`
- `signalfd_notify`用來通知`signalfd`有signal來了
- `sigaddset`把`pending->signal`中代表`sig`的bit改成1
`Note: sigpending->signal是一個64bit的結構,每個bit對應一個signal`
- 最後呼叫`complete_signal`,用`signal_wake_up`在要接收signal的thread中設置`TIF_SIGPENDING`來完成signal的產生
`Note: 如果是SIGKILL,所有thread都會被設置TIF_SIGPENDING`
```c=
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, bool force)
{
...
//1.
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
...
//2.
q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit);
if (q) {
list_add_tail(&q->list, &pending->list);
...
//3.
if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;
...
out_set:
signalfd_notify(t, sig);
sigaddset(&pending->signal, sig);
...
complete_signal(sig, t, type);
...
}
```
## Signal的傳遞
- 從kernel mode切換回user mode時, kernel都會檢查`TIF_SIGPENDING`,如果有被set,代表此thread有signal要處理
- linux用`ret_to_user`跳回userspace,
- 第20行檢查`TIF_SIGPENDING`,若為1就跳到`working_pending`
- 第6行`do_notify_resume`,會呼叫`do_signal`
```c=
/*
* Ok, we need to do extra processing, enter the slow path.
*/
work_pending:
mov x0, sp // 'regs'
bl do_notify_resume //call do_signal
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_on // enabled while in userspace
#endif
ldr x1, [tsk, #TSK_TI_FLAGS] // 再次檢查TIF_SIGPENDING
b finish_ret_to_user
/*
* "slow" syscall return path.
*/
ret_to_user:
disable_daif
gic_prio_kentry_setup tmp=x3
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK //檢查TIF_SIGPENDING
cbnz x2, work_pending //若不為0就跳到working_pending
finish_ret_to_user:
enable_step_tsk x1, x2
#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
bl stackleak_erase
#endif
kernel_exit 0
```
- `do_signal`
![](https://i.imgur.com/GG6LJ7n.png)
Kernel對signal的處理,主要在兩個function上
1. 第10行的`get_signal`: 從queue將此次處理的signal取出,放進一個`ksignal`的struct中
2. 第26行的`handle_signal`: 負責為user space準備好處理signal需要的環境
```c=
static void do_signal(struct pt_regs *regs)
{
...
struct ksignal ksig;
...
/*
* Get the signal to deliver. When running under ptrace, at this point
* the debugger may change all of our registers.
*/
if (get_signal(&ksig)) {
/*
* Depending on the signal settings, we may need to revert the
* decision to restart the system call, but skip this if a
* debugger has chosen to restart at a different PC.
*/
if (regs->pc == restart_addr &&
(retval == -ERESTARTNOHAND ||
retval == -ERESTART_RESTARTBLOCK ||
(retval == -ERESTARTSYS &&
!(ksig.ka.sa.sa_flags & SA_RESTART))))
{
regs->regs[0] = -EINTR;
regs->pc = continue_addr;
}
handle_signal(&ksig, regs);
return;
}
...
}
```
- `get_signal`
1. 第13行將`signr`指定為要處理的signal number
2. 第15行`dequeue_signal`往下執行會呼叫`sigdelset`跟`__sigqueue_free`
- `collect_signal`中的`sigdelset`會負責將對應的bit清成0
- `collect_signal`中的`__sigqueue_free`負責將signal從queue中移除
- 最後`recalc_sigpending`檢查是否還有待傳遞的signal,沒有就把`TIF_SIGPENDING`清成0
```c=
bool get_signal(struct ksignal *ksig)
{
struct sighand_struct *sighand = current->sighand;
struct signal_struct *signal = current->signal;
int signr; //最後會是signal number
...
/*
* Signals generated by the execution of an instruction
* need to be delivered before any other pending signals
* so that the instruction pointer in the signal stack
* frame points to the faulting instruction.
*/
signr = dequeue_synchronous_signal(&ksig->info);
if (!signr)
signr = dequeue_signal(current, ¤t->blocked, &ksig->info);
...
ksig->sig = signr;
return ksig->sig > 0;
}
/*ksignal中會包含處理signal需要的所有訊息*/
struct ksignal {
struct k_sigaction ka; //signal對應的處理方式
kernel_siginfo_t info; //附加訊息
int sig; //signal number
};
//Linux可以用`sigaction`設定對特定signal的處理方式
```
- `handle_signal`負責準備處理signal需要的環境
1. 保存user space目前的Context
2. 將執行流程轉交給signal handler
3. signal handler處理完後,在轉交給kernel,讓kernel將執行權限還給user space
```c=
/*
* OK, we're invoking a handler
*/
static void handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
sigset_t *oldset = sigmask_to_save();
int ret;
/*
* Perform fixup for the pre-signal frame.
*/
rseq_signal_deliver(ksig, regs);
/*
* Set up the stack frame
*/
if (ksig->ka.sa.sa_flags & SA_SIGINFO)
ret = setup_rt_frame(ksig, oldset, regs);
else
ret = setup_frame(ksig, oldset, regs);
/*
* Check that the resulting registers are actually sane.
*/
ret |= !valid_user_regs(regs);
signal_setup_done(ret, ksig, 0);
}
```
- 其中上面第18行的`setup_rt_frame`呼叫`get_sigframe`(下面第4行)負責處理`1.`的部分
:::info
- 當process從user space進入kernel space
的時候,user space當下的context就會被保存在kernel space的stack上,由kernel中一個名為`pt_regs`的struct保存user space在進入kernek前的register
- 當從kernel space回到user space的時,`pt_regs`就會丟失,因為每次由進入kernel space的時候,kernel space的stack都一定是空的,因此不能把user space的context保存在kernel上,<font color='red'>必須保存在user space的stack上</font>
- 新增的stack frame layout由`get_sigframe`規劃
![](https://i.imgur.com/xnYqmIE.png =300x)
:::
```c=
static int
setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs)
{
struct rt_sigframe __user *frame = get_sigframe(ksig, regs, sizeof(*frame));
int err = 0;
if (!frame)
return 1;
err |= copy_siginfo_to_user(&frame->info, &ksig->info);
err |= __put_user(0, &frame->sig.uc.uc_flags);
err |= __put_user(NULL, &frame->sig.uc.uc_link);
err |= __save_altstack(&frame->sig.uc.uc_stack, regs->ARM_sp);
err |= setup_sigframe(&frame->sig, regs, set);
if (err == 0)
err = setup_return(regs, ksig, frame->sig.retcode, frame);
if (err == 0) {
/*
* For realtime signals we must also set the second and third
* arguments for the signal handler.
* -- Peter Maydell <pmaydell@chiark.greenend.org.uk> 2000-12-06
*/
regs->ARM_r1 = (unsigned long)&frame->info;
regs->ARM_r2 = (unsigned long)&frame->sig.uc;
}
return err;
}
```
- `2.3.`則由上面第18行的`setup_return`(以ARM64為例)完成
1. 第9行,因為經由`setup_sigframe`已經將user space的 context存到`user`中,因此這邊可以直接修改`pt_regs`來將PC指向signal handler
2. 按照ARM64的ABI, `regs[0]`存signal number,`regs[29]`是stack pointer, `regs[30]`是Link Register(存放函數的return address)
3. 第22行將`sigtramp`指向VDSO的`rt_sigreturn`(定義在`arch/arm64/kernel/vdso/sigreturn.S`),並在第24行將return address設為`sigtramp`,因此signal handler執行完後會接著執行`rt_sigreturn`
```c=
static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
struct rt_sigframe_user_layout *user, int usig)
{
__sigrestore_t sigtramp;
regs->regs[0] = usig;
regs->sp = (unsigned long)user->sigframe;
regs->regs[29] = (unsigned long)&user->next_frame->fp;
regs->pc = (unsigned long)ka->sa.sa_handler;
if (system_supports_bti()) {
regs->pstate &= ~PSR_BTYPE_MASK;
regs->pstate |= PSR_BTYPE_C;
}
/* TCO (Tag Check Override) always cleared for signal handlers */
regs->pstate &= ~PSR_TCO_BIT;
if (ka->sa.sa_flags & SA_RESTORER)
sigtramp = ka->sa.sa_restorer;
else
sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp);
regs->regs[30] = (unsigned long)sigtramp;
}
```
- 使用system call代表會再次切換到kernel mode,由kernle執行`sys_rt_sigreturn`,呼叫`restore_sigframe`來恢復user space的執行
![](https://i.imgur.com/xLrD5jz.png =400x)
```c=
asmlinkage int sys_rt_sigreturn(struct pt_regs *regs)
{
struct rt_sigframe __user *frame;
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
/*
* Since we stacked the signal on a 64-bit boundary,
* then 'sp' should be word aligned here. If it's
* not, then the user is trying to mess with us.
*/
if (regs->ARM_sp & 7)
goto badframe;
frame = (struct rt_sigframe __user *)regs->ARM_sp;
if (!access_ok(frame, sizeof (*frame)))
goto badframe;
if (restore_sigframe(regs, &frame->sig))
goto badframe;
if (restore_altstack(&frame->sig.uc.uc_stack))
goto badframe;
return regs->ARM_r0;
badframe:
force_sig(SIGSEGV);
return 0;
}
```