# Signal trace code ###### tags: `learning` ## Signal of Linux Kernel - POSIX決定Linux中signal如何實現 - Process中的所有thread共享signal - Linux kernel對signal的傳遞過程分成了兩階段 1. Signal generation: 產生signal, 由kernel在目標process的`task_struct`更新signal的狀態 2. Signal delivery: 傳遞signal, kernel將process的control flow交給signal handler - Signal已被產生但還沒傳遞時,稱為<font color='red'>pending signal</font> - 一個signal可以被多次產生,但只能被傳遞一次 - 可能發生在 1. process ignore該signal `Note: SIGKILL. SIGSTOP不能被ignore` 2. process此時沒有在執行 - `task_struct`中跟signal相關的data與其structure ![](https://i.imgur.com/qsKwPxu.png =400x) 可以看到一個process會有兩個`sigpending`,一個是thread共享的, 放在`signal->shared_pending`中,另一個是private的,放在`pending`中,給`kill`這類針對所有thread的signal使用 ## Signal的產生 - 以下函數都可以產生Signal ``` send_sig send_sig_info force_sig force_sig_info sys_kill sys_tkill sys_tgkill ``` - 以`send_sig`為例, function在`kernel/signal.c`中,`v5.10.5`版本的呼叫流程大概如下圖,而上述的function最終都會呼叫到`send_signal` ![](https://i.imgur.com/Yzyazd3.png) - `__send_signal` 1. 用`pid_type`來判斷要signal要放進`signal->shared_pending`還是`pending` 2. `__sigqueue_alloc`來創建一個signal queue,加到`pending->list`中 3. 如果是`SIGKILL` - `signalfd_notify`用來通知`signalfd`有signal來了 - `sigaddset`把`pending->signal`中代表`sig`的bit改成1 `Note: sigpending->signal是一個64bit的結構,每個bit對應一個signal` - 最後呼叫`complete_signal`,用`signal_wake_up`在要接收signal的thread中設置`TIF_SIGPENDING`來完成signal的產生 `Note: 如果是SIGKILL,所有thread都會被設置TIF_SIGPENDING` ```c= static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t, enum pid_type type, bool force) { ... //1. pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; ... //2. q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit); if (q) { list_add_tail(&q->list, &pending->list); ... //3. if ((sig == SIGKILL) || (t->flags & PF_KTHREAD)) goto out_set; ... out_set: signalfd_notify(t, sig); sigaddset(&pending->signal, sig); ... complete_signal(sig, t, type); ... } ``` ## Signal的傳遞 - 從kernel mode切換回user mode時, kernel都會檢查`TIF_SIGPENDING`,如果有被set,代表此thread有signal要處理 - linux用`ret_to_user`跳回userspace, - 第20行檢查`TIF_SIGPENDING`,若為1就跳到`working_pending` - 第6行`do_notify_resume`,會呼叫`do_signal` ```c= /* * Ok, we need to do extra processing, enter the slow path. */ work_pending: mov x0, sp // 'regs' bl do_notify_resume //call do_signal #ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on // enabled while in userspace #endif ldr x1, [tsk, #TSK_TI_FLAGS] // 再次檢查TIF_SIGPENDING b finish_ret_to_user /* * "slow" syscall return path. */ ret_to_user: disable_daif gic_prio_kentry_setup tmp=x3 ldr x1, [tsk, #TSK_TI_FLAGS] and x2, x1, #_TIF_WORK_MASK //檢查TIF_SIGPENDING cbnz x2, work_pending //若不為0就跳到working_pending finish_ret_to_user: enable_step_tsk x1, x2 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK bl stackleak_erase #endif kernel_exit 0 ``` - `do_signal` ![](https://i.imgur.com/GG6LJ7n.png) Kernel對signal的處理,主要在兩個function上 1. 第10行的`get_signal`: 從queue將此次處理的signal取出,放進一個`ksignal`的struct中 2. 第26行的`handle_signal`: 負責為user space準備好處理signal需要的環境 ```c= static void do_signal(struct pt_regs *regs) { ... struct ksignal ksig; ... /* * Get the signal to deliver. When running under ptrace, at this point * the debugger may change all of our registers. */ if (get_signal(&ksig)) { /* * Depending on the signal settings, we may need to revert the * decision to restart the system call, but skip this if a * debugger has chosen to restart at a different PC. */ if (regs->pc == restart_addr && (retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK || (retval == -ERESTARTSYS && !(ksig.ka.sa.sa_flags & SA_RESTART)))) { regs->regs[0] = -EINTR; regs->pc = continue_addr; } handle_signal(&ksig, regs); return; } ... } ``` - `get_signal` 1. 第13行將`signr`指定為要處理的signal number 2. 第15行`dequeue_signal`往下執行會呼叫`sigdelset`跟`__sigqueue_free` - `collect_signal`中的`sigdelset`會負責將對應的bit清成0 - `collect_signal`中的`__sigqueue_free`負責將signal從queue中移除 - 最後`recalc_sigpending`檢查是否還有待傳遞的signal,沒有就把`TIF_SIGPENDING`清成0 ```c= bool get_signal(struct ksignal *ksig) { struct sighand_struct *sighand = current->sighand; struct signal_struct *signal = current->signal; int signr; //最後會是signal number ... /* * Signals generated by the execution of an instruction * need to be delivered before any other pending signals * so that the instruction pointer in the signal stack * frame points to the faulting instruction. */ signr = dequeue_synchronous_signal(&ksig->info); if (!signr) signr = dequeue_signal(current, &current->blocked, &ksig->info); ... ksig->sig = signr; return ksig->sig > 0; } /*ksignal中會包含處理signal需要的所有訊息*/ struct ksignal { struct k_sigaction ka; //signal對應的處理方式 kernel_siginfo_t info; //附加訊息 int sig; //signal number }; //Linux可以用`sigaction`設定對特定signal的處理方式 ``` - `handle_signal`負責準備處理signal需要的環境 1. 保存user space目前的Context 2. 將執行流程轉交給signal handler 3. signal handler處理完後,在轉交給kernel,讓kernel將執行權限還給user space ```c= /* * OK, we're invoking a handler */ static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) { sigset_t *oldset = sigmask_to_save(); int ret; /* * Perform fixup for the pre-signal frame. */ rseq_signal_deliver(ksig, regs); /* * Set up the stack frame */ if (ksig->ka.sa.sa_flags & SA_SIGINFO) ret = setup_rt_frame(ksig, oldset, regs); else ret = setup_frame(ksig, oldset, regs); /* * Check that the resulting registers are actually sane. */ ret |= !valid_user_regs(regs); signal_setup_done(ret, ksig, 0); } ``` - 其中上面第18行的`setup_rt_frame`呼叫`get_sigframe`(下面第4行)負責處理`1.`的部分 :::info - 當process從user space進入kernel space 的時候,user space當下的context就會被保存在kernel space的stack上,由kernel中一個名為`pt_regs`的struct保存user space在進入kernek前的register - 當從kernel space回到user space的時,`pt_regs`就會丟失,因為每次由進入kernel space的時候,kernel space的stack都一定是空的,因此不能把user space的context保存在kernel上,<font color='red'>必須保存在user space的stack上</font> - 新增的stack frame layout由`get_sigframe`規劃 ![](https://i.imgur.com/xnYqmIE.png =300x) ::: ```c= static int setup_rt_frame(struct ksignal *ksig, sigset_t *set, struct pt_regs *regs) { struct rt_sigframe __user *frame = get_sigframe(ksig, regs, sizeof(*frame)); int err = 0; if (!frame) return 1; err |= copy_siginfo_to_user(&frame->info, &ksig->info); err |= __put_user(0, &frame->sig.uc.uc_flags); err |= __put_user(NULL, &frame->sig.uc.uc_link); err |= __save_altstack(&frame->sig.uc.uc_stack, regs->ARM_sp); err |= setup_sigframe(&frame->sig, regs, set); if (err == 0) err = setup_return(regs, ksig, frame->sig.retcode, frame); if (err == 0) { /* * For realtime signals we must also set the second and third * arguments for the signal handler. * -- Peter Maydell <pmaydell@chiark.greenend.org.uk> 2000-12-06 */ regs->ARM_r1 = (unsigned long)&frame->info; regs->ARM_r2 = (unsigned long)&frame->sig.uc; } return err; } ``` - `2.3.`則由上面第18行的`setup_return`(以ARM64為例)完成 1. 第9行,因為經由`setup_sigframe`已經將user space的 context存到`user`中,因此這邊可以直接修改`pt_regs`來將PC指向signal handler 2. 按照ARM64的ABI, `regs[0]`存signal number,`regs[29]`是stack pointer, `regs[30]`是Link Register(存放函數的return address) 3. 第22行將`sigtramp`指向VDSO的`rt_sigreturn`(定義在`arch/arm64/kernel/vdso/sigreturn.S`),並在第24行將return address設為`sigtramp`,因此signal handler執行完後會接著執行`rt_sigreturn` ```c= static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, struct rt_sigframe_user_layout *user, int usig) { __sigrestore_t sigtramp; regs->regs[0] = usig; regs->sp = (unsigned long)user->sigframe; regs->regs[29] = (unsigned long)&user->next_frame->fp; regs->pc = (unsigned long)ka->sa.sa_handler; if (system_supports_bti()) { regs->pstate &= ~PSR_BTYPE_MASK; regs->pstate |= PSR_BTYPE_C; } /* TCO (Tag Check Override) always cleared for signal handlers */ regs->pstate &= ~PSR_TCO_BIT; if (ka->sa.sa_flags & SA_RESTORER) sigtramp = ka->sa.sa_restorer; else sigtramp = VDSO_SYMBOL(current->mm->context.vdso, sigtramp); regs->regs[30] = (unsigned long)sigtramp; } ``` - 使用system call代表會再次切換到kernel mode,由kernle執行`sys_rt_sigreturn`,呼叫`restore_sigframe`來恢復user space的執行 ![](https://i.imgur.com/xLrD5jz.png =400x) ```c= asmlinkage int sys_rt_sigreturn(struct pt_regs *regs) { struct rt_sigframe __user *frame; /* Always make any pending restarted system calls return -EINTR */ current->restart_block.fn = do_no_restart_syscall; /* * Since we stacked the signal on a 64-bit boundary, * then 'sp' should be word aligned here. If it's * not, then the user is trying to mess with us. */ if (regs->ARM_sp & 7) goto badframe; frame = (struct rt_sigframe __user *)regs->ARM_sp; if (!access_ok(frame, sizeof (*frame))) goto badframe; if (restore_sigframe(regs, &frame->sig)) goto badframe; if (restore_altstack(&frame->sig.uc.uc_stack)) goto badframe; return regs->ARM_r0; badframe: force_sig(SIGSEGV); return 0; } ```