# shell lab
> [github](https://github.com/KYG-yaya573142/shell-lab)
:::info
此筆記主要紀錄 CS:APP 作業 shell lab 的解題思路,由於是一步一腳印撰寫,並非寫完再事後筆記,因此會顯得比較雜亂
:::
:::info
TODO:
* unlify the notation style for better readability
* need futhur discussion about the signal safety (POSIX)
:::
## 基礎知識複習 - Exceptions & Process
要完成 shell lab 需先對 exceptional control flow (ECL) 有基礎的認知,建議先至少閱讀過 CS:APP 第 8 章再進行此作業
### `fork`
called **once** but return **twice**
```c
#include <sys/types.h>
#include <unistd.h>
int fork(void)
```
#### RETURN
* 0
returned in the child
* pid
returned in the parent
### `waitpid`
```c
#include <sys/types.h>
#include <sys/wait.h>
pid_t waitpid(pid_t pid, int *status, int options)
```
`init (pid = 1)` reaps zombie after parent terminated, so explicit reaping are only necessary for
* long-run processes (e.g., shells and servers)
* non-stop child process (infinite loop)
#### `pid_t pid`
* < -1
meaning wait for any child process whose process group ID is equal to the absolute value of pid
* -1
meaning wait for any child process
* 0
meaning wait for any child process whose process group ID is equal to that of the calling process
* \> 0
meaning wait for the child whose process ID is equal to the value of pid
#### `int option`
* 0 (default)
* WNOHANG
Return immediately (with a return value of 0) if none of the child processes in the wait set has terminated yet
* WUNTRACED
Return the PID of the terminated **or stopped child** that caused the return (The default behavior returns only for terminated children)
* WNOHANG|WUNTRACED
Return immediately, with a return value of 0, if none of the children in the wait set has stopped or terminated, or with a return value equal to the PID of one of the stopped or terminated children.
#### `int *status`
If the status argument is non-NULL, then waitpid encodes status information about the child that caused the return in the status argument. The `wait.h` include file defines several macros for interpreting the status argument:
* WIFEXITED(status)
Returns true if the child terminated normally, via a call to exit or a return.
* WEXITSTATUS(status)
Returns the exit status of a normally terminated child. This status is only defined if WIFEXITED returned true.
* WIFSIGNALED(status)
Returns true if the child process terminated because of a signal that was not handled.
* WTERMSIG(status)
Returns the number of the signal that caused the child process to terminate. This status is only defined if WIFSIGNALED(status) returned true.
* WIFSTOPPED(status)
Returns true if the child that caused the return is currently stopped.
* WSTOPSIG(status)
Returns the number of the signal that caused the child to stop. This status is only defined if WIFSTOPPED(status) returned true.
#### RETURN
* PID of child
waitpid success
* 0
if WNOHANG was specified and one or more child specified by pid exist, but have not yet changed state
* -1
ERROR
#### ERRORS
Error message accordingly can be accessed through `strerror(errno)`
* ECHILD
The process specified by pid (waitpid()) or idtype and id (waitid()) does not exist or is not a child of the calling process.
* EINTR
WNOHANG was not set and an unblocked signal or a SIGCHLD was caught
```c
#include <sys/types.h>
#include <sys/wait.h>
int wait(int *status)
```
* `wait(&status)` == `waitpid(-1, &status, 0)`
### `getpid`
```c
#include <sys/types.h>
#include <unistd.h>
pid_t getpid(void);
pid_t getppid(void);
```
* `getpid` returns the PID of the calling process
* `getppid` returns the PID of its parent
### `execve`
```c
int execve(char *filename, char *argv[], char *envp[])
```
The execve function loads and runs the executable object file `filename` with the argument list `argv` and the environment variable list `envp`. While it overwrites the address space of the current process, it does not create a new process. The new program still has the same PID, and it inherits all of the file descriptors that were open at the time of the call to the execve function.
#### RETURN
* nothing
execve success
* -1
ERROR
![argv and envp](https://i.imgur.com/BrimUoh.png)
### appendex - `envp`
```c
#include <stdlib.h>
char *getenv(const char *name);
int setenv(const char *name, const char *newvalue, int overwrite);
void unsetenv(const char *name);
```
* `getenv` searches the envp[] for a string "name=value"
* Returns: poiner to name if exists, NULL if no match
* `setenv` searches the envp[] for a string "name=oldvalue" and replaces oldvalue with newvalue
* If name does not exist, then `setenv` adds "name=newvalue" to the envp[]
* Returns: 0 on success, −1 on error
* `unsetenv` searches the envp[] for a string "name=value" and deletes it
```c
extern char **environ;
```
* defined as a global variable in the Glibc source file
[further information](https://www.gnu.org/software/libc/manual/html_node/Environment-Access.html)
## 基礎知識複習 - Signals
```c
#include <unistd.h>
pid_t getpgrp(void);
```
* a child process belongs to the same process group as its parent
```c
#include <unistd.h>
int setpgid(pid_t pid, pid_t pgid);
```
* changes the process group of process pid to pgid
* Returns: 0 on success, −1 on error
* If pid is zero, the PID of the current process is used
* If pgid is zero, the PID of the process specified by pid is used for the process group ID
### `kill` - send signal, not kill!
```c
#include <sys/types.h>
#include <signal.h>
int kill(pid_t pid, int sig);
```
* Returns: 0 if OK, −1 on error
```shell
unix> /bin/kill -9 15213
```
* sends signal 9 (SIGKILL) to process 15213
* A negative PID causes the signal to be sent to every process in process group PID
### `signal`
```c
#include <signal.h>
typedef void (*sighandler_t)(int);
sighandler_t signal(int signum, sighandler_t handler);
```
* Returns: ptr to previous handler if OK, SIG_ERR on error (does not set errno)
* If handler is SIG_IGN, then signals of type signum are ignored
* If handler is SIG_DFL, then the action for signals of type signum reverts to the default action
* Otherwise, handler is the address of a user-defined function, called a signal handler
```c
#include <signal.h>
int sigaction(int signum, struct sigaction *act, struct sigaction *oldact);
```
* portable signal handling (Posix standard)
* Returns: 0 if OK, −1 on error
* Interrupted system calls are automatically restarted whenever possible
a cleaner approach using wrapper:
```c=
handler_t *Signal(int signum, handler_t *handler)
{
struct sigaction action, old_action;
action.sa_handler = handler;
sigemptyset(&action.sa_mask); /* Sig of type being handled is still blocked */
action.sa_flags = SA_RESTART; /* Restart syscalls if possible */
if (sigaction(signum, &action, &old_action) < 0)
unix_error("Signal error");
return (old_action.sa_handler);
}
/* The sigaction structure is defined as something like: */
struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
};
```
直接看 man 比較清楚!
->[man sigaction](http://man7.org/linux/man-pages/man2/sigaction.2.html)
### `sigprocmask`
```c
#include <signal.h>
int sigprocmask(int how, const sigset_t *set, sigset_t *oldset);
int sigemptyset(sigset_t *set);
int sigfillset(sigset_t *set);
int sigaddset(sigset_t *set, int signum);
int sigdelset(sigset_t *set, int signum);
Returns: 0 if OK, −1 on error
int sigismember(const sigset_t *set, int signum);
Returns: 1 if member, 0 if not, −1 on error
```
* `how`
* SIG_BLOCK: Add the signals in set to blocked (blocked = blocked | set)
* SIG_UNBLOCK: Remove the signals in set from blocked (blocked = blocked & ~set).
* SIG_SETMASK: blocked = set.
* If `oldset` is non-NULL, the previous value of the blocked bit vector is stored in `oldset`
* Signal sets such as set are manipulated using the following functions:
`int sigemptyset(sigset_t *set)` initializes set to the empty set
`int sigfillset(sigset_t *set)` adds every signal to set
`int sigaddset(sigset_t *set, int signum)`adds signum to set
`int sigdelset(sigset_t *set, int signum)`deletes signum from set
## 解題思路
> [CS:APP 8.4.6] A shell performs a sequence of read/evaluate steps, and then terminates. The read step reads a command line from the user. The evaluate step parses the command line and runs programs on behalf of the user.
首先看教科書範例:
```c
#include "csapp.h"
#define MAXARGS 128
/* Function prototypes */
void eval(char *cmdline);
int parseline(char *buf, char **argv);
int builtin_command(char **argv);
int main()
{
char cmdline[MAXLINE]; /* Command line */
while (1) {
/* Read */
printf("> ");
Fgets(cmdline, MAXLINE, stdin);
if (feof(stdin))
exit(0);
/* Evaluate */
eval(cmdline);
}
}
/* eval - Evaluate a command line */
void eval(char *cmdline)
{
char *argv[MAXARGS]; /* Argument list execve() */
char buf[MAXLINE]; /* Holds modified command line */
int bg; /* Should the job run in bg or fg */
pid_t pid; /* Process id */
strcpy(buf, cmdline);
bg = parseline(buf, argv);
if (argv[0] == NULL)
return; /* Ignore empty lines */
if (!builtin_command(argv)) {
if ((pid = Fork()) == 0) { /* Child runs user job */
if (execve(argv[0], argv, environ) < 0) {
printf("%s: Command not found.\n", argv[0]);
exit(0);
}
}
/* Parent waits for foreground job to terminate */
if (!bg) {
int status;
if (waitpid(pid, &status, 0) < 0)
unix_error("waitfg: waitpid error");
}
else
printf("%d %s", pid, cmdline);
}
return;
}
/* If first arg is a builtin command, run it and return true */
int builtin_command(char **argv)
{
if (!strcmp(argv[0], "quit")) /* quit command */
exit(0);
if (!strcmp() /* Ignore singleton & */
return 1;
return 0; /* Not a builtin command */
}
/* parseline - Parse the command line and build the argv array */
int parseline(char *buf, char **argv)
{
char *delim; /* Points to first space delimiter */
int argc; /* Number of args */
int bg; /* Background job? */
buf[strlen(buf)-1]=' '; /* Replace trailing ’\n’ with space */
while (*buf && (*buf == ' ')) /* Ignore leading spaces */
buff++;
/* Build the argv list */
argc = 0;
while ((delim = strchr(buf, ' '))) {
argv[argc++] = buf;
*delim = '\0';
buf = delim + 1;
while (*buf && (*buf == ' ')) /* Ignore spaces */
buf++;
}
argv[argc] = NULL;
if (argc == 0) /* Ignore blank line */
return 1;
/* Should the job run in the background? */
if ((bg = (*argv[argc-1] == '&')) != 0)
argv[--argc] = NULL;
return bg;
}
```
接下來會依照作業說明,根據 trace file 的順序完成作業
### trace01
* trace01.txt - Properly terminate on EOF
基本上一開始就是完成的=_=
```c=
if (feof(stdin)) { /* End of file (ctrl-d) */
fflush(stdout);
exit(0);
}
```
* Ctrl+D invokes EOF in `stdin`
### trace02
* trace02.txt - Process builtin quit command
就是簡單的字符偵測,先交給 `parseline` 分析輸入的指令串,再交給 `builtin_cmd` 偵測
```c=
bg = parseline(cmdline, argv);
int builtin_cmd(char **argv)
{
if (!strcmp(argv[0], "quit")) { /* quit command */
exit(0);
}
return 0; /* not a builtin command */
}
```
### trace03 & trace04
* trace03.txt - Run a foreground job (FG)
* trace04.txt - Run a background job (BG)
這兩個 trace 可以一起做,具體的邏輯順序如下
1. `parseline` 分析輸入的指令串,判斷是否為 FG/BG job
2. 確認是否為 builtin command
3. `fork` 出 child 並讓其 `execve` 執行 job
4. 根據 FG/BG 決定 parent 是否等待 child 結束
```c=
bg = parseline(cmdline, argv);
if (argv[0] == NULL) { /* ignore empty lines */
return;
}
if (!builtin_cmd(argv)) { /* no need to fork buildin command */
if ((pid = fork()) == 0) { /* child runs the job */
if(execve(argv[0], argv, environ) < 0) {
unix_error("execve error");
}
}
/* parent waits for fg job terminate */
if (!bg) {
if (waitpid(pid, &status, 0) < 0) {
unix_error("waitpid error");
}
}
}
```
### trace05
* trace05.txt - Process jobs builtin command
內鍵指令 `jobs` 的實作很簡單,就是登錄一個指令到 `builtin_cmd`,實作的部分則直接呼叫 `listjobs` 來列出 job list 的內容 (作業題目一開始就已提供)
```c=
int builtin_cmd(char **argv)
{
if (!strcmp(argv[0], "quit")) { /* quit command */
exit(0);
}
if (!strcmp(argv[0], "jobs")) { /* jobs command */
listjobs(jobs);
return 1;
}
return 0; /* not a builtin command */
}
```
接下來需要處理 job list 的登錄問題,共需處理以下幾個點:
* `fork` 後需要呼叫 `addjob` 來將 child 登錄到 job list
* child 執行結束後需要呼叫 `deletejob`,這會再細分為兩個狀況
* child 為前景程式,parent 會等待其結束並 reap
* child 為背景程式,之後由 `sigchld_handler` 負責 reap
在實作前我們先注意一下 writeup 中給予的提示:
> One of the tricky parts of the assignment is deciding on the allocation ofwork between the waitfg and sigchld handler functions. We recommend the following approach:
– In waitfg, use a busy loop around the sleep function.
– In sigchld handler, use exactly one call to waitpid.
While other solutions are possible, such as calling waitpid in both waitfg and sigchld handler,
these can be very confusing. It is simpler to do all reaping in the handler.
writeup 中提到在 `waitfg` 及 `sigchld_handler` 內皆使用 `waitpid` 雖然可能是可行的方案,但建議還是交由 `sigchld_handler` 來統一 reap child 以避免混亂,因此一開始在 trace03 部分的寫法是不行的,後面需再修改
> The parent needs to block the SIGCHLD signals in this way in order to avoid the race condition where the child is reaped by sigchld handler (and thus removed from the job list) before the parent
calls addjob.
另外,需在 parent 執行 `fork` 前先阻擋住 SIGCHLD,並在 parent 執行 `addjob` 後再解封 SIGCHLD,否則有可能會因為 child 先執行完畢,造成 `sigchld_handler` 在 parent 執行 `addjob` 前就先執行 `deletejob` 並造成 race condition
> When you run your shell from the standard Unix shell, your shell is running in the foreground process group. If your shell then creates a child process, by default that child will also be a member of the foreground process group. Since typing ctrl-c sends a SIGINT to every process in the foreground group, **typing ctrl-c will send a SIGINT to your shell, as well as to every process that your shell
created, which obviously isn’t correct.**
Here is the workaround: **After the fork, but before the execve, the child process should call setpgid(0, 0), which puts the child in a new process group whose group ID is identical to the child’s PID.** This ensures that there will be only one process, your shell, in the foreground process group. When you type ctrl-c, the shell should catch the resulting SIGINT and then forward it to the appropriate foreground job (or more precisely, the process group that contains the foreground
job).
為了避免 ctrl-c 後將我們寫的 shell 一起砍掉,需使用 `setpgid(0, 0)` 來將 child 移到別的 group ID
綜合上述的討論並修改 trace03 & trace04 的結果如下:
```c=
/*
* eval - Evaluate the command line that the user has just typed in
*
* If the user has requested a built-in command (quit, jobs, bg or fg)
* then execute it immediately. Otherwise, fork a child process and
* run the job in the context of the child. If the job is running in
* the foreground, wait for it to terminate and then return. Note:
* each child process must have a unique process group ID so that our
* background children don't receive SIGINT (SIGTSTP) from the kernel
* when we type ctrl-c (ctrl-z) at the keyboard.
*/
void eval(char *cmdline)
{
char *argv[MAXARGS]; /* argument list execve() */
pid_t pid;
int bg; /* should the job runs in bg or fg */
int status;
sigset_t mask;
sigemptyset(&mask);
bg = parseline(cmdline, argv);
if (argv[0] == NULL) { /* ignore empty lines */
return;
}
if (!builtin_cmd(argv)) { /* no need to fork buildin command */
sigaddset(&mask, SIGCHLD);
sigprocmask(SIG_BLOCK, &mask, NULL); /* block SIGCHLD */
if ((pid = fork()) == 0) { /* child runs the job */
sigprocmask(SIG_UNBLOCK, &mask, NULL); /* unblock SIGCHLD in child */
setpgid(0,0); /* puts the child in a new process group, GID = PID */
if(execve(argv[0], argv, environ) < 0) {
unix_error("execve error");
}
}
/* adds the child to job list */
addjob(jobs, pid, (bg?BG:FG), cmdline);
sigprocmask(SIG_UNBLOCK, &mask, NULL); /* unblock SIGCHLD */
if (!bg) { /* parent waits for fg job terminate */
waitfg(pid);
}
else { /* shows information of bg job */
printf("[%d] (%d) %s", pid2jid(pid), pid, cmdline);
}
}
return;
}
```
```c=
/*
* waitfg - Block until process pid is no longer the foreground process
*/
void waitfg(pid_t pid)
{
while(pid == fgpid(jobs)) {
sleep(0);
}
return;
}
```
```c=
/*
* sigchld_handler - The kernel sends a SIGCHLD to the shell whenever
* a child job terminates (becomes a zombie), or stops because it
* received a SIGSTOP or SIGTSTP signal. The handler reaps all
* available zombie children, but doesn't wait for any other
* currently running children to terminate.
*/
void sigchld_handler(int sig)
{
pid_t pid;
int status;
while((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) > 0) {
if(WIFEXITED(status)) { /* process terminated normaly */
deletejob(jobs, pid);
}
}
return;
}
```
註:看 `eval` 的開頭註解可以發現尚未解決 SIGINT (ctrl-c) 與 SIGTSTP (ctrl-z) 的問題,這些會在 trace06 一起解決
### trace06 ~ trace08
* trace06.txt - Forward SIGINT to foreground job
* trace07.txt - Forward SIGINT only to foreground job
* trace08.txt - Forward SIGTSTP only to foreground job
當我們輸入 ctrl-c 或 ctrl-z 時,OS 會傳送 SIGINT 或 SIGTSTP 至我們寫的 shell,因此我們還需要使用 `sigint_handler` 來將此訊號再送往對應的 FG job
```c=
/*
* sigint_handler - The kernel sends a SIGINT to the shell whenver the
* user types ctrl-c at the keyboard. Catch it and send it along
* to the foreground job.
*/
void sigint_handler(int sig)
{
pid_t pid = fgpid(jobs);
if(pid != 0) { /* do nothing if no FG job exist */
/* send signal to entire foreground process group */
if(kill(-pid, SIGINT) < 0) {
unix_error("sigint error");
}
}
return;
}
/*
* sigtstp_handler - The kernel sends a SIGTSTP to the shell whenever
* the user types ctrl-z at the keyboard. Catch it and suspend the
* foreground job by sending it a SIGTSTP.
*/
void sigtstp_handler(int sig)
{
pid_t pid = fgpid(jobs);
if(pid != 0) { /* do nothing if no FG job exist */
/* send signal to entire foreground process group */
if(kill(-pid, SIGTSTP) < 0) {
unix_error("sigint error");
}
}
return;
}
```
`sigchld_handler` 也需新增處理 SIGINT (ctrl-c) 與 SIGTSTP (ctrl-z) 的部分:
```c=
/*
* sigchld_handler - The kernel sends a SIGCHLD to the shell whenever
* a child job terminates (becomes a zombie), or stops because it
* received a SIGSTOP or SIGTSTP signal. The handler reaps all
* available zombie children, but doesn't wait for any other
* currently running children to terminate.
*/
void sigchld_handler(int sig)
{
pid_t pid;
int status;
while((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) > 0) {
if(WIFEXITED(status)) { /* process terminated normaly */
deletejob(jobs, pid);
}
if(WIFSIGNALED(status)) { /* process terminated by signals e.g., ctrl-c */
printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
deletejob(jobs, pid);
}
if(WIFSTOPPED(status)) { /* process stopped by signals e.g., ctrl-z */
printf("Job [%d] (%d) stopped by signal %d\n", pid2jid(pid), pid, WSTOPSIG(status));
struct job_t *job = getjobpid(jobs, pid);
job->state = ST;
}
}
if(pid < 0 && errno != ECHILD) {
unix_error("waitpid error");
}
return;
}
```
### trace09
* trace09.txt - Process bg builtin command
* trace10.txt - Process fg builtin command
首先修改 `builtin_cmd`,增加 `bg` 和 `fg` 的判斷
```c=
int builtin_cmd(char **argv)
{
...
...
if(!strcmp(argv[0], "bg") || !strcmp(argv[0], "fg")) { /* bg and fg command */
do_bgfg(argv);
return 1;
}
return 0; /* not a builtin command */
}
```
`do_bgfg` 實作的部分我們先看一下 writeup 給的提示
> Each job can be identified by either a process ID (PID) or a job ID (JID), which is a positive integer assigned by tsh. JIDs should be denoted on the command line by the prefix ’%’. For example, “%5”
denotes JID 5, and “5” denotes PID 5.
> The bg \<job> command restarts \<job> by sending it a SIGCONT signal, and then runs it in the background. The \<job> argument can be either a PID or a JID.
>The fg \<job> command restarts \<job> by sending it a SIGCONT signal, and then runs it in the foreground. The \<job> argument can be either a PID or a JID.
因此整體的解題順序大致為:
1. 解析使用者輸入的參數,即 argv
1. 根據解析結果索取 job 的指標
1. 透過 `kill` 發送 SIGCONT 到目標 job
1. 更新 job 的狀態
1. 根據 BG/FG 進行不同的後續處理,這部分跟 `eval` 在 `execve` 後的處理邏輯一樣
最後結果如下
```c=
/*
* do_bgfg - Execute the builtin bg and fg commands
*/
void do_bgfg(char **argv)
{
char *id = argv[1];
struct job_t *job;
int i;
int length;
if(id == NULL) {
printf("%s command requires PID or %%jobid argument\n",argv[0]);
return;
}
if(id[0] == '%') { /* identified by JID */
id++; /* skip the '%' */
length = strlen(id);
for (i = 0; i < length; i++) { /* check if ID are digit numbers */
if(!isdigit(id[i])) {
printf("%s: argument must be a PID or %%jobid\n", argv[0]);
return;
}
}
job = getjobjid(jobs, atoi(id));
if(job == NULL) {
printf("%%%d: No such job\n", atoi(id));
return;
}
}
else { /* identified by PID */
length = strlen(id);
for (i = 0; i < length; i++) { /* check if ID are digit numbers */
if(!isdigit(id[i])) {
printf("%s: argument must be a PID or %%jobid\n", argv[0]);
return;
}
}
job = getjobpid(jobs, atoi(id));
if(job == NULL) {
printf("(%d): No such process\n", atoi(id));
return;
}
}
kill(-(job->pid), SIGCONT); /* send SIGCONT to the job */
if(!strcmp(argv[0], "fg")) { /* waits until fg job terminates */
job->state = FG;
waitfg(job->pid);
}
else { /* shows information of bg job */
job->state = BG;
printf("[%d] (%d) %s", job->jid, job->pid, job->cmdline);
}
return;
}
```
### trace12 ~ 16
* trace12.txt - Forward SIGTSTP to every process in foreground process group
* trace13.txt - Restart every stopped process in process group
* trace14.txt - Simple error handling
* trace15.txt - Putting it all together
* trace16.txt - Tests whether the shell can handle SIGTSTP and SIGINT signals that come from other processes instead of the terminal
這幾個 trace 基本上是用來檢驗前面幾個步驟是否有疏漏,在此不再贅述
## 延伸問題: `exit()` v.s. `_exit()`
首先認真看一下 `man 3 exit` 與 `man 2 _exit` 的內容
> The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).
**All functions registered with atexit(3) and on_exit(3) are called**, in the reverse order of their registration. **All open stdio(3) streams are flushed and closed. Files created by tmpfile(3) are removed.**
> Note that **a call to execve(2) removes registrations created using atexit(3) and on_exit(3).**
> The function _exit() is like exit(3), but does not call any functions registered with atexit(3) or on_exit(3).
因此若 `fork()` 後 child 的 `execve()` 失敗,child 內應該使用 `_exit()` 而不是 `exit()`,因為 `_exit()` 會把 parent 的 stdio 沖掉、暫存檔刪掉還會呼叫已經不存在的 `atexit(3)` 與 `on_exit(3)`。
```c=
void eval(char *cmdline)
{
...
...
if ((pid = fork()) == 0) { /* child runs the job */
sigprocmask(SIG_UNBLOCK, &mask, NULL); /* unblock SIGCHLD in child */
setpgid(0,0); /* puts the child in a new process group, GID = PID */
if(execve(argv[0], argv, environ) < 0) {
fprintf(stderr, "%s: Command not found\n", argv[0]);
_exit(1);
}
}
...
...
}
```
[reference](https://stackoverflow.com/questions/5422831/what-is-the-difference-between-using-exit-exit-in-a-conventional-linux-fo)
## 延伸問題: async-signal-safe function
:::warning
本章節之後會再行探討,現在的版本有待改善
:::
到此為止雖然所有的 trace 都會顯示正確 (或是說有機會顯示全部正確),但如果有認真上課的話...**會發現我們在撰寫 signal handler 的時候完全無視了 signal safety 的議題**,根據 POSIX 規範,signal handler 裡面只能使用 async-signal-safe function,然而 `printf` 並不是!
[man 7 signal-safety](http://man7.org/linux/man-pages/man7/signal-safety.7.html)
> An async-signal-safe function is one that can be safely called from within a signal handler. Many functions are not async-signal-safe. In particular, nonreentrant functions are generally unsafe to call from a signal handler.
> The kinds of issues that render a function unsafe can be quickly understood when one considers the implementation of the *stdio* library, all of whose functions are not async-signal-safe.
> When performing buffered I/O on a file, the stdio functions must maintain a statically allocated data buffer along with associated counters and indexes (or pointers) that record the amount of data and the current position in the buffer. Suppose that the main program is in the middle of a call to a stdio function such as printf(3) where the buffer and associated variables have been partially updated. If, at that moment, the program is interrupted by a signal handler that also calls printf(3), then the second call to printf(3) will operate on inconsistent data, with unpredictable results.
在我測試 trace file 時也確實發生過這種狀況,signal 在 main 執行某個 system call 的時候發生,導致 signal handler 會在中途執行
```shell
$ make test07
./sdriver.pl -t trace07.txt -s ./tsh -a "-p"
#
# trace07.txt - Forward SIGINT only to foreground job.
#
tsh> ./myspin 4 &
[1] (2345) ./myspin 4 &
tsh> ./myspin 5
waitpid error: Interrupted system call
```
> To avoid problems with unsafe functions, there are two possible choices:
> 1. Ensure that (a) the signal handler calls only async-signal-safe functions, and (b) the signal handler itself is reentrant with respect to global variables in the main program.
> 2. Block signal delivery in the main program when calling functions that are unsafe or operating on global data that is also accessed by the signal handler.
因此可行的方案為
* 在 signal handler 內只使用 async-signal-safe functions,且與 main 不具有共用的 global variables
* 在 main 內操作與 handler 共用的 global variables 時,需暫時擋住 signal
我們將修改的流程分為兩個步驟
1. 確保只使用 async-signal-safe functions
2. 確保 global variables 的安全性
### 步驟1 - 確保只使用 async-signal-safe functions
CS:APP 其實有提供 csapp.c 的輔助文件,內含可以在 signal handler 內使用的方程式,節錄我們會用到的部分如下
```c=
/* signal-safe I/O functions ported from csapp.c */
static size_t sio_strlen(char s[])
{
int i = 0;
while (s[i] != '\0')
++i;
return i;
}
ssize_t sio_puts(char s[]) /* Put string */
{
return write(STDOUT_FILENO, s, sio_strlen(s)); //line:csapp:siostrlen
}
void sio_error(char s[]) /* Put error message and exit */
{
sio_puts(s);
_exit(1); //line:csapp:sioexit
}
```
`unix_error` 可以很簡單的改為 `sio_error`,但 `printf` 的部分比較麻煩,因為 `sio_puts` 不支援格式操作 (%d, %s 等),因此需要在 signal handler 內設置 flag,之後再 `main` 中根據 flag 執行 `printf`
### 步驟2 - 確保 global variables 的安全性
以本題的狀況考慮,共用的 global variable 是 jobs 這個物件,以下列出會存取這個物件的方程式
```
main
└getjobpid
eval
├addjob
└pid2jid
builtin_cmd
└listjobs
do_bgfg
├getjobjid
└getjobpid
sigchld_handler sigint_handler sigtstp_handler
├deletejob └fgpid └fgpid
└fgpid
```
可行的方案有兩個
1. 把 `deletejob` 和 `fgpid` 移出 signal handler
2. 把全部取用 jobs 的方程式都用 sigpromask 包起來...
由於本題要求的關係,方案1反而更難實行 (主要是實作 `waitfg` 的部分會因此變得很麻煩),因此採用方案 2,也就是將全部取用 jobs 的方程式都用 `sigpromask` 包起來。
但仔細想想並非所有的都需要包起來,其中 eval 內的 `pid2jid` 以及 handler 內的 `fgpid` 是不用特別處理也不會有問題,前者是就算衝突也只是列出不對的數據,後者是因為 main 中所有取用 jobs 的方程式都用 `sigpromask` 包起來了,不可能衝突。
為了避免版面太雜亂,以下僅列出關鍵的部分,只新增 `sigpromask` 的部分就不特別列出來了
```c=
/* Global flag variables for signal handlers */
volatile sig_atomic_t sigint_flag = 0;
volatile sig_atomic_t sigstp_flag = 0;
volatile pid_t sigint_pid = 0;
volatile pid_t sigstp_pid = 0;
volatile pid_t sigint_jid = 0;
volatile pid_t sigstp_jid = 0;
volatile int sigint_WIF;
volatile int sigstp_WIF;
volatile sig_atomic_t fgjob_flag = 0;
int main(int argc, char **argv)
{
...
...
...
while (1) {
...
...
/* Evaluate the command line */
eval(cmdline);
/* signal hamdling */
if(sigint_flag) {
printf("Job [%d] (%d) terminated by signal %d\n", sigint_jid, sigint_pid, sigint_WIF);
sigint_flag = 0;
}
if(sigstp_flag) {
printf("Job [%d] (%d) stopped by signal %d\n", sigstp_jid, sigstp_pid, sigstp_WIF);
struct job_t *job = getjobpid(jobs, sigstp_pid);
job->state = ST;
sigstp_flag = 0;
}
fflush(stdout);
}
exit(0); /* control never reaches here */
}
void sigchld_handler(int sig)
{
pid_t pid;
int status;
int olderrno = errno; /* prevent errno overwrite by signal handler */
sigset_t mask_all, prev_all;
sigfillset(&mask_all);
while((pid = waitpid(-1, &status, WNOHANG|WUNTRACED)) > 0) {
/* process terminated normaly */
if(WIFEXITED(status)) {
if(pid == fgpid(jobs)) {
fgjob_flag = 0;
}
/* block all signals while running critical code */
sigprocmask(SIG_BLOCK, &mask_all, &prev_all);
deletejob(jobs, pid);
sigprocmask(SIG_SETMASK, &prev_all, NULL);
}
/* process terminated by signals e.g., ctrl-c */
if(WIFSIGNALED(status)) {
if(pid == fgpid(jobs)) {
fgjob_flag = 0;
}
sigint_flag = 1;
sigint_pid = pid;
sigint_jid = pid2jid(sigint_pid);
sigint_WIF = WTERMSIG(status);
/* block all signals while running critical code */
sigprocmask(SIG_BLOCK, &mask_all, &prev_all);
deletejob(jobs, pid);
sigprocmask(SIG_SETMASK, &prev_all, NULL);
}
/* process stopped by signals e.g., ctrl-z */
if(WIFSTOPPED(status)) {
if(pid == fgpid(jobs)) {
fgjob_flag = 0;
}
sigstp_flag = 1;
sigstp_pid = pid;
sigstp_jid = pid2jid(sigstp_pid);
sigstp_WIF = WSTOPSIG(status);
}
}
if(pid < 0 && errno != ECHILD) {
sio_error("waitpid error");
}
errno = olderrno;
return;
}
```
###### tags: `csapp`