此筆記主要紀錄 CS:APP 作業 shell lab 的解題思路,由於是一步一腳印撰寫,並非寫完再事後筆記,因此會顯得比較雜亂
TODO:
要完成 shell lab 需先對 exceptional control flow (ECL) 有基礎的認知,建議先至少閱讀過 CS:APP 第 8 章再進行此作業
fork
called once but return twice
waitpid
init (pid = 1)
reaps zombie after parent terminated, so explicit reaping are only necessary for
pid_t pid
int option
int *status
If the status argument is non-NULL, then waitpid encodes status information about the child that caused the return in the status argument. The wait.h
include file defines several macros for interpreting the status argument:
Error message accordingly can be accessed through strerror(errno)
wait(&status)
== waitpid(-1, &status, 0)
getpid
getpid
returns the PID of the calling processgetppid
returns the PID of its parentexecve
The execve function loads and runs the executable object file filename
with the argument list argv
and the environment variable list envp
. While it overwrites the address space of the current process, it does not create a new process. The new program still has the same PID, and it inherits all of the file descriptors that were open at the time of the call to the execve function.
envp
getenv
searches the envp[] for a string "name=value"
setenv
searches the envp[] for a string "name=oldvalue" and replaces oldvalue with newvalue
setenv
adds "name=newvalue" to the envp[]unsetenv
searches the envp[] for a string "name=value" and deletes itkill
- send signal, not kill!signal
a cleaner approach using wrapper:
直接看 man 比較清楚!
->man sigaction
sigprocmask
how
oldset
is non-NULL, the previous value of the blocked bit vector is stored in oldset
int sigemptyset(sigset_t *set)
initializes set to the empty setint sigfillset(sigset_t *set)
adds every signal to setint sigaddset(sigset_t *set, int signum)
adds signum to setint sigdelset(sigset_t *set, int signum)
deletes signum from set[CS:APP 8.4.6] A shell performs a sequence of read/evaluate steps, and then terminates. The read step reads a command line from the user. The evaluate step parses the command line and runs programs on behalf of the user.
首先看教科書範例:
接下來會依照作業說明,根據 trace file 的順序完成作業
基本上一開始就是完成的=_=
stdin
就是簡單的字符偵測,先交給 parseline
分析輸入的指令串,再交給 builtin_cmd
偵測
這兩個 trace 可以一起做,具體的邏輯順序如下
parseline
分析輸入的指令串,判斷是否為 FG/BG jobfork
出 child 並讓其 execve
執行 job內鍵指令 jobs
的實作很簡單,就是登錄一個指令到 builtin_cmd
,實作的部分則直接呼叫 listjobs
來列出 job list 的內容 (作業題目一開始就已提供)
接下來需要處理 job list 的登錄問題,共需處理以下幾個點:
fork
後需要呼叫 addjob
來將 child 登錄到 job listdeletejob
,這會再細分為兩個狀況
sigchld_handler
負責 reap在實作前我們先注意一下 writeup 中給予的提示:
One of the tricky parts of the assignment is deciding on the allocation ofwork between the waitfg and sigchld handler functions. We recommend the following approach:
– In waitfg, use a busy loop around the sleep function.
– In sigchld handler, use exactly one call to waitpid.
While other solutions are possible, such as calling waitpid in both waitfg and sigchld handler,
these can be very confusing. It is simpler to do all reaping in the handler.
writeup 中提到在 waitfg
及 sigchld_handler
內皆使用 waitpid
雖然可能是可行的方案,但建議還是交由 sigchld_handler
來統一 reap child 以避免混亂,因此一開始在 trace03 部分的寫法是不行的,後面需再修改
The parent needs to block the SIGCHLD signals in this way in order to avoid the race condition where the child is reaped by sigchld handler (and thus removed from the job list) before the parent
calls addjob.
另外,需在 parent 執行 fork
前先阻擋住 SIGCHLD,並在 parent 執行 addjob
後再解封 SIGCHLD,否則有可能會因為 child 先執行完畢,造成 sigchld_handler
在 parent 執行 addjob
前就先執行 deletejob
並造成 race condition
When you run your shell from the standard Unix shell, your shell is running in the foreground process group. If your shell then creates a child process, by default that child will also be a member of the foreground process group. Since typing ctrl-c sends a SIGINT to every process in the foreground group, typing ctrl-c will send a SIGINT to your shell, as well as to every process that your shell
created, which obviously isn’t correct.
Here is the workaround: After the fork, but before the execve, the child process should call setpgid(0, 0), which puts the child in a new process group whose group ID is identical to the child’s PID. This ensures that there will be only one process, your shell, in the foreground process group. When you type ctrl-c, the shell should catch the resulting SIGINT and then forward it to the appropriate foreground job (or more precisely, the process group that contains the foreground
job).
為了避免 ctrl-c 後將我們寫的 shell 一起砍掉,需使用 setpgid(0, 0)
來將 child 移到別的 group ID
綜合上述的討論並修改 trace03 & trace04 的結果如下:
註:看 eval
的開頭註解可以發現尚未解決 SIGINT (ctrl-c) 與 SIGTSTP (ctrl-z) 的問題,這些會在 trace06 一起解決
當我們輸入 ctrl-c 或 ctrl-z 時,OS 會傳送 SIGINT 或 SIGTSTP 至我們寫的 shell,因此我們還需要使用 sigint_handler
來將此訊號再送往對應的 FG job
sigchld_handler
也需新增處理 SIGINT (ctrl-c) 與 SIGTSTP (ctrl-z) 的部分:
首先修改 builtin_cmd
,增加 bg
和 fg
的判斷
do_bgfg
實作的部分我們先看一下 writeup 給的提示
Each job can be identified by either a process ID (PID) or a job ID (JID), which is a positive integer assigned by tsh. JIDs should be denoted on the command line by the prefix ’%’. For example, “%5”
denotes JID 5, and “5” denotes PID 5.
The bg <job> command restarts <job> by sending it a SIGCONT signal, and then runs it in the background. The <job> argument can be either a PID or a JID.
The fg <job> command restarts <job> by sending it a SIGCONT signal, and then runs it in the foreground. The <job> argument can be either a PID or a JID.
因此整體的解題順序大致為:
kill
發送 SIGCONT 到目標 jobeval
在 execve
後的處理邏輯一樣最後結果如下
這幾個 trace 基本上是用來檢驗前面幾個步驟是否有疏漏,在此不再贅述
exit()
v.s. _exit()
首先認真看一下 man 3 exit
與 man 2 _exit
的內容
The exit() function causes normal process termination and the value of status & 0377 is returned to the parent (see wait(2)).
All functions registered with atexit(3) and on_exit(3) are called, in the reverse order of their registration. All open stdio(3) streams are flushed and closed. Files created by tmpfile(3) are removed.
Note that a call to execve(2) removes registrations created using atexit(3) and on_exit(3).
The function _exit() is like exit(3), but does not call any functions registered with atexit(3) or on_exit(3).
因此若 fork()
後 child 的 execve()
失敗,child 內應該使用 _exit()
而不是 exit()
,因為 _exit()
會把 parent 的 stdio 沖掉、暫存檔刪掉還會呼叫已經不存在的 atexit(3)
與 on_exit(3)
。
本章節之後會再行探討,現在的版本有待改善
到此為止雖然所有的 trace 都會顯示正確 (或是說有機會顯示全部正確),但如果有認真上課的話…會發現我們在撰寫 signal handler 的時候完全無視了 signal safety 的議題,根據 POSIX 規範,signal handler 裡面只能使用 async-signal-safe function,然而 printf
並不是!
An async-signal-safe function is one that can be safely called from within a signal handler. Many functions are not async-signal-safe. In particular, nonreentrant functions are generally unsafe to call from a signal handler.
The kinds of issues that render a function unsafe can be quickly understood when one considers the implementation of the stdio library, all of whose functions are not async-signal-safe.
When performing buffered I/O on a file, the stdio functions must maintain a statically allocated data buffer along with associated counters and indexes (or pointers) that record the amount of data and the current position in the buffer. Suppose that the main program is in the middle of a call to a stdio function such as printf(3) where the buffer and associated variables have been partially updated. If, at that moment, the program is interrupted by a signal handler that also calls printf(3), then the second call to printf(3) will operate on inconsistent data, with unpredictable results.
在我測試 trace file 時也確實發生過這種狀況,signal 在 main 執行某個 system call 的時候發生,導致 signal handler 會在中途執行
To avoid problems with unsafe functions, there are two possible choices:
- Ensure that (a) the signal handler calls only async-signal-safe functions, and (b) the signal handler itself is reentrant with respect to global variables in the main program.
- Block signal delivery in the main program when calling functions that are unsafe or operating on global data that is also accessed by the signal handler.
因此可行的方案為
我們將修改的流程分為兩個步驟
CS:APP 其實有提供 csapp.c 的輔助文件,內含可以在 signal handler 內使用的方程式,節錄我們會用到的部分如下
unix_error
可以很簡單的改為 sio_error
,但 printf
的部分比較麻煩,因為 sio_puts
不支援格式操作 (%d, %s 等),因此需要在 signal handler 內設置 flag,之後再 main
中根據 flag 執行 printf
以本題的狀況考慮,共用的 global variable 是 jobs 這個物件,以下列出會存取這個物件的方程式
可行的方案有兩個
deletejob
和 fgpid
移出 signal handler由於本題要求的關係,方案1反而更難實行 (主要是實作 waitfg
的部分會因此變得很麻煩),因此採用方案 2,也就是將全部取用 jobs 的方程式都用 sigpromask
包起來。
但仔細想想並非所有的都需要包起來,其中 eval 內的 pid2jid
以及 handler 內的 fgpid
是不用特別處理也不會有問題,前者是就算衝突也只是列出不對的數據,後者是因為 main 中所有取用 jobs 的方程式都用 sigpromask
包起來了,不可能衝突。
為了避免版面太雜亂,以下僅列出關鍵的部分,只新增 sigpromask
的部分就不特別列出來了
csapp