quiz 7A - shell

# quiz 7A - shell contributed by < `limaoxd` > ## 專題說明本專題是 [Unix Shell](https://en.wikipedia.org/wiki/Unix_shell) 的實作 , 並能夠使用一部份 **shell** 常見的指令 . **shell** 簡單的迴圈概念如以下步驟, * **Read :** 從 stdin 標準輸入流讀取命令. * **Parse :** 分析並拆解成程式名和參數. * **Execute :** 執行拆解後的命令. 先跳過輸入的字串拆解成程式名和參數,會留到**程式解析**. :::warning 改進漢語表達，力求通順和簡潔 :notes: jserv ::: * 如果需要將呼叫的程式拿來執行，就要分成不同的行程. 使用 `fork()` 系統呼叫，建立目前行程的副本，以`fork` 的傳回值進行判斷: 父行程 (即原本的行程) 得到的回傳值為 **PID(Process ID)** , 而子行程為 0 * 接下來就需要將子行程替換成命令要求的程式 , 這時候透過 `execvp()` 將行程做替換 , 與此同時父行程能夠繼續的執行其他事情,如果使用 `wait()` 的話,則繼續**監視**子行程. * 有些功能不能將功能交給子行程做 , 常見的如 `cd` , `exit` 等等的必須由父行程執行 , 比如說使用子行程做 `cd` 的話,並沒有改變 **shell** 本身, 所以需要自己寫個函式去執行這些功能 . ::: info 更加詳盡的說明 , 可以參考 [Tutorial: Write a Shell in C](https://brennan.io/2015/01/16/write-a-shell-in-c/) ::: --- ## 程式解析 #### ++prompt 命令列起始符號++ ``` cpp static void prompt() { write(2, "$ ", 2); } ``` * `prompt()` 為 **command_line** 的開頭 , 而將 `$` 命令列起始符號印出來 #### ++fatal 報錯處理++ ``` cpp /* Display error message, optionally - exit */ static void fatal(int retval, int leave) { if (retval >= 0) return; write(2, "?\n", 2); if (leave) exit(1); } ``` * 報錯處理 * `fatal` 目的為判斷行程執行是否出錯 , 其參數有 : `retval` , `leave` * `retval` 為判斷是否出錯 , `leave` 為是否終止行程的參數 * 行程出錯的話,顯示 `?` 來做為錯誤訊息 :::warning 或許能夠做到改造 `fatal` 函式 , 善用 **stderr** 的話 , 使其錯誤訊息更加詳盡 . ::: #### ++字符判斷函式++ ```cpp static inline int is_delim(int c) { return c == 0 || c == '|'; } static inline int is_redir(int c) { return c == '>' || c == '<'; } static inline int is_blank(int c) { return c == ' ' || c == '\t' || c == '\n'; } static int is_special(int c) { return is_delim(c) || is_redir(c) || is_blank(c); } ``` * 這些函式是判斷字串是否為特殊字符而建立的 . 之後在 `run()` 做**字串解析**時會用到 ### 主程式分析 #### ++main 主函式++ ``` cpp int main() { while (1) { prompt(); char buf[512] = {0}; /* input buffer */ char *c = buf; if (!fgets(c + 1, sizeof(buf) - 1, stdin)) exit(0); for (; *++c;) /* skip to end of line */ ; run(c, 0); } return 0; } ``` * 為了獲得輸入的命令 , 一開始 `main()` 透過 `fgets()` 將 **buffer** 抓下來, ``` graphviz digraph array1 { node [shape=plaintext, fontcolor=black, fontsize=18]; "Pointers:" -> "Array:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> c | <f1> c+1 | <f2> c+2 | <f3> c+3 | <f4> ... | <f5> c+511", color=white]; array [label="<f0> buf[0] | <f1> buf[1] | <f2> buf[2] | <f3> buf[3] | <f4> ... | <f5> buf[511]", fillcolor=white, style=filled]; { rank=same; "Pointers:"; pointers } { rank=same; "Array:"; array } } ``` * 不過 `fgets()` 將 `c+1` 也就是 `buf[1]` 的位置 ,設定為字串開頭有一個重要的原因 , 在字串拆解後會一併解釋 . 在判斷完是否成功的 `fgets()` 後 , 開始用 **for** 迴圈跑到字串的結尾 (該處為 `\0` ) ``` graphviz digraph array2 { node [shape=plaintext, fontcolor=black, fontsize=18]; "Pointers:" -> "Values:" [color=white]; node [shape=record, fontcolor=black, fontsize=14, width=4.75, fixedsize=true]; pointers [label="<f0> c-17| <f1> c-16| <f2> c-15| <f3> c-14| <f4> | <f5> c-1| <f6> c", color=white]; array [label="<f0> 0 | <f1> e | <f2> c | <f3> h | <f4> ... | <f5> d| <f6> 0", fillcolor=white, style=filled]; { rank=same; "Pointers:"; pointers } { rank=same; "Values:"; array } } ``` :::warning 該處可以使用 [Binary search algorithm](https://en.wikipedia.org/wiki/Binary_search_algorithm) 將**時間複雜度**從 ***O(n)*** 變成 ***O(log n)*** 不過長度 511 要省也省不到哪裡去. > 尤其是字串長度小於 8 的話 , 不用二分搜會比較快將遍歷改良成 [Binary search algorithm](https://en.wikipedia.org/wiki/Binary_search_algorithm) 👇 ``` cpp int lwBound = 1 , upBound = 511 , p; while(lwBound < upBound){ /* if lwBound >= upBound then break */ p = (lwBound + upBound) / 2; if (!*(c + p)) /* if pivot value == NULL */ upBound = p; else lwBound = p+1; } c = c+p; ``` > 需設計實驗來證明該方法的成效 ::: ### 父子行程解析 > 我先將 `run()` 切成 2 個部分 , 並拆成許多小部分做講解. #### ++**run() - part 1**++ ``` cpp static void run(char *c, int t) { char *redir_stdin = NULL, *redir_stdout = NULL; int pipefds[2] = {0, 0}, outfd = 0; char *v[99] = {0}; char **u = &v[98]; /* end of words */ ... ``` * 函式參數 * `c` 為紀錄 **buf** 位置的指標 (從 `main()` 那邊傳遞過來的) * `t` **filedescriptor** 的數值 , 若為 `0` , 代表沒有使用 **pipe** 將 **stdout** 標準輸出流做替換 * 宣告變數 * `redir_stdin` 為紀錄 **infile** 字串 * `redir_stdout` 則記錄 **outfile** * `v` 是紀錄要 **split** 後的指標字串陣列 (可視為 **2** 維字元陣列), 其 **row** 數量為 **99** * `u` 則是類似 `main()` 裡的 `c` 一樣 ,　但是它記錄的是 `v` 的 **row** 的位置 > 接下來 , 便開始將字串拆解 . > 由字串結尾 ( 一開始的 c ) 開始往字串前遍歷 , 並且分別遇到 `is_delim()` , `!is_special()` 和 `is_redir()` 後,分別作出不同的處理. ```cpp if (is_delim(*c)) /* if NULL (start of string) or pipe: break */ break; ``` * 遍歷時 , 遇到了 `\0` 或是 **pipe 字元** * 該情形是為了判斷是否到達了 **0** 的位置又或者是遇到 **pipe** * 判斷到達了頭的原因是因為在使用 `fgets()` 時 , 將 `buf[0]` 保留 \0, 這就是為甚麼要將 `*fgets(char *s, int size, FILE *stream)` 的 `s` , 位置設定在 `c+1` 的原因了 . * 最後直接 **break** 來停止讀取字串 :::danger 如果沒有做保留一個 \0 的話,讀到 -1 位置的值時 , 不一定是 \0. ::: ```cpp if (!is_special(*c)) { /* Copy word of regular chars into previous u */ char *delim = " |<>\n\t"; u--; while(!is_special(*c)) c--; *u = strtok(c+1,delim); } ``` * 遍歷到字串的結尾 * before ``` graphviz digraph structs { rankdir = LR; node [shape=record]; hashTable [label="<f0>0|<f1>...|<f2>95|<f3>96|<f4>97|<f5>u[0] 98"]; hashTable:f5 -> hashTable:f4 [label = "u--"]; } ``` * * after (repeat twice) ``` graphviz digraph structs { rankdir = LR; node [shape=record]; hashTable [label="<f0>0|<f1>...|<f2>95|<f3>u[0] 96|<f4>u[1] 97|<f5>u[2] 98"]; str2 [label="<f0> echo"] str1 [label="<f0> hello"]; str [label="<f0> world"]; hashTable:f3 -> str2:f0; hashTable:f4 -> str1:f0; hashTable:f5 -> str:f0; } ``` * * 開始讀取到字符後便對其進行遍歷 , 直到碰到特殊字符 * 這樣可以抵達到要==拆解字串的開頭前== * 之後進行字串分割 , 這裡我採用了 `strtok()` , 相較於 `strncpy()` 和 `strndup()` , 不需要在程式裡進行 **count** . :::warning 直接使用 `strncpy()` 會導致 [Segmentation fault](https://en.wikipedia.org/wiki/Segmentation_fault) , 必須要在結尾處補上 `\0` 而使用 `strndup()` 則是動態記憶體配置 , 最後需要用 free()　來進行釋放 , 避免 [Memory leak](https://en.wikipedia.org/wiki/Memory_leak) ::: ```cpp if (is_redir(*c)) { /* If < or > */ if (*c == '<') redir_stdin = *u; else redir_stdout = *u; if ((u - v) != 98) u++; } ``` * 遍歷到重新導向字符 * 判斷是重新導向的字符後 , 便對其記錄其為標準輸入亦或輸出 * `redir_stdin` 會記錄下 **infile** 的檔名 , `redir_stdout` 也是如此紀錄 **outfile** * 紀錄完 file 後 , 之後再讀取到字串的話 , 因為 u++ 而將紀錄 file 的字串取代掉 , 不需要保留 file 在 u 裡 ```cpp if ((u - v) == 98) /* empty input */ return; ``` * **part1** 的最後 * 確認 `u` 的位子沒有從 **98** 改動的話 , 則為沒有輸入 , 結束主行程就好. #### ++**run()-part2**++ > 暨剛剛在專題簡介有介紹到主程式有需要自己執行的功能 , 不能交由子行程執行的程式. > 底下會解析該功能如何實現 ```cpp if (!strcmp(*u, "cd")) { /* built-in command: cd */ fatal(chdir(u[1]), 0); return; /* actually, should run() again */ } ``` * 在主行程完成 * 因為由 **shell** 執行 , 並不需要 `fork()` 直接使用 `chdir()` 後 , 即可直接結束行程. * 因為沒叫出子行程 , 如果報錯並不需要進行 **exit** 的動作 . ```cpp if (*c) { pipe(pipefds); outfd = pipefds[1]; /* write end of the pipe */ } pid_t pid = fork(); if (pid) { /* Parent or error */ fatal(pid, 1); if (outfd) { run(c, outfd); /* parse the rest of the cmdline */ close(outfd); /* close output fd */ close(pipefds[0]); /* close read end of the pipe */ } wait(0); return; } ``` ```graphviz digraph pipe { rankdir=LR ranksep=1 a [label = "pipefds[0]"] b [label = "pipe" , shape=record] c [label = "pipefds[1]"] a -> b b -> c } ``` ```graphviz digraph pipe { rankdir=LR ranksep=1 a [label = "front"] c [label = "behind"] a -> c } ``` * 接下來就是檢查 `c` 位置是否不在 `buf[0]` 的位置 , 如果不在代表遇到了 `|` * 這個時候 , 使用 **pipe** 會建立管道從 `pipefds [0]` 送到 `piprfds [1]` , 後面只要用 `dup2()` 取代主程式的 **stdout** 即可達到行程的溝通 . * 雖然說一開始 `u` 是從==c 後面開始遍歷== , 照理來說會使 `|` 後面的程式先執行 , 但是使用 **pipe** 後 , 如果 `|` 前的程式沒有先執行的話 , `|` 後的程式會無法執行 ```cpp if (outfd) { dup2(pipefds[0], 0); /* dup read fd to stdin */ close(pipefds[0]); /* close read fd */ close(outfd); /* close output */ } if (t) { dup2(t, 1); /* replace stdout with t */ close(t); } ``` >這邊就很好的解釋為甚麼 **pipe** 會先等待前面的程式執行完 * 底下的列表為說明預設的 **fd** , 在進行 `dup2()` 或 `close()` 對輸入或輸出流進行操作 | Handle | Name | Description | | -------- | -------- | -------- | | 0 | stdin | Standard input | | 1 | stdout | Standard output | | 2 | stderr | Standard error | * 判斷 `outfd` 的屬於 **pipe** 後面的程式 , 使用 `dup2()` 將 **stdin** 變成 **pipefds [0]** , 如果輸入為空的話 , 便會繼續等待 . * 這個時候前面的程式需要將 **stdout** 替換成 `pipefds [1]` 後 , 透過==管道輸出到後面==的程式.這樣就可以達成前面的程式先跑 , 後面的程式繼續 . ```cpp if (redir_stdin) { close(0); /* replace stdin with redir_stdin */ fatal(open(redir_stdin, 0), 1); } if (redir_stdout) { close(1); fatal(creat(redir_stdout, 438), 1); /* replace stdout with redir */ } ``` * 剛剛在 `!is_special()` 那邊做字串拆解時 , 有將 **infile** , **outfile** 做儲存 , 這時候將輸入或輸出流替換成 **file** 後即可完成重新導向 :::info 1. `int open(const char *pathname, int flags)` * 第二個參數為 flags 分別有 `O_CREAT` , `O_EXCL` , `O_NPCTTY` 等等 , 對於文件的讀取以不同方式打開亦能創建檔案 . 就算文件開啟後 , 依然能以 **fcntl** 進行修改 . 可以參考 : https://linux.die.net/man/2/open 2. `int creat(const char *path, mode_t mode); ` * 等同於部分 `open()` 功能的函式 > A call to creat() is equivalent to calling open() with flags equal to O_CREAT|O_WRONLY|O_TRUNC. >來源 : https://man7.org/linux/man-pages/man2/open.2.html * 不同於 `flags` , `mode` 代表了使用者權限 (**permission**) * 第一周提供的 [鳥哥的 linux 私房菜第五章](http://linux.vbird.org/linux_basic/0210filepermission.php) 中有提到使用 `ls -al` 可以列出檔案的權限如 **-rwxrwx---** , * 比對參數 **S_IRWXU(700)** **S_IRUSR(400)** **S_IWUSR(200)** 得知參數為八進位 * 由此得知將 `438` 轉成八進位 `666` , 為 **-rw-rw-rw-** **User , Group , World** 皆可讀可寫 ::: --- ## 延伸問題 ### 導向到 **stderr** 上述程式並未含有 `2>` 的延伸功能 , 故對 `run()` 和 `fatal()` 做小幅度的改變 * 改動部分 * 在 `!is_special()` 判斷中 , 對 `is_redirect()` 額外新增對 c[-1] 是否為 **2** 做了判斷 . * 使用 `dup2()` 將 **stderr** 導入到 `redir_stdout` ``` cpp int redir_stderr = 0; ``` * 新增一個 **int** 變數 , 來確認是否使用 **stderr** 導入到 `redir_stdout` ```cpp if (is_redir(*c)) { /* If < or > */ if (*c == '<') redir_stdin = *u; else{ redir_stdout = *u; if (c[-1] == '2') { redir_stderr = 2; c--; } } if ((u - v) != 98) u++; } ``` * 更改 `redir_stderr` 來確認是有導向到 **stderr** * 如改動部分所式 , 確認 `c[-1]` 是否有 **2** , 如果有將 `redir_stderr` 更改成 2 * 並且再做一次 `c--` 避免被紀錄在 `u` 裡 ```cpp static int fatal(int retval, int leave) { if (retval >= 0) return retval; write(2, "?\n", 2); if (leave) exit(1); } ``` ``` cpp if (redir_stdout) { close(1); outfd = fatal(creat(redir_stdout, 438),1); /* replace stdout with redir */ if(redir_stderr){ dup2(outfd, redir_stderr); close(outfd); } } ``` * 將 `fatal()` 改為 **int** 函式 , 在沒有出錯的情況回傳 `retavl` * 當確認要導向到 **stderr** , 使用 `dup2()` 實作出此功能輸入以下 **command** 來確認是否執行成功 ```shell $ command1 2> x $ cat x ``` 輸出結果 ⬇ ```shell ? ``` 最後 `cat` 出來的內容為 `?` 符合 **fatal()** 裡的實作的報錯訊息 --- >還剩下 `>>` (append) 功能和 `&` (背景執行) , 和搭配 User - Mode Linux 用 `picosh.c` 取代 **init** 行程