Process - HackMD

# Process: a familiar stranger ## Background 所有人可能都曾經上過作業系統的課程聽過process或者是曾經在實作中利用`strace`,`ps`等命令觀察過process行為又聽過thread, coroutine等等相關連的名詞到底process是什麼？本文目的就是希望深入探討process的本質和其相關技術延伸 ## What is process ? 簡單來說process是一個**running program** 是對於processor(e.g. CPU)的抽象概念對於這個名詞來說, 這段解釋可能就足夠了但為什麼我們需要process?它能做到什麼？深入探討之前我們先定義好名詞在OS實作當中會分為low-level偏向硬體的**mechanism**還有high-level偏向抽象算法等的**policies** 現代電腦當中使用者可能希望同時運行多個程式, 多個運行中的程式？那不就是多個process嗎？但process如果是processor的抽象實作, 那能夠運行的process數量不就受限於processor? 普通筆電可能只有八核, 難道你一次只能跑八個程式？為了解決此問題, OS創造**Illusion of many CPUs**, 最好能達到**nearly endless supply of CPUs** 最基本的方法就是透過**time sharing** 如果要更加具體的體會process是什麼可以說process是它**能夠使用或影響的資源的組成** * memory **address space**: 該process能access的memory範圍 * register * I/O info 接著簡介幾個process可以進行的API * Create 具體創建process的過程是, 將放在disk中的executable program給load進入process的address space 現代OS的loading process是屬於**lazily** 就是利用paging, swapping等技術來達到只把當下需要的資源,code給load進來接著需要分配記憶體空間給例如stack, heap等然後進行I/O setup等 (UNIX system中, 每個process預設就有三個file descriptors) * Destroy * Wait * Miscellaneous Control * Status process會利用process list, PCB等資料結構紀錄自己的status等 ## Process API ### fork() `fork()` system call是用來創建新的process的考慮以下程式碼 ```clike #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main () { printf("Hello I am pid:%d\n", (int) getpid()); int rc = fork(); if (rc < 0) { fprintf(stderr, "fork failed\n"); exit(1); } else if (rc == 0) { printf("hello, I am (pid:%d), my rc is %d\n", (int) getpid(), rc); } else { printf("hello, I am (pid:%d), my rc is %d\n", (int) getpid(), rc); } return 0; } ``` ```bash $ ./main Hello I am pid:12301 hello, I am (pid:12301), my rc is 12302 hello, I am (pid:12302), my rc is 0 ``` 可以發現parent process呼叫`fork()`得到的回傳值會是child的pid child表現得像它直接呼叫完`fork()`然後繼續執行一樣得到的回傳值會是0 另外這個程式其實是non-determinism(related to CPU scheduler) 詳細可以閱讀[fork(2)](https://man7.org/linux/man-pages/man2/fork.2.html) 以下點出幾個重點 * The child process and the parent process run in separate memory spaces. both memory spaces have the same content. Memory writes, file mappings (mmap(2)), and unmappings (munmap(2)) performed by one of the processes do not affect the other. * parent和child依舊有數個不一樣的點 * The child does not inherit its parent's memory locks * Process resource utilizations (getrusage(2)) and CPU time counters (times(2)) are reset to zero in the child. * The child does not inherit semaphore adjustments from its parent (semop(2)). * The child does not inherit timers from its parent * The child does not inherit outstanding asynchronous I/O operations from its parent ### wait() 以上程式的輸出結果可能隨著CPU schedule的不同而有不同的結果有可能是child會先輸出那要如何確保執行順序呢？就是使用[wait(2)](https://man7.org/linux/man-pages/man2/waitpid.2.html) 許多人可能認為`wait()`就是等待child process結束這麼理解不能說是錯的, 但規格書的定義是 > All of these system calls are used to wait for state changes in a child of the calling process, and obtain information about the child whose state has changed. 這裡說的是**改變狀態**, 所以不只是從running到terminate 甚至是從stop到resume都一樣會讓`wait()`傳送返回值給parent 如果把程式改寫如下 ```diff } else { - printf("hello, I am (pid:%d), my rc is %d\n", (int) getpid(), rc); + int wc = wait(NULL); + printf("hello, I am (pid:%d), my wc is %d, my rc is %d\n", (int) getpid(), wc, rc); } return 0; } ``` ```bash $ ./main Hello I am pid:13726 hello, I am (pid:13727), my rc is 0 hello, I am (pid:13726), my wc is 13727, my rc is 13727 ``` 現在可以確保child process會先完成 ### exec() 如果你想讓原本的process呼叫並執行一個和自己不相同的program 就可以使用[exec(3)](https://man7.org/linux/man-pages/man3/exec.3.html) 注意description的部分 > The exec() family of functions replaces the current process image with a new process image. 這個API並不是創建一個新的process 他是直接用新的process image來取代舊的而且要注意到**成功呼叫的`exec()`永遠不會return** ```clike #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> int main () { printf("Hello I am pid:%d\n", (int) getpid()); int rc = fork(); if (rc < 0) { fprintf(stderr, "fork failed\n"); exit(1); } else if (rc == 0) { printf("hello, I am (pid:%d), my rc is %d\n", (int) getpid(), rc); char *const myargs[3] = {"wc", "count.c", NULL}; execvp(myargs[0], myargs); printf("This shouldn't print out\n"); } else { int wc = wait(NULL); printf("hello, I am (pid:%d), my wc is %d, my rc is %d\n", (int) getpid(), wc, rc); } return 0; } ``` 以上的程式還是有parent process跟child process 因為有呼叫`fork()`, 事實上我們是讓`execvp()`去覆蓋掉child process :::info 以上三個API有一個十分常見的使用場景就是在shell當中 shell本身就是一個process 當使用者輸入一個command的時候 process呼叫`fork()`, 創建出一個和自己完全相同的child process 這時候再將使用者呼叫的command和參數當成parameters傳入`exec()` `exec()`所呼叫的program就會覆蓋掉這個child process 這時候原本的parent process就會`wait()`到該child process結束 ::: ## Limited Direct Execution ### Direct execution direct execution代表的是just run the program directly on the CPU 是最簡單直接的做法, 也會非常快速不過該如何從這樣的mechanism當中實現time sharing? 以下該考慮幾個問題 1. **Restricted Operation** process如果想執行例如I/O或取得更多權限等操作 OS該如何處理才能在不讓此process取得整個系統的操作權下完成這些操作呢？有個大家最熟悉的方法之一就是區分modes of execution, 也就是分出kernel mode跟user mode 如果user mode program想要執行一些較高權限的動作就得呼叫**system call**