--- title: Systems Programming 筆記 part 1 tags: [2024_Fall] --- Systems Programming 筆記 part 1 === 2024 Fall 台大資工系必修 老師:鄭卜壬 1. OS Concept & Intro. to UNIX 2. UNIX History, Standardization & Implementation 3. File I/O 4. Standard I/O Library 5. Files and Directories 6. System Data Files and Information 7. Environment of a Unix Process 8. Process Control 9. Signals 10. Inter-process Communication 11. Thread Programming 12. Networking # ch1 UNIX System Overview `/etc/passwd` contains login names, encrypted passwd, uid, gid, home dir, **shell program** # ch3 File I/O P.14~ ## File Descriptor ### Reference Count system file table只會在`fork`、`dup`或`dup2`才會增加,否則就算同個process開同個檔案很多次也會開很多個entry ## open, openat ### openat: `int openat(int fd, const char *path, int oflag, ... /* mode_t mode */ );` 開啟時相對fd 如果path是絕對路徑,那就和open一模一樣。但若是相對路徑,比如要創建並開啟`/tmp/abc/def.txt`,先開好`/tmp/abc`的fd,此時若別人把`/tmp/abc`刪掉並重新創建一個symbolic link指向一個不安全的路徑(比如他的資料夾),用fd存取就能避免寫入到錯誤的資料夾內。但未必是真正的atomic,因為別人還是可能在中途刪除`/tmp/abc`,並使fd變成dangling file descriptor。 >Time-of-check-to-time-of-use (TOCTTOU): >atomic,在運作時若工作目錄改變不會受影響,對安全性或效能有幫助,尤其是多執行緒。 ### mode: 只有在檔案不存在而且O_CREAT有開的時候會新建一個符合mode的檔案 ### (non-)block mode * block mode: 比如要讀10 bytes,buffer cache不夠,process就會被suspend。 * non-block mode: 盡可能讀,有可能不完整,或有`EAGAIN`或`EWOULDBLOCK`的錯誤。 在pipe、terminal特別明顯,read write時可能會比較麻煩。 在檔案操作中,**資料同步(data synchronization)** 涉及將檔案的數據從暫存區(buffer cache)寫入到磁碟或其他永久儲存設備,以確保數據的持久性。`open` 的系統呼叫中有與同步相關的選項,可以用來控制數據何時真正地被寫入磁碟,這主要影響檔案系統的可靠性和效能。 ### Data Synchronization 確保資料在每次寫入後立即存到磁碟以防遺失 system calls: * fsync(filedes): data+attr * fdatasync(filedes): data * sync * queues all kernel modified blocks(return immediately) * called by daemon & command `sync` open options: - **`O_DSYNC`**: 每次write都等I/O,只寫會影響read的元數據,比 `O_SYNC` 快一些。 - **`O_RSYNC`**: 讀之前確保要讀的部分的寫入都寫完 - **`O_SYNC`**: 每次寫入後資料會馬上寫入磁碟,降低效能但更安全。 ### Misc open always returns the lowest unopened descriptor: 回收已經close的descriptor ## lseek 移動目前的current file offset(讀寫頭),儲存在open file table * `SEEK_SET` 開頭 * `SEEK_CUR` 目前 * `SEEK_END` 結尾 ### weird things * `lseek(fd,0,SEEK_CUR)`: 取得目前current file offset * seek past the end of the file: 產生hole ## Atomic Operations 避免Time-of-check-to-time-of-use (TOCTTOU)的問題:比如兩個process都先lseek到SEEK_END再寫入,可能會出錯。 ### pread =`lseek`+`read` ### pwrite =`lseek`+`write` ## dup&dup2 ### dup Return the lowest available file descriptor. ### dup2 atomic ## fork fork後,child和parent一起跑,child可能會exec,exit後回到parent wait的地方 ## Efficiency 為什麼RealTime-(UserCPU+SysCPU)會隨著buffer size增加而增加?從buffered 1:2s, unbuffered 8192:6s **read-ahead效果**:一次一個byte UNIX會先讀後面的block,一次太多的話可能來不及。 Here's a refined and improved version of your notes with minor corrections and added clarifications: --- ![](https://hackmd.io/_uploads/B1o1LDixkx.jpg) :::spoiler AI generated notes ### Chapter 3, p.62 ~ FILE I/O - **O_SYNC** → `fsync` (file descriptors) - Forces all modified data and metadata of a file to be written to storage immediately. Equivalent to using `sync()` which also flushes the buffer cache. - **O_DSYNC** → `fdatasync` (file descriptors) - Similar to `O_SYNC`, but only data (not metadata) is synchronized, which can be more efficient. Commonly used by daemons or the `sync` command to update file changes on disk. - **daemon** | Background process that runs independently of terminal sessions. - **fcntl** | Modify file descriptor properties, duplicate descriptors, or set/get file flags. - **Flags**: - `FD_CLOEXEC` | Close file descriptor on `exec` calls, preventing leakage of descriptors to child processes. - File status flags (e.g., `O_APPEND`) modify behavior at the file level. - `procID`, `groupID`, and `file lock` allow control over process ownership and access. - **Access modes**: `O_RDONLY`, `O_WRONLY`, `O_RDWR` control read/write permissions. - **dup** | Duplicates a file descriptor. - Example: `dup2(fd, F_DUPFD)` creates a duplicate starting at a specified index. - **ioctl** | Input/Output control, often used to manage device settings, such as terminal size, or to apply additional security measures. --- ### Examples - **`/dev/fd`** | Special device file representing open file descriptors. - Example: `open("/dev/fd/0", mode) = dup(0)` replicates the standard input. --- ### Chapter 4, p.1 ~ Advanced I/O - **Slow System Calls** | Can block a process indefinitely if resources are unavailable. - Examples: Pipes, terminal input, and network operations may wait for data or user actions. - **Disk I/O** | Processes can block waiting for disk I/O indefinitely unless interrupted or the disk becomes unresponsive. - **Terminal Mode** - **Canonical Mode**: Input is buffered and processed line-by-line. - **Non-canonical Mode**: Also known as raw mode, allows character-by-character processing, commonly used for real-time applications. - **R/W from/to disk is never a slow system call** --- ### I/O Multiplexing - **Purpose**: Monitor multiple file descriptors simultaneously without being blocked by any specific one. This is essential for applications handling multiple I/O sources, like servers. - **`select` System Call** - Used to monitor multiple descriptors for readability, writability, or exceptional conditions. - Parameters: - `readfds`, `writefds`, `exceptfds` | Specify the sets of file descriptors to monitor. - Network applications often rely on `select` to handle connections or socket data. - **Macros**: - `FD_ZERO`, `FD_SET`, `FD_ISSET`, `FD_CLR` are used to manage file descriptor sets. - Example Usage: ```c ready_count = select(nfds, &readfds, &writefds, &exceptfds, &timeout); ``` - `ready_count` represents the number of ready descriptors, which the program can process sequentially. - **Timeouts**: - `select(0, NULL, NULL, &timeout)` | Sleeps for the specified time (in microseconds) without checking descriptors. - Alternative: `usleep()` and `sleep()` can also provide delays (in seconds or microseconds). - **`poll` System Call** - An alternative to `select`, where descriptors are defined in an array of `struct pollfd`. - Allows specifying event types (`POLLIN` for readable, `POLLOUT` for writable). - Example: `poll(array[], length, timeout)`. - **Variants**: - **epoll**: More efficient for a large number of descriptors, commonly used in Linux. - **pselect**: A variation of `select` with an additional signal mask for handling interruptions. ::: # ch14 Advanced I/O ## blocking vs nonblocking * blocking: not return until done * non-blocking: return immediately, return what has been done Slow system calls: may block forever * R/W on pipes(input/output may not be ready forever) * terminal devices * network devices ## I/O multiplexing 同時處理多file,可能會block(不用non-blocking 因為太耗CPU) **system call** * `select`: specify readfds, writefds, errorfds, timeout struct fd_set: FD_SET, FD_ZERO, FD_ISSET `ready_count=select(length, readfds, writefds, errorfds, timeout)` ready的東西會變成1,其他0 * `poll`: use array of pollfd(fd,events,revents). Better then `select`(reuse input, better with few high fds, more types) ## File Lock ### Ways to Lock * flock: lock整個file * fcntl: 可以lock檔案的某些bytes * lockf: built on fcntl ### Types of Lock * shared read lock * any read lock exists, deny write lock * any read lock exists, allow read lock 實務上會看有沒有 write lock,避免 starving(生動比喻: 如果遊樂園的快速通關是絕對優先的話,一般遊客都不用玩了) * exclusive write lock ### Release of lock **只要(process, file)對了就會release**,像是 ```c fd1 = open(filename,...) read_lock(fd1,...) fd2 = dup(fd1) close(fd2) ``` ```c fd1 = open(filename,...) read_lock(fd1,...) fd2 = open(filename,...) close(fd2) ``` 都會把lock release 因為實現方法是把lock記在某檔案的i-node的一個linked list,每個lock有對應的process **所以一個process開同檔案很多個fd要小心** ### Advisory vs. Mandatory * Advisory:只是建議,別人不一定要管 * Mandatory:每次read write都先檢查有沒有鎖,很耗時間 Linux and SVR3: set-group-id on, group x off # ch5 I/O Library ## buffered vs. unbuffered * File Pointers vs File Descriptors * stream * FILE object: fd, pointer, buffer, buffer type, buffer size, error/EOF flag, etc. * stdin, stdout, stderr vs STDIO_FILENO, STDOUT_FILENO, STDERR_FILENO ## system calls * `FILE *open(path,type)`: normal open * `FILE *freopen(path,type,FILE *fp)` * close fp, clear orientation(multibyte words) * `freopen(file,type,stdout)` $\approx$ `int fd=open(file,mode);dup2(fd,1);close(fd);` * `FILE *fdopen(filedes, type)`: for existing files, no truncate for w. Usually for pipes, network channels * `fileno(FILE *fp)`: get file descriptor * `fclose(FILE *fp)` * flush output * discard input * all fps are closed after the process exits * **buffer must be valid**(don't set buf to local variable) ### Buffering * fully-buffered * line-buffered * 例外:scanf after printf(without \n), will trigger I/O * unbuffered depends on the output type * stdout and stdin are fully buffered unless interactive devices(terminal): line-buffered * stderr: unbuffered * File: fully-buffered ### Synchronize Data `int fflush(FILE *fp)`: send buffer in user space to buffer cache ### Set buffer Any operations on streams will have OS allocate a buffer. * `void setbuf(*fp,*buf)` * size=BUFSIZE(defined in stdio.h) * not specify mode(depend on fp,buf) * fp=terminal: line-buffered * buf=NULL: unbuffered * `int setvbuf(*fp,*buf,mode,size)` * buf=NULL: auto * mode: \_IOFBF, \_IOLBF, \_IONBF ### Positioning * `long ftell(*fp)`: get current file offset (review=`lseek(fd,0,SEEK_CUR)`) * `int fseek(*fp,offset,whence)`: =`lseek` * `void rewind(*fp)`: =`fseek(*fp,0,SEEK_SET)` * For different types: * `off_t`: `fello`, `fseeko` * `fpos_t`: `fgetpos`, `fsetpos` ### Read/Write #### Unformatted ##### Character-at-a-time * `int getc(*fp)`: maybe a macro * `int fgetc(*fp)` * `int getchar(void)`=`getc(stdin)` -1 for EOF or error(call some function to check it) > `char` may be signed or unsigned(system dependent) > So do NOT write `char c; while((c=getchar())!=EOF)...`, just use `int` as instructed. About error and EOF: error and EOF flags are stored in `FILE` * `int ferror(*fp)` and * `int feof(*fp)`: non-zero=true, zero=false * `void clearerr(*fp)`: clear both - `ungetc(c,*fp)`: c cannot be EOF, clear EOF flag, stored in the buffer Ex: skip spaces before character: By reading and checking if is space until is not and put it back. `putc(int c,*fp)`, `fputc(int c,*fp)`, `putchar(int c)`: same as reading(all `int`) ##### Line-at-a-Time * `char *fgets(*buf,n,*fp)`: **include `\n`**, return partial line(n-1 bytes) if the line is too long * `char *gets(*buf)`: from stdin, not include `\n`, no buf size$\Rightarrow$may overflow ***unsafe*** 差別:有沒有讀**換行**、是否指定大小 write is not really "Line-at-a-Time" * `fputs(char *str,*fp)`: Output includes `\n` in the string. * `puts(char *str)`: **terminated by null**, output a `\n` at the end. ##### Direct/Binary I/O R/W some objects of a specified size alias: object-at-a-time I/O, record/structure-oriented I/O * `size_t fread(*ptr,size,nobj,*fp)`: 讀取nobj個size為bytes的object到ptr * `size_t fwrite(*ptr,size,nobj,*fp)`: 同上 缺點:Not portable,implement depends on系統,只保證同系統可以讀寫。像是可能有padding,也可能同型別具體儲存方式不一樣。 #### Formatted * `int scanf(const char *format,...)` * `int fscanf(FILE *fp, const char *format,...)` * `int sscanf(char *buf, const char *format,...)` - `int printf(const char *format,...)` - `int fprintf(FILE *fp, const char *format,...)` - `int sprintf(char *buf, const char *format,...)`: may overflow, use `snprintf` instead - `int vprintf(const char *format, va_list arg)` - `int vfprintf(FILE *fp, const char *format, va_list arg)` - `int vsprintf(char *buf, const char *format, va_list arg)` * v版本: va_list,變動參數的實作方式 * 無印: stdin, stdout * f: fp * d: fd * s: to/from `char* buf` * n: 指定長度 :::spoiler output format details `%[flags][fldwidth][precision][lenmodifier]convtype` * flag * `,`: 千位一撇 * `-`: 置左 * `+`: 顯示+ * ` `: 沒有符號則加一個空格在前面 * `#`: alternative form。如16進位加入0x、浮點數無小數部分則加入小數點 * `0`: 用0 padding * fldwidth 總寬度,會用space padding,用`*`表示在參數前指定 * precision `.number`,用`.*`表示在參數前指定(`printf("%.*f",3,a);`) * lenmodifier 略過 ::: > some functions are with **no range-checking problems**, DO NOT USE > * strcpy(char *dest, const char *src) > * strcat(char *dest, const char *src) > * gets(char *s) > * sprintf(char *buf, const char *format, …); ### Interleaved R&W restrictions Need to make R/W pointers consistent, touch current offset Output [ `fseek` | `fsetpos` | `rewind`| `fflush`] Input Intput [`fseek` | `fsetpos` | `rewind` | `EOF`] Output ```c fread(&c,1,1,fp); //fseek(fp,0,SEEK_CUR); fwrite(&c,1,1,fp); ``` Without `fseek`, the result may depend on OS. One bad solution: open 2 fp for the same fd(fdopen), where each for read and write. # ch4 Files and Directories ## information stored in i-node dump with stat, fstat, lstat `lstat` returns info about the symbolic link, rather than the reference file. | type | name | description | | --------------- | ---------- | -------------------------------- | | mode_t | st_mode | file type & mode (permissions) | | ino_t | st_ino | i-node number (serial number) | | dev_t | st_dev | device number (file system) | | dev_t | st_rdev | device number for special files | | nlink_t | st_nlink | number of links | | uid_t | st_uid | user ID of owner | | gid_t | st_gid | group ID of owner | | off_t | st_size | size in bytes, for regular files | | struct timespec | st_atim | time of last access | | struct timespec | st_mtim | time of last modification | | struct timespec | st_ctim | time of last file status change | | blksize_t | st_blksize | best I/O block size | | blkcnt_t | st_blocks | number of disk blocks allocated | ## file types * regular: binary, text * directory: only kernel can update ({filename, pointer}) * character special files: tty, audio * block special files: disks * FIFO * sockets * symbolic links check with `S_ISREG()`, `S_ISDIR`,etc. ## permission & UID/GID directory * R: list files(can only dump info) * W: update(delete, create a file(X is also required)) * X: pass through(search bit, can check each entry) file * R: read * W: write * X: execute ## set-user-id, set-group-id 執行檔把擁有者的權限借給執行者,跑起來effective user id會是file owner User/group id: * real: 真正的執行者(access是看real user) * effective: 檢查file permission是看effective * Saved set-user/group id: stored by exec (ch8 part 2) - set-user-id: 04000, `S_ISUID` in st_mode, `--s --- ---`, `S` if X is not set - set-group-id: 02000, `S_ISGID` in st_mode, `--- --s ---`, `S` if X is not set ## check permission > Q: How to share files without setting up a group? > A: Set directories `rwx--x--x`, only my friend knows the file name, other users cannot list files. > Q: 老師發現成績檔案的others write權限沒關,關完確定內容沒錯,但還是被改了?! > A: 只會在open時確認permission,所以壞學生開好後不關process,等到老師確認完再改。 1. Is effective UID is superuser? -> yes 2. Is effective UID the UID of the file? -> Check owner permission 3. Is effective GID or any of the supplementary groups the GID of the file? -> Check group permission 4. check others permission ## owner of a new file * UID=effective UID * GID= * effective GID * GID of directory * Some OS always do, some need the set-group-id of the directory, ex: `/var/spool/mail` SVR3 & Linux provide mandatory locks by **turning on set-group-id** but **turning off group-X**. Therefore, the superuser checks if some files have set-user/group-id,**especially those owned by root!** (by `find`) ## access `access(path,mode)` mode:`R_OK`,`W_OK`,`X_OK`,`F_OK`(existence) Check the real UID/GID. Ex: A set-user-id program wants to check if the real user can access the file. ## umask user file creation mask 預設新file是666,新directory是777,在process跑`umask(x)`後之後跑都會變成666&~x, 777&~x。也就是把某些mode**關掉** mask從parent inherit,改了自己的不影響parent的 **return**: previous mask ## chmod, fchmod `chmod(path, mode)` ### sticky bit 以前硬碟很慢,為了讓電腦一直執行同一個執行檔,把它sticky bit設為1,把執行檔存在swap area。現在被virtual memory取代。 **現在的用處**:如果一個directory的sticky bit被set,裡面的檔案只有在process有**寫入權限**並且符合以下其一才能**刪除或rename** * 擁有該file * 擁有該directory * 是superuser > 因此雖然`/tmp`是777,但它有sticky bit,所以不能亂刪別的user的檔案 ### security issues * Only the superuser can set sticky bit, otherwise will be turned off. * If GID of new file $\neq$ effec. GID (and is not superuser), clear set-group-id. (所以owner不會亂借權限) * Clear set-user/group-id if a normal user(non-superuser) writes to a file. #### 表示方法 `--- --- --t` 有s、t出現要特別小心 #### swap area 放進去就不會刻意刪掉,因為會盡量連續儲存,讀取會比較快。 #### fchmod 已經被open的版本,傳入file descriptor(f開頭都是這個意思) #### lchmod 不follow symbolic link,修改link檔案本身(l開頭都是這個意思) ## chown, fchown, lchown ## Limits * Compiler-time: ex: range of short * Run-time: Can be queried by process * Related to file/dir, `pathconf(path, name)` ex: maximum bytes in a filename (or `fpathconf` for filedes) * Others, `sysconf(name)` ex: maximum # of opened files per process ## truncate 把檔案的後面砍掉 `truncate(const char* pathname, off_t length)`:只留下前length bytes ## UNIX file and directory ### structure 根目錄下有各種資料夾,裝置也會是一個file,disk partition可以mount到directory Hard Drive: * partition:每個是一個file system * boot blocks * superblock: metadata * cylinder groups * superblock copy * cg info * i-node map * i-nodes * i-node * (Data block map) * Data blocks map概念:每個bit記i-node/data block是否被用 i-node要記很多東西,比如owner, type, file size, 佔了哪些block dir block: link to i-node, filename > * 邏輯大小>data block size:用lseek到很後面寫入會造成hole > * 邏輯大小<data block size:占不完整的block >root的parent是誰?implementation issue,可能是自己,可能是特殊值 #### 檔案操作 * 刪除檔案 unlink,Link count-1,在那個path不見,但相同的檔案還有可能有其他hard links * 移動檔案 把directory的data block中entry改一改就好 #### example 4.4BSD i-node * mode * owner * timestamp * size * direct blocks,每個直接指向一個block * single indirect,指向一個blcok,其中都是direct blocks * double indirect,指向一個blcok,其中都是single indirect blocks * triple indirect,指向一個blcok,其中都是double indirect blocks * **lock**: review: releasing lock only check if (proc, file). * ... > du的大小包含indirect blocks > 會把indirect cache起來,所以未必每次access block都會多走幾層 i-node number: st_ino ### hard link vs. soft link * hard link:把i-node number抄過去,不可跨fs,只有superuser可以建directory的hard link * soft link:在捷徑檔中記住絕對路徑, 可跨fs 如果dir中有個soft link連到dir,為了避免traverse中一直重複走下去,可以**記i-node**或**限制走soft link的次數(4.3BSD 8次)** 各種function不一定follow soft link,像是remove就會刪捷徑(不follow),open會開到指向的檔案(follow) ### link/unlink `int link(existing_path, new_path)` `int unlink(path)` 指hard link 把i-node hard link counts -1 常用skill:要使用暫存檔可以open後馬上unlink,雖然可以讀寫、會佔系統硬碟大小,但無法用ls找到 > 老師的反制:把原本權限沒設好的檔案unlink,創一個新的檔案同樣名稱。更謹慎的話也把修改時間竄改,讓學生看不出來。 > 不同的reference count: > * i-node hard link counts:有多少filename指向它 > * system open file table reference count:有多少file descriptor指向它(複習:dup或fork時增加,開同個檔案不變) ### remove, rename * `remove`: unlink for file, rmdir for dir * `rename`: dir to dir, file to file ### symbolic link * `int symlink(const char *actual_path, const char *sym_path)` * `int readlink(const char *pathname, char *buf, int buf_size)` Get the actual path in buf. Involves open, read, close. No `\0` at last. ### File times * `st_atime`:last access * `st_mtime`:last modify * `st_ctime`:last modify i-node(chmod, chown),沒有system call可竄改 影響時間的操作 * create、remove、rename、(un)link file會設a、m、c影響包含的directory的m、c * 原則:要改dir的dir block,因此i-node也要改大小 * 改permission、owner只會影響i-node -> c * open file: * 還沒讀寫不會有a、m * `O_CREAT`才會有a * `O_TRUNC`不需要a,只有m、c system call:`utime`改變時間 ### Functions for directory * `int mkdir(path, mode)` * `int rmdir(path)` * `DIR *opendir(path)` * `struct dirent *readdir(DIR *dp)`,dirent包含很多東西 * `void rewinddir(DIR *dp)` * `int closedir(DIR *dp)` ### ftw file tree walk `int ftw(char* dir_path, *fn(char* f_path,stat *sb,type_flag), n_openfd)` * dirpath要開的資料夾 * fn找到新檔案會呼叫的函數 * nopenfd允許開啟的fd數量,太少tree又太深的話就會只留下面幾層,每次要換目錄就得要從dirpath重新開始,浪費時間 ### chdir fchdir getcwd process有current walking directory(CWD),就是記device id和i-node number 和`umask`一樣要是built-in command,因為current walking directory和user file creation mask一樣是independently inherit from parent。所以如果是一個執行檔,執行起來只會改child process的這兩個變數。 > `chdir` follows soft link:因為讀的是檔名,從根目錄開始找i-node,「輸出」是i-node > `getcwd` doesn't follow soft link:因為只能從現在目錄的i-node traverse上去,輸出是檔名 # ch8 Process Control part 1 ```mermaid graph LR; A[fork] --> E; A --> C[child]; C --> D[exec]; D --> E[exit]; ``` ## pid 可能會開一個檔名為pid的檔案用來存tmp資料 Special process * 0: swapper * 1: user process * 2: virtual memory管理 ## Booting Bootstrapping * 關機時開機:cold boot * ROM存的code讀取要開的系統的boot program * 讀取kernel,把主控權給他 * kernel開始configure devices * 初始化系統 * 開始single-user mode(root),做一些必要的事 * 跑初始script * 開始multi-user mode ## Process Control Block (PCB) 會有個table存所有process的資訊,每個entry是一個PCB: * process state: * ready:資料在memory * waiting/block: I/O etc * Running * program counter:跑到哪個指令 * CPU register:CPU狀態(context switch 回去要能繼續跑) * CPU scheduling info\:process的優先度etc * memory-management info\:OS會教 * Accounting info:跑多久etc * I/O status info:開了哪些檔案etc sys/proc.h有一個struct proc ## Process Scheduling Queues 把PCB串起來 * job queue: 所有proc * ready queue: * device queue: 等某I/O的proc ## Process Metadata & Content 每個proc都要有 * metadata: PCB * content: virtual memory(VM) ## system call ```c #include <sys/types.h> #include <unistd.h> pid_t getpid(void) pid_t getppid(void) uid_t getuid(void) uid_t geteuid(void) gid_t getgid(void) gid_t getegid(void) ``` no error return ## Process Creation & Termination 要開process都得要fork fork()->(parent)->wait->(resume)-> \>(child)->exec->exit->把wait叫醒 wait:等小孩死掉,可能等到死 PCB有放return status,所以小孩死了PCB還不能馬上刪,如果parent死了會變殭屍孤兒 ## fork return pid call一次,在parent return一次,在child return一次 * pid 0不會出現,所以當作特別作用:此proc是小孩(可以用`ppid`得到parent id) * \>0:parent,是小孩的pid,只會在這出現一次 PCB會改所以不能直接繼承,但VM差不多所以可以 child會從fork return開始跑 process的VM會依照page table(ch7)對應到physical memory,如果fork時全都複製,cost太高。所以一個proc要改東西時才複製(copy on write) 馮諾伊曼架構:運行時不會改instruction,所以共用VM中的text會很有用 大部分東西會繼承,但有些顯然不能繼承: * return value * pid * parent pid * running time * file locks * ... ## vfork 避免fork把所有東西複製一份浪費時間,直接把parent的memory與child的共用,但現在有copy on write所以沒用了。 算是殘廢的shared memory 如果`exit`,parent的file descriptor會被偷偷關掉,所以要`_exit` > train: select->read中間要搞成non-blocking ## Deadlock 死結 兩個以上的process在等對方結束,但因此都不會結束 ## process termination C程式隨時可以`_exit`、`_Exit`(分別是ANSI、POSIX,沒什麼差)自殺,而`exit`先額外: * 會倒序呼叫exit handler * flush, close I/O > Core=memory, eg core dumped ### exit handler ```c int atexit(void (*func)(void)) ``` register一個exit handler ### 正常死亡 * exit * _exit,_Exit * return from main() ### 不正常死亡 都是signal,像是abort,system會送一個signal給proc。要自殺只能叫UNIX殺自己。 ### wait, waitpid * `pid_t wait(int *statloc)`:等其中一個小孩死 * `pid_t waitpid(pid_t pid, int *statloc, int op)`:等自己的某個小孩死 error when 沒小孩、pid非小孩 Actually, wait child process 狀態改變(正常死亡termination status),(比如suspend:一個scanf程式被跑到background,不可能讀取到) `wait3` `wait4`:讀取更多資訊(user/CPU time etc) ## Zombie & Orphan * Zombie:一個proc死掉,VM可以丟掉,但PCB要留著,等parent wait把他狀態拿走的期間。exit前parent wait,小孩短期內會是zombie * Orphan:parent死掉,會被init領養 ### How to create a background process without making a zombie? Double fork 因為UNIX希望要wait,所以直接拿child會有問題。 開一個child,專門拿來開grandchild,child生完就趕快死,這樣grandchild就被init領養,與原本parent就無關了。 (open source)應用:main立刻 ```c if(fork()>0)exit(0); ``` 則會直接在background執行 related: session group(ch9) ## Race Conditions 很多process要用同個資源,結果會depend on執行順序 Ex: parent child都每次輸出一個字元,或是child等2秒再print但parent還沒print完。 解決辦法1: ```c while(getppid()!=1)sleep(1); ``` 不太好,算busy waiting,所以需要IPC ## exec l,v選一個 * l: pass args as list(ends with `NULL(char*)0`) * v: `argv[]` as input of main 額外加e,p * e: includes environment variables(HOME, PATH, etc),其實main也可以輸入,否則要用`extern char **environ` * p: 會去PATH找執行檔的位置,允許filename是shell script `execl, execlp, execle, execv, execvp, execve` 通常只有execve是system call,其他最終都是呼叫它 ### Environment Variables shell會有個.bashrc,先設好Environment Variables則shell跑的程式都會共用 ### FD_CLOEXEC/close-on-exec Flag `exec`後,file descriptor預設是會繼承的,可以用`fcntl`用`F_GETFD` `F_SETFD`修改`FD_CLOEXEC`