# MIT6.s081 Lab: Xv6 and Unix utilities ###### tags: `mit6.s081` `作業系統 Operating System` 1. 編譯並透過 `qemu` 啟動 `xv6` 2. 配合系統呼叫如 `fork`、`pipe` 等,實作一些有趣的函式 ## sleep (easy) 第一個題目非常簡單,讓學生熟悉一下系統呼叫 ```cpp #include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" void main(int argc, char *argv[]) { if (argc != 2) { fprintf(2, "Usage: sleep <number>\n"); exit(1); } int n; n = atoi(argv[1]); sleep(n); exit(0); } ``` ## pingpong (easy) 要求使用 `fork` 及兩個 `pipe` ,讓父、子程序互相傳送文字,且需符合以下輸出 ``` $ pingpong 4: received ping 3: received pong $ ``` `pipe` 的架構如下,兩個程序分別從兩個 `pipe` 進行寫入及讀取,兩者的處理流程非常類似: 1. 關閉標準輸入 2. 關閉未使用到寫入端及讀取端 3. 寫入 `pong/ping` 至 `pipe` 4. 關閉寫入端發送 `EOF`,讓讀取端知道寫入結束 5. 從讀取端將寫入 `pipe` 的文字讀出 6. 輸出題目要求的文字 ![](https://i.imgur.com/zDdVOeA.png) 唯一不同的是,因為題目要求依序輸出,父程序在 `printf` 之前必須等待子程序完成,才能列印輸出文字,所以父程序在關閉寫入端之後,呼叫 `wait(&status)` 強制等待子程序結束工作,再繼續執行 ```cpp #include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" void main(int argc, char *argv[]) { int fpid, cpid, status, p1[2], p2[2]; char *readbuf = malloc(5); /* buffer to read from pipe */ pipe(p1); pipe(p2); fpid = getpid(); if ((cpid = fork()) < 0) { fprintf(2, "fork error\n"); exit(1); } if (cpid == 0) { /* child process */ close(0); close(p1[1]); close(p2[0]); write(p2[1], "pong", 4); close(p2[1]); read(p1[0], readbuf, 4); close(p1[0]); printf("%d: received %s\n", cpid, readbuf); } else { /* parent process */ close(0); close(p1[0]); close(p2[1]); write(p1[1], "ping", 4); close(p1[1]); wait(&status); read(p2[0], readbuf, 4); close(p2[0]); printf("%d: received %s\n", fpid, readbuf); } free(readbuf); exit(0); } ``` ## primes (moderate)/(hard) 要求實作一個並行的質數篩選器,範圍從 2~35,MIT 提供 [Bell Labs and CSP Threads](https://swtch.com/~rsc/thread/) 連結給我們參考,其概念簡介如下: - 每個 block 的第一個數字 `N` 一定是質數,因為除了 1 跟自己,沒有其他因數 - 剩餘的數字除上 `N`,若不可整除代表為質數的候選者,將數字向後傳遞 - 重複上述步驟,直到 block 沒有數字可以向後傳為止 ![](https://i.imgur.com/TiNYqH6.png) 我們先以 `python` 寫個簡單的實作幫助思考,用遞迴來處理 ```python def primes(nums): if not nums: return prime = nums[0] print("prime %s"%(prime)) next_nums = [num for num in nums if num % prime != 0] primes(next_nums) nums = [n for n in range(2, 36)] primes(nums) ``` 以數字範圍 2~9 為例,整個函式架構可參考下圖: - 第一個程序 `fork` 第二個子程序,將 2~9 透過 `pipe` 傳遞給第二個程序 - 第二個程序 `fork` 第三個子程序,接收第一個程序傳進來的資料,列印第一個接收到的數字,並將可能的質數候選者透過 `pipe` 傳給第三個子程序 - 第三、第四個子程序步驟同上 - 因為第一個程序必需等待所有程序完成工作,所以每一個父程序必需等待其子程序結束後,才能呼叫 `exit` 另外,如果用字串來傳遞數字,後處理會比較麻煩,mit 建議可以直接將位元寫入整數型態的 buffer >It's simplest to directly write 32-bit (4-byte) ints to the pipes, rather than using formatted ASCII I/O. ![](https://i.imgur.com/NUR4XDC.png) ```cpp #include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" void main(int argc, char *argv[]) { int fd[2]; int n, pid, status; pipe(fd); if ((pid=fork()) == 0) { close(fd[1]); pipeline(fd[0]); close(fd[0]); exit(0); } else { close(fd[0]); for (n = 2; n <= 35; n++) { write(fd[1], &n, sizeof(int)); } close(fd[1]); } wait(&status); exit(0); } ``` 但本題的實作無法像 `python` 一樣,可以直接判斷傳進來的陣列是否為空,那我們該怎麼決定遞迴的終止條件呢?參考上面的函式架構圖,當走到 `Block 4` 時,因為只有 `7` 會傳進來,沒有其他數字可以向後傳遞。MIT 的 hints 中提到 : >Hint: read returns zero when the write-side of a pipe is closed.` 假如我們嘗試接收第一個數字時,`read` 就回傳 `0`,代表前一個 block 沒有任何數字往下傳遞,函式遞迴終止。剩下的就是小心處理 file descriptor,將不需要或使用完的 `pipe` 端關閉避免洩漏 ```cpp void pipeline(int lfd) { int rfd[2]; int nb, num, pid, prime, status; nb = read(lfd, &prime, sizeof(int)); if (!nb) /* last block */ return; printf("prime %d\n", prime); pipe(rfd); if ((pid=fork()) == 0) { close(rfd[1]); pipeline(rfd[0]); close(rfd[0]); exit(0); } else { close(rfd[0]); while ((nb=read(lfd, &num, sizeof(int)))) { if (num % prime) write(rfd[1], &num, sizeof(int)); } close(rfd[1]); } wait(&status); } ``` ## find (moderate) 實作類 Unix 的 `find` 函式,其目的為找到 directory tree 中特定名稱的檔案。因為尚未講述 file system,mit 提示可以參考 `user/ls.c` 了解目錄讀取的方式,以下為 `user/ls.c` 的實作 ```cpp user/ls.c void ls(char *path) { char buf[512], *p; int fd; struct dirent de; struct stat st; if((fd = open(path, 0)) < 0){ fprintf(2, "ls: cannot open %s\n", path); return; } if(fstat(fd, &st) < 0){ fprintf(2, "ls: cannot stat %s\n", path); close(fd); return; } switch(st.type){ case T_FILE: printf("%s %d %d %l\n", fmtname(path), st.type, st.ino, st.size); break; case T_DIR: if(strlen(path) + 1 + DIRSIZ + 1 > sizeof buf){ printf("ls: path too long\n"); break; } strcpy(buf, path); p = buf+strlen(buf); *p++ = '/'; while(read(fd, &de, sizeof(de)) == sizeof(de)){ if(de.inum == 0) continue; memmove(p, de.name, DIRSIZ); p[DIRSIZ] = 0; if(stat(buf, &st) < 0){ printf("ls: cannot stat %s\n", buf); continue; } printf("%s %d %d %d\n", fmtname(buf), st.type, st.ino, st.size); } break; } close(fd); } ``` 在 `kernel/stat.h` /`kernel/fs.h` 可以找到 `struct stat` / `struct dirent` 的實作,需要用到的資訊為 `stat.type` 及 `dirent.name`,分別用來判斷檔案類型以及取得子目錄的名稱。透過呼叫 `fstat` 我們可以得知檔案的各類訊息 ```cpp kernel/stat.h struct stat { int dev; // File system's disk device uint ino; // Inode number short type; // Type of file short nlink; // Number of links to file uint64 size; // Size of file in bytes }; kernel/fs.h struct dirent { ushort inum; char name[DIRSIZ]; }; ``` 其中最重要的是以下這段程式碼,原來目錄也是 file descriptor,而其所有子目錄為一個 `stream` ```cpp while(read(fd, &de, sizeof(de)) == sizeof(de)){ if(de.inum == 0) continue; memmove(p, de.name, DIRSIZ); p[DIRSIZ] = 0; if(stat(buf, &st) < 0){ printf("ls: cannot stat %s\n", buf); continue; } printf("%s %d %d %d\n", fmtname(buf), st.type, st.ino, st.size); } ``` 有了以上的資訊,我們可以著手開始實作。這題只要搞懂如何取得子目錄,剩下的就是遞迴呼叫與路徑處理,特別注意 `.` & `..` 為每個目錄皆存在的子目錄,分別指向自己及父目錄,不可遞迴傳入 `find` ,否則將導致無窮迴圈。路徑字串處理的概念可參考下圖 ![](https://i.imgur.com/YsPomPy.png) ```cpp #include "kernel/types.h" #include "kernel/stat.h" #include "user/user.h" #include "kernel/fs.h" char* get_filename(char *path) { char *p; for (p=path+strlen(path); p>=path && *p != '/'; p--); p++; return p; } void find(char *path, char *name) { char buf[512], *p; int fd; struct stat st; struct dirent de; if ((fd=open(path, 0)) < 0) { fprintf(2, "find: cannot open %s\n", path); return; } if (fstat(fd, &st) < 0) { fprintf(2, "find: cannot stat %s\n", path); close(fd); return; } switch (st.type) { case T_FILE: p = get_filename(path); if (!strcmp(p, name)) printf("%s\n", path); break; case T_DIR: while(read(fd, &de, sizeof(de)) == sizeof(de)) { if(strlen(path) + 1 + DIRSIZ + 1 > sizeof buf){ printf("find: path too long\n"); break; } if (de.inum == 0) continue; if (!strcmp(de.name, ".") || !strcmp(de.name, "..")) continue; strcpy(buf, path); p = buf + strlen(buf); *p++ = '/'; memmove(p, de.name, DIRSIZ); p[DIRSIZ] = '\0'; find(buf, name); } break; } close(fd); return; } void main(int argc, char *argv[]) { if (argc != 3) { fprintf(2, "Usage: find <path> <name>\n"); exit(1); } find(argv[1], argv[2]); exit(0); } ``` ## xargs (moderate) 最後一題要求我們實作一個簡化的 `xargs` 函式,以下節錄自 man >xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input. Blank lines on the standard input are ignored. 這個函式的功能是從標準輸入讀取參數,並以空白或換行符號切割,將這些參數 append 到 `xargs` 命令所帶的參數之後,再執行命令列,mit 的實驗頁面給出以下範例 ``` $ echo hello too | xargs echo bye bye hello too $ ``` 因為實驗要求僅以換行符切割參數,所以這個例子等同於執行 `echo "bye" "hello too"` 此外我們無須實作 -n 旗標的功能,每次從標準輸入讀取一行之後馬上執行,可參考 mit 提供的範例 ``` $ echo "1\n2" | xargs -n 1 echo line (永遠為 -n 1,無須實作) line 1 line 2 $ ``` 所以這題只要處理好參數解析的工作,`fork` 一個子程序來執行命令列就可以了 - 注意 buffer 及擺放參數的陣列的初始化 - 依照實驗要求忽略空白行 - `argv[argc]` 必須為空指標,以下截自 C99 5.1.2.2.1 ``` 5.1.2.2.1 Program startup ... -- argv[argc] shall be a null pointer. ``` ```cpp #include "kernel/types.h" #include "kernel/stat.h" #include "kernel/param.h" #include "user/user.h" void xargs(char *argv[]) { int pid, status; if ((pid=fork()) == 0) { exec(argv[0], argv); exit(1); } wait(&status); return; } void main(int argc, char *argv[]) { int i, j; char c, buf[512], *xargv[MAXARG]; if (argc < 2 || argc - 1 > MAXARG) { fprintf(2, "usage: xargs <cmd> {args}, # of args should be less than 32\n"); exit(1); } memset(buf, 0, sizeof(buf)); for (i = 1; i < argc; i++) xargv[i - 1] = argv[i]; for (; i < MAXARG ; i++) xargv[i] = 0; j = 0; while (read(0, &c, 1) > 0) { if (c != '\n') buf[j++] = c; else { if (j != 0) { buf[j] = '\0'; xargv[argc - 1] = buf; xargs(xargv); j = 0; } } } if (j != 0) { buf[j] = '\0'; xargv[argc - 1] = buf; xargs(xargv); } exit(0); } ``` :::success == Test sleep, no arguments == $ make qemu-gdb sleep, no arguments: OK (5.0s) == Test sleep, returns == $ make qemu-gdb sleep, returns: OK (1.2s) == Test sleep, makes syscall == $ make qemu-gdb sleep, makes syscall: OK (1.8s) == Test pingpong == $ make qemu-gdb pingpong: OK (1.7s) == Test primes == $ make qemu-gdb primes: OK (2.0s) == Test find, in current directory == $ make qemu-gdb find, in current directory: OK (1.9s) == Test find, recursive == $ make qemu-gdb find, recursive: OK (2.9s) == Test xargs == $ make qemu-gdb xargs: OK (2.8s) == Test time == time: OK Score: 100/100 ::: 完整程式碼 [github](https://github.com/Chang-Chia-Chi/MIT6.s081/tree/main/Lab1:Xv6_and_Unix_utilities) 另外寫這個實驗的過程中,踩了一個字串的坑,考慮以下程式碼,其功用為刪除字串中的特定字元 ```cpp /* This is a buggy code! */ #include <stdio.h> #include <string.h> #include <stdlib.h> void squeeze(char s[], int n) { int i, j; for (i = j = 0; s[i] != '\0'; i++) { if (s[i] != n) s[j++] = s[i]; } s[j] = '\0'; } int main() { char *a = "1122332211"; squeeze(a, '3'); printf("%s\n", a); } ``` 執行時會報 `Segmentation fault`,但如果把 `char *a` 改成 `char a[]` 就可以正常執行。在 K&R 2/e 5.5 p.94 中說明了使用 `char *` 與 `char []` 初始化字串的不同處 > There is an important difference between these definitions: ``` char amessage[] = "now is the time"; /* an array */ ``` ``` char *pmessage = "now is the time"; /* a pointer */ ``` amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed but amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents. `char *` 為指標,初始化時指向儲存於 `read-only memory` 的字串(字串常量)。因此 `pmessage` 將指向一個沒有名稱的字串陣列的第一個元素,我們可以改變 `pmessage` 的指向,但無法改變指向字串的內容 ## Reference 1. [xv6: a simple, Unix-like teaching operating system](https://pdos.csail.mit.edu/6.828/2020/xv6/book-riscv-rev1.pdf) 2. [Lab: Xv6 and Unix utilities](https://pdos.csail.mit.edu/6.828/2020/labs/util.html)