2024 Fall 台大資工系必修 老師:鄭卜壬
/etc/passwd
contains login names, encrypted passwd, uid, gid, home dir, shell program
P.14~
system file table只會在fork
、dup
或dup2
才會增加,否則就算同個process開同個檔案很多次也會開很多個entry
int openat(int fd, const char *path, int oflag, ... /* mode_t mode */ );
開啟時相對fd
如果path是絕對路徑,那就和open一模一樣。但若是相對路徑,比如要創建並開啟/tmp/abc/def.txt
,先開好/tmp/abc
的fd,此時若別人把/tmp/abc
刪掉並重新創建一個symbolic link指向一個不安全的路徑(比如他的資料夾),用fd存取就能避免寫入到錯誤的資料夾內。但未必是真正的atomic,因為別人還是可能在中途刪除/tmp/abc
,並使fd變成dangling file descriptor。
Time-of-check-to-time-of-use (TOCTTOU):
atomic,在運作時若工作目錄改變不會受影響,對安全性或效能有幫助,尤其是多線程。
只有在檔案不存在而且O_CREAT有開的時候會新建一個符合mode的檔案
EAGAIN
或EWOULDBLOCK
的錯誤。在檔案操作中,資料同步(data synchronization) 涉及將檔案的數據從暫存區(buffer cache)寫入到磁碟或其他永久儲存設備,以確保數據的持久性。open
的系統呼叫中有與同步相關的選項,可以用來控制數據何時真正地被寫入磁碟,這主要影響檔案系統的可靠性和效能。
確保資料在每次寫入後立即存到磁碟以防丟失
system calls:
sync
open options:
O_DSYNC
: 每次write都等I/O,只寫會影響read的元數據,比 O_SYNC
快一些。O_RSYNC
: 讀之前確保要讀的部分的寫入都寫完O_SYNC
: 每次寫入後資料會馬上寫入磁碟,降低效能但更安全。open always returns the lowest unopened descriptor: 回收已經close的descriptor
移動目前的current file offset(讀寫頭),儲存在open file table
SEEK_SET
開頭SEEK_CUR
目前SEEK_END
結尾lseek(fd,0,SEEK_CUR)
: 取得目前current file offset避免Time-of-check-to-time-of-use (TOCTTOU)的問題:比如兩個process都先lseek到SEEK_END再寫入,可能會出錯。
=lseek
+read
=lseek
+write
Return the lowest available file descriptor.
atomic
fork後,child和parent一起跑,child可能會exec,exit後回到parent wait的地方
為甚麼RealTime-(UserCPU+SysCPU)會隨著buffer size增加而增加?從buffered 1:2s, unbuffered 8192:6s
read-ahead效果:一次一個byte UNIX會先讀後面的block,一次太多的話可能來不及。
Here's a refined and improved version of your notes with minor corrections and added clarifications:
O_SYNC → fsync
(file descriptors)
sync()
which also flushes the buffer cache.O_DSYNC → fdatasync
(file descriptors)
O_SYNC
, but only data (not metadata) is synchronized, which can be more efficient. Commonly used by daemons or the sync
command to update file changes on disk.daemon | Background process that runs independently of terminal sessions.
fcntl | Modify file descriptor properties, duplicate descriptors, or set/get file flags.
FD_CLOEXEC
| Close file descriptor on exec
calls, preventing leakage of descriptors to child processes.O_APPEND
) modify behavior at the file level.procID
, groupID
, and file lock
allow control over process ownership and access.O_RDONLY
, O_WRONLY
, O_RDWR
control read/write permissions.dup | Duplicates a file descriptor.
dup2(fd, F_DUPFD)
creates a duplicate starting at a specified index.ioctl | Input/Output control, often used to manage device settings, such as terminal size, or to apply additional security measures.
/dev/fd
| Special device file representing open file descriptors.
open("/dev/fd/0", mode) = dup(0)
replicates the standard input.Slow System Calls | Can block a process indefinitely if resources are unavailable.
Disk I/O | Processes can block waiting for disk I/O indefinitely unless interrupted or the disk becomes unresponsive.
Terminal Mode
Purpose: Monitor multiple file descriptors simultaneously without being blocked by any specific one. This is essential for applications handling multiple I/O sources, like servers.
select
System Call
readfds
, writefds
, exceptfds
| Specify the sets of file descriptors to monitor.select
to handle connections or socket data.Macros:
FD_ZERO
, FD_SET
, FD_ISSET
, FD_CLR
are used to manage file descriptor sets.ready_count
represents the number of ready descriptors, which the program can process sequentially.
Timeouts:
select(0, NULL, NULL, &timeout)
| Sleeps for the specified time (in microseconds) without checking descriptors.usleep()
and sleep()
can also provide delays (in seconds or microseconds).poll
System Call
select
, where descriptors are defined in an array of struct pollfd
.POLLIN
for readable, POLLOUT
for writable).poll(array[], length, timeout)
.Variants:
select
with an additional signal mask for handling interruptions.Slow system calls: may block forever
同時處理多file,可能會block(不用non-blocking 因為太耗CPU)
system call
select
: specify readfds, writefds, errorfds, timeoutready_count=select(length, readfds, writefds, errorfds, timeout)
poll
: use array of pollfd(fd,events,revents). Better then select
(reuse input, better with few high fds, more types)只要(process, file)對了就會release,像是
都會把lock release
因為實現方法是把lock記在某檔案的i-node的一個linked list,每個lock有對應的process
所以一個process開同檔案很多個fd要小心
FILE *open(path,type)
: normal openFILE *freopen(path,type,FILE *fp)
freopen(file,type,stdout)
int fd=open(file,mode);dup2(fd,1);close(fd);
FILE *fdopen(filedes, type)
: for existing files, no truncate for w. Usually for pipes, network channelsfileno(FILE *fp)
: get file descriptorfclose(FILE *fp)
depends on the output type
int fflush(FILE *fp)
: send buffer in user space to buffer cache
Any operations on streams will have OS allocate a buffer.
void setbuf(*fp,*buf)
int setvbuf(*fp,*buf,mode,size)
long ftell(*fp)
: get current file offset (review=lseek(fd,0,SEEK_CUR)
)int fseek(*fp,offset,whence)
: =lseek
void rewind(*fp)
: =fseek(*fp,0,SEEK_SET)
off_t
: fello
, fseeko
fpos_t
: fgetpos
, fsetpos
int getc(*fp)
: maybe a macroint fgetc(*fp)
int getchar(void)
=getc(stdin)
char
may be signed or unsigned(system dependent)
So do NOT writechar c; while((c=getchar())!=EOF)...
, just useint
as instructed.
About error and EOF:
error and EOF flags are stored in FILE
int ferror(*fp)
andint feof(*fp)
: non-zero=true, zero=falsevoid clearerr(*fp)
: clear bothungetc(c,*fp)
: c cannot be EOF, clear EOF flag, stored in the bufferputc(int c,*fp)
, fputc(int c,*fp)
, putchar(int c)
: same as reading(all int
)
char *fgets(*buf,n,*fp)
: include \n
, return partial line(n-1 bytes) if the line is too longchar *gets(*buf)
: from stdin, not include \n
, no buf sizemay overflow unsafewrite is not really "Line-at-a-Time"
fputs(char *str,*fp)
: Output includes \n
in the string.puts(char *str)
: terminated by null, output a \n
at the end.R/W some objects of a specified size
alias: object-at-a-time I/O, record/structure-oriented I/O
size_t fread(*ptr,size,nobj,*fp)
: 讀取nobj個size為bytes的object到ptrsize_t fwrite(*ptr,size,nobj,*fp)
: 同上缺點:Not portable,implement depends on系統,只保證同系統可以讀寫。像是可能有padding,也可能同型別具體儲存方式不一樣。
int scanf(const char *format,...)
int fscanf(FILE *fp, const char *format,...)
int sscanf(char *buf, const char *format,...)
int printf(const char *format,...)
int fprintf(FILE *fp, const char *format,...)
int sprintf(char *buf, const char *format,...)
: may overflow, use snprintf
insteadint vprintf(const char *format, va_list arg)
int vfprintf(FILE *fp, const char *format, va_list arg)
int vsprintf(char *buf, const char *format, va_list arg)
char* buf
%[flags][fldwidth][precision][lenmodifier]convtype
,
: 千位一撇-
: 置左+
: 顯示+
: 沒有符號則加一個空格在前面#
: alternative form。如16進位加入0x、浮點數無小數部分則加入小數點0
: 用0 padding*
表示在參數前指定.number
,用.*
表示在參數前指定(printf("%.*f",3,a);
)some functions are with no range-checking problems, DO NOT USE
- strcpy(char *dest, const char *src)
- strcat(char *dest, const char *src)
- gets(char *s)
- sprintf(char *buf, const char *format, …);
Need to make R/W pointers consistent, touch current offset
Output [ fseek
| fsetpos
| rewind
| fflush
] Input
Intput [fseek
| fsetpos
| rewind
| EOF
] Output
Without fseek
, the result may depend on OS.
One bad solution: open 2 fp for the same fd(fdopen), where each for read and write.
dump with stat, fstat, lstat
lstat
returns info about the symbolic link, rather than the reference file.
type | name | description |
---|---|---|
mode_t | st_mode | file type & mode (permissions) |
ino_t | st_ino | i-node number (serial number) |
dev_t | st_dev | device number (file system) |
dev_t | st_rdev | device number for special files |
nlink_t | st_nlink | number of links |
uid_t | st_uid | user ID of owner |
gid_t | st_gid | group ID of owner |
off_t | st_size | size in bytes, for regular files |
struct timespec | st_atim | time of last access |
struct timespec | st_mtim | time of last modification |
struct timespec | st_ctim | time of last file status change |
blksize_t | st_blksize | best I/O block size |
blkcnt_t | st_blocks | number of disk blocks allocated |
check with S_ISREG()
, S_ISDIR
,etc.
directory
file
執行檔把擁有者的權限借給執行者,跑起來effective user id會是file owner
User/group id:
S_ISUID
in st_mode, --s --- ---
, S
if X is not setS_ISGID
in st_mode, --- --s ---
, S
if X is not setQ: How to share files without setting up a group?
A: Set directoriesrwx--x--x
, only my friend knows the file name, other users cannot list files.
Q: 老師發現成績檔案的others write權限沒關,關完確定內容沒錯,但還是被改了?!
A: 只會在open時確認permission,所以壞學生開好後不關process,等到老師確認完再改。
/var/spool/mail
SVR3 & Linux provide mandatory locks by turning on set-group-id but turning off group-X.
Therefore, the superuser checks if some files have set-user/group-id,especially those owned by root! (by find
)
access(path,mode)
mode:R_OK
,W_OK
,X_OK
,F_OK
(existence)
Check the real UID/GID. Ex: A set-user-id program wants to check if the real user can access the file.
user file creation mask
預設新file是666,新directory是777,在process跑umask(x)
後之後跑都會變成666&~x, 777&~x。也就是把某些mode關掉
mask從parent inherit,改了自己的不影響parent的
return: previous mask
chmod(path, mode)
以前硬碟很慢,為了讓電腦一直執行同一個執行檔,把它sticky bit設為1,把執行檔存在swap area。現在被virtual memory取代。
現在的用處:如果一個directory的sticky bit被set,裡面的檔案只有在process有寫入權限並且符合以下其一才能刪除或rename
因此雖然
/tmp
是777,但它有sticky bit,所以不能亂刪別的user的檔案
--- --- --t
有s、t出現要特別小心
放進去就不會刻意刪掉,因為會盡量連續儲存,讀取會比較快。
已經被open的版本,傳入file descriptor(f開頭都是這個意思)
不follow symbolic link,修改link檔案本身(l開頭都是這個意思)
pathconf(path, name)
fpathconf
for filedes)sysconf(name)
把檔案的後面砍掉
truncate(const char* pathname, off_t length)
:只留下前length bytes
根目錄下有各種資料夾,裝置也會是一個file,disk partition可以mount到directory
Hard Drive:
map概念:每個bit記i-node/data block是否被用
i-node要記很多東西,比如owner, type, file size, 佔了哪些block
dir block: link to i-node, filename
- 邏輯大小>data block size:用lseek到很後面寫入會造成hole
- 邏輯大小<data block size:占不完整的block
root的parent是誰?implementation issue,可能是自己,可能是特殊值
du的大小包含indirect blocks
會把indirect cache起來,所以未必每次access block都會多走幾層
i-node number: st_ino
如果dir中有個soft link連到dir,為了避免traverse中一直重複走下去,可以記i-node或限制走soft link的次數(4.3BSD 8次)
各種function不一定follow soft link,像是remove就會刪捷徑(不follow),open會開到指向的檔案(follow)
int link(existing_path, new_path)
int unlink(path)
指hard link
把i-node hard link counts -1
常用skill:要使用暫存檔可以open後馬上unlink,雖然可以讀寫、會佔系統硬碟大小,但無法用ls找到
老師的反制:把原本權限沒設好的檔案unlink,創一個新的檔案同樣名稱。更謹慎的話也把修改時間竄改,讓學生看不出來。
不同的reference count:
- i-node hard link counts:有多少filename指向它
- system open file table reference count:有多少file descriptor指向它(複習:dup或fork時增加,開同個檔案不變)
remove
: unlink for file, rmdir for dirrename
: dir to dir, file to fileint symlink(const char *actual_path, const char *sym_path)
int readlink(const char *pathname, char *buf, int buf_size)
\0
at last.st_atime
:last accessst_mtime
:last modifyst_ctime
:last modify i-node(chmod, chown),沒有system call可竄改影響時間的操作
O_CREAT
才會有aO_TRUNC
不需要a,只有m、csystem call:utime
改變時間
int mkdir(path, mode)
int rmdir(path)
DIR *opendir(path)
struct dirent *readdir(DIR *dp)
,dirent包含很多東西void rewinddir(DIR *dp)
int closedir(DIR *dp)
file tree walk
int ftw(char* dir_path, *fn(char* f_path,stat *sb,type_flag), n_openfd)
process有current walking directory(CWD),就是記device id和i-node number
和umask
一樣要是built-in command,因為current walking directory和user file creation mask一樣是independently inherit from parent。所以如果是一個執行檔,執行起來只會改child process的這兩個變數。
chdir
follows soft link:因為讀的是檔名,從根目錄開始找i-node,「輸出」是i-node
getcwd
doesn't follow soft link:因為只能從現在目錄的i-node traverse上去,輸出是檔名
可能會開一個檔名為pid的檔案用來存tmp資料
Special process
Bootstrapping
會有個table存所有process的資訊,每個entry是一個PCB:
sys/proc.h有一個struct proc
把PCB串起來
每個proc都要有
no error return
要開process都得要fork
fork()->(parent)->wait->(resume)->
>(child)->exec->exit->把wait叫醒
wait:等小孩死掉,可能等到死
PCB有放return status,所以小孩死了PCB還不能馬上刪,如果parent死了會變殭屍孤兒
return pid
call一次,在parent return一次,在child return一次
ppid
得到parent id)PCB會改所以不能直接繼承,但VM差不多所以可以
child會從fork return開始跑
process的VM會依照page table(ch7)對應到physical memory,如果fork時全都複製,cost太高。所以一個proc要改東西時才複製(copy on write)
馮諾伊曼架構:運行時不會改instruction,所以共用VM中的text會很有用
大部分東西會繼承,但有些顯然不能繼承:
避免fork把所有東西複製一份浪費時間,直接把parent的memory與child的共用,但現在有copy on write所以沒用了。
算是殘廢的shared memory
如果exit
,parent的file descriptor會被偷偷關掉,所以要_exit
train: select->read中間要搞成non-blocking
兩個以上的process在等對方結束,但因此都不會結束
C程式隨時可以_exit
、_Exit
(分別是ANSI、POSIX,沒什麼差)自殺,而exit
先額外:
Core=memory, eg core dumped
register一個exit handler
都是signal,像是abort,system會送一個signal給proc。要自殺只能叫UNIX殺自己。
pid_t wait(int *statloc)
:等其中一個小孩死pid_t waitpid(pid_t pid, int *statloc, int op)
:等自己的某個小孩死Actually, wait child process 狀態改變(正常死亡termination status),(比如suspend:一個scanf程式被跑到background,不可能讀取到)
wait3
wait4
:讀取更多資訊(user/CPU time etc)
Double fork
因為UNIX希望要wait,所以直接拿child會有問題。
開一個child,專門拿來開grandchild,child生完就趕快死,這樣grandchild就被init領養,與原本parent就無關了。
(open source)應用:main立刻
則會直接在background執行
related: session group(ch9)
很多process要用同個資源,結果會depend on執行順序
Ex: parent child都每次輸出一個字元,或是child等2秒再print但parent還沒print完。
解決辦法1:
不太好,算busy waiting,所以需要IPC
l,v選一個
NULL(char*)0
)argv[]
as input of main額外加e,p
extern char **environ
execl, execlp, execle, execv, execvp, execve
通常只有execve是system call,其他最終都是呼叫它
shell會有個.bashrc,先設好Environment Variables則shell跑的程式都會共用
exec
後,file descriptor預設是會繼承的,可以用fcntl
用F_GETFD
F_SETFD
修改FD_CLOEXEC