# System Programming - 2 ## Files and Directories * Properties * ownership * access permission * time attributes, type, management, etc --- ### stat(), fstat(), lstat() ![](https://i.imgur.com/ocQRnsF.png =500x) ```c include <sys/types.h> include <sys/stat.h> int stat(const char *pathname, struct stat *buf); int fstat(int fd, struct stat *buf); int lstat(const char *pathname, struct stat *buf); struct stat { mode_t st_mode; // file type & mode (permissions) ino_t st_ino; // i-node number (serial number) dev_t st_dev; // device number (file system) dev_t st_rdev; // device number for special files nlink_t st_nlink; // number of links uid_t st_uid; // user ID of owner gid_t st_gid; // group ID of owner off_t st_size; // size in bytes, for regular files time_t st_atime; // time of last access time_t st_mtime; // time of last modification time_t st_ctime; // time of last file status change blksize_t st_blksize; // best I/O block size blkcnt_t st_blocks; // number of disk blocks allocated } ``` * stat(), lstat() * `lstat()` returns information about a symbolic link * `stat()` returns the file referenced by the symbolic link --- ### Mask, File Types, Access Permissions 1. **st_mode mask** ``` S_IFMT 0170000 bit mask: file type S_IFSOCK 0140000 socket S_IFLNK 0120000 symbolic link S_IFREG 0100000 regular file S_IFBLK 0060000 block device S_IFDIR 0040000 directory S_IFCHR 0020000 character device S_IFIFO 0010000 FIFO S_ISUID 04000 set-user-ID on execution S_ISGID 02000 set-group-ID on execution S_ISVTX 01000 sticky bit S_IRWXU 00700 owner: r, w, x S_IRUSR 00400 owner: r S_IWUSR 00200 owner: w S_IXUSR 00100 owner: x S_IRWXG 00070 group: r, w, x S_IRGRP 00040 group: r S_IWGRP 00020 group: w S_IXGRP 00010 group: x S_IRWXO 00007 other: r, w, x S_IROTH 00004 other: r S_IWOTH 00002 other: w S_IXOTH 00001 other: x ``` 2. **File Types** * Types * Regular Files : text, binary, etc * Directory Files : { (filename, pointer) }, only kernel can update the info * Character Special Files : tty, audio * Block Special Files : disk * FIFO : named pipes * Sockets : the type of file used for network communication between processes * Symbolic Links : the type of file that points to another file * Macros <sys/stat.h> * The argument to macros : st_mode ``` S_ISREG() // regular file S_ISDIR() // directory file S_ISCHR() // character special file S_ISBLK() // block special file S_ISFIFO() // pipe or FIFO S_ISLNK() // symbolic link S_ISSOCK() // socket example: #define S_IFMT 0xF000 // type of file #define S_IFDIR 0x4000 // directory #define S_ISDIR(mode) ((mode & S_IFMT) == S_IFDIR) ``` 3. **Access Permissions** * Operations * Directory * X: access the i-node of files in a dir * e.g., open a file ("/usr/include/stdio.h") * need `x` of “/“, “/usr”, “/usr/include” * R: read a dir and list all filenames in a dir * W: create or delete a file * create a file : need `w+x` (dir) * delete a file : need `w+x` (dir), file's permission doesn't matter * e.g., everyone has `w+x` for /tmp, but cannot delete any file in /tmp (sticky bit, [here](#Sticky-bit-S_ISVTX-01000)) * File * X : execute a file, `exec()` * R : `O_RDONLY`, `O_RDWR` for `open()` * W : `O_WRONLY`, `O_RDWR`, `O_TRUNC` for `open()` ### UID/GID ![](https://i.imgur.com/Gm5rtXf.png =500x) * A process could have more than one ID. * ID type * Real UID/GID * from /etc/passwd * Effective UID/GID, Supplementary GID’s * determine file access permissions for processes * whether effective UID/GID == st_uid/st_gid (at least one applies) * when a program/file is executed(no set-uid/set-gid) * the process's effective UID/GID = real UID/GID * each time a process executes, creates, opens, or deletes a file, the kernel perform the File Access Test. Check from 1~4, and the access is granted if one applies. 1. effective UID == 0 : superuser 2. effective UID == st_uid (UID of the file) 3. effective GID or any of its supplementary group ID == st_gid (GID of the file) 4. performs the access permission check for other * Saved Set-User/Group-ID * copies of the effective UID/GID * saved by exec() * Set-User-ID (setuid), Set-Group-ID (setgid), sticky bit * setuid, setgid: if the bit is set to on, when this file is executed * the process's effective UID/GID = the owner of the file(st_uid/st_gid) * S_ISUID/S_ISGID is set * S_ISUID (04000) on st_mode * Symbolic : `--s --- ---` * S_ISGID (02000) on st_mode * Symbolic : `--- --s ---` * e.g., setuid/setgid program allows normal users to have root permission to update /etc/passwd * Sticky bit (01000) on st_mode * Symbolic : `--- --- --t` * will be mentioned [here](#Sticky-bit-S_ISVTX-01000) * Example 1 : ```bash $ ls -alt /usr/bin/passwd -rwsr-xr-x 1 root root 25692 May 24 ... ``` * Example 2 : ![](https://hackmd.io/_uploads/rJelduiwG6.png =500x) * if permission of File A1 = 0755 * All of the IDs are B; the process cannot read File A2. * If File A2’s permission is 0644, the process can read it. * if permission of File A1 = 4755 * Real User/Group ID is B. * Effective User ID is A. * The process can read File A2 * Create a new file * UID of a file (st_uid) = the effective UID of the process * GID of a file (st_gid) (POSIX.1 allows one of these two) = 1. the effective GID of the process 2. the GID of directory in which the file is being created (if meet the following conditions) * Linux supports 1 and 2, FreeBSD and Mac OS X supports 2 * If 2 is used, it assures all files and dirs in a dir have the same GID as a dir --- ### access(), umask() ```c #include <unistd.h> // 0: OK, -1: on error int access(const char *pathname, int mode); ``` * mode : * R_OK : test for read permission * W_OK : test for write permission * X_OK : test for execute permission * F_OK : test for existence of file * check if the real UID/GID has access to a file ```c #include <sys/stat.h> mode_t umask(mode_t cmask); ``` * return : the previous value of the mask * Turn **off** the file mode(st_mode) * cmask = bitwise-or S_IRUSR, ... * file mode creation mask: per process * child's inherited from the parent ([here](#fork-vs-exec)) * change the mask has no affection for the mask of others * A built-in command in a shell * Example : ```bash // example 1 $ umask 0x22 $ created file: 0x777 result: 0x755 // a.out source code umask(0); creat("foo", RWRWRW); // create -rw-rw-rw for "foo" umask(S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH); creat("bar", RWRWRW); // create -rw-rw-rw for "bar" ------------ $ umask 002 $./a.out $ ls -l foo bar -rw-rw-rw 1 sar 0 Dec 7 21:20 foo -rw------ 1 sar 0 Dec 7 21:20 bar $ umask 002 // No affection for the mask of its parent ``` --- ### chmod(), fchmod() ```c #include <sys/types.h> #include <sys/stat.h> // 0: OK, -1: on error int chmod(const char *pathname, mode_t mode); int fchmod(int fd, mode_t mode); ``` * mode : bitwise-or S_IRUSR, ..., S_ISVTX (sticky bit), S_IS[UG]ID, etc * chmod() updates on i-nodes * Callers must be a superuser or effective UID = file UID * chmod() automatically clears two permission bits under the conditions * Set the sticky bit (S_ISVTX) on a regular file with no superuser privileges * the sticky bit in the mode is automatically turned off * GID of a newly created file != the effective GID (or supplementary GID) of the calling process, or the process != superuser * clear the set-group-ID bit * a non-superuser process writes to a set-uid/gid file * Clear up set-user/group-ID bits --- ### Sticky bit (S_ISVTX, 01000) * executable file with S_ISVTX : * Used to save a copy of a S_ISVTX executable in the swap area to speed up the execution next time, when the process terminates * Not needed for a system with a virtual memory system and fast file system * no effect in modern system * directory with S_ISVTX : * User needs both permissions to remove or rename the file in the directory * has `w` for the directory * And one of the following * User is superuser (root) * Owns the file * Owns the directory (if the file type is dir) * Symbolic : `--- --- --t` * Example : ```bash $ ls -ld /tmp drwxrwxrwt 4 root sys 485 Nov 10 06:01 /tmp ``` --- ### chown(), fchown(), lchown() ```c #include <sys/types.h> #include <unistd.h> // 0: OK, -1: on error int chown(const char *pathname, uid_t owner, gid_t group); int fchown(int filedes, uid_t owner, gid_t group); int lchown(const char *pathname, uid_t owner, gid_t group); ``` * change a file's UID/GID * pathname is symbolic link * lchown(): change symbolic link file itself * chown(): dereference the symbolic link * owner or group = -1: that ID is not changed. * ability to change * UID: with the CAP_CHOWN capability, e.g., superuser(root) * GID: * with the CAP_CHOWN capability * owner of the file may change the group to any group which it is a member * set-user/group-ID bits would be cleared if chown is called by nonsuper users. --- ### sysconf(), pathconf(), fpathconf() --- Run-Time Limits ```c #include <unistd.h> long sysconf(int name); long pathconf(const char *pathname, int name); long fpathconf(int *filedes, int name); ``` * sysconf() * name : _SC_CHILD_MAX, _SC_OPEN_MAX, etc. * fpathconf() * name : _PC_LINK_MAX, _PC_PATH_MAX, _PC_PIPE_BUF, _PC_NAME_MAX, _PC_SYMLINK_MAX, etc. * Return –1 and set errno if any error occurs * EINVAL if the name is incorrect. --- ### File Size * File Sizes --- st_size, only meaningful for the following * Regular files – 0~max (off_t) * Directory files – multiples of 16 or 512 * Symbolic links – pathname length * File Holes The memory is allocated as multiples of blocks to a file (a file may not use up the whole block if `st_size != multiples of st_blksize`) * st_blksize: preferred block size for efficient filesystem I/O * st_blocks: number of blocks allocated to the file * st_size: file character size If a file has a hole (e.g., by lseek()), `st_size > st_blocks * st_blksize` (the hole is filled with null bytes) --- ### truncate(), ftruncate() ```c #include <sys/types.h> #include <unistd.h> int truncate(const char *pathname, off_t length); int ftruncate(int fd, off_t length); ``` * file must be writable * size not equal: * file size > length: truncate * file size < length: file size increase, create hole --- ### File System ![](https://hackmd.io/_uploads/S11SVg6zT.png) ![](https://i.imgur.com/gJcnH3Z.png) ![](https://hackmd.io/_uploads/SkbqVe6fa.png =350x) ### i-node and data blocks ![](https://hackmd.io/_uploads/ByBLHe6z6.png) ![](https://i.imgur.com/1lvKMef.png =500x) ![](https://i.imgur.com/Iqu9twM.png =300x) ![](https://i.imgur.com/5U5B12L.png =500x) * i-node (fixed size) : * i-node size: * Version 7: 64B, 4.3+BSD:128B, S5:64B, UFS:128B * File type, access permission, file size, data blocks, link count, etc. * Predefined number of files that can be created * It can happen that there is enough size, but i-node table is full. * ls –i filename (show i-node number) * Link count --- hard link * How many pointers from files in directories to a specific i-node * only be deleted if the link count = 0 * contained in st_nlink in stat * LINK_MAX: maximum value for link count * Moving files among directories * Move within the partition : * only directory block is updated (add new entry and unlink old entry) * link count remains the same * Move between the partitions : * the files are moved pnysically (move data block) ### Hard Link vs. Soft Link ![](https://hackmd.io/_uploads/H1-AkSpzT.png =300x) ![](https://hackmd.io/_uploads/Syb1mSTMa.png =300x) * Hard Link * Filesize = data size * Is a different name for the same set of data blocks * limitation * require both pathnames and link to be on the same file system * only superuser can link/unlink to a directory * Soft Link (Symbolic Link) * Filesize = pathname length * Can be a directory or file * Is a pointer to a set of data blocks * File type is S_IFLINK * relative path: * the path name is relative to the directory containing the symbolic link ![](https://hackmd.io/_uploads/SJGkfH6GT.png =500x) * Problem 1 : dangling pointers * Example : if `/usr/joe/foo` was deleted, `/usr/sue/bar` becomes a dangling pointer * Problem 2 : infinite loop ![](https://i.imgur.com/o8HpHPl.png =100x) * functions follow symbolic link or not: ![](https://hackmd.io/_uploads/BkKEgHTza.png =300x) --- ### link(), unlink(), rename(), remove() ```c int link(char *existingpath, char *newpath); ``` * return * 0: OK, -1: on error, if newpath already exists * need permission `w+x` for the directory * create a new hard link * atomic operation : creation of the new dir entry and increment of the link count ```c int unlink(char *pathname); ``` * need permission `w+x` for the directory * sticky bits was set for a residing dir: * need the same permissions as delete files in dir (see sticky bit) * pathname is a symbolic link: * only unlink the symbolic link itself * Checking if any process has the file open * If open, the link is removed, but the file is delayed to be deleted until all references to it have been closed ```c int remove(const char *pathname); ``` * pathname is a file : unlink * pathname is a dir : rmdir (ANSI C) ```c int rename(const char *oldname, const char *newname); ``` * need `w+x` for the directories containing oldname and newname * condition: * oldname: file * newname exists and not a dir: file(newname) removed * newname exists and is a dir: error * oldname: dir * newname exists and is an empty dir: file(newname) removed * newname exists and is a file, or oldname is a prefix of newname: error * e.g., `"/usr/foo"` is a prefix of `"/usr/foo/testdir"` --- ### symlink(), readlink() ```c int symlink(const char *actualpath, const char *sympath); ``` * actualpath does not need to exist * actualpath and sympath can be in different file systems ```c int readlink(const char *pathname, char *buf, int bufsize); ``` * read the contents of the symbolic link into buf * readlink is an action consisting of open, read, and close – content put in buf are not null terminated * `readlinkat()`, `symlinkat()` * analogous to `open()` v.s. `openat()` --- ### File Times, utime(), utimes() 1. **File Times** | Field | Description | Example | ls-option | | -------- | ----------------------- | ------------ | --------- | | st_atim | last-access-time | read | -lu | | st_mtim | last-content-modification-time | write | -l | | st_ctim | last-i-node-change-time | chmod, chown | -lc | * `access()` and `stat()` don't change file time * changing the access permissions, user ID, link count, etc, only affects the i-node (st_ctim) * ctime is modified automatically! (stat & access are for reading) * effect of functions on file times: ![](https://hackmd.io/_uploads/SyVfFBafT.png) 2. **utime()** ```c #include <sys/types.h> #include <utime.h> int utime(const char *pathname, const struct utimbuf *times); int utimes(const char *pathname, const struct timeval times[2]); struct utimbuf { time_t actime; time_t modtime; } // time_t: number of seconds since 1970 Jan 1 00:00:00 struct timeval { time_t tv_sec; long tv_usec; } ``` * utimes() * finer grained resolution than utime() * `times[0]` = access time * `times[1]` = modification time * time values are in seconds since the Epoch * times : * == null : set as the current time * (Effective UID = file UID) or `w` right to the file * != null : set as requested * (Effective UID = file UID or superuser) and `w` right to the file --- ### mkdir(), rmdir(), opendir(), readdir(), rewinddir(), closedir() 1. **mkdir()** ```c #include <unistd.h> // 0: OK, -1: on error int mkdir(const char *pathname, mode_t mode); ``` * umask, UID/GID setup (works the same as file) * . and .. are automatically created 2. **rmdir()** ```c #include <unistd.h> // 0: OK, -1: on error int rmdir(const char *pathname); ``` * delete an empty directory (space freed) if * link count of a dir = 0 * no other process has a dir open * if not the above * ENOTEMPTY, EEXIST 3. **opendir(), readdir(), rewinddir(), closedir()** ```c #include <sys/types.h> #include <dirent.h> // returns pointer, -1: on error DIR *opendir(const char *pathname); // returns pointer, NULL: at the end of the dir on error struct dirent *readdir(DIR *dp); // 0: OK, -1: on error void rewinddir(DIR *dp); int closedir(DIR *dp); struct dirent { ino_t d_ino; /* not in POSIX.1 */ char d_name[NAME_MAX+1]; } // implementation dependent ``` * Only the kernel can write to a dir * `w+x` for creating/deleting a file * ftw()/nftw() * recursively traversing the file system --- ### chdir(), fchdir(), getcwd() a process has a current working directory - where all relative pathnames begin (per process) 1. **chdir(), fchdir()** ```c #include <unistd.h> // 0:OK, -1: on error int chdir(const char *pathname); int fchdir(int fd); ``` * chdir : must be built into shells * The kernel only maintains the i-node number and dev ID for the current working directory 2. **getcwd()** ```c #include <unistd.h> char *getcwd(char *buf, size_t size); ``` * The buffer must be large enough, or an error returns * chdir follows symbolic links, and getcwd has not idea of symbolic links --- ### st_ino (I-node number) * Each file has a unique i-node number (index number) * The i-node number can be used to look up a file’s information (i-node) in a system table (the i-list) * A file's i-node contains : * user and group ids of its owner * permission bits * etc --- ## Process Control - user process v.s. kernel process ![](https://hackmd.io/_uploads/S1GT_SCzT.png =300x) - process memory layout ![](https://hackmd.io/_uploads/Bk5sv9AGT.png) - PID: non-negative integer - 0: swapper or scheduler/idle process * kernel process * no program on the disk corresponds to this process - 1: init process * user process with superuser * bring up the Unix system after the kernel has been bootstrapped * initialize system services, login processes, etc * run init scripts - 2: pagedaemon or kthreadd * kernel process * page the virtual memory system - bootstrap - computer power on - CPU executes the firmware from ROM - firmware(BIOS/UEFI) initialize hardware(e.g., RAM) - loads software from the storage device(boot partition in hard drive) to RAM ![](https://hackmd.io/_uploads/S1bFBYAfa.png =300x) - CPU executes loaded programs(bootloader(grub, u-boot)) - bootloader loads OS to RAM and more bootstrap - OS bootstrap more hardware - OS create pid 1, 2 - the system is brought up for multi-user operation ![](https://hackmd.io/_uploads/S1f3SY0M6.png =170x) - process control block (PCB) - each process is represented in the OS kernel by PCB ![](https://hackmd.io/_uploads/r109OFRfp.png =100x) - process scheduling ![](https://hackmd.io/_uploads/rymLKYCM6.png) ![](https://hackmd.io/_uploads/rJIuKFRza.png) ### getpid(), getppid() ```c #include <unistd.h> int getpid(void); int getppid(void); ``` * return: - getpid(): PID of calling process - getppid(): PID of parent process of calling process - OS kernel track parent process for each process except for pid 0, 1, 2 ### fork(), vfork() ![](https://hackmd.io/_uploads/Bk_ljt0M6.png) ![](https://hackmd.io/_uploads/SyOZoY0MT.png =500x) ```c #include <unistd.h> pid_t fork(void); ``` ![](https://hackmd.io/_uploads/HkFl0t0fT.png) - return: - 0: if in the child - pid of child: if in the parent - -1: on error - can create many child, but only single parent - parent v.s. child memory - parent and child share the same memory address layout (discuss later) - child get a copy of parent's data, text segment, heap, stack - a copy or share - copy: data, heap, stack - share: text segment - share the same file offset ![](https://hackmd.io/_uploads/r1Y-X5Czp.png) - when child terminates, any shared fd's offsets will update accordingly - close() in parent or child neither interferes with the other's open fds - cannot predict whether child or parent to run first - decided by the kernel scheduler - if synchronization is needed, `sleep()`...etc - reason for fork() to fail - too many processes in the system - total number of process for a real UID > `CHILD_MAX` (system's limit) ```c #include <unistd.h> pid_t vfork(void); ``` - return: - same as `fork()` - optimization of `fork()` - share the address space until changes are required - the same as `fork()` except - runs the same address space until child calls `exec()` or `exit()` - child always runs first - parent is blocked until child calls `exec()` or `exit()` - modern Unix systems empolys copy-on-write (COW) - if `fork()` already use COW, `vfork()` does not add much performance gain ![](https://hackmd.io/_uploads/BkYqp5Cza.png =300x) - behavior of `vfork()` is undefined if - child modify any data except the variable - used for return value from `vfork()` - makes function calls - return without calling `exec()` or `exit()` - example: ```c // fork() if ((pid = fork()) < 0) { err_sys("fork error"); } else if (pid == 0) { /* child */ globvar++; /* modify variables */ var++; } else { /* parent */ sleep(2); /* we don’t flush stdout */ } printf("pid = %ld, glob = %d, var = %d\n", (long)getpid(), globvar, var); output: (copy of memory) pid = 430, glob = 7, var = 89 pid = 429, glob = 6, var = 88 // vfork() if ((pid = vfork()) < 0) { err_sys("vfork error"); } else if (pid == 0) { globvar++; var++; _exit(0); } printf("pid = %ld, glob = %d, var = %d\n” (long)getpid(), globvar, var); output: (shared memory address) pid = 29039, glob = 7, var = 89 ``` ### exit(), atexit() ![](https://hackmd.io/_uploads/rJt3tE17a.png) - process termination - normal termination - return from main() - call `exit()`, `_exit()` or `_Exit()` - return of the last thread from its start routine (later) - calling pthread_exit from the last thread (later) - abnormal termination - call `abort()` (generating the `SIGABRT` signal) ([here](#abort-sleep)) - terminated by a signal ([here](#Signal)) - response of the last thread to a cancellation request (later) - kernel eventually deallocate all process' memory, closes process' open fd, etc. - parent call `wait()` to reap the resources of child - kernel will notify parent by sending `SIGCHLD` ```c #include <stdlib.h> (specified in ISO C) void exit(int status); void _Exit(int status); #include <unistd.h> (specified in POSIX.1) void _exit(int status); ``` - status - transmitted by the OS kernel to parent to tell it how child is terminated - normal exits: status set to 0 - cleanup tasks - `_Exit()` and `_exit()` does not perform - `exit()` performs - calls all exit handlers - clean shutdown of the standard I/O library: `fclose()` all open I/O streams ```c #include <stdlib.h> // Returns 0 if OK, nonzero on error int atexit(void (*func)(void)); ``` - register `func` as exit handlers - same exit func can be registered for several times - handlers will be called in reverse order of their registration - child inherit copies of parent's registrations - removed when `exec()` ### wait(), waitpid() ```c #include <sys/wait.h> // Both return: process ID if OK, 0, or -1 on error pid_t wait(int *statloc); pid_t waitpid(int pid, int *statloc, int options); ``` - pid: - pid < -1: wait for any child whose GID == |pid| - pid = -1: wait for any child, `wait(&status) == waitpid(-1, &status, 0)` - pid > 0: wait for pid - pid = 0: wait for any child whose GID = parent's GID - options - `WCONTINUED`, `WNOHANG` (nonblocking), `WUNTRACED` - wait for state changes of child - child terminated, child stopped by a signal, child resumed by a signal - change `statloc` to retrieve status value - set `statloc = NULL`: ignore status - process calls `wait()`, `waitpid()` - block: if all children are still running - return immediately with status value: any child has changed state - return immediately with an error: if no child - `wait()` v.s. `waitpid()` - `wait()`: block until a child changes - `waitpid()`: block until a specific child change (modifiable, e.g., nonblocking, wait for any child, set via `pid` and `options`) ### zombie and orphan process ![](https://hackmd.io/_uploads/r1KYFHk7p.png) - zombie - child has been terminated, but parent has not yet waited for it (zombie state) - do no use lots of memory, but consume PIDs - PIDs are limited resources - orphan - child remains running, while parent has terminated - OS kernel set parent of orphan = init process (pid == 1) - child of init never become zombie - init calls one of the `wait()` to fetch the status ### exec() ```c #include <unistd.h> // All return -1 on error; no return on success int execl(const char *pathname, const char *arg0, … /* (char *)0 */ ); int execv(const char *pathname, char *const argv[]); int execle(const char *pathname, const char *arg0, … /* (char *)0, char *const envp[] */ ); int execve(const char *pathname, char *const argv[], char *const envp[]); int execlp(const char *filename, const char *arg0, … /* (char *)0 */ ); int execvp(const char *filename, char *const argv[]); ``` - call `exec()` to execute a program - `exec()` replace process' text(code), data segments(global variables), heap, and stack - PID not changed - new program execute `main()` - `l, v, p, e` after the function name: - l: list, list of arguments terminated by a null pointer - e.g., `execl(“/bin/ls", "ls", "-l", NULL);` - v: vector, arguments passed via an array which terminated by a null pointer - e.g., ```c char *cmd = “/bin/ls”; char *args[] = {cmd, “-l”, NULL}; execv(cmd, args); ``` - e: environment, environ variable for the new process - e.g., ```c char *cmd = “/bin/ls”; char *args[] = {cmd, “-l”, NULL}; char *env[] = {“PATH=/bin”, “HOME=/home/user”, NULL}; execve(cmd, args, env); ``` - p: path, if `filename` has no slash(/), search pathnames in the `PATH` environment variable; if `filename` has slash(/), ignore - e.g., ```c char *cmd = “ls”; char *args[] = {“ls”, “-l”, NULL); execvp(cmd, args); // if filename is not an executable, execute /bin/sh ``` ### fork() v.s. exec() ![](https://hackmd.io/_uploads/r1ouupyma.png) ![](https://hackmd.io/_uploads/SypoO6Jm6.png) ### pipe - Inter-Process Communication (IPC): to solve race conditions between multiple processes ![](https://hackmd.io/_uploads/H1Z5e0kX6.png) ```c #include <unistd.h> // Returns 0 if OK, -1 on error int pipe(int fd[2]); ``` - a pipe has a read end and a write end - write to write end -> buffered by the kernel -> read from the read end' ![](https://hackmd.io/_uploads/BkueLAyQT.png =150x) - returned `fd[2]`: - `fd[0]`: read end - `fd[1]`: write end - output of `fd[1]` = input for `fd[0]` - file type of them are FIFO - accessing closed fds of pipes: - write end closed: read from a pipe will see the EOF (returns 0) - read end closed: write to a pipe triggers `SIGPIPE` signal sent to the process - if `SIGPIPE` is ignored, `write()` return -1, errno = `EPIPE` - pipe capacity: - `PIPE_BUF`: kernel's pipe buffer size, 4096 bytes in Linux - write data size <= `PIPE_BUF` bytes: data is contiguous (atomic) - write data size > `PIPE_BUF` bytes: data is interleaved with other writes - `pipe()` and then `fork()` ![](https://hackmd.io/_uploads/HkgESCkm6.png =500x) - cause blocking: - read from an empty pipe: block until data is available - write to a full pipe: block until sufficient data has been read from the pipe - I/Os on pipes are slow system calls: cannot tell when the pipe is ready for reading or writing - two limitations: - half duplex: data flow in one direction - can only be used between processes that have the same ancestor: `fork()` from the same parent who creates the pipe - solution: - socket (stream pipes): address both - FIFOs: address the second ### FIFO - a kind of file - also called named pipes - `S_ISFIFO`: used for check by stat() - data kept internally in the kernel (no write to file system): FIFO file has no contents ```c #include <sys/types.h> #include <sys/stat.h> // Returns 0 if OK, -1 on error int mkfifo(const char *path, mode_t mode); ``` - path: - name of FIFO - mode: - FIFO's permissions, same as `open()` - can use file system-related I/O (e.g., `write()`, ...) on FIFO - must be opened on both read write end, blocked until the other is opened - read end: open with `O_RDONLY` - write end: open with `O_WRONLY` - if a process wirte to FIFO which has no readers, signal `SIGPIPE` sent to the process - FIFO and `O_NONBLOCK` - open without `O_NONBLOCK`: - open for read blocks until another process open for write - open for write blocks until another process open for read - open with `O_NONBLOCK`: - open for read returns immediately - open for write ruturns error = `ENXIO` if no process open for read ## Signal ![image](https://hackmd.io/_uploads/S1_JSV4La.png =500x) - defined by a name(begins with `SIG`) and a number(positive integer constants) - signal types ![image](https://hackmd.io/_uploads/SyRdHN4Up.png) - terminal-generated - hardware exceptions - software - the process can tell the kernel to do one of these dispositions(actions): ignore, catch with own handler, perform default actions(default handler) ![image](https://hackmd.io/_uploads/BkYKH4VIp.png) - the process can receive signal at any place in the program - `SIGKILL`, `SIGSTOP` cannot be ignored or caught by the program's own handler - core dump file - generated when the program crashes or exits abnormally, used for debugging - not generated if - set-UID/GID process: real UID/GID != program file’s UID/GID - no write permission to the directory - the generated core dump file is too big and the file system has no space left - signal inheritance ![image](https://hackmd.io/_uploads/Byf5HNNUa.png) - interrupted system calls - signal can be delivered when making a system call, e.g., - the system call returns an error, errno = `EINTR` - the system call is automatically restarted after the signal handler returns if the SA_RESTART flag is set(`sigaction()`) - pending and blocking ![image](https://hackmd.io/_uploads/Bkq5SVNIa.png =450x) - pending, delivered: - pending: the signal has not been caught. - delivered: has been caught - a process has the option of blockng signal delivery; if the signal is blocked and the disposition is not ignore, the signal remains pending until - the process unblocks the signal - disposition becomes ignore - `sigpending()` can determine which signals are blocked and pending - if a signal is generated more than once before being unblocked - POSIX.1 allows to deliver more than once (signals are queued) - Linux does not queue the same signals - if more than one signal is ready to be delivered to a process - POSIX.1 does not specify the order for signal delivery - POSIX.1 suggests that signals related to the current state of a process (e.g., `SIGSEGV`) should be delivered first ### signal() ```c #define SIG_ERR (void (*)())-1 #define SIG_DFL (void (*)())0 #define SIG_IGN (void (*)())1 #include <signal.h> typedef void (*sighandler_t) (int); // we typedef a new data type of signal handlers sighandler_t signal(int signum, sighandler_t handler); ``` - return: - old disposition of the `signum` if OK - `SIG_ERR` on error - `handler` (disposition of the `signum`): - function pointer - `SIG_IGN`: ignore the signal - `SIG_DFL`: signal handled with default actions - `SIGKILL` and `SIGSTOP` cannot be caught or ignored ### Reentrant Functions - the signal handler cannot tell where the process was executing before the signal was caught - a function could be called twice, once before the signal occurs, and once by the signal handler - reentrant functions: functions that can be safely called recursively - make sure the signal handler is reentrant - non-reentrant functions cases: - use static or global variables & data structures - calls `malloc()` or `free()`: both functions use a static global data structure to track what memory blocks are - modifies the errno without backups - calls non-entrant functions - reentrant functions specified by the Single Unix Specification ![image](https://hackmd.io/_uploads/rymjS4NLT.png) ### Signal Sets ```c #include <signal.h> // All return: 0 if OK, -1 on error int sigemptyset(sigset_t *set); int sigfillset(sigset_t *set); int sigaddset(sigset_t *set, int signo); // Return 1 if true, 0 if false, -1 on error int sigismember(const sigset_t *set, int signo); typedef struct { unsigned long sig[_NSIG_WORDS]; } sigset_t ``` - defined to manipulate signal sets (do not manipulate directly) - initialize a signal set: `sigemptyset()`, `sigfillset()` - add or delete a signal to an existing set: `sigaddset()`, `sigdelset()` - test if a signal is in a set: `sigismember()` - POSIX.1 defines the data type `sigset_t` to contain a signal set ### Signal Mask - defines the set of signals currently blocked from delivery to that process ```c #include <signal.h> // Returns 0 if OK, -1 on error int sigprocmask(int how, const sigset_t *set, sigset_t *oset); ``` - a process can get and change its signal mask by `sigprocmask()` - `how`: - `SIG_BLOCK`: new mask = current mask ∪ `set` - `SIG_UNBLOCK`: new mask = current mask - `set` - `SIG_SETMASK`: new mask = `set` - `set`, `oset` - if `oset` != NULL: previous value of the signal mask is stored in `oset` - if `set` == NULL: the signal mask is unchanged and `how` is ignored - if `set` != NULL: the `how` indicates how the current signal mask is modified - if any unblocked signals are pending, at least one of these signals is delivered to the process before `sigprocmask()` returns ### sigpending(), kill(), raise() ```c #include <signal.h> // Returns 0 if OK, -1 on error int sigpending(sigset_t *set); ``` - returns the signals that are currently pending for the calling process - update the set of signals in `set` - return error if set points to an invalid memory address ```c #include <signal.h> // Returns 0 if OK, -1 on error int kill(pid_t pid, int signo); int raise(int signo); ``` - `kill()`: sends a signal to a process or a group of processes - `pid`: - `pid` > 0: to the process pid - `pid` == 0: to all processes with the same GID of the calling process (excluding process pid = 0, 1, 2) - `pid` < 0: to all processes with gid == |pid| - `pid` == -1: to all processes (broadcasted) - `signo`: - `signo` == 0 (POSIX null signal), no actual signal is sent - can used to test whether a specific process exist - not exist: return -1, `errno` = `ESRCH` - not atomic: by the time that kill() returns, the process might have exited - permission - superuser can send a signal to any process - caller's real or effective UID == receiver's real or effective UID - if support `_POSIX_SAVED_IDS`: check caller's saved-set-UID instead of its effective UID - `raise()`: sends a signal to itself (caller) - `raise(signo) == kill(getpid(), signo);` ### alarm(), pause() ```c #include <unistd.h> unsigned int alarm(unsigned int seconds); ``` - sets a timer that will expire in the future - when the timer expires, `SIGALRM` signal is generated and sent to the calling process - `SIGALRM`: default action is to terminate the process - return: - number of seconds left for the previously scheduled alarm - 0: no previously scheduled alarm - only one alarm clock per process - `seconds` == 0: any pending alarm is canceled - `seconds` != 0: if the previous alarm has not yet expired, the alarm clock is set to new value ```c #include <unistd.h> int pause(void); ``` - suspends the calling process until a signal is caught - returns only if a signal handler is executed and that handler is returned; in this case `pause()` returns -1 with errno set to `EINTR` ### sigaction() ```c #include <signal.h> // Returns 0 if OK, -1 on error int sigaction(int signo, const struct sigaction *act, struct sigaction *oact); struct sigaction { void (*sa_handler)(int); sigset_t sa_mask; int sa_flags; /* alternate handler */ void (*sa_sigaction)(int, siginfo_t *, void *); } struct siginfo { int si_signo /* signal number */ int si_errno; /* error number */ int si_code; /* signal code */ pid_t si_pid; /* sending process ID */ uid_t si_uid; /* sending process's real user ID */ void *si_addr; /* address of faulting instruction */ int si_status; /* exit value or signal */ union sigval si_value; /* signal value */ /* some other fields */ } ``` - examine or modify (or both) the action associated with a particular signal - `signo`: signal number being examined or modified - cannot change the action for `SIGKILL` and `SIGSTOP` - `act`: - `act` != NULL: the new action for `signo` is installed from `act` - `oact` != NULL: the previous/current action (depending on the value of act) for `signo` is saved in `oact` - `sigaction()` supersedes `signal()` and should be used in preference - `struct sigaction`: - `sa_handler`: address of the signal handler, or SIG_DFL, SIG_IGN - `sa_mask`: specify a signal mask which should be blocked during the exection of the signal handler (added to the signal mask of the process) ![image](https://hackmd.io/_uploads/HJCirNVU6.png) - (A) OS kernel adds the following signals to the signal mask of the process before invoking its signal handler - the current signal to be delivered (by default), unless `SA_NODEFER` is set in the `sa_flags` - the signals specified in the `sa_mask` - (B) When signal handling function returns, the signal mask of the process is reset to its previous value (before (A)) ![image](https://hackmd.io/_uploads/r18hHNVIp.png) - `sa_flag`: signal options - `sa_sigaction`: if `SA_SIGINFO` is specified in `sa_flags`, `sa_sigaction` is used instead of `sa_handler` ### Nonlocal jumps - cannot `goto` a label that is in another function - nonlocal jumps enable program control transfer to an arbitrary program location (nonlocal gotos) ```c #include <setjmp.h> int setjmp(jmp_buf env); void longjmp(jmp_buf env, int val); ``` - `setjmp()`: establishes the target to which control will later be transferred - saves the calling environment (stack and CPU registers) in the `env` (`jmp_buf`: data type in some form of an array) - `env` (global variable): sets first by `setjmp()` and later used by `longjmp()` - multiple `setjmp()` uses the same `env` (a better practice: each `setjmp()` should employ a unique `env`) - return: - 0: if called directly - nonzero ( == `val` in `longjmp()`): if returning from a call to `longjmp()` - `longjmp()`: performs the transfer of execution - uses `env` to transfer control back to the point where setjmp() was called and to restore the environment to its state at the time of the setjmp() call - `val`: fake return value for `setjmp()` - variable value: - stored in memory: = values at the time of `longjmp()` - stored in CPU and floating-point registers: = values at the time of `setjmp()` - values of local variables and register variables are indetermined - compiler optimization could put local and register variables in CPU registers (rolled back) - use `volatile` if you do not wish to roll back the value - `global`, `volatile`, `static` are unchanged after fake return - can only `longjmp()` to the place in the function which has not returned ![image](https://hackmd.io/_uploads/SJ-prNEU6.png) ![image](https://hackmd.io/_uploads/SkuaBEEUa.png) ### sigsetjmp(), siglongjmp() ```c #include <setjmp.h> // Returns: 0 if called directly, nonzero if returning from a call to siglongjmp int sigsetjmp(sigjmp_buf env, int savemask); void siglongjmp(sigjmp_buf env, int val); ``` - POSIX does not specify whether `setjmp()` and `longjmp()` save and restore signal masks - In FreeBSD 8.0 and Mac OS X 10.6.8: yes - In Linux: no - POSIX provides these two functions to support saving and restoring signal masks (behave the same as `setjmp()` and `longjmp()`) - `savemask`: - `savemask` != 0: `sigsetjmp()` saves the current signal mask of the calling process to env ### sigsuspend() ```c #include <signal.h> // Returns −1 with errno set to EINTR (If it returns to the caller) int sigsuspend(const sigset_t *sigmask); ``` - `sigprocmask()` + `pause()` in a single atomic operation, replace the signal mask with `sigmask` and then suspend the process until - returns after the signal handler returns - not returns if the process is terminated - resotres the signal mask to the value before `sigsuspend()` after returns ### abort(), sleep() ```c #include <stdlib.h> // The function never returns void abort(void); ``` - cause abnormal termination - unblocks `SIGABRT` and raises it for the calling process - default disposition: terminate the process - if ignored or caught by other handlers: still terminates - `abort()` restore the default disposition and raises the signal for a second time ```C #include <unistd.h> // Returns: 0 or the number of unslept seconds unsigned int sleep(unsigned int seconds); ``` - causes the calling process to be suspended until either: - `seconds` passed: return 0 - a signal is caught by the process and the signal handler returns (returns the unslept seconds)