# System Programming (textbook: apue) This is the class note for NTU CSIE System Programming course. The content will cover: - Introduction - File I/O (unbuffered I/O) - Advanced I/O - Standarded I/O - Files and Directories - Process Control - Signal - Thread & Process - Process Environment Series of the articles: - System Programming - Introduction, File I/O (unbuffered I/O), Advanced I/O, Standarded I/O - [System Programming - 2](https://hackmd.io/@ACaccel/System-Programming-2) - Files and Directories, Process Control, Signal - [System Programming - 3](https://hackmd.io/@ACaccel/System-Programming-3) - Thread & Process, Process Environment ## Introduction * Architecture ![](https://hackmd.io/_uploads/Hkgn8G3kp.png =300x) * kernel: machine bootstrap, system initialization, interrupt, exception handling, process scheduling, memory management, and I/Os * expose services via "system call" * Applications: written and compiled by programmers into binaries * Library routines: provide pre-compiled binaries * Shell: interactive interface for users to issue commands * User/kernel mode ![](https://hackmd.io/_uploads/rJvDtf2JT.png =300x) ![](https://hackmd.io/_uploads/HJRYKM3ka.png =500x) * applications run in the user mode * OS kernel runs in the kernel mode * Porcess Scheduling ![](https://hackmd.io/_uploads/H1RSKm3Ja.png) * Process memory layout ![](https://hackmd.io/_uploads/r1hdtX2kp.png) ## File I/O (unbuffered I/O) ### File Descriptor ![](https://i.imgur.com/kgGz1QP.png) * File Descriptor * per-process base, different process may have the same fd number * non-negative integer * POSIX.1 0 : STDIN_FILENO 1 : STDOUT_FILENO 2 : STDERR_FILENO * Open File Descriptor Table * one table per process * A file descriptor contains : 1. file descriptor flags 2. pointer to a system open file table entry 3. child inherits from parents ([here](#fork-vs-exec)) * Open File Table * shared for all open files in the system * Reference count of number of files descriptors pointing to each entry * Each entry contains : 1. files status flags for the file 2. the current file offset 3. a pointer to the v-node table entry for the file * Each open file corresponds to an entry * a disk file may be opened multiple times * V-node table (Linux uses a generic i-node, conceptually the same as the v-node) * shared for all open files in the system * an in-memory structure for each open file * each entry contains: * pointer to the i-node structure of the file in memory (read from disk when the file is opened) * v-node information * type of the file * pointers to functions that operate on the file * i-node * both stored physically and in-memory * contains: * file owner, file size, residing device, protection information, location of the data blocks * OS kernel reads the i-node from the disk to memory when the file is opened * some open file cases : 1. two processes open the same file ![](https://hackmd.io/_uploads/SypD1VkeT.png) 2. open process opens the same file twice ![](https://hackmd.io/_uploads/S1qjkV1l6.png) 3. child inherits from parents ([here](#fork-vs-exec)) ![](https://i.imgur.com/9OZVOob.png) --- ### Block/Synchronized ![](https://i.imgur.com/cpX3Icf.png) * Block/Non-block : the behavior of a function call * A blocking call returns after the requested operations complete * A non-blocking call returns * Ack if the system receives and starts to process the request * Error if the system cannot process the request * Synchronized / Asynchronized IO : the behavior of data movement * Synchronized IO moves the data to the targeted devices and returns * An asynchronized IO buffers the data and moves the data to the targeted device later --- ### O_DSYNC, O_RSYNC, O_SYNC --- open()'s options * Deplayed Write: ![](https://hackmd.io/_uploads/ByYqPglga.png =200x) * data written is buffered and queued in kernel buffer cache, and will be written to disk at some later time * O_DYSNC : * **write()** block until **all data and necessary metadata(not all)** have been written to physical disk * O_SYNC : * **write()** block until **all data and all metadata** have been written to physical disk * O_RSYNC : * **read()** block until the data(and maybe metadata) being read has been written to physical disk * must be used in combination with O_SYNC or O_DSYNC ![](https://i.imgur.com/lAolFbk.png =x250) ![](https://i.imgur.com/oCjmUGc.png =x250) ![](https://i.imgur.com/WuJdldU.png =x250) * notes : * Linux file system isn't honoring the O_SYNC flag --- ### sync(), fsync(), fdatasync() ```c #include <unistd.h> int fsync(int fd); int fdatasync(int fd); void sync(); ``` * fsync() : * (data + metadata) sync * only affect fds, and return after completed * fdatasync() : * data sync * only affect fds, and return after completed * sync() : * queue all modified block buffers in the kernel for writing and returns immediately * called by daemon update & command sync (1 time/30s) --- ### open() ```c #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *path, int oflag, …/* mode_t mode */); ``` * return: * fd: always return lowest unopened descriptor * path: * absolute path (/xxx) * relative path (./xxx) * path * PATH_MAX, NAME_MAX * _POSIX_NO_TRUNC -> ENAMETOOLONG if error occurs * oflag: * access modes * O_RDONLY, O_WRONLY, O_RDWR, O_EXEC * status flags * O_CREAT, O_TRUNC, O_EXCL (file creation) * O_APPEND (append to the end of the file for each write) * O_NONBLOCK (non-blocking) * O_DSYNC, O_RSYNC, O_SYNC (data synchronization) * mode: * applied only when a new file is created, otherwise omitted * example : ```c open(path, O_WRONLY | O_CREAT | O_TRUNC, mode); ``` --- ### openat() ```c #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int openat(int dirfd, const char *path, int oflag, .../* mode_t mode */); ``` * path: * absolute path -> openat() == open() * relative path * dirfd == AT_FDCWD -> openat() == open() * otherwise, relative to directory file descriptor dirfd * Time-of-check-to-time-of-use (TOCTTOU) error * if it makes two operations (ex: function calls) where the second operation depends on the results of the first operation * open() supports file open from a current working directory (all threads share the same working directory) * openat() is atomic and guarantee the opened file is located relative to the desired directory * example : ```c int dirfd = open("..", O_RDONLY); int fd = openat(dirfd, "test", O_REWR | O_CREAT, 0644); // is equivalent to int fd = openat(AT_FDCWD, "../test", O_REWR | O_CREAT, 0644); ``` --- ### creat(), close() 1. **creat()** ```c #include <sys/types> #include <sys/stat.h> #include <fcntl.h> int creat(const char *path, mode_t mode); ``` * write-only * is equivalent to `open(path, O_WRONLY | O_CREAT | O_TRUNC, mode);` * mode: (Defined in <sys/stat.h>) * S_IRUSR : owner read permit * S_IWUSR : owner write permit * S_IXUSR : owner execute permit * S_IRGRP : group read permit * S_IWGRP : group write permit * S_IXGRP : group execute permit * S_IROTH : others read permit * S_IWOTH : others write permit * S_IXOTH : others execute permit * example : ```c open("myfile", O_CREAT, S_IRUSR | S_IXOTH); ``` * Set umask variable for default file permit ```bash $ umask 0022 create: 0777 result: 0755 ``` 2. **close()** ```c #include <unistd.h> int close(int fd); ``` * All open files are automatically closed by the kernel when a process terminates * Closing a file descriptor releases any record locks on that file --- ### lseek() ![](https://i.imgur.com/lBmTUfd.png) ```c #include <sys/types> #include <unistd.h> off_t lseek(int fd, off_t offset, int whence); ``` * return: * new file offset * -1: error * if fd refers to a pipe, FIFO, or socket * returns -1, and sets the errno * off_t : * `typedef long off_t; / 2^31 bytes` * or `typedef longlong_t off_t; / 2^63 bytes` * offset : * number of bytes from the beginning of the file * by default, initialized to 0 when a file is opened, unless O_APPEND * read/write cause the offset to be incremented * whence : SEEK_SET, SEEK_CUR, SEEK_END * No actual I/O takes place * offset can > file's current size * next write() will extend the file size * example : ```c currpos = lseek(fd, 0, SEEK_CUR); ``` * Something that lseek can do : * seek to a negative offset * seek 0 bytes from the current position * seek past the end of the file --- ### read(), write() 1. **read()** ```c #include <unistd.h> ssize_t read(int fd, void *buf, size_t nbytes); ``` * return: * number of bytes read * 0: EOF * -1: error * offset is incremented by the number of bytes read 2. **write()** ```c #include <unistd.h> ssize_t write(int fd, const void *buf, size_t nbytes); ``` * return: * number of bytes written * -1: error * Write errors: disk-full or file-size-limit * When O_APPEND is set, the file offset is set to the end of the file before each write operation --- ### I/O Efficiency | Delayed Write | read ahead | | ------------- | ---------- | | multiple writes do not require multiple disk | User CPU Time + Sys CPU Time <= Clock Time | | the buffer is queued for writing to disk later | as the buffer size increases, sequential reads are detected, the system tries to read more data than an application requests, so future reads do not have to go to the disk | | ![](https://i.imgur.com/6RR4mfA.png =x250) | ![](https://i.imgur.com/VxCQ5E4.png =x250) | --- ### pread(), pwrite() --- Atomic Operation 1. **pread()** ```c #include <unistd.h> ssize_t pread(int fd, void *buf, size_t nbytes, off_t offset); ``` * return: * same as read() * Same as`lseek(... , SEEK_SET); read();` * Cannot interrupt pread * file offset is not affected by pread() 2. **pwrite()** ```c #include <unistd.h> ssize_t pwrite(int fd, const void *buf, size_t nbytes, off_t offset); ``` * return: * same as write() * Same as `lseek(... , SEEK_SET); write();` * Cannot interrupt pwrite * file offset is not affected by pwrite() --- ### dup(), dup2() --- Atomic Operation ```c #include <unistd.h> int dup(int fd); int dup2(int fd, int newfd); ``` * Create a copy of an existing fd * dup() copy fd to a newly allocated fd * returns the lowest available fd * `dup(fd);` == `fcntl(fd, F_DUPFD, 0);` * dup2() copy fd to newfd * if newfd is open, close newfd before copying * `dup2(fd, newfd);` == `close(newfd); fcntl(fd, F_DUPFD, newfd);` * `dup2(fd, 1)` can redirect stdout to fd * `dev/fd` * directory whose entries are files named 0, 1, 2... * `int fd = open("/dev/fd/0", mode);` == `int fd = dup(0);` ![](https://i.imgur.com/Ca0ypHT.png) --- ### fcntl() ```c #include <fcntl.h> int fcntl(int fd, int cmd, .../*int arg*/); ``` * Changes the properties of opened files * cmd : * F_DUPFD : duplicate an existing file descriptor (>= arg). * FD_CLOEXEC (close-on-exec ) is cleared (for exec()). * F_GETFD, F_SETFD : get/set file descriptor flag, * e.g., FD_CLOEXEC * F_GETFL, F_SETFL : get/set file status flags * e.g., O_APPEND, O_NONBLOCK, O_SYNC, O_ASYNC, O_RDONLY, O_WRONLY, RDWR * F_GETOWN, F_SETOWN : get/set asynchronous I/O ownership * get/set the process ID or process group ID receiving/will receive SIGIO and SIGURG signals for events on file descriptor fd. * SIGIO, SIGURG – I/O possible on a filedes/urgent condition on I/O channel * F_GETLK, F_SETLK, F_SETLKW : get/set file lock * Example 1 : ```c int val = fcntl(atoi(argv[1], F_GETFL, 0)); int accmode = val & O_ACCMODE; if (accmode == O_RDONLY) printf("read only"); else if (accmode == O_WRONLY) printf("write only"); else if (accmode == O_RDWR) printf("read write"); ``` * Example 2 : turn one or more flag ```c // val &= ~flags; // turn off flags // val |= flags; // turn on flags void set_fl(int fd, int flags) { int val; if ((val = fcntl(fd, F_GETFL, 0)) < 0) err_sys("fcntl F_GETFL error"); val |= flags; // turn on flags if (fcntl(fd, F_SETFL, val) < 0) err_sys("fcntl F_SETFL error"); } ``` * example.2 : ![](https://i.imgur.com/FViY2dD.png) --- ### ioctl() ```c #include <unistd.h> /*System V*/ #include <sys/ioctl.h> /*BSD and Linux*/ int ioctl(int fd, int request, …); ``` * Catchall for I/O operations * each device driver can define its own set of ioctl commands * More headers could be required * Disk labels (<disklabel.h>) * file I/O, socket I/O, terminal I/O (<ioctl.h>) * map tape (<mtio.h>) --- ### Error Handling * errno in <errno.h> (sys/errno.h) * e.g., 15 error numbers for open() * define ENOTTY 25 : Inapporopriate ioctl for device * No value 0 for any error number * Functions * `char *strerror(int errnum)` (<string.h>) * `void perror(const char *msg)` (<stdio.h>) * example : ```c fprintf(stderr, "EACCES: %s\n", strerror(EACCES)); errno = ENOENT; perror(argv[0]); --------------------------------- $ a.out EACCES: Permission denied a.out: No such file or directory ``` * Error Recovery * Fatal error : no recovery action * Nonfatal error : delay and retry * Examples : EAGAIN, ENFILE, ENOBUFS, ENOLCK, ENOSPC, ENOSR, EWOULDBLOCK, ENOMEM --- ## Advanced I/O ### Blocking vs. Nonblocking I/O * Blocking * Function doesn't return until its action is completed (i.e., all bytes in the count field are read or written) * Nonblocking * Let us issue I/O operations (such as open, read, and write) and return as quickly as possible without waiting * Two ways to specify nonblocking I/O * open() or fcntl()(cmd: F_SETFL) with O_NONBLOCK flag ![](https://i.imgur.com/R4ZjP5A.png =500x) * Terminal device * input queue, output queue * The shell redirects standard input to the terminal (in canonical mode), and each read returns at most one line * When the output queue starts to fill up * Blocking : put process to sleep until room is available * Non-blocking : polling (a waste of CPU time on a multiuser system) ![](https://i.imgur.com/LofbOKI.png) --- ### I/O Multiplexing --- select(), poll() * When an application needs to handle multiple I/O descriptors at the same time * E.g., file and socket descriptors, multiple socket descriptors * When I/O on any one descriptor can result in blocking ![](https://i.imgur.com/oAOEukp.png =300x) 1. **select** ```c int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout); totalFds = select(nfds, readfds, writefds, errorfds, timeout); ``` * input info * nfds : the range (0 ... nfds-1) of file descriptors to check * getdtablesize() : get file descriptor table size * FD_SETSIZE: max num of descriptors (1024 in Linux) * readfds : bit map of filedes. * the kernel checks if readfds is ready for reading * return 1 if ready * writefds : same as readfds but for writing * errorfds : bit map to check for errors * timeout : how long to wait(block) * The call will block until either : 1. a file descriptor becomes ready 2. the call is interrupted by a signal handler 3. the timeout expires * NULL, select() blocks indefinitely waiting for a file descriptor to become ready. * output info * update readfds, writefds, errorfds * what descriptors are ready * totalfds : number of ready descriptors * 0: timeout, 1: ready * if the same descriptor is ready to be read and written * counted twice ![](https://i.imgur.com/Mz4X3Y2.png =500x) * Example : ```c # include <sys/select.h> // helper function for the fd_set // Return: nonzero if fd is in set, 0 otherwise int FD_ISSET(int fd, fd_set *fdset); void FD_CLR(int fd, fd_set *fdset); void FD_SET(int fd, fd_set *fdset); void FD_ZERO(fd_set *fdset); ``` ```c fd_set readset, writeset; FD_ZERO(&readset); FD_ZERO(&writeset); FD_SET(0, &readset); FD_SET(3, &readset); FD_SET(1, &writeset); FD_SET(2, &writeset); select(4, &readset, &writeset, NULL, NULL); ``` ![](https://i.imgur.com/8O4YbLo.png =500x) ```c // this case, select() == sleep() in microseconds int totalfds = select(0, NULL, NULL, NULL, &timeval) ``` 2. **poll** ```c #include <poll.h> int poll(struct pollfd *fds, nfds_t nfds, int timeout); totalFds = poll(fdarray[], nfds, timeout); struct pollfd { int fd; // fd to check, < 0 to ignore short events; // bit mask: each bit indicates an event of interest on fd short revents; // bit mask: each bit indicates an event that occurred on fd } ``` * nfds : the number of elements in fdarray[] * fdarray: * caller sets the **events** of interests * kernel sets the **revents** which have occurred * events : POLLIN, POLLOUT, POLLERR, POLLHUP, POLLPRI, ... * timeout : how long should poll() block waiting for a file descriptor to become ready. * -1 : wait forever * 0 : don't wait * positive : wait (ms) * totalFds : number of ready descriptors * 0 : timeout * negative : error 3. **select vs. poll** ![](https://hackmd.io/_uploads/BkEzQk4gp.png =450x) * Alternatives * pselect() (nanoseconds, signal mask) * epoll() (good for speed & scalability) --- ### File Lock Prevent data from overwriting while it is still being read by other processes (atomic operation) * When a lock is granted, the process has the right to read/write the file * When a lock is denied, the process has to wait until the lock is released by the lock holder * Three ways to lock * flock() : lock an entire file * fcntl() : lock arbitrary byte ranges in a file (record locking) * lockf() : built on top of fcntl, simplified interface * flock(), fcntl(), and lockf() support advisory locking 1. **flock()** ```c #include <sys/file.h> // 0: OK, -1: on error int flock(int fd, int operation); ``` * operation: * LOCK_SH: shared lock * more than one process may hold the same lock * LOCK_EX: exclusive lock * only one process hold the lock * LOCK_UN: remove an existing lock * apply or remove a lock on a open file 2. **fcntl** ```c int fcntl(int filedes, int cmd, ... /* struct flock *flockptr */ ); ``` * cmd : * F_GETLK : Determine whether the lock described by flockptr can be placed on the file * yes: return F_UNLCK in l_type * not(the file is locked): return details about one of those locks (updating l_type, l_whence, l_start, l_pid, l_len) * F_SETLK : Set the lock described by flockptr (nonblocking) * return -1 and update the errno (EACCESS or EGAIN) if failing to set the lock * acquire a lock by specifying l_type (F_RDLCK, F_WRLCK) on the bytes (l_whence, l_start, l_len) * release a lock by apecifying l_type (F_UNLCK) * F_SETLKW : Same as F_SETLK except that if a lock is blocked by other locks, the caller shall block until the lock to be released * lockptr ```c struct flock { short l_type; // F_RDLCK, F_WRLCK, or F_UNLCK short l_whence; // SEEK_SET, SEEK_CUR, or SEEK_END off_t l_start; // offset in bytes, relative to l_whence off_t l_len; // length, in bytes; 0 means lock to EOF pid_t l_pid; // returned with F_GETLK } ``` * l_type ![](https://hackmd.io/_uploads/rkU9ERPga.png =400x) * F_RDLCK: shared read lock * any number of processes can have * one or more read locks -> no write locks on that byte(s) * i.e., multiple readers but no writers * F_WRLCK: exclusive write lock * only one single process * an exclusive write lock -> no other read/write locks on that byte(s) * i.e., a single writer but no reader * F_UNLCK: unlock a region * Advisory vs. Mandatory lock * Advisory lock * fcntl(), flock(), lockf() * the shared files shall be accessed by the file lock function * Problem: cannot prevent a process not holding an advisory lock from accessing shared files ![](https://i.imgur.com/tQ0JCmi.png =500x) * Mandatory lock * The kernel checks every open, read, and write to verify that the calling process is not violating a lock * Mandatory locking involves system overhead ![](https://i.imgur.com/z0y0VbM.png =500x) * Remark * when setting or releasing a lock, the system combines or splits adfacent areas as required ![](https://hackmd.io/_uploads/S1g7TCwgT.png =300x) * File Access: To obtain a read/write lock, the descriptor must be open for reading/writing * Non-atomic operations: F_GETLK and then F_SETLK/F_SETLKW is not an atomic operation * Implied inheritance and release of locks ([here](#fork-vs-exec)) * a process terminates -> all locks are released * a descriptor is closed -> any locks on the file referenced by the discriptor for that process are released * Locks are never inherited across a `fork()` * Locks are inherited across an `exec()`except close_on_exec is set ![](https://i.imgur.com/rjA1jZF.png =500x) --- ## Standarded I/O ![](https://i.imgur.com/GdG6TdL.png) * Difference from File I/O (unbuffered I/O) * File Pointers vs. File Descriptor * fopen vs. open * When a file is opened/created, a *stream* is associated with the file * File object * File descriptor, pointer to buffer, buffer size, # of remaining chars, an error flag, an end-of-file flag, and the like * stdin, stdout, stderr (<stdio.h>) * Unbuffered I/O vs. buffered I/O ![](https://i.imgur.com/3N1NYS1.png =x200)![](https://i.imgur.com/Ht9d6bS.png =x200) --- ### fopen(), freopen(), fdopen(), fileno(), fclose() 1. **fopen()** ```c #include <stdio.h> FILE *fopen(const char *pathname, const char *type); ``` * text file vs. binary file : b (rb, wb, ab, r+b, ...) stands for binary file --- no effect for Unix kernel * Append mode supports multiple access (atomic operation) ![](https://i.imgur.com/lNbf7jH.png) 2. **freopen()** ```c FILE *freopen(const char *pathname, const char *type, FILE *fp); ``` * close fp stream first and clear a stream's orientation * typically used for stdin/stdout/stderr * example : ```c // logs all standard output to the /tmp/logfile FILE *fp; fp = freopen("/tmp/logfile", "a+", stdout); printf("Sent to stdout and redirected to /tmp/logfile"); fclose(stdout); ``` 3. **fdopen()** ```c FILE *fdopen(int filedes, const char *type); ``` * Associate (standard) I/O stream with an existing filedes --- POSIX.1 * Pipes, network channels * No truncating for the file for "w" * type : must be the subset of the opened file 4. **fileno()** ```c int fileno(FILE *fp); ``` * Get filedes for fcntl, dup, etc 5. **fclose()** ```c int fclose(FILE *fp); ``` * Flush buffered output * Discard buffered input * All I/O streams are closed after the process exits * The relocated buffers must be valid before the stream is closed :arrow_down: ```c FILE * open_data(void){ FILE *fp; char databuf[BUFSIZ]; if ((fp = fopen(DATAFILE, "r")) == NULL) return(NULL); if (setvbuf(fp, databuf, _IOLBF, BUFSIZ) != 0) return(NULL); return(fp); // error! when someone call open_data() // the databuf will be invalid out of the function scope } ``` --- ### Buffering * Allocation * Automatically allocated when the first-time I/O is performed on a stream * Or call setbuf(), setvbuf() * Types * Fully Buffered * Perform I/O when the buffer is filled up * Disk files, pipes, and sockets are normally fully buffered * Line Buffered * Perform I/O when a newline char is encountered --- usually for terminals (e.g., stdin, stdout) * Caveats * a full buffer (a too long line) could trigger I/O * if input is requested (from the kernel), flush all line-buffered to outputs :arrow_down: ```c char buf[100]; printf("$ "); // "prompt" in she scantf("%s", buf); // trigger the output of $ ``` * Unbuffered * Expect to output ASAP, e.g., using write() * E.g., stderr * ANSI C Requirements * Fully buffered for stdin and stdout unless interactive devices (terminal) are referred to * For interactive devices (terminal) * stdin, stdout : SVR4/4.3+BSD --- line buffered * stderr : SVR4/4.3+BSD --- unbuffered --- ### fflush() ```c int fflush(FILE *fp); ``` * Any unwritten data in the stream are passed to kernel * All output streams are flushed, if `fp == NULL` * Call fsync() after each call to fflush() if necessary ![](https://i.imgur.com/9L7pohI.png) --- ### setbuf(), setvbuf() * Must be called before any I/O is performed on streams! 1. **setbuf()** ```c void setbuf(FILE *fp, char *buf); ``` * Full/line buffering if buf is not NULL (BUFSIZ) * e.g., line buffer for terminals * if buf is NULL, unbuffered * #define BUFSIZ 1024 (<stdio.h>) 2. **setvbuf()** ```c int setvbuf(FILE *fp, char *buf, int mode, size_t size); ``` * Can determine the size and the buffer type * mode : _IOFBF, _IOLBF, _IONBF (<stdio.h>) * Optional size : st_blksize (stat()) ![](https://i.imgur.com/44CQRl0.png) --- ### ftell(), rewind(), fseek() ```c int ftell(FILE *fp); void rewind(FILE *fp); int fseek(FILE *fp, long offset, int whence); ``` 1. **ftell()** * Get current file offset in bytes 2. **rewind()** * Move the offset to the beginning 3. **fseek()** * whence : same as lseek * Binary files : No requirements for SEEK_END under ANSI C (good under Unix, possible padding for other systems) * Text files : SEEK_SET only --- 0 or returned value by ftell * Another version : (the difference is the type of offset (long for fseek)) * fello, fseeko (off_t) * fgetpos, fsetpos (fpos_t) (ANSI C Standard) --- ### Unformatted I/O --- Character-at-a-Time I/O 1. **getc(), fgetc(), getchar()** ```c int getc(FILE *fp); int fgetc(FILE *fp); int getchar(void); ``` * `getchar == getc(stdin)` * getc() could be a macro (some side effect, less exec time, cannot pass the function address), e.g., ```c // macro example #define WRONG(A) A*A*A #define CUBE(A) (A)*(A)*(A) int main() { int num = 3; int wrong = WRONG(num+1); // num+1 * num+1 * num+1 int cube = CUBE(num+1); // (num+1) * (num+1) * (num+1) } ``` * unsigned char converted to int (for error number) * Error value : -1 for EOF or error * example : ```c char c; while ((c = getchar()) != EOF) // EOF == 1 putchar(c); ``` * A system could choose "unsigned" of "signed" for "char" type * unsigned char : c no longer equals -1 -> infinite loop * signed char : when reading char 255 ( = -1), loop terminates before EOF 2. **ferror(), feof(), clearerr(), ungetc()** ```c int ferror(FILE *fp); int feof(FILE *fp); void clearerr(FILE *fp); // clear both of error and EOF flags int ungetc(int c, FILE *fp); ``` * ungetc() * used for pushing back a char * No pushing back of EOF (i.e., -1) * can push back an arbitrary char (No need to be the same char read) * Clear EOF flag * The character is stored in I/O buffer, instead of file 3. **putc(), fputc(), putchar()** ```c int putc(int c, FILE *fp); int fputc(int c, FILE *fp); int putchar(int c); ``` * `putchar(c) == putc(c, stdout)` * putc() could be a macro --- ### Unformatted I/O --- Line-at-a-Time I/O 1. **fgets(), gets()** ```c char *fgets(char *buf, int n, FILE *fp); ``` * Include '\n' and be terminated by null * Could return a partial line with (n-1) bytes if the line is too long ```c char *gets(char *buf); ``` * Read from stdin * No buffer size is specified -> may overflow (unsafe) * Not include '\n' and be terminated by null 2. **fputs(), puts()** ```c char *fputs(const char *str, FILE *fp); ``` * Include '\n' and be terminated by null * Newline is not required for line-at-a-time output ```c char *puts(const char *str); ``` * Not include '\n' and be terminated by null * puts() then writes '\n' to stdout --- ### Unformatted I/O --- Direct I/O 1. fread(), fwrite() ```c size_t fread(void *ptr, size_t size, size_t nobj, FILE *fp); size_t fwrite(const void *ptr, size_t size, size_t nobj, FILE *fp); ``` * Not portable for programs using fread and fwrite * The offset of a member in a structure can differ between compilers and systems (due to alignment) * The binary formats for various data types, such as integers, could be different over different ![](https://i.imgur.com/WvI3pqu.png =x200)![](https://i.imgur.com/J8UqoCo.png =x200)![](https://i.imgur.com/x7Aadf9.jpg =x200) * Read less than the specified number of objects -> error or EOF -> ferror, feof * Write error if less than the specified number of objects are written * Example 1 : ```c float data[10]; if (fwrite(&data[2], sizeof(float), 4, fp) != 4) err_sys("fwrite error"); ``` * Example 2 : ```c struct { short count; long total; } item; if (fwrite(&item, sizeof(item), 1, fp) != 1) err_sys("fwrite error"); ``` --- ### Formatted I/O 1. Input Functions ```c int scanf(const char *format, ...); int fscanf(FILE *fp, const char *format, ...); int sscanf(char *buf, const char *format, ...); ``` 2. Output Functions ```c int printf(const char *format, ...); int fprintf(FILE *fp, const char *format, ...); int sprintf(char *buf, const char *format, ...); int vprintf(const char *format, va_list arg); int vfprintf(FILE *fp, const char *format, va_list arg); int vsprintf(char *buf, const char *format, va_list arg); ``` * sprintf() * overflow is possible for sprintf() ('\0' appended at the end of the string) * substitute : snprintf() --- ### Warning : No buffer range checking * `strcpy(char *dest, const char *src)` * `strcat(char *dest, const char *src)` * `gets(char *s)` * `sprintf(char *buf, const char *format, ...)` --- ### Interleaved R&W restrictions ``` Output [fflush | fseek | fsetpos | rewind] Input Input [fseek | fsetpos | rewind | EOF] Output ``` * Standard I/O read/write data from buffer, directed by R/W pointers respectively * The inconsistent on R/W pointers leads to errors * Example : ```c FILE *fp = fopen( "./test.txt", "r+"); // test.txt: 12345 fread(&c, 1, 1, fp); // fseek(fp, 0, SEEK_CUR); // this line is needed fwrite( &c, 1, 1, fp); ``` expect : 11345 result : 11345 or 12345, depend on which OS --- ### Multibyte Files (Orientation) ```c #include <wchar.h> int fwide(FILE *fp, int mode); ``` set before the first I/O operation has done! Once the orientation of the stream has been determined, it will be fixed and cannot be changed until the stream is closed * Standard I/O file streams can be used with single byte or multiple byte character sets, e.g., 'a' : 1 byte, '臺' : maybe 2 or 3 bytes * By default, there is no orientation * mode : * Negative : set byte-oriented * Positive : set wide-oriented * 0 : No change on orientation * Return values : * Negative : byte oriented * Positive : wide oriented * 0 : no oriented * Related functions : fwprintf, fwscanf, fgetwc, fputwc, wmemcpy (wide oriented)