owned this note
owned this note
Published
Linked with GitHub
# XV6 A simple, Unix-like Teaching Operating System Chapter 6
---
> [TOC]
---
## Chapter 6 File system
- 文件系統的目的是組織和存儲數據,典型的文件系統支持用戶和程序間的數據共享,並提供數據持久化的支持(即重啟之後數據仍然可用)。
- xv6 的文件系統中使用了類似Unix 的文件,文件描述符,目錄和路經名(請參閱第零章),並且把數據存儲到一塊IDE 磁盤上(請參閱第三章)。這個文件系統解決了幾大難題:
- 該文件系統需要磁盤上數據結構來表示目錄樹和文件,記錄每個文件用於存儲數據的塊,以及磁盤上哪些區域是空閒的。
- 該文件系統必須支持崩潰恢復,也就是說,如果系統崩潰了(比如斷電了),文件系統必須保證在重啟後仍能正常工作。問題在於一次系統崩潰可能打斷一連串的更新操作,從而使得磁盤上的數據結構變得不一致(例如:有的塊同時被標記為 使用中 和 空閒 )。
- 不同的進程可能同時操作文件系統,要保證這種並行不會破壞文件系統的正常工作。
- 訪問磁盤比訪問內存要慢幾個數量級,所以文件系統必須要維護一個內存內的cache 用於緩存常被訪問的塊。
---
### Overview

- xv6 的文件系統分6層實現
- **Buffer cache** : 通過塊緩衝讀寫 IDE 硬盤,它**同步**了對磁盤的訪問,保證同時只有一個內核進程可以修改磁盤塊。
- **Logging** : 使得更高層的接口可以在 transaction 將對磁盤更新,並保證這些操作是原子操作(要嘛都被應用,要嘛都不被應用)。
- **Inodes and block allocator** : 提供無名文件,每一個這樣的文件由一個i 節點和一連串的數據塊組成。
- **Directory inodes** : 將目錄實現為一種特殊的i 節點,它的內容是一連串的目錄項,每一個目錄項包含一個文件名和對應的i 節點。
- **Recursive lookup** : 提供了層次路經名(如/usr/rtm/xv6/fs.c這樣的),這一層通過遞歸的方式來查詢路徑對應的文件。
- **File Descriptors** : 將許多 UNIX 的資源(如管道,設備,文件等)抽象為文件系統的接口,極大地簡化了程序員的工作。

- 文件系統必須設計好在磁盤上的什麼地方放置 i 節點和 data block 。xv6 把磁盤劃分為幾個區塊。文件系統不使用第 0 塊(第 0 塊存有 bootloader)。第 1 塊叫做 super block;它包含了文件系統的元信息(如文件系統的總塊數,data block 塊數,i節點數,以及 log 的塊數)。從第 2 塊開始存放 i 節點,每一塊能夠存放多個 i 節點。接下來的塊存放空閒塊位圖。剩下的大部分是 data block,它們保存了文件和目錄的內容。在磁盤的最後是 log block,將在後面詳述。
---
### Buffer cache Layer
- Buffer Cache 有兩個工作
- 使得 Disk Block 的存取同步化,同一時間只有一份資料被拷貝,並且同時只有一個 kernel thread 使用這份資料。 ( 方法 : 狀態 )
- Cache Popular Blocks 以提升性能。( bio.c )
- Buffer cache 有固定數量的緩衝區,這意味著如果文件系統請求一個不在緩衝中的塊,必須換出一個已經使用的緩衝區。這裡的置換策略是LRU,因為我們假設假設最近未使用的塊近期內最不可能再被使用。
---
### Code: Buffer cache
- File
- bio.c
- Buffer 有三種狀態 (b->flags)
- B_VALID (緩衝區擁有磁盤塊的有效內容)
- B_DIRTY (緩衝區的內容已經被改變並且需要寫回磁盤)
- B_BUSY (某個內核線程持有這個緩衝區且尚未釋放)

### - Structure
```c=
struct {
struct spinlock lock;
struct buf buf[NBUF];
// Linked list of all buffers, through prev/next.
// head.next is most recently used.
struct buf head;
} bcache;
struct buf {
int flags;
uint dev;
uint sector;
struct buf *prev; // LRU cache list
struct buf *next;
struct buf *qnext; // disk queue
uchar data[512];
};
// 補充:
// #define NBUF 10 // size of disk block cache
#define B_BUSY 0x1 // buffer is locked by some process
#define B_VALID 0x2 // buffer has been read from disk
#define B_DIRTY 0x4 // buffer needs to be written to disk
```
### - binit()
1. 初始化 bcache 的 lock。
2. 初始化 bcache 中的 buf 陣列,並串在 head 上變成 Link List。
```c=
void
binit(void)
{
struct buf *b;
initlock(&bcache.lock, "bcache");
//PAGEBREAK!
// Create linked list of buffers
bcache.head.prev = &bcache.head;
bcache.head.next = &bcache.head;
for(b = bcache.buf; b < bcache.buf+NBUF; b++){
b->next = bcache.head.next;
b->prev = &bcache.head;
b->dev = -1;
bcache.head.next->prev = b;
bcache.head.next = b;
}
}
```
### - bwrite() / bread()
- 利用 iderw 對磁盤執行操作。 ( IDE Driver 中的 API )
```c=
// Write b's contents to disk. Must be B_BUSY.
void
bwrite(struct buf *b)
{
if((b->flags & B_BUSY) == 0)
panic("bwrite");
b->flags |= B_DIRTY;
iderw(b);
}
// Return a B_BUSY buf with the contents of the indicated disk sector.
struct buf*
bread(uint dev, uint sector)
{
struct buf *b;
b = bget(dev, sector);
if(!(b->flags & B_VALID))
iderw(b);
return b;
}
```
### - bget()
- 掃描緩衝區鍊錶,通過給定的設備號和扇區號找到對應的緩衝區。如果存在這樣一個緩衝區,並且它還不是處於 B_BUSY 狀態, bget 就會設置它的 B_BUSY 位並且返回。如果找到的緩衝區已經在使用中,bget就會睡眠等待它被釋放。當sleep返回的時候, bget 並不能假設這塊緩衝區現在可用了,事實上,sleep 時釋放了 buf_table_lock ,醒來後重新獲取了它,這就不能保證 b 仍然是可用的緩衝區:它有可能被用來緩衝另外一個扇區。bget 非常無奈,只能重新掃描一次,希望這次能夠找到可用的緩衝區。
```c=
// Look through buffer cache for sector on device dev.
// If not found, allocate fresh block.
// In either case, return B_BUSY buffer.
static struct buf*
bget(uint dev, uint sector)
{
struct buf *b;
acquire(&bcache.lock);
loop:
// Is the sector already cached?
for(b = bcache.head.next; b != &bcache.head; b = b->next){
if(b->dev == dev && b->sector == sector){
if(!(b->flags & B_BUSY)){
b->flags |= B_BUSY;
release(&bcache.lock);
return b;
}
sleep(b, &bcache.lock);
goto loop;
}
}
// Not cached; recycle some non-busy and clean buffer.
for(b = bcache.head.prev; b != &bcache.head; b = b->prev){
if((b->flags & B_BUSY) == 0 && (b->flags & B_DIRTY) == 0){
b->dev = dev;
b->sector = sector;
b->flags = B_BUSY;
release(&bcache.lock);
return b;
}
}
panic("bget: no buffers");
}
```
### - brelse()
```java=
// Release a B_BUSY buffer.
// Move to the head of the MRU list.
void
brelse(struct buf *b)
{
if((b->flags & B_BUSY) == 0)
panic("brelse");
acquire(&bcache.lock);
b->next->prev = b->prev;
b->prev->next = b->next;
b->next = bcache.head.next;
b->prev = &bcache.head;
bcache.head.next->prev = b;
bcache.head.next = b;
b->flags &= ~B_BUSY;
wakeup(b);
release(&bcache.lock);
}
```
---
### Logging layer
- 文件系統設計中最有趣的問題之一就是錯誤恢復,產生這樣的問題是因為大多數的文件系統都涉及到對磁盤多次的寫操作,如果在寫操作的過程當中系統崩潰了,就會使得磁盤上的文件系統處於不一致的狀態中。
- 舉例來說,根據寫的順序的不同,上述錯誤可能會導致一個目錄項指向一個空閒的 inode ,或者產生一個已被分配但是未被引用的塊。後一種情況相對來說好一些,但在前一種情況中,目錄項指向了一個空閒的 inode,重啟之後就會導致非常嚴重的問題。
- 解法 : Logging layer
- 為什麼 Logging layer 可以解決文件系統操作中出現的崩潰呢?如果崩潰發生在操作提交之前,那麼磁盤上的 Log 就不會被標記為已完成,恢復系統的代碼就會忽視它,磁盤的狀態正如這個操作從未執行過一樣。如果是在操作提交之後崩潰的,恢復程序會重演所有的寫操作,可能會重複之前已經進行了的對磁盤文件系統的寫操作。在任何一種情況下,Log 都使得磁盤操作對於系統崩潰來說是原子操作:在恢復之後,要麼所有的寫操作都完成了,要麼一個寫操作都沒有完成。
---
### Log design
- Log 存在於磁盤末端已知的固定區域。它包含了一個起始塊,緊接著一連串的數據塊。起始塊包含了一個扇區號的數組,每一個對應於 Log 中的數據塊,起始塊還包含了 Log 數據塊的計數。xv6 在提交後修改 Log 的起始塊而不是之前,並且在將 Log 中的數據塊都拷貝到文件系統之後將數據塊計數清 0。提交之後,清 0 之前的崩潰就會導致一個非 0 的計數。
- 每一個 System Call 都可能包含一個必須從頭到尾原子完成的寫操作序列,我們稱這樣的一個序列為一個 Transaction ,雖然他比數據庫中的會話要簡單得多。任何時候只能有一個進程在一個 Transaction 之中,其他進程必須等待當前 Transaction 中的進程結束。因此同一時刻 Log 最多只記錄一次 Transaction。
---
### Code: logging
### - Structure
```c=
// Contents of the header block, used for both the on-disk header block
// and to keep track in memory of logged sector #s before commit.
struct logheader {
int n;
int sector[LOGSIZE];
};
struct log {
struct spinlock lock;
int start;
int size;
int busy; // a transaction is active
int dev;
struct logheader lh;
};
struct log log;
```
- 一個使用 Log 的程式範例
```c=
begin_trans();
...
bp = bread(...);
bp->data[...] = ...;
log_write(bp);
...
commit_trans();
```
### - begin_trans()
```c=
void
begin_trans(void)
{
acquire(&log.lock);
while (log.busy) {
sleep(&log, &log.lock);
}
log.busy = 1;
release(&log.lock);
}
```
### - commit_trans()
```c=
void
commit_trans(void)
{
if (log.lh.n > 0) {
write_head(); // Write header to disk -- the real commit
install_trans(); // Now install writes to home locations
log.lh.n = 0;
write_head(); // Erase the transaction from the log
}
acquire(&log.lock);
log.busy = 0;
wakeup(&log);
release(&log.lock);
}
```
### - log_write()
```c=
// Caller has modified b->data and is done with the buffer.
// Append the block to the log and record the block number,
// but don't write the log header (which would commit the write).
// log_write() replaces bwrite(); a typical use is:
// bp = bread(...)
// modify bp->data[]
// log_write(bp)
// brelse(bp)
void
log_write(struct buf *b)
{
int i;
if (log.lh.n >= LOGSIZE || log.lh.n >= log.size - 1)
panic("too big a transaction");
if (!log.busy)
panic("write outside of trans");
for (i = 0; i < log.lh.n; i++) {
if (log.lh.sector[i] == b->sector) // log absorbtion?
break;
}
log.lh.sector[i] = b->sector;
struct buf *lbuf = bread(b->dev, log.start+i+1);
memmove(lbuf->data, b->data, BSIZE);
bwrite(lbuf);
brelse(lbuf);
if (i == log.lh.n)
log.lh.n++;
b->flags |= B_DIRTY; // XXX prevent eviction
}
```
- XV6 中的實例 ( 因為 inode 中包含一連串的 Data Block 所以需要用 Transaction )
```c=
begin_trans();
ilock(f->ip);
r = writei(f->ip, ...);
iunlock(f->ip);
commit_trans();
```
---
### Code: Block allocator
- File 和 directory 的內容都存在 Disk 中, 所以必須從 Free Pool 中要一塊記憶體。
- Block allocator 裡面包含了 free bitmap,每個 Block 是一個 Bit,用來標是哪個 Block 是有效的。 ( boot sector, superblock,log blocks, inode blocks, and bitmap blocks 永遠都有效 )
- Block allocator 有兩個功能
- balloc 分配一個新的磁盤塊
- bfree 釋放一個塊磁盤塊
### - balloc()
1. readsb 從磁盤中讀出 Super Block(或者從塊緩衝中)到sb 中。
2. 從 0 開始跑把所有Block都跑過
```c=
// Allocate a zeroed disk block.
static uint
balloc(uint dev)
{
int b, bi, m;
struct buf *bp;
struct superblock sb;
bp = 0;
readsb(dev, &sb);
for(b = 0; b < sb.size; b += BPB){
bp = bread(dev, BBLOCK(b, sb.ninodes));
for(bi = 0; bi < BPB && b + bi < sb.size; bi++){
m = 1 << (bi % 8);
if((bp->data[bi/8] & m) == 0){ // Is block free?
bp->data[bi/8] |= m; // Mark block in use.
log_write(bp);
brelse(bp);
bzero(dev, b + bi);
return b + bi;
}
}
brelse(bp);
}
panic("balloc: out of blocks");
}
```
### - bfree()
```c=
// Free a disk block.
static void
bfree(int dev, uint b)
{
struct buf *bp;
struct superblock sb;
int bi, m;
readsb(dev, &sb);
bp = bread(dev, BBLOCK(b, sb.ninodes));
bi = b % BPB;
m = 1 << (bi % 8);
if((bp->data[bi/8] & m) == 0)
panic("freeing free block");
bp->data[bi/8] &= ~m;
log_write(bp);
brelse(bp);
}
```
---
### Inode layer
- Inode : Two related meanings
- The on-disk data structure containing a file’s size and list of data block numbers. ( dinode )
- An in-memory inode, which contains a copy of the on-disk inode as well as extra information needed within the kernel. ( inode )
- All of the on-disk inodes are packed into a contiguous area of disk called the inode blocks. ( like an array )
- Because of the same size of inode, we can give a number n to easily find the nth inode on the disk.
- Struct DInode
```c=
// On-disk inode structure
struct dinode {
short type; // File type
short major; // Major device number (T_DEV only)
short minor; // Minor device number (T_DEV only)
short nlink; // Number of links to inode in file system
uint size; // Size of file (bytes)
uint addrs[NDIRECT+1]; // Data block addresses
};
```
- DInode :
- The main function : The type field distinguishes between files, directories, and special files (devices).
- Nlink : Counts the number of directory entries that refer to this inode, in order to recognize when the on-disk inode and its data blocks should be freed.
- Size : Records the number of bytes of content in the file.
- Addrs : Array records the block numbers of the disk blocks holding the file’s content.
- Struct Inode
```c=
// in-memory copy of an inode
struct inode {
uint dev; // Device number
uint inum; // Inode number
int ref; // Reference count
int flags; // I_BUSY, I_VALID
short type; // copy of disk inode
short major;
short minor;
short nlink;
uint size;
uint addrs[NDIRECT+1];
};
#define I_BUSY 0x1
#define I_VALID 0x2
```
- Inode :
- The main function : The kernel keeps the set of active inodes in memory.
- **struct inode** is the in-memory copy of a **struct dinode** on disk.
- Timing : The kernel stores an inode in memory only if there are C pointers referring to that inode.
- Ref : Counts the number of C pointers referring to the in-memory inode, and the kernel discards the inode from memory if the reference count drops to zero.
- Iget / Iput : Acquire and release pointers to an inode, modifying the reference count.
---
### Code: Inodes
### - ialloc()
```c=
//PAGEBREAK!
// Allocate a new inode with the given type on device dev.
// A free inode has a type of zero.
struct inode*
ialloc(uint dev, short type)
{
int inum;
struct buf *bp;
struct dinode *dip;
struct superblock sb;
readsb(dev, &sb);
for(inum = 1; inum < sb.ninodes; inum++){
bp = bread(dev, IBLOCK(inum));
dip = (struct dinode*)bp->data + inum%IPB;
if(dip->type == 0){ // a free inode
memset(dip, 0, sizeof(*dip));
dip->type = type;
log_write(bp); // mark it allocated on the disk
brelse(bp);
return iget(dev, inum);
}
brelse(bp);
}
panic("ialloc: no inodes");
}
```
### - iget()
```c=
// Find the inode with number inum on device dev
// and return the in-memory copy. Does not lock
// the inode and does not read it from disk.
static struct inode*
iget(uint dev, uint inum)
{
struct inode *ip, *empty;
acquire(&icache.lock);
// Is the inode already cached?
empty = 0;
for(ip = &icache.inode[0]; ip < &icache.inode[NINODE]; ip++){
if(ip->ref > 0 && ip->dev == dev && ip->inum == inum){
ip->ref++;
release(&icache.lock);
return ip;
}
if(empty == 0 && ip->ref == 0) // Remember empty slot.
empty = ip;
}
// Recycle an inode cache entry.
if(empty == 0)
panic("iget: no inodes");
ip = empty;
ip->dev = dev;
ip->inum = inum;
ip->ref = 1;
ip->flags = 0;
release(&icache.lock);
return ip;
}
```
### - ilock() / iunlock()
```c=
// Lock the given inode.
// Reads the inode from disk if necessary.
void
ilock(struct inode *ip)
{
struct buf *bp;
struct dinode *dip;
if(ip == 0 || ip->ref < 1)
panic("ilock");
acquire(&icache.lock);
while(ip->flags & I_BUSY)
sleep(ip, &icache.lock);
ip->flags |= I_BUSY;
release(&icache.lock);
if(!(ip->flags & I_VALID)){
bp = bread(ip->dev, IBLOCK(ip->inum));
dip = (struct dinode*)bp->data + ip->inum%IPB;
ip->type = dip->type;
ip->major = dip->major;
ip->minor = dip->minor;
ip->nlink = dip->nlink;
ip->size = dip->size;
memmove(ip->addrs, dip->addrs, sizeof(ip->addrs));
brelse(bp);
ip->flags |= I_VALID;
if(ip->type == 0)
panic("ilock: no type");
}
}
// Unlock the given inode.
void
iunlock(struct inode *ip)
{
if(ip == 0 || !(ip->flags & I_BUSY) || ip->ref < 1)
panic("iunlock");
acquire(&icache.lock);
ip->flags &= ~I_BUSY;
wakeup(ip);
release(&icache.lock);
}
```
### - iput()
```c=
// Drop a reference to an in-memory inode.
// If that was the last reference, the inode cache entry can
// be recycled.
// If that was the last reference and the inode has no links
// to it, free the inode (and its content) on disk.
void
iput(struct inode *ip)
{
acquire(&icache.lock);
if(ip->ref == 1 && (ip->flags & I_VALID) && ip->nlink == 0){
// inode has no links: truncate and free inode.
if(ip->flags & I_BUSY)
panic("iput busy");
ip->flags |= I_BUSY;
release(&icache.lock);
itrunc(ip);
ip->type = 0;
iupdate(ip);
acquire(&icache.lock);
ip->flags = 0;
wakeup(ip);
}
ip->ref--;
release(&icache.lock);
}
```
---
### Code: Inode content

### - bmap()
```c=
//PAGEBREAK!
// Inode content
//
// The content (data) associated with each inode is stored
// in blocks on the disk. The first NDIRECT block numbers
// are listed in ip->addrs[]. The next NINDIRECT blocks are
// listed in block ip->addrs[NDIRECT].
// Return the disk block address of the nth block in inode ip.
// If there is no such block, bmap allocates one.
static uint
bmap(struct inode *ip, uint bn)
{
uint addr, *a;
struct buf *bp;
if(bn < NDIRECT){
if((addr = ip->addrs[bn]) == 0)
ip->addrs[bn] = addr = balloc(ip->dev);
return addr;
}
bn -= NDIRECT;
if(bn < NINDIRECT){
// Load indirect block, allocating if necessary.
if((addr = ip->addrs[NDIRECT]) == 0)
ip->addrs[NDIRECT] = addr = balloc(ip->dev);
bp = bread(ip->dev, addr);
a = (uint*)bp->data;
if((addr = a[bn]) == 0){
a[bn] = addr = balloc(ip->dev);
log_write(bp);
}
brelse(bp);
return addr;
}
panic("bmap: out of range");
}
```
---
### Code: directory layer
### - Strucrure
```c=
// Directory is a file containing a sequence of dirent structures.
#define DIRSIZ 14
struct dirent {
ushort inum;
char name[DIRSIZ];
};
```
### - dirlookup()
```c=
// Look for a directory entry in a directory.
// If found, set *poff to byte offset of entry.
struct inode*
dirlookup(struct inode *dp, char *name, uint *poff)
{
uint off, inum;
struct dirent de;
if(dp->type != T_DIR)
panic("dirlookup not DIR");
for(off = 0; off < dp->size; off += sizeof(de)){
if(readi(dp, (char*)&de, off, sizeof(de)) != sizeof(de))
panic("dirlink read");
if(de.inum == 0)
continue;
if(namecmp(name, de.name) == 0){
// entry matches path element
if(poff)
*poff = off;
inum = de.inum;
return iget(dp->dev, inum);
}
}
return 0;
}
```
### - dirlink()
```c=
// Write a new directory entry (name, inum) into the directory dp.
int
dirlink(struct inode *dp, char *name, uint inum)
{
int off;
struct dirent de;
struct inode *ip;
// Check that name is not present.
if((ip = dirlookup(dp, name, 0)) != 0){
iput(ip);
return -1;
}
// Look for an empty dirent.
for(off = 0; off < dp->size; off += sizeof(de)){
if(readi(dp, (char*)&de, off, sizeof(de)) != sizeof(de))
panic("dirlink read");
if(de.inum == 0)
break;
}
strncpy(de.name, name, DIRSIZ);
de.inum = inum;
if(writei(dp, (char*)&de, off, sizeof(de)) != sizeof(de))
panic("dirlink");
return 0;
}
```
---
### Code: Path names
- namei()
- Evaluates path and returns the corresponding inode.
- nameiparent()
- It stops before the last element, returning the inode of the parent directory and copying the final element into name.
### - namei() / nameiparent()
```c=
struct inode*
namei(char *path)
{
char name[DIRSIZ];
return namex(path, 0, name);
}
struct inode*
nameiparent(char *path, char *name)
{
return namex(path, 1, name);
}
// Look up and return the inode for a path name.
// If parent != 0, return the inode for the parent and copy the final
// path element into name, which must have room for DIRSIZ bytes.
static struct inode*
namex(char *path, int nameiparent, char *name)
{
struct inode *ip, *next;
if(*path == '/')
ip = iget(ROOTDEV, ROOTINO);
else
ip = idup(proc->cwd);
while((path = skipelem(path, name)) != 0){
ilock(ip);
if(ip->type != T_DIR){
iunlockput(ip);
return 0;
}
if(nameiparent && *path == '\0'){
// Stop one level early.
iunlock(ip);
return ip;
}
if((next = dirlookup(ip, name, 0)) == 0){
iunlockput(ip);
return 0;
}
iunlockput(ip);
ip = next;
}
if(nameiparent){
iput(ip);
return 0;
}
return ip;
}
```
---
### File descriptor layer
- One of the cool aspect of the Unix interface is that most resources in Unix are represented as a **file**, including **devices** such as the **console**, **pipes**, and of course, **real files**. The file descriptor layer is the layer that achieves this uniformity.
- Structure
```c=
struct file {
enum { FD_NONE, FD_PIPE, FD_INODE } type;
int ref; // reference count
char readable;
char writable;
struct pipe *pipe;
struct inode *ip;
uint off;
};
```
---
### Code: System calls
- 最上一層就是系統調用,讀(),寫()之類,以及相關的實現。
### - sys_open()
```c=
int
sys_open(void)
{
char *path;
int fd, omode;
struct file *f;
struct inode *ip;
if(argstr(0, &path) < 0 || argint(1, &omode) < 0)
return -1;
if(omode & O_CREATE){
begin_trans();
ip = create(path, T_FILE, 0, 0);
commit_trans();
if(ip == 0)
return -1;
} else {
if((ip = namei(path)) == 0)
return -1;
ilock(ip);
if(ip->type == T_DIR && omode != O_RDONLY){
iunlockput(ip);
return -1;
}
}
if((f = filealloc()) == 0 || (fd = fdalloc(f)) < 0){
if(f)
fileclose(f);
iunlockput(ip);
return -1;
}
iunlock(ip);
f->type = FD_INODE;
f->ip = ip;
f->off = 0;
f->readable = !(omode & O_WRONLY);
f->writable = (omode & O_WRONLY) || (omode & O_RDWR);
return fd;
}
```

### - sys_mkdir()
```c=
int
sys_mkdir(void)
{
char *path;
struct inode *ip;
begin_trans();
if(argstr(0, &path) < 0 || (ip = create(path, T_DIR, 0, 0)) == 0){
commit_trans();
return -1;
}
iunlockput(ip);
commit_trans();
return 0;
}
```
### - sys_mknod()
```c=
int
sys_mknod(void)
{
struct inode *ip;
char *path;
int len;
int major, minor;
begin_trans();
if((len=argstr(0, &path)) < 0 ||
argint(1, &major) < 0 ||
argint(2, &minor) < 0 ||
(ip = create(path, T_DEV, major, minor)) == 0){
commit_trans();
return -1;
}
iunlockput(ip);
commit_trans();
return 0;
}
```
### - sys_chdir()
```c=
int
sys_chdir(void)
{
char *path;
struct inode *ip;
if(argstr(0, &path) < 0 || (ip = namei(path)) == 0)
return -1;
ilock(ip);
if(ip->type != T_DIR){
iunlockput(ip);
return -1;
}
iunlock(ip);
iput(proc->cwd);
proc->cwd = ip;
return 0;
}
```
### - sys_exec()
```c=
int
sys_exec(void)
{
char *path, *argv[MAXARG];
int i;
uint uargv, uarg;
if(argstr(0, &path) < 0 || argint(1, (int*)&uargv) < 0){
return -1;
}
memset(argv, 0, sizeof(argv));
for(i=0;; i++){
if(i >= NELEM(argv))
return -1;
if(fetchint(uargv+4*i, (int*)&uarg) < 0)
return -1;
if(uarg == 0){
argv[i] = 0;
break;
}
if(fetchstr(uarg, &argv[i]) < 0)
return -1;
}
return exec(path, argv);
}
```
### - sys_pipe()
```c=
int
sys_pipe(void)
{
int *fd;
struct file *rf, *wf;
int fd0, fd1;
if(argptr(0, (void*)&fd, 2*sizeof(fd[0])) < 0)
return -1;
if(pipealloc(&rf, &wf) < 0)
return -1;
fd0 = -1;
if((fd0 = fdalloc(rf)) < 0 || (fd1 = fdalloc(wf)) < 0){
if(fd0 >= 0)
proc->ofile[fd0] = 0;
fileclose(rf);
fileclose(wf);
return -1;
}
fd[0] = fd0;
fd[1] = fd1;
return 0;
}
```
### - create()
```c=
static struct inode*
create(char *path, short type, short major, short minor)
{
uint off;
struct inode *ip, *dp;
char name[DIRSIZ];
if((dp = nameiparent(path, name)) == 0)
return 0;
ilock(dp);
if((ip = dirlookup(dp, name, &off)) != 0){
iunlockput(dp);
ilock(ip);
if(type == T_FILE && ip->type == T_FILE)
return ip;
iunlockput(ip);
return 0;
}
if((ip = ialloc(dp->dev, type)) == 0)
panic("create: ialloc");
ilock(ip);
ip->major = major;
ip->minor = minor;
ip->nlink = 1;
iupdate(ip);
if(type == T_DIR){ // Create . and .. entries.
dp->nlink++; // for ".."
iupdate(dp);
// No ip->nlink++ for ".": avoid cyclic ref count.
if(dirlink(ip, ".", ip->inum) < 0 || dirlink(ip, "..", dp->inum) < 0)
panic("create dots");
}
if(dirlink(dp, name, ip->inum) < 0)
panic("create: dirlink");
iunlockput(dp);
return ip;
}
```
---