Bin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # XV6 A simple, Unix-like Teaching Operating System Chapter 6 --- > [TOC] --- ## Chapter 6 File system - 文件系統的目的是組織和存儲數據,典型的文件系統支持用戶和程序間的數據共享,並提供數據持久化的支持(即重啟之後數據仍然可用)。 - xv6 的文件系統中使用了類似Unix 的文件,文件描述符,目錄和路經名(請參閱第零章),並且把數據存儲到一塊IDE 磁盤上(請參閱第三章)。這個文件系統解決了幾大難題: - 該文件系統需要磁盤上數據結構來表示目錄樹和文件,記錄每個文件用於存儲數據的塊,以及磁盤上哪些區域是空閒的。 - 該文件系統必須支持崩潰恢復,也就是說,如果系統崩潰了(比如斷電了),文件系統必須保證在重啟後仍能正常工作。問題在於一次系統崩潰可能打斷一連串的更新操作,從而使得磁盤上的數據結構變得不一致(例如:有的塊同時被標記為 使用中 和 空閒 )。 - 不同的進程可能同時操作文件系統,要保證這種並行不會破壞文件系統的正常工作。 - 訪問磁盤比訪問內存要慢幾個數量級,所以文件系統必須要維護一個內存內的cache 用於緩存常被訪問的塊。 --- ### Overview ![](https://i.imgur.com/3CFlz0W.png) - xv6 的文件系統分6層實現 - **Buffer cache** : 通過塊緩衝讀寫 IDE 硬盤,它**同步**了對磁盤的訪問,保證同時只有一個內核進程可以修改磁盤塊。 - **Logging** : 使得更高層的接口可以在 transaction 將對磁盤更新,並保證這些操作是原子操作(要嘛都被應用,要嘛都不被應用)。 - **Inodes and block allocator** : 提供無名文件,每一個這樣的文件由一個i 節點和一連串的數據塊組成。 - **Directory inodes** : 將目錄實現為一種特殊的i 節點,它的內容是一連串的目錄項,每一個目錄項包含一個文件名和對應的i 節點。 - **Recursive lookup** : 提供了層次路經名(如/usr/rtm/xv6/fs.c這樣的),這一層通過遞歸的方式來查詢路徑對應的文件。 - **File Descriptors** : 將許多 UNIX 的資源(如管道,設備,文件等)抽象為文件系統的接口,極大地簡化了程序員的工作。 ![](https://i.imgur.com/WmEd9yu.png) - 文件系統必須設計好在磁盤上的什麼地方放置 i 節點和 data block 。xv6 把磁盤劃分為幾個區塊。文件系統不使用第 0 塊(第 0 塊存有 bootloader)。第 1 塊叫做 super block;它包含了文件系統的元信息(如文件系統的總塊數,data block 塊數,i節點數,以及 log 的塊數)。從第 2 塊開始存放 i 節點,每一塊能夠存放多個 i 節點。接下來的塊存放空閒塊位圖。剩下的大部分是 data block,它們保存了文件和目錄的內容。在磁盤的最後是 log block,將在後面詳述。 --- ### Buffer cache Layer - Buffer Cache 有兩個工作 - 使得 Disk Block 的存取同步化,同一時間只有一份資料被拷貝,並且同時只有一個 kernel thread 使用這份資料。 ( 方法 : 狀態 ) - Cache Popular Blocks 以提升性能。( bio.c ) - Buffer cache 有固定數量的緩衝區,這意味著如果文件系統請求一個不在緩衝中的塊,必須換出一個已經使用的緩衝區。這裡的置換策略是LRU,因為我們假設假設最近未使用的塊近期內最不可能再被使用。 --- ### Code: Buffer cache - File - bio.c - Buffer 有三種狀態 (b->flags) - B_VALID (緩衝區擁有磁盤塊的有效內容) - B_DIRTY (緩衝區的內容已經被改變並且需要寫回磁盤) - B_BUSY (某個內核線程持有這個緩衝區且尚未釋放) ![](https://i.imgur.com/MzHGMhs.png) ### - Structure ```c= struct { struct spinlock lock; struct buf buf[NBUF]; // Linked list of all buffers, through prev/next. // head.next is most recently used. struct buf head; } bcache; struct buf { int flags; uint dev; uint sector; struct buf *prev; // LRU cache list struct buf *next; struct buf *qnext; // disk queue uchar data[512]; }; // 補充: // #define NBUF 10 // size of disk block cache #define B_BUSY 0x1 // buffer is locked by some process #define B_VALID 0x2 // buffer has been read from disk #define B_DIRTY 0x4 // buffer needs to be written to disk ``` ### - binit() 1. 初始化 bcache 的 lock。 2. 初始化 bcache 中的 buf 陣列,並串在 head 上變成 Link List。 ```c= void binit(void) { struct buf *b; initlock(&bcache.lock, "bcache"); //PAGEBREAK! // Create linked list of buffers bcache.head.prev = &bcache.head; bcache.head.next = &bcache.head; for(b = bcache.buf; b < bcache.buf+NBUF; b++){ b->next = bcache.head.next; b->prev = &bcache.head; b->dev = -1; bcache.head.next->prev = b; bcache.head.next = b; } } ``` ### - bwrite() / bread() - 利用 iderw 對磁盤執行操作。 ( IDE Driver 中的 API ) ```c= // Write b's contents to disk. Must be B_BUSY. void bwrite(struct buf *b) { if((b->flags & B_BUSY) == 0) panic("bwrite"); b->flags |= B_DIRTY; iderw(b); } // Return a B_BUSY buf with the contents of the indicated disk sector. struct buf* bread(uint dev, uint sector) { struct buf *b; b = bget(dev, sector); if(!(b->flags & B_VALID)) iderw(b); return b; } ``` ### - bget() - 掃描緩衝區鍊錶,通過給定的設備號和扇區號找到對應的緩衝區。如果存在這樣一個緩衝區,並且它還不是處於 B_BUSY 狀態, bget 就會設置它的 B_BUSY 位並且返回。如果找到的緩衝區已經在使用中,bget就會睡眠等待它被釋放。當sleep返回的時候, bget 並不能假設這塊緩衝區現在可用了,事實上,sleep 時釋放了 buf_table_lock ,醒來後重新獲取了它,這就不能保證 b 仍然是可用的緩衝區:它有可能被用來緩衝另外一個扇區。bget 非常無奈,只能重新掃描一次,希望這次能夠找到可用的緩衝區。 ```c= // Look through buffer cache for sector on device dev. // If not found, allocate fresh block. // In either case, return B_BUSY buffer. static struct buf* bget(uint dev, uint sector) { struct buf *b; acquire(&bcache.lock); loop: // Is the sector already cached? for(b = bcache.head.next; b != &bcache.head; b = b->next){ if(b->dev == dev && b->sector == sector){ if(!(b->flags & B_BUSY)){ b->flags |= B_BUSY; release(&bcache.lock); return b; } sleep(b, &bcache.lock); goto loop; } } // Not cached; recycle some non-busy and clean buffer. for(b = bcache.head.prev; b != &bcache.head; b = b->prev){ if((b->flags & B_BUSY) == 0 && (b->flags & B_DIRTY) == 0){ b->dev = dev; b->sector = sector; b->flags = B_BUSY; release(&bcache.lock); return b; } } panic("bget: no buffers"); } ``` ### - brelse() ```java= // Release a B_BUSY buffer. // Move to the head of the MRU list. void brelse(struct buf *b) { if((b->flags & B_BUSY) == 0) panic("brelse"); acquire(&bcache.lock); b->next->prev = b->prev; b->prev->next = b->next; b->next = bcache.head.next; b->prev = &bcache.head; bcache.head.next->prev = b; bcache.head.next = b; b->flags &= ~B_BUSY; wakeup(b); release(&bcache.lock); } ``` --- ### Logging layer - 文件系統設計中最有趣的問題之一就是錯誤恢復,產生這樣的問題是因為大多數的文件系統都涉及到對磁盤多次的寫操作,如果在寫操作的過程當中系統崩潰了,就會使得磁盤上的文件系統處於不一致的狀態中。 - 舉例來說,根據寫的順序的不同,上述錯誤可能會導致一個目錄項指向一個空閒的 inode ,或者產生一個已被分配但是未被引用的塊。後一種情況相對來說好一些,但在前一種情況中,目錄項指向了一個空閒的 inode,重啟之後就會導致非常嚴重的問題。 - 解法 : Logging layer - 為什麼 Logging layer 可以解決文件系統操作中出現的崩潰呢?如果崩潰發生在操作提交之前,那麼磁盤上的 Log 就不會被標記為已完成,恢復系統的代碼就會忽視它,磁盤的狀態正如這個操作從未執行過一樣。如果是在操作提交之後崩潰的,恢復程序會重演所有的寫操作,可能會重複之前已經進行了的對磁盤文件系統的寫操作。在任何一種情況下,Log 都使得磁盤操作對於系統崩潰來說是原子操作:在恢復之後,要麼所有的寫操作都完成了,要麼一個寫操作都沒有完成。 --- ### Log design - Log 存在於磁盤末端已知的固定區域。它包含了一個起始塊,緊接著一連串的數據塊。起始塊包含了一個扇區號的數組,每一個對應於 Log 中的數據塊,起始塊還包含了 Log 數據塊的計數。xv6 在提交後修改 Log 的起始塊而不是之前,並且在將 Log 中的數據塊都拷貝到文件系統之後將數據塊計數清 0。提交之後,清 0 之前的崩潰就會導致一個非 0 的計數。 - 每一個 System Call 都可能包含一個必須從頭到尾原子完成的寫操作序列,我們稱這樣的一個序列為一個 Transaction ,雖然他比數據庫中的會話要簡單得多。任何時候只能有一個進程在一個 Transaction 之中,其他進程必須等待當前 Transaction 中的進程結束。因此同一時刻 Log 最多只記錄一次 Transaction。 --- ### Code: logging ### - Structure ```c= // Contents of the header block, used for both the on-disk header block // and to keep track in memory of logged sector #s before commit. struct logheader { int n; int sector[LOGSIZE]; }; struct log { struct spinlock lock; int start; int size; int busy; // a transaction is active int dev; struct logheader lh; }; struct log log; ``` - 一個使用 Log 的程式範例 ```c= begin_trans(); ... bp = bread(...); bp->data[...] = ...; log_write(bp); ... commit_trans(); ``` ### - begin_trans() ```c= void begin_trans(void) { acquire(&log.lock); while (log.busy) { sleep(&log, &log.lock); } log.busy = 1; release(&log.lock); } ``` ### - commit_trans() ```c= void commit_trans(void) { if (log.lh.n > 0) { write_head(); // Write header to disk -- the real commit install_trans(); // Now install writes to home locations log.lh.n = 0; write_head(); // Erase the transaction from the log } acquire(&log.lock); log.busy = 0; wakeup(&log); release(&log.lock); } ``` ### - log_write() ```c= // Caller has modified b->data and is done with the buffer. // Append the block to the log and record the block number, // but don't write the log header (which would commit the write). // log_write() replaces bwrite(); a typical use is: // bp = bread(...) // modify bp->data[] // log_write(bp) // brelse(bp) void log_write(struct buf *b) { int i; if (log.lh.n >= LOGSIZE || log.lh.n >= log.size - 1) panic("too big a transaction"); if (!log.busy) panic("write outside of trans"); for (i = 0; i < log.lh.n; i++) { if (log.lh.sector[i] == b->sector) // log absorbtion? break; } log.lh.sector[i] = b->sector; struct buf *lbuf = bread(b->dev, log.start+i+1); memmove(lbuf->data, b->data, BSIZE); bwrite(lbuf); brelse(lbuf); if (i == log.lh.n) log.lh.n++; b->flags |= B_DIRTY; // XXX prevent eviction } ``` - XV6 中的實例 ( 因為 inode 中包含一連串的 Data Block 所以需要用 Transaction ) ```c= begin_trans(); ilock(f->ip); r = writei(f->ip, ...); iunlock(f->ip); commit_trans(); ``` --- ### Code: Block allocator - File 和 directory 的內容都存在 Disk 中, 所以必須從 Free Pool 中要一塊記憶體。 - Block allocator 裡面包含了 free bitmap,每個 Block 是一個 Bit,用來標是哪個 Block 是有效的。 ( boot sector, superblock,log blocks, inode blocks, and bitmap blocks 永遠都有效 ) - Block allocator 有兩個功能 - balloc 分配一個新的磁盤塊 - bfree 釋放一個塊磁盤塊 ### - balloc() 1. readsb 從磁盤中讀出 Super Block(或者從塊緩衝中)到sb 中。 2. 從 0 開始跑把所有Block都跑過 ```c= // Allocate a zeroed disk block. static uint balloc(uint dev) { int b, bi, m; struct buf *bp; struct superblock sb; bp = 0; readsb(dev, &sb); for(b = 0; b < sb.size; b += BPB){ bp = bread(dev, BBLOCK(b, sb.ninodes)); for(bi = 0; bi < BPB && b + bi < sb.size; bi++){ m = 1 << (bi % 8); if((bp->data[bi/8] & m) == 0){ // Is block free? bp->data[bi/8] |= m; // Mark block in use. log_write(bp); brelse(bp); bzero(dev, b + bi); return b + bi; } } brelse(bp); } panic("balloc: out of blocks"); } ``` ### - bfree() ```c= // Free a disk block. static void bfree(int dev, uint b) { struct buf *bp; struct superblock sb; int bi, m; readsb(dev, &sb); bp = bread(dev, BBLOCK(b, sb.ninodes)); bi = b % BPB; m = 1 << (bi % 8); if((bp->data[bi/8] & m) == 0) panic("freeing free block"); bp->data[bi/8] &= ~m; log_write(bp); brelse(bp); } ``` --- ### Inode layer - Inode : Two related meanings - The on-disk data structure containing a file’s size and list of data block numbers. ( dinode ) - An in-memory inode, which contains a copy of the on-disk inode as well as extra information needed within the kernel. ( inode ) - All of the on-disk inodes are packed into a contiguous area of disk called the inode blocks. ( like an array ) - Because of the same size of inode, we can give a number n to easily find the nth inode on the disk. - Struct DInode ```c= // On-disk inode structure struct dinode { short type; // File type short major; // Major device number (T_DEV only) short minor; // Minor device number (T_DEV only) short nlink; // Number of links to inode in file system uint size; // Size of file (bytes) uint addrs[NDIRECT+1]; // Data block addresses }; ``` - DInode : - The main function : The type field distinguishes between files, directories, and special files (devices). - Nlink : Counts the number of directory entries that refer to this inode, in order to recognize when the on-disk inode and its data blocks should be freed. - Size : Records the number of bytes of content in the file. - Addrs : Array records the block numbers of the disk blocks holding the file’s content. - Struct Inode ```c= // in-memory copy of an inode struct inode { uint dev; // Device number uint inum; // Inode number int ref; // Reference count int flags; // I_BUSY, I_VALID short type; // copy of disk inode short major; short minor; short nlink; uint size; uint addrs[NDIRECT+1]; }; #define I_BUSY 0x1 #define I_VALID 0x2 ``` - Inode : - The main function : The kernel keeps the set of active inodes in memory. - **struct inode** is the in-memory copy of a **struct dinode** on disk. - Timing : The kernel stores an inode in memory only if there are C pointers referring to that inode. - Ref : Counts the number of C pointers referring to the in-memory inode, and the kernel discards the inode from memory if the reference count drops to zero. - Iget / Iput : Acquire and release pointers to an inode, modifying the reference count. --- ### Code: Inodes ### - ialloc() ```c= //PAGEBREAK! // Allocate a new inode with the given type on device dev. // A free inode has a type of zero. struct inode* ialloc(uint dev, short type) { int inum; struct buf *bp; struct dinode *dip; struct superblock sb; readsb(dev, &sb); for(inum = 1; inum < sb.ninodes; inum++){ bp = bread(dev, IBLOCK(inum)); dip = (struct dinode*)bp->data + inum%IPB; if(dip->type == 0){ // a free inode memset(dip, 0, sizeof(*dip)); dip->type = type; log_write(bp); // mark it allocated on the disk brelse(bp); return iget(dev, inum); } brelse(bp); } panic("ialloc: no inodes"); } ``` ### - iget() ```c= // Find the inode with number inum on device dev // and return the in-memory copy. Does not lock // the inode and does not read it from disk. static struct inode* iget(uint dev, uint inum) { struct inode *ip, *empty; acquire(&icache.lock); // Is the inode already cached? empty = 0; for(ip = &icache.inode[0]; ip < &icache.inode[NINODE]; ip++){ if(ip->ref > 0 && ip->dev == dev && ip->inum == inum){ ip->ref++; release(&icache.lock); return ip; } if(empty == 0 && ip->ref == 0) // Remember empty slot. empty = ip; } // Recycle an inode cache entry. if(empty == 0) panic("iget: no inodes"); ip = empty; ip->dev = dev; ip->inum = inum; ip->ref = 1; ip->flags = 0; release(&icache.lock); return ip; } ``` ### - ilock() / iunlock() ```c= // Lock the given inode. // Reads the inode from disk if necessary. void ilock(struct inode *ip) { struct buf *bp; struct dinode *dip; if(ip == 0 || ip->ref < 1) panic("ilock"); acquire(&icache.lock); while(ip->flags & I_BUSY) sleep(ip, &icache.lock); ip->flags |= I_BUSY; release(&icache.lock); if(!(ip->flags & I_VALID)){ bp = bread(ip->dev, IBLOCK(ip->inum)); dip = (struct dinode*)bp->data + ip->inum%IPB; ip->type = dip->type; ip->major = dip->major; ip->minor = dip->minor; ip->nlink = dip->nlink; ip->size = dip->size; memmove(ip->addrs, dip->addrs, sizeof(ip->addrs)); brelse(bp); ip->flags |= I_VALID; if(ip->type == 0) panic("ilock: no type"); } } // Unlock the given inode. void iunlock(struct inode *ip) { if(ip == 0 || !(ip->flags & I_BUSY) || ip->ref < 1) panic("iunlock"); acquire(&icache.lock); ip->flags &= ~I_BUSY; wakeup(ip); release(&icache.lock); } ``` ### - iput() ```c= // Drop a reference to an in-memory inode. // If that was the last reference, the inode cache entry can // be recycled. // If that was the last reference and the inode has no links // to it, free the inode (and its content) on disk. void iput(struct inode *ip) { acquire(&icache.lock); if(ip->ref == 1 && (ip->flags & I_VALID) && ip->nlink == 0){ // inode has no links: truncate and free inode. if(ip->flags & I_BUSY) panic("iput busy"); ip->flags |= I_BUSY; release(&icache.lock); itrunc(ip); ip->type = 0; iupdate(ip); acquire(&icache.lock); ip->flags = 0; wakeup(ip); } ip->ref--; release(&icache.lock); } ``` --- ### Code: Inode content ![](https://i.imgur.com/PwrdlyI.png) ### - bmap() ```c= //PAGEBREAK! // Inode content // // The content (data) associated with each inode is stored // in blocks on the disk. The first NDIRECT block numbers // are listed in ip->addrs[]. The next NINDIRECT blocks are // listed in block ip->addrs[NDIRECT]. // Return the disk block address of the nth block in inode ip. // If there is no such block, bmap allocates one. static uint bmap(struct inode *ip, uint bn) { uint addr, *a; struct buf *bp; if(bn < NDIRECT){ if((addr = ip->addrs[bn]) == 0) ip->addrs[bn] = addr = balloc(ip->dev); return addr; } bn -= NDIRECT; if(bn < NINDIRECT){ // Load indirect block, allocating if necessary. if((addr = ip->addrs[NDIRECT]) == 0) ip->addrs[NDIRECT] = addr = balloc(ip->dev); bp = bread(ip->dev, addr); a = (uint*)bp->data; if((addr = a[bn]) == 0){ a[bn] = addr = balloc(ip->dev); log_write(bp); } brelse(bp); return addr; } panic("bmap: out of range"); } ``` --- ### Code: directory layer ### - Strucrure ```c= // Directory is a file containing a sequence of dirent structures. #define DIRSIZ 14 struct dirent { ushort inum; char name[DIRSIZ]; }; ``` ### - dirlookup() ```c= // Look for a directory entry in a directory. // If found, set *poff to byte offset of entry. struct inode* dirlookup(struct inode *dp, char *name, uint *poff) { uint off, inum; struct dirent de; if(dp->type != T_DIR) panic("dirlookup not DIR"); for(off = 0; off < dp->size; off += sizeof(de)){ if(readi(dp, (char*)&de, off, sizeof(de)) != sizeof(de)) panic("dirlink read"); if(de.inum == 0) continue; if(namecmp(name, de.name) == 0){ // entry matches path element if(poff) *poff = off; inum = de.inum; return iget(dp->dev, inum); } } return 0; } ``` ### - dirlink() ```c= // Write a new directory entry (name, inum) into the directory dp. int dirlink(struct inode *dp, char *name, uint inum) { int off; struct dirent de; struct inode *ip; // Check that name is not present. if((ip = dirlookup(dp, name, 0)) != 0){ iput(ip); return -1; } // Look for an empty dirent. for(off = 0; off < dp->size; off += sizeof(de)){ if(readi(dp, (char*)&de, off, sizeof(de)) != sizeof(de)) panic("dirlink read"); if(de.inum == 0) break; } strncpy(de.name, name, DIRSIZ); de.inum = inum; if(writei(dp, (char*)&de, off, sizeof(de)) != sizeof(de)) panic("dirlink"); return 0; } ``` --- ### Code: Path names - namei() - Evaluates path and returns the corresponding inode. - nameiparent() - It stops before the last element, returning the inode of the parent directory and copying the final element into name. ### - namei() / nameiparent() ```c= struct inode* namei(char *path) { char name[DIRSIZ]; return namex(path, 0, name); } struct inode* nameiparent(char *path, char *name) { return namex(path, 1, name); } // Look up and return the inode for a path name. // If parent != 0, return the inode for the parent and copy the final // path element into name, which must have room for DIRSIZ bytes. static struct inode* namex(char *path, int nameiparent, char *name) { struct inode *ip, *next; if(*path == '/') ip = iget(ROOTDEV, ROOTINO); else ip = idup(proc->cwd); while((path = skipelem(path, name)) != 0){ ilock(ip); if(ip->type != T_DIR){ iunlockput(ip); return 0; } if(nameiparent && *path == '\0'){ // Stop one level early. iunlock(ip); return ip; } if((next = dirlookup(ip, name, 0)) == 0){ iunlockput(ip); return 0; } iunlockput(ip); ip = next; } if(nameiparent){ iput(ip); return 0; } return ip; } ``` --- ### File descriptor layer - One of the cool aspect of the Unix interface is that most resources in Unix are represented as a **file**, including **devices** such as the **console**, **pipes**, and of course, **real files**. The file descriptor layer is the layer that achieves this uniformity. - Structure ```c= struct file { enum { FD_NONE, FD_PIPE, FD_INODE } type; int ref; // reference count char readable; char writable; struct pipe *pipe; struct inode *ip; uint off; }; ``` --- ### Code: System calls - 最上一層就是系統調用,讀(),寫()之類,以及相關的實現。 ### - sys_open() ```c= int sys_open(void) { char *path; int fd, omode; struct file *f; struct inode *ip; if(argstr(0, &path) < 0 || argint(1, &omode) < 0) return -1; if(omode & O_CREATE){ begin_trans(); ip = create(path, T_FILE, 0, 0); commit_trans(); if(ip == 0) return -1; } else { if((ip = namei(path)) == 0) return -1; ilock(ip); if(ip->type == T_DIR && omode != O_RDONLY){ iunlockput(ip); return -1; } } if((f = filealloc()) == 0 || (fd = fdalloc(f)) < 0){ if(f) fileclose(f); iunlockput(ip); return -1; } iunlock(ip); f->type = FD_INODE; f->ip = ip; f->off = 0; f->readable = !(omode & O_WRONLY); f->writable = (omode & O_WRONLY) || (omode & O_RDWR); return fd; } ``` ![](https://i.imgur.com/dPxwUt0.png) ### - sys_mkdir() ```c= int sys_mkdir(void) { char *path; struct inode *ip; begin_trans(); if(argstr(0, &path) < 0 || (ip = create(path, T_DIR, 0, 0)) == 0){ commit_trans(); return -1; } iunlockput(ip); commit_trans(); return 0; } ``` ### - sys_mknod() ```c= int sys_mknod(void) { struct inode *ip; char *path; int len; int major, minor; begin_trans(); if((len=argstr(0, &path)) < 0 || argint(1, &major) < 0 || argint(2, &minor) < 0 || (ip = create(path, T_DEV, major, minor)) == 0){ commit_trans(); return -1; } iunlockput(ip); commit_trans(); return 0; } ``` ### - sys_chdir() ```c= int sys_chdir(void) { char *path; struct inode *ip; if(argstr(0, &path) < 0 || (ip = namei(path)) == 0) return -1; ilock(ip); if(ip->type != T_DIR){ iunlockput(ip); return -1; } iunlock(ip); iput(proc->cwd); proc->cwd = ip; return 0; } ``` ### - sys_exec() ```c= int sys_exec(void) { char *path, *argv[MAXARG]; int i; uint uargv, uarg; if(argstr(0, &path) < 0 || argint(1, (int*)&uargv) < 0){ return -1; } memset(argv, 0, sizeof(argv)); for(i=0;; i++){ if(i >= NELEM(argv)) return -1; if(fetchint(uargv+4*i, (int*)&uarg) < 0) return -1; if(uarg == 0){ argv[i] = 0; break; } if(fetchstr(uarg, &argv[i]) < 0) return -1; } return exec(path, argv); } ``` ### - sys_pipe() ```c= int sys_pipe(void) { int *fd; struct file *rf, *wf; int fd0, fd1; if(argptr(0, (void*)&fd, 2*sizeof(fd[0])) < 0) return -1; if(pipealloc(&rf, &wf) < 0) return -1; fd0 = -1; if((fd0 = fdalloc(rf)) < 0 || (fd1 = fdalloc(wf)) < 0){ if(fd0 >= 0) proc->ofile[fd0] = 0; fileclose(rf); fileclose(wf); return -1; } fd[0] = fd0; fd[1] = fd1; return 0; } ``` ### - create() ```c= static struct inode* create(char *path, short type, short major, short minor) { uint off; struct inode *ip, *dp; char name[DIRSIZ]; if((dp = nameiparent(path, name)) == 0) return 0; ilock(dp); if((ip = dirlookup(dp, name, &off)) != 0){ iunlockput(dp); ilock(ip); if(type == T_FILE && ip->type == T_FILE) return ip; iunlockput(ip); return 0; } if((ip = ialloc(dp->dev, type)) == 0) panic("create: ialloc"); ilock(ip); ip->major = major; ip->minor = minor; ip->nlink = 1; iupdate(ip); if(type == T_DIR){ // Create . and .. entries. dp->nlink++; // for ".." iupdate(dp); // No ip->nlink++ for ".": avoid cyclic ref count. if(dirlink(ip, ".", ip->inum) < 0 || dirlink(ip, "..", dp->inum) < 0) panic("create dots"); } if(dirlink(dp, name, ip->inum) < 0) panic("create: dirlink"); iunlockput(dp); return ip; } ``` ---

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully