# The Virtual Filesystem kernel開給user-space用的file interface. ![](https://i.imgur.com/IspF6tF.png) 往下cover掉各種不同的file system or underlying physical medium. 往上給使用者提供open(),read(),write()等等system call,使用者不會察覺就算底下硬體有什麼不一樣,簡單的使用上層的wrapper就可以對file作讀寫。 * 就算底層各種file system如何翻新或變動,上層寫好的程式也不用改 ![](https://i.imgur.com/QJj6HgW.png) # Unix Filesystems * files (file_system_type) * is an ordered string of bytes. * first byte marks the beginning of the file * last byte marks the end of the file. * operations * read * write * create * delete * mount * inodes(index node) (file metadata) * access permissions * size * owner * creation time * ...... * directory * 裡面放一堆files * subdirectories * directory自己也是file * mount points (vfsmount struct) * Unix把filesystems mount在namepsace * Windows把namespace拆分成 * C: * D: (放A片的地方) :) * superblock * data structure containing information about the filesytem as a whole. # VFS Objects and Their Data Structures (用C語言寫出來的OOP架構,所以以下很多稱objects) the four primary object types of the VFS are * superblock * mounted filesystem * inode * specific file * dentry * directory entry * which is a single component of a path * file * an open file as associated with a process * operations object * super_operations * invoke on a specific filesytem * write_inode() * sync_fs() * inode_operations * invoke on a specific file * create() * link() * dentry_operations\ * invoke on a specific directory entry * d_compare() * d_delete() * file_operations * invoke on an open file * read() * write() # The Superblock Object (filesystem control block) 存放在disk上某個sector內 Filesystems 不是描述disk-based的 是放給VM(virtual memory based)看得 * ex sysfs * super_block struct ![](https://i.imgur.com/TW3UwAR.png) ![](https://i.imgur.com/Z5rNFYi.png) ![](https://i.imgur.com/Gxw1mFf.png) 裡面最重要的是 s_op這個 objects,代表了所有對superblock的操作 # Superblock Operations 裡面全都是function pointer ![](https://i.imgur.com/Uov2MT9.png) ![](https://i.imgur.com/O7ZhPcb.png) ex: a filesystem want to write to its superblock, ![](https://i.imgur.com/KqcPRNz.png) sb: a pointer to the filesystem's superblock 這是C語言不support oop所以很醜 (沒有繼承觀念,也沒有override觀念) in c++就可以直接這樣做了 ![](https://i.imgur.com/kV8bnsH.png) (因為有繼承) In C, there is no way for the method to easily obtain its parent, so you have to pass it super_operations function實際用途如下 ![](https://i.imgur.com/7d09MhH.png) ![](https://i.imgur.com/AQBx1s3.png) ![](https://i.imgur.com/kj0boXh.png) # The Inode Object inode提供了kernel操作file跟directory的一切資訊 (如果某個filesystem沒有inodes架構,則需要額外跟kernel充分溝通你的file介面長什麼樣,要定protocal來轉換成inode) inode就是file的描述啦,但inode只會在files are accessed時才會被construct in memory,包括special files, such as device files or pipes. ![](https://i.imgur.com/qrquYVT.png) ![](https://i.imgur.com/e1o7o8b.png) ![](https://i.imgur.com/iTKhiUc.png) 下面這三種type用union存因為一個file(inode)同一時間只會有一種形態 * i_pipe: * points to a named pepe data structure * i_bdev: * points to a block device structure * i_cdev: * points to a character device structure. # Inode Operations just like superblock operations inode operations are like this ![](https://i.imgur.com/1hQchYo.png) ![](https://i.imgur.com/LpUpCg3.png) ![](https://i.imgur.com/JfqKaks.png) operations詳解 ![](https://i.imgur.com/LVVxAfp.png) ![](https://i.imgur.com/cWc2Bi7.png) ![](https://i.imgur.com/BlUNIWm.png) ![](https://i.imgur.com/HMdDnNV.png) ![](https://i.imgur.com/wkOrJHq.png) # The Dentry Object directroies 也是一種file ex: in the path /bin/vi. bin跟vi都是一個file * 但bin是special directory file * vi是 regular file. linux用dentry來描述path上每一個component ex: /bin/vi can break into 3 component * / * bin * vi ![](https://i.imgur.com/nZb7y9S.png) ![](https://i.imgur.com/cgwqykD.png) ![](https://i.imgur.com/9UHLEgz.png) Dentry State * used * 代表一個valid inode * d_inode point to it * d_count 指出有多少users use it. * unused * 也代表一個valid inode * d_count = 0 * currently not used. * negative * is not associated with a valid inode * d_inode = NULL * ex: * 想像有個daemon持續的open read/write一個現在不在memory的config file,open() system call持續回傳ENOENT, 但kernel不會去constructs the path, walks the on-disk directory structure, verifies the file's inexistence. 因為這個成本還是很高的,我們caching這個negative是值得的 # The File Object The file object is used to represent a file opened by a process. * Process是直接跟files object溝通的,而不是透過inodes, dentries, superblocks.. * in-memory representation of an open file. * 呼叫open()的時候被create * 呼叫close()的時候被destroy * 同時間可能會有多個對象存取同一個file * 所以同時間一個file可能會有多個file objects存在 * 這個objects其實就代表process看一個open file的樣子 ![](https://i.imgur.com/15J3GP1.png) ![](https://i.imgur.com/Bq9FSB0.png) ![](https://i.imgur.com/lTSU2PN.png) ![](https://i.imgur.com/ZePFmCf.png) ![](https://i.imgur.com/LrWrEN7.png) Here are the individual operations ![](https://i.imgur.com/15cX7b8.png) ![](https://i.imgur.com/wP5woc3.png) ![](https://i.imgur.com/VhfrKVl.png) ![](https://i.imgur.com/lAhYIyR.png) ![](https://i.imgur.com/6wFynM2.png) ![](https://i.imgur.com/Jg48uce.png) ![](https://i.imgur.com/ci2M700.png) ioctl: * unlocked_ioctl() * 跟ioctl()一樣 * 但是沒有Big kernel Lock版本 * 所以author要自己做好lock機制 * 因為ioctl用了BKL很沒效率所以現在都規定要用unlocked_ioctl()了 * compat_ioctl() * 一樣沒用BKL * provide 32-bits compatible ioctl method for 64-bits systems. * 要注意portable問題 # Data Structures Associated with Filesystems 因為support很多不同的filesystems. kernel必須有一個特別的結構來描述每種filesystems的能力跟行為 這個用來描述不同filesystem間的不同 ![](https://i.imgur.com/wqONu09.png) 這個用來描述mounted instance of a filesystem ![](https://i.imgur.com/R6QUyxb.png) ![](https://i.imgur.com/r2lYNce.png) 他會追蹤filesystem跟所有的mount points之間的關係 ![](https://i.imgur.com/GNgUngq.png) # Data Structures Associated with a Process 每個process都會有自己的一份 * list of open files * root filesystem * current working directory * mount points * ... VFS跟process之間的3種structure * files_struct * task_struct裡的files point to here * ![](https://i.imgur.com/m6csvbK.png) * ![](https://i.imgur.com/jGF7Ish.png) * fs_struct * ![](https://i.imgur.com/sVdc6h9.png) * namespace * ![](https://i.imgur.com/NPPQz2M.png) * ![](https://i.imgur.com/RSWsPAn.png) ![](https://i.imgur.com/M66a0j4.png)