freshLiver
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    --- tags: Linux, FileSystem, Kernel Module --- # VFS - Virtual File System ![](https://i.imgur.com/zzbzY7P.png) ![](https://i.imgur.com/D0ykfoN.png) ## Overview 根據 [Linux Kernel 文件中對 Virtual File System 的描述](https://www.kernel.org/doc/html/latest/filesystems/vfs.html#introduction): > The Virtual File System (also known as the Virtual Filesystem Switch) is the software layer in the kernel that provides the **filesystem interface to userspace programs**. It also provides an abstraction within the kernel which **allows different filesystem implementations to coexist**. 可以知道 VFS 是 System Call 與 File System 間的界面,能夠讓多種不同的檔案系統同時存在於系統中。 而每個檔案系統實際上都是透過 `file_system_type` 這個結構體來描述,已掛載的檔案系統的物件會透過一個鏈結串列 `file_systems` 串在一起,這個列表以全域變數的形式宣告在 `fs/filesystems.c` 中: ```c=34 static struct file_system_type *file_systems; ``` 而這個列表對應的檔案則是 `/proc/filesystems`: ```shell $ cat /proc/filesystems nodev sysfs nodev tmpfs nodev bdev nodev proc nodev cgroup nodev cgroup2 nodev cpuset nodev devtmpfs nodev configfs nodev debugfs nodev tracefs nodev securityfs nodev sockfs ... ``` :::warning nodev 的意義見 [Mounting](#Mounting) 中 `fill_super` 參數的說明 ::: 而檔案系統的操作則主要透過 `super_operations`、`inode_operations`、`file_operations` 等結構體作為 VFS 的界面,當需要對檔案系統進行實作時,再根據檔案系統呼叫各檔案系統中各自的實作函式進行處理。 ![](https://i.imgur.com/AotaTxI.png) :::danger TODO: 加入 superblock、cache、dentry 與流程 ::: ## Inode and Data Block 在檔案系統中,用來儲存檔案的基本單位是 **data block**,對應到實體儲存制裝置中的某個區塊,每個 data block 都有各自的編號。而各個檔案對應到的 data block 以及權限、大小等資訊則會寫入到某些特定的 data block 中,而這些資訊則是透過 **inode** 這個結構體來描述。 :::warning 當一個 data block 不足以儲存完檔案的全部內容時,根據檔案系統的設計,可能會需要將檔案內容切割到多個 data block 中儲存,因此一個檔案可能會對應到多個 data block。 ::: [inode 結構體](https://github.com/torvalds/linux/blob/143a6252e1b8ab424b4b293512a97cca7295c182/include/linux/fs.h#L585)定義在 `include/linux/fs.h` 中,除了負責紀錄檔案的各種資訊之外,還包含了 `i_op` 與 `i_fop` 兩個重要的成員: - `i_op` 對應到定義在 `include/linux/fs.h` 的 `inode_operations` 結構體: ```c struct inode_operations { int (*create) (struct user_namespace *, struct inode *,struct dentry *, umode_t, bool); struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, ... }; ``` 這個結構體是 VFS 的一個界面,讓 rename 或 link 等對 inode 進行操作的系統呼叫能夠透過這個界面呼叫到實際檔案系統的對應函式。 - `i_fop` 則定義在 `include/linux/fs.h` 的 `file_operations` 結構體中: ```c struct file_operations { struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); int (*mmap) (struct file *, struct vm_area_struct *); unsigned long mmap_supported_flags; int (*open) (struct inode *, struct file *); int (*flush) (struct file *, fl_owner_t id); ... }; ``` 與 `inode_operations` 結構體一樣是個 VFS 的一個界面,但主要負責將 `read`、`write` 之類需要對 data block 進行操作的系統呼叫對應到目標檔案系統的實作。 :::info inode ctime ::: ## Directory Entry 如同實際儲存資料的 data blocks 需要由 inode 紀錄,檔名對應到的 inode 也需要有個機制來紀錄。紀錄 inode 的策略能有很多種,像是可以建立一個 table 紀錄檔案與 inode 的對應關係,但 table 大小會隨著檔案數量增加,進而造成記憶體用量也跟著增加。 而不同於單一 table 儲存對應關係,Linux Kernel 採用的是樹狀結構來管理檔名與 inode 間的對應關係,而每個對應關係則是由 **Directory Entry** 來管理,其對應的到[定義在 `include/linux/dcache.h` 中的 `dentry` 結構體](https://github.com/torvalds/linux/blob/4b0986a3613c92f4ec1bdc7f60ec66fea135991f/include/linux/dcache.h#L81): ```c struct dentry { ... struct hlist_bl_node d_hash; struct dentry *d_parent; struct qstr d_name; struct inode *d_inode; unsigned char d_iname[DNAME_INLINE_LEN ... const struct dentry_operations *d_op; struct super_block *d_sb; ... struct list_head d_child; struct list_head d_subdirs; ... }; ``` 如其名 Directory Entry,這個結構體就是用來紀錄目錄 `d_parent` 中檔名 `d_name` 的對應關係,其主要包含了以下幾個資訊: - 所屬目錄的 `dentry` 物件 `d_parent` - 檔案名稱 `d_name` 及其對應的 `inode` 物件 `d_inode` - 目錄中的檔案構成的串列 `d_child` 以及子目錄串列 `d_subdirs`(若其為目錄的話) - 配合雜湊表快速尋找 dentry 的 `d_hash` 節點 - `dentry` 相關操作的結構體 `d_op` 當檔案系統提供的某個功能需要解析檔案路徑時,就能夠透過以上幾個資訊來找到對應的 `inode` 物件,而其大致流程如下(假設檔案存在): 1. 將路徑以 `/` 切割成數個 path components 並根據其決定起始 `dentry` - 絕對路徑:以 `/` 目錄作為起始 `dentry` - 相對路徑:以 `$PWD` 作為起始 `dentry` 2. 是否已找到最後一個 path component 的 `inode` - 是:跳至 5. - 否:跳至 3. 3. 透過 `d_child` 或 `d_subdirs` 尋找下個 path component 的 `dentry` 4. 回到 2. 5. 回傳該路徑對應的 `inode` 物件 :::warning 一個個檢查檔案以及子目錄很沒效率,因此實際上會透過後面會提到的 [Dentry Cache](#Inode-Cache-and-Dentry-Cache) 快速查詢,詳細流程在 [File System Operations](/6apkyfhlQBGvi4f9aDLH8g#Pathname-Lookup) 中說明。 ::: 與 Inode 相似,`dentry` 也有相關的結構體作為界面、負責處理 dentry 相關的操作,定義在 `include/linux/dcache.h` 中: ```c struct dentry_operations { int (*d_revalidate)(struct dentry *, unsigned int); int (*d_weak_revalidate)(struct dentry *, unsigned int); int (*d_hash)(const struct dentry *, struct qstr *); int (*d_compare)(const struct dentry *, unsigned int, const char *, const struct qstr *); int (*d_delete)(const struct dentry *); int (*d_init)(struct dentry *); void (*d_release)(struct dentry *); void (*d_prune)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)(struct dentry *, char *, int); struct vfsmount *(*d_automount)(struct path *); int (*d_manage)(const struct path *, bool); struct dentry *(*d_real)(struct dentry *, const struct inode *); } ``` ## Superblock 有了 Data Block、Inode、Directory Entry 就大致足以描述一個檔案了,但要讓檔案系統運作還是缺少一些資訊,例如 Data Block 的大小、還能用的 Inode 與 Data Block 數量等檔案系統層級的資訊,而這些必要的資訊就要透過 **Superblock** 來紀錄,對應到的資料結構為定義在 `include/linux/fs.h` 中的 `super_block` 結構體: ```c struct super_block { ... unsigned char s_blocksize_bits; unsigned long s_blocksize; loff_t s_maxbytes; /* Max file size */ struct file_system_type *s_type; const struct super_operations *s_op; unsigned long s_magic; struct dentry *s_root; void *s_fs_info; /* Filesystem private info */ char s_id[32]; /* Informational name */ uuid_t s_uuid; /* UUID */ fmode_t s_mode; const struct dentry_operations *s_d_op; /* default d_op for dentries */ struct list_head s_inodes; /* all inodes */ struct list_head s_inodes_wb; /* writeback inodes */ ... }; ``` :::danger TODO: 解釋重要的成員 ::: 而作為檔案系統中最高層級的物件,除了紀錄檔案系統層級的資訊之外,Superblock 還須負責處理掛載檔案系統、分配 `inode` 等檔案系統層級的操作,而這些操作與 `inode` 與 `dentry` 相似,也是透過一個結構體(`s_op`)作為 VFS 的界面與各個檔案系統進行互動: ```c struct super_operations { struct inode *(*alloc_inode)(struct super_block *sb); void (*destroy_inode)(struct inode *); void (*dirty_inode) (struct inode *, int flags); int (*write_inode) (struct inode *, int); void (*drop_inode) (struct inode *); void (*delete_inode) (struct inode *); void (*put_super) (struct super_block *); int (*sync_fs)(struct super_block *sb, int wait); int (*freeze_fs) (struct super_block *); ... }; ``` ## Inode Cache and Dentry Cache 由於 Inode 以及 Dentry 都是不可隨著斷電消失的資料,因此必須除存在儲存裝置中,但每次存取檔案時都要從儲存裝置中讀取它們的話,很明顯的會造成效能上的問題,但也不能每次掛載檔案系統時就將所有資訊保存在記憶體中,否則會造成記憶體浪費的問題。因此 VFS 提供了 Cache 的機制作為折衷,讓近期存取過的 Inode 以及 Dentry 可以儲存在記憶體中,以便快速存取。 而這個機制會透過分別定義在 `fs/inode.c` 與 `fs/dcache.c` 中的 `inode_hashtable` 與 `dentry_hashtable` 兩個雜湊表管理。以 dcache 為例,作為雜湊表節點的 `dentry->d_hash` 能夠透過 `d_add` 或 `d_drop` 等函式將其加入或移除雜湊表,而在 `fs/namei.c` 中定義的函數 `d_lookup` 則負責根據路徑 `name` 尋找對應的 dentry,其實作如下: ```c= struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name) { unsigned int hash = name->hash; struct hlist_bl_head *b = d_hash(hash); struct hlist_bl_node *node; struct dentry *found = NULL; struct dentry *dentry; rcu_read_lock(); hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) { if (dentry->d_name.hash != hash) continue; spin_lock(&dentry->d_lock); if (dentry->d_parent != parent) goto next; if (d_unhashed(dentry)) goto next; if (!d_same_name(dentry, parent, name)) goto next; dentry->d_lockref.count++; found = dentry; spin_unlock(&dentry->d_lock); break; next: spin_unlock(&dentry->d_lock); } rcu_read_unlock(); return found; } ``` 可以看到在第 4 行就透過 `d_hash(hash)` 找出雜湊表中對應的 bucket,接著再透過第 11 行的 `hlist_bl_for_each_entry_rcu` 巨集依序檢查 bucket 中的節點。 :::info TODO: Inode Cache 是否也相似?Size 上限? ::: --- ## Filesystem 而 `file_system_type` 結構體則定義在 [`include/linux/fs.h`](https://github.com/torvalds/linux/blob/eaea45fc0e7b6ae439526b4a41d91230c8517336/include/linux/fs.h#L2385) 中,用來儲存檔案系統的資訊以及操作時需要的各種物件: ```c struct file_system_type { const char *name; int fs_flags; #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 #define FS_HAS_SUBTYPE 4 #define FS_USERNS_MOUNT 8 /* Can be mounted by userns root */ #define FS_DISALLOW_NOTIFY_PERM 16 /* Disable fanotify permission events */ #define FS_ALLOW_IDMAP 32 /* FS has been updated to handle vfs idmappings. */ #define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */ int (*init_fs_context)(struct fs_context *); const struct fs_parameter_spec *parameters; struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; struct hlist_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; struct lock_class_key s_vfs_rename_key; struct lock_class_key s_writers_key[SB_FREEZE_LEVELS]; struct lock_class_key i_lock_key; struct lock_class_key i_mutex_key; struct lock_class_key invalidate_lock_key; struct lock_class_key i_mutex_dir_key; }; ``` 大致上可以分成檔案系統資訊以及 lock 三個部份,其中檔案系統資訊的部份包含了: - `const char *name` 檔案系統的名稱,例如 [`ext4`](https://github.com/torvalds/linux/blob/master/fs/ext4/super.c#L7118)、[`NTFS`](https://github.com/torvalds/linux/blob/master/fs/ntfs/super.c#L3053)。 - `struct module *owner` 實際指向檔案系統核心模組的指標,多數情況下就是各核心模組的 `THIS_MODULE`。 - `struct dentry *(*mount)` 指向掛載檔案系統用的函式的指標,目標函式的參數有三個: - `struct file_system_type *fs_type`: 掛載裝置的類型 - `int flags`: 掛載時的 flags - `const char * dev_name`: 掛載裝置的名稱 - `void *data`: 掛載時的參數字串 (參數列表見 [Mount Options](https://www.kernel.org/doc/html/latest/filesystems/vfs.html#mount-options)) 回傳值則會掛載點的 Directory Entry。 - `void (*kill_sb) (struct super_block *)` :::danger TODO: 卸載檔案系統時會呼叫的函式 ::: - `struct hlist_head fs_supers` :::danger TODO: 由 Superblock 組成的串列 ::: - `const struct fs_parameter_spec *parameters` - `struct file_system_type * next` 指向 `file_systems` 串列中下一個檔案系統的指標。 ## Registering and Mounting a Filesystem ### Initializing Module 為了讓系統能夠認知到檔案系統的存在,需要使用 [`include/linux/fs.h`](https://github.com/torvalds/linux/blob/eaea45fc0e7b6ae439526b4a41d91230c8517336/include/linux/fs.h#L2462) 中宣告的函式: ```c #include <linux/fs.h> extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *); ``` 而這兩個函式並不是在 mount 時呼叫的,而是透過 `module_init` 以及 `module_exit` 兩個巨集在掛載以及卸載核心模組時呼叫,例如在 ext4 檔案系統中是由定義在 [`fs/ext4/super.c`](https://github.com/torvalds/linux/blob/4b0986a3613c92f4ec1bdc7f60ec66fea135991f/fs/ext4/super.c#L7229) 中的 `ext4_init_fs` 以及 `ext4_exit_fs` 兩個函式分別處理: ```c=7229 module_init(ext4_init_fs) module_exit(ext4_exit_fs) ``` 而這兩個函式則會分別呼叫到 `register_filesystem` 以及 `unregister_filesystem` 這兩個函式: ```c static int __init ext4_init_fs(void) { ... err = register_filesystem(&ext4_fs_type); if (err) goto out; return 0; ... } static void __exit ext4_exit_fs(void) { ... unregister_filesystem(&ext4_fs_type); ... } ``` ### Registering 定義在 [`fs/filesystems.c`](https://github.com/torvalds/linux/blob/master/fs/filesystems.c#L72) 中的 `register_filesystem` 負責將一個尚未掛載的檔案系統的物件加入到 `file_systems` 串列中,其實作如下: ```c= int register_filesystem(struct file_system_type * fs) { int res = 0; struct file_system_type ** p; if (fs->parameters && !fs_validate_description(fs->name, fs->parameters)) return -EINVAL; BUG_ON(strchr(fs->name, '.')); if (fs->next) return -EBUSY; write_lock(&file_systems_lock); p = find_filesystem(fs->name, strlen(fs->name)); if (*p) res = -EBUSY; else *p = fs; write_unlock(&file_systems_lock); return res; } ``` :::warning 可以注意到檔案系統的名稱中不應包含 `.` 這個字元,否則在第 10 行的 `BUG_ON` 檢查檔案系統的名稱時會發生錯誤。 ::: :::info 為什麼需要這個限制? pathlookup? ::: 首先會透過定義在 `fs/fs_parser.c` 中的 `fs_validate_description` 檢查是否有重複的參數: ```c bool fs_validate_description(const char *name, const struct fs_parameter_spec *desc) { const struct fs_parameter_spec *param, *p2; bool good = true; for (param = desc; param->name; param++) { /* Check for duplicate parameter names */ for (p2 = desc; p2 < param; p2++) { if (strcmp(param->name, p2->name) == 0) { if (is_flag(param) != is_flag(p2)) continue; pr_err("VALIDATE %s: PARAM[%s]: Duplicate\n", name, param->name); good = false; } } } return good; } ``` 接著才會呼叫 [`find_filesystem()`](https://github.com/torvalds/linux/blob/eaea45fc0e7b6ae439526b4a41d91230c8517336/fs/filesystems.c#L49) 依序從 `file_systems` 串列中檢查是否有已掛載相同名稱的檔案系統: ```c static struct file_system_type **find_filesystem(const char *name, unsigned len) { struct file_system_type **p; for (p = &file_systems; *p; p = &(*p)->next) if (strncmp((*p)->name, name, len) == 0 && !(*p)->name[len]) break; return p; } ``` 由於 `file_systems` 是 Singly-linked List,所以若是名稱為 `name` 的檔案系統尚未被掛載的話,`*p` 就會是 `NULL`;若有重複的檔案系統的話,`*p` 則會指向該檔案系統的物件,即不為 `NULL`,用來代表錯誤。 :::warning 若只是要單純的走訪 `file_systems` 並檢查是否重複的話,不需要使用到指標的指標。但在 `find_filesystem` 裡巧妙的利用了指標的指標,若是沒找到相同的檔案系統的話,`p` 最終會指向最後一個節點的結構體的 `next` 成員的地址,然後將其回傳,接著在 `register_filesystem` 的第 18 行將目標檔案系統的物件加入到 `file_systems` 串列的尾端。 ::: 比較需要注意的是由於 `strcmp` 不安全,所以要改用 `strncmp` 來比較,而由於只有前 n 個字元相同並無法保證兩個字串完全相同,所以還需要額外檢查 `(*p)->name` 是否也到結尾了。 :::danger 在 `register_filesystem` 呼叫 `find_filesystem` 時,第二個參數是用 `strlen` 取得檔案系統名稱的長度。 但比起透過參數傳遞長度,在 `find_filesystem` 中再用 `strlen(name)` 取得名稱長度應該比較安全。 ::: ### Unregistering 相對於負責將檔案系統加入到 `file_systems` 串列中的 `register_filesystem`,`unregister_filesystem` 則會在卸載檔案系統核心模組時將對應的檔案系統的物件從 `file_systems` 串列中移除,其實做如下: ```c int unregister_filesystem(struct file_system_type * fs) { struct file_system_type ** tmp; write_lock(&file_systems_lock); tmp = &file_systems; while (*tmp) { if (fs == *tmp) { *tmp = fs->next; fs->next = NULL; write_unlock(&file_systems_lock); synchronize_rcu(); return 0; } tmp = &(*tmp)->next; } write_unlock(&file_systems_lock); return -EINVAL; } ``` 由於 `file_systems` 串列使用的是單向鏈結串列,所以必須透過迴圈從頭開始一一進行比對。 ### Mounting ```c struct file_system_type { const char *name; int fs_flags; struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); void (*kill_sb) (struct super_block *); struct module *owner; struct file_system_type * next; struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; ... }; ``` 而在掛載檔案系統時,則會呼叫 `mount` 指向的函式,而掛載檔案系統時要做的事主要有以下幾個: 1. 尋找 Superblock :::danger 通常為 [`mount_bdev`](https://github.com/torvalds/linux/blob/143a6252e1b8ab424b4b293512a97cca7295c182/fs/super.c#L1313)、[`mount_nodev`](https://github.com/torvalds/linux/blob/143a6252e1b8ab424b4b293512a97cca7295c182/fs/super.c#L1403) 或 [`mount_single`](https://github.com/torvalds/linux/blob/143a6252e1b8ab424b4b293512a97cca7295c182/fs/super.c#L1453) 三種通用的掛載函式,分別代表 Block Device 而在掛載函式中會嘗試透過 [`sget`](https://github.com/torvalds/linux/blob/143a6252e1b8ab424b4b293512a97cca7295c182/fs/super.c#L576) 尋找儲存裝置中的 Superblock, ::: 2. 透過 `fill_super` 初始化 Superblock 在掛載函式中會有一個 function pointer 的參數 `fill_super`,會根據不同的檔案系統使用不同的初始化函式,在找到 Superblock 之後就會透過這個函式對 Superblock 進行初始化。 * mount_bdev() 將檔案系統掛載到實體裝置上,比如說硬碟上。 * mount_single() 將多個掛載操作將共享在同一個檔案系統實例 (instance)。 * mount_nodev() 將檔案系統掛載到非實體裝置上。 不同的檔案系統的 `fill_super()` 會有些許的不同,不過最主要都有對 Superblock 做初始化的功能,比如 [ramfs 檔案系統](https://www.kernel.org/doc/html/latest/filesystems/ramfs-rootfs-initramfs.html): ```c static int ramfs_fill_super(struct super_block *sb, void *data, int silent) { struct ramfs_fs_info *fsi; struct inode *inode; int err; save_mount_options(sb, data); fsi = kzalloc(sizeof(struct ramfs_fs_info), GFP_KERNEL); sb->s_fs_info = fsi; if (!fsi) return -ENOMEM; err = ramfs_parse_options(data, &fsi->mount_opts); if (err) return err; sb->s_maxbytes = MAX_LFS_FILESIZE; sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; sb->s_magic = RAMFS_MAGIC; sb->s_op = &ramfs_ops; sb->s_time_gran = 1; inode = ramfs_get_inode(sb, NULL, S_IFDIR | fsi->mount_opts.mode, 0); sb->s_root = d_make_root(inode); if (!sb->s_root) return -ENOMEM; return 0; } ``` 3. 回傳掛載點 而 `mount` 的回傳值則會是掛載點的 Directory Entry,如同 [linux/super.c](https://github.com/torvalds/linux/blob/master/fs/super.c#L1416) 中的 `mount_bdev()` ```c struct dentry *mount_nodev(struct file_system_type *fs_type, int flags, void *data, int (*fill_super)(struct super_block *, void *, int)) { int error; struct super_block *s = sget(fs_type, NULL, set_anon_super, flags, NULL); if (IS_ERR(s)) return ERR_CAST(s); error = fill_super(s, data, flags & SB_SILENT ? 1 : 0); if (error) { deactivate_locked_super(s); return ERR_PTR(error); } s->s_flags |= SB_ACTIVE; return dget(s->s_root); } EXPORT_SYMBOL(mount_nodev); ``` ### Unmount 卸載檔案系統則是將整個檔案系統當中的 Superblock 註銷掉,主要會用到的函式為 `kill_block_super`,`kill_anon_super` 和 `kill_litter_super`。 * `kill_block_super` 為將掛載在 block device 上的檔案系統卸載 * `kill_anon_super` 為將虛擬檔案系統 ( virtual file system ) 卸載 * `kill_litter_super` 為卸載不在實體裝置上的檔案系統(例如:記憶體) 上述三個函式中主要註銷 Superblock 的函式為 [fs/super.c](https://github.com/torvalds/linux/blob/master/fs/super.c#L467) 中的`generic_shutdown_super`,而作為註銷 Superblock 的函式,其功能就在釋放結構體 Superblock 的成員的記憶體空間。 ``` * generic_shutdown_super - common helper for ->kill_sb() * @sb: superblock to kill * * generic_shutdown_super() does all fs-independent work on superblock * shutdown. Typical ->kill_sb() should pick all fs-specific objects * that need destruction out of superblock, call generic_shutdown_super() * and release aforementioned objects. Note: dentries and inodes _are_ * taken care of and do not need specific handling. ``` ## Buffer Cache Buffer cache 提供一個緩存空間 (buffer block) 來儲存或寫入儲存設備 (block device),可以視將其視為 page cache 和儲存設備的溝通橋樑,靠著映射兩者之間的位址來做資料轉換,其儲存或寫入的單位為 block,buffer cache 在 Linux 內主要是由 [linux/buffer_head.h](https://github.com/torvalds/linux/blob/master/include/linux/buffer_head.h) 來運作 ```c struct buffer_head { unsigned long b_state; /* buffer state bitmap (see above) */ struct buffer_head *b_this_page;/* circular list of page's buffers */ union { struct page *b_page; /* the page this bh is mapped to */ struct folio *b_folio; /* the folio this bh is mapped to */ }; sector_t b_blocknr; /* start block number */ size_t b_size; /* size of mapping */ char *b_data; /* pointer to data within the page */ struct block_device *b_bdev; bh_end_io_t *b_end_io; /* I/O completion */ void *b_private; /* reserved for b_end_io */ struct list_head b_assoc_buffers; /* associated with another mapping */ struct address_space *b_assoc_map; /* mapping this buffer is associated with */ atomic_t b_count; /* users using this buffer_head */ spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to * serialise IO completion of other * buffers in the page */ }; ``` 下面是一些重要成員的解釋 * `char *b_data` 為 page 內資料的地址,地址的開頭常作為映射的起點 * `sector_t b_blocknr` 為要映射的第幾個 block * `struct block_device *b_bdev` 為要儲存或寫入的裝置

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully