Sean20405
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # OS Storage Management ## Ch. 10 File-System Interface - File concept - file 是個抽象的概念,實際上是一塊連續的記憶體空間 - UNIX file type - regular file - device file (device node) `/dev/*` - directory - link - File attr. - name - identifier:每個檔案的 ID(e.g. inode number in UNIX) - type - regular, directory, link, ...(給系統看的) - .c, .exe, .bat(給 user 看的,UNIX 不管) - location - size - protection (permission) - timestamp, date, user identification(檔案擁有者和其 group) ### Regular File - open / close file - opened file 會需要存一些資訊 - file ptr: ptr to last read / write location - file open cnt:當前有幾個 process 在讀他 - 用途之一:管理何時能把檔案從 open file table 移掉 - e.g. 「安全移除 USB 裝置」前會檢查是否有檔案 cnt > 0 - disk location of the file - access right - file system 在讀檔時會 cache 這些 metadata,而關檔時會把有修改的資料 flush 回 disk - binary vs. text - `fopen("filename", "r+t")` or `fopen("filename", "r+b")` - binary mode - byte stream - OS 不會對檔案做任何解釋,直接 byte by byte 原封不動讀進來 - text mode - 控制字元 - OS 會針對一些特殊字元作解釋 - e.g. EOF: Ctrl + Z (`1A`) - e.g. newline: `\r\n` (`0D 0A`, Windows) or `\n` (`0A`, UNIX) - string 切割 - 用 `\n` 切割 - 不同 OS 有不同行為 ### Device Node - 通常在 /dev 底下 - 用 `mknode` 建新的 node,需要提供 device major-minor number - major number 是給設備的 - minor 是給裝置內部的邏輯單元 e.g. partition - device driver 在驅動的時候會向 kernel 去註冊一組他們的 major-minor number - `open()`、`close()` 一個 device node 就會用 driver 先前 register 的 major-minor number 去找 - 但一般 user 不會這樣做,會破壞檔案系統的資料結構 - 一些系統工具如 `fdisk`、`mkfs`(格式化)才會用到 ### Link (File Aliasing) - soft link - just string substitution - 跟 file system 本身無關 - 把輸入代換成 link 到的字串,再開一次 - Linux: `ln -s`; Windows: `junction.exe` - hard link - file system internal function,需要 FS 有支援 - A link file that refers to the target file using file system internal location information (inode) - inode 上會存一個 cnt 紀錄當前的 link 數 - Use `link()` to create a hard link file - Use `unlink()` to delete (link) file - 直到 cnt 降到 0 才能把 inode 真的刪掉 - Linux: `ln`; Windows: `fsutil hardlink create` - hardlink cnt:資料夾的 hard link cnt 為 $2+n$ - 一個從 parent 來,一個是自己指向自己 `.`,$n$ 個是 sub-directory 的 `..` - problem - Back up - hard link 在備份的時候檔案會被複製很多份 - 可以用 `cp -a` 或 `rsync` 可以盡可能解決這些問題(他們不能判斷太深的檔案) - Loop - link 構成一個環 - Cause by hard link - Sol 1: 直接禁止 hard link 到資料夾(in recent UNIX) - Sol 2: 每次創建的時候都去檢查會不會產生 cycle(較少用) - Caused by soft link - Sol 1 (Linux):keep a time-to-live conter,最多只能走 40 層 - Sol 2 (Windows):限制最大路徑長度 ~ 260 chars - Dangling pointer - Link 的目標已不存在 - soft link:不處理,反正字串代換完檔案系統去開檔案的時候就會發現不存在了 - hard link:Ref count,無法刪除只能 unlink ### Create a hard drive 1. partition - `fdisk` 或其他 GUI utils. - partition 可以被格式化為不同檔案系統,也可以保留給 swap - 硬碟前有個 partition table 紀錄分割資訊 - BIOS 再開機的時候會首先載入 MBR,是一個 boot code + partition table,僅 512 bytes,跑完開機程式後就會去讀 boot partition ![image](https://hackmd.io/_uploads/H19fHUomJx.png) - 近期常被 GPT(GUID Partition Table)取代,其可允許更大的 partiotion size,也不會限制分割數量 - ![image](https://hackmd.io/_uploads/rkMhS8jXkg.png) - GPT 還是會有個 protective MBR 先佔用後續的空間 - 接著才由 Primary 紀錄分割資訊 2. 格式化(Format) - logical formatting - 又叫 high-level formatting - 有點像是創建一個 file system - `mkfs.ext4 /dev/hda1` - 把檔案系統需要的 metadata 寫進去 - low-level formatting (or physical formatting) 3. 掛載(Mount) - `mount –t ext4 /users /dev/hda1` - 之後檔案系統的目錄樹就會出現在 `/users` 底下 ## Ch. 11 File-System Implementation ### File System Structure ![螢幕擷取畫面 2024-12-03 140226](https://hackmd.io/_uploads/SJC_kXhQyg.png) ![image](https://hackmd.io/_uploads/H1tHl7nX1x.png) - 一個系統可能有很多檔案系統,因此會統一一個抽象介面(VFS surface, virtual file system surface) - VFS 利用一組定義好的資料結構(in-memory object)與 local file system 溝通 - local file system 會負責查看 disk 的 metadata,把 in-memory object(包含一些檔案的內容及資訊)填好後回傳給 VFS - Linux in-memory object 包含 - Superblock:整個 file system 的資訊(如所剩容量、File system 類型) - Inode:每個檔案都有一個獨一無二的 inode - File object:代表被打開的檔案,每個 fopen instance 都有一個(所以如果有多個 process 開啟同一份檔案,就會產生多個 file object) - Dentry object (D for Directory):描述目錄樹 - ![image](https://hackmd.io/_uploads/HkNGfX2m1x.png) - Disk metadata - 每個檔案系統不一樣 - e.g. ext: super block, inodes, allocation bitmaps - e.g. FAT: file allocaiton tables, dir. - ![image](https://hackmd.io/_uploads/rJgC47nXJg.png) - bitmap 表示後面哪些地方是空的,哪些地方已經有資料 - ![image](https://hackmd.io/_uploads/r1y7S7hQJx.png) - 用 [Linked Allocation](#Linked-Allocation) - 把所有 ptr 獨立出來放在 FAT 表,每個檔案是一個內部元素為 block 的 linked list,會有個進入點。而 FAT 表的每一格就代表了該 ptr 指到的 block 和下一個 ptr ### File System Key Design Issues - Directory implementation:如何列出資料夾的檔案 + 如何找到檔案 - Allocation (index) methods:如何配置空間給檔案 + 如何找到檔案的資料 - Free-space management:如何找到空間配置給檔案 ### Directory Implementation - 重點:list files in dir. and find a file in a dir. - Method 1: Linear list - simple but time-consuming - e.g. FAT file system(目標是簡單,如 SD 卡就使用此 File system) - Method 2: B-trees - efficient search $\log n$ - e.g. XFS, NTFS, ext4 (H-tree, a variant of B-tree) ### Allocation (index) methods - 如何配置空間給檔案 + 如何找到檔案的資料 #### Contiguous Allocation - ![image](https://hackmd.io/_uploads/rkEJdH6mye.png) - 每個檔案會佔據硬碟中連續的 block(block size 多為 4 KB) - 簡單:只需要紀錄開頭位置和長度 - Efficient access - 因資料都是連續的,能減少 IO overhead(硬碟的 cursor 不用一直移動) - 但檔案無法擴大到超過被配置的區域大小,除非要將其移到更大的區域(需花費額外移動成本) - 浪費空間(跟動態記憶體配置遇到的問題一樣),需要 compaction,將資料整理移動一下 - mapping - 將 Logical Address (LA, or called file offset) 映射到 Physical Disk Block - 區塊編號 $Q=\text{LA}/512$:看第幾個 block - 區塊內的位一輛 $Q=\text{LA}%512$:看他在 block 的那一個 byte - 所以要讀的 block 即為 Q + starting block #### Linked Allocation ![image](https://hackmd.io/_uploads/ByDMOS6Xyg.png) - Each file is a linked list of disk blocks - 優點 - 更簡單:只要 start pointer - No external fragmentation - 缺點 - No random access (due to linked list) - e.g. FAT - 把所有 ptr 獨立出來,不要嵌入進 block 中 - 每個 block 的大小回到 $2^n$,好管理 - traverse 時不需要真的讀每個 block 才能拿到下一個 ptr #### Indexed Allocation - ![image](https://hackmd.io/_uploads/rJ8CEBp71e.png) - Directory 擁有 file index block 的 address - 每個檔案皆會多配置一些額外的 blocks 作為 index blocks,內部存儲存檔案資料的 block 的 ptr - mapping - $Q=LA/512$:displacement(位移) into index table - 其實就是 index table 中的第 Q 個 block ptr - $R=LA%512$:displacement into the referred block - multi-level index - 檔案尺寸會受到 index block 的大小(假設 index block 只能放 64 個 ptr,則檔案大小只能是 64 * block size) - Sol: multi-level index - mapping - ![image](https://hackmd.io/_uploads/HJh0c2CVJl.png) - $Q_1=\text{LA}/(512\times 512)$:第一層 index block - $R_1=\text{LA}%(512\times 512)$ - $Q_2=R_1/512$:在第二層的 index block 的偏移量 - $R_2=R_1/512$:在 disk block 的偏移量 - 優點 - direct & random access 非常有效率 - 沒有 external fragmentation - 很容易去建立檔案(只要有 free list,就知道哪裡有空間放檔案) - 建檔時無需事先宣告大小 - 缺點 - index blocks 佔用額外空間 - index block 太小不夠存放一個檔案的所有 block 編號,太大又形成浪費 - Sol 1:每個 file 用多個 index block 存,blocks 之間用 linked list 串接 → 無法 random access - Sol 2:Multilevel index → 大檔案較適合,小檔案可能導致 index 大小甚至大於檔案本身 - e.g. UNIX inode - ![image](https://hackmd.io/_uploads/SyRuDra7Jg.png) - 特別針對小檔案優化 - 容許的檔案大小很大,並且若檔案很小的話可以很快就指到 data block,不用看很多層 - e.g. ext2 & 3 #### Extent-Based Allocation - 融合 contiguous allocation 和 linked / indexed allocation - 每個檔案是一塊一塊連續的記憶體,並利用 linked / indexed allocation 的想法將每塊記憶體串起來 - Allocate disk blocks in **extents** - Extent: A set of contiguous disk blocks - 一個 extent 若用完了,系統會給檔案另一個 extent,利用 linked list 串起來 - extent block 會有一個 ptr 存下個 extent 的位置 - e.g. ext4 in Linux - ![image](https://hackmd.io/_uploads/S1Qa9rT7yl.png) - dir -> inode -> index node -> data blocks ### Free-space management - Bit vector - `bit[i]` = 0 if `block[i]` is free else 1 - 優點 - 可以很快找到連續的記憶體空間 - 缺點 - 需要額外空間 - e.g. 1 TB 硬碟 w/ 4 KB blocks block size = $2^{12}$ bytes disk size = $2^{40}$ bytes number of blocks = $2^{40} / 2^{12} = 2^{28}$ number of bits in bitmap = $2^{28}$ bits = $32$ MB - Used in UNIX FFS, Linux Ext* - Linked Free Space list - 不用額外空間 - 但沒辦法快速獲得連續的空記憶體 - 現今檔案系統少用 ### Case study - ext4 - ![image](https://hackmd.io/_uploads/r1bwpr6XJe.png) - 為了拉近 metadata 和資料本身的距離,才將 partition 再拆成多個 cylinder group(HDD inode 和資料本身的距離會影響讀寫速度,越遠 cursor 要移越多,越慢) - Directory: H-tree - Allocation: indexed (ext123), extent (ext4) - Free memory: bitmap - FAT (File Allocation Table) - ![image](https://hackmd.io/_uploads/SylkRra7kg.png) - Directory: linear table - Allocation: linked - Free memory: Scan 0 in FAT ### Efficiency and Performance - OS 本身已經有一些可以改善 file system 的機制,並且可適用到所有檔案系統 - page cache (utilize temporal locality) - File prefetching (utilize spatial locality) - 跟 pre-paging 一樣 - 並設有獎勵機制,若 prefetch 的資料被用到了,則下一次會 prefetch 兩倍的資料 - 可以用 `fadvise()` 或 `madvise` 告訴 kernel 要如何 prefetching // TODO: 有空看一下 `fadvise()` - Defragmentation - 讓 file data blocks 變得連續 (by copying) #### Page caching - Regular file pages (non-mapped) - ![image](https://hackmd.io/_uploads/Sk-LxIpmJg.png) - 這些 pages 同時就是 file pages - Regular file pages (but using `mmap()`) - ![image](https://hackmd.io/_uploads/B1gKoeL6mJx.png) - Anonymous Pages - ![image](https://hackmd.io/_uploads/rklTl8aQke.png) #### File Fragmentation - Degree of Fragmentation of a file $$\text{DoF}=\frac{\text{# of extents of the file}}{\text{the ideal # if extents for the file}}$$ - 假設一個檔案 100 bytes,在硬碟中被分為四個 extents 然而在 ext4 中一個 extent 最大可以到 128 bytes 因此 ideal # of extents = 1,$\text{DoF}=4/1=4$ #### FS-Specific Optimization (e.g. ext4) - 把一個 partition 拆成多個 cylinder group,讓 inode 和其對應的 data blocks 盡量靠近 - inline file:直接把一些小檔案(< 60 bytes)放進 inode 內 - 原先:dir -> inode -> index node -> data blocks -> got data! - 改良:dir -> inode -> got data! ### Consistency and Recovery #### Inconsistency - e.g. 在 ext4 新增檔案需要改多個地方:allocation bitmap, inode, directory, data block - Issue:如果在途中斷電 - bitmap 說空間已配置但 inode 還沒寫 (ext4) - hard link 已經創好但 ref cnt 還沒增加 (ext4) - 一些 blocks 已經被釋放並分配給其他檔案,但 list table 還沒改(cross-linked list) (FAT) #### Recovery Utilities - 在檔案系統中每個 volume 有個 dirty bit,標示他有沒有正常被 unmount - 放在 super block,這個地方負責存檔案系統的資訊,其他內容如所剩容量、檔案系統類型等 - Run file system consistency check on dirty volumes (`fsck` in UNIX, `scandisk` in Windows) - 檢查 File system 的 meta,若有不符則試著去修復 - 超慢 -> 衍伸出 Journaling File Systems #### Journaling File Systems - 目的:保護檔案系統避免 structure inconsistency,可以很快的恢復檔案系統 - Transactions:把修改檔案系統所需要的操作包成一個 transaction - Atomicity:要不都做完,要不都不做 - 可以透過 Write-Ahead Logging (WAL) 來保證 transactions 的 atomicity - Write-Ahead Logging (WAL) - 需要一些空間來做 journal - ![image](https://hackmd.io/_uploads/r1whD18Eyx.png) - 先把 transaction 寫進 journal,將要改的點紀錄進去,並在最後加個 flag 標示 transaction 的結束 - 再進 file system 把內容改掉 - Crash recovery - 先去看 journal - 找到一個完整的 transaction(檢查 flag) -> 重作 - 找到一個不完整的 transaction -> 直接丟掉 - 為什麼這樣的機制可以保證 Atomicity 1. transaction 還在記憶體就 crash -> 還沒改到 file system 沒差 2. transaction 在寫進 journal 的時候 crash -> 此時的 transaction 是不完整的,所以會直接被丟掉,不會寫進 file system 3. 寫進 file system 時 crash -> 此時的 transaction 是完整的,所以會重做 - Problem:需要寫兩次 - 有些檔案系統會只將 metadata 的改動寫進 journal,避免檔案系統本體的 crash ### Log-Structured File Systems - Sequential writing always - 動機:read 的速度會因 memory 變大變便宜而解決,因此 read 的需求會減少,但 write 一定得做。在 write 中,sequential write 很快,不用特別處理,因此硬碟的效能瓶頸在 random write - Key idea: out-of-place update - 可以被視做很大的 journal space,而無 "file system" - e.g. NILFS2 (servers), F2FS (Android devices), NOVA (NVRAM) - ![image](https://hackmd.io/_uploads/BJMf0JL4Je.png) - 寫入時就一直連續往右寫 - 原先的 a 會被標成 invalid(garbage) - 會造成 disk 產生很多洞 - ![image](https://hackmd.io/_uploads/HJgdCk8Nkx.png) - Known as compaction, garbage collection or cleaning - 為此類型 file system 附帶的成本 - Recovery - 有一個 checkpoint 紀錄當前處理完的點,直至一個 transaction 完成後才能將 checkpoint 往後移 - 當有個 transaction 在寫入時 crash,檔案系統就可以直接將 checkpoint 以後的資料全部清除已達到 consistency 的效果 - ![image](https://hackmd.io/_uploads/rJooylIVye.png) ## Ch. 12 Mass-Storage Systems ### Magnetic Tape and Disk - Magnetic Tape(磁帶) - 讀寫頻寬高 (~400 MB/s) - sequential access 很快,但 random access 很慢 - 因為磁帶是一整條的,所以 random access 仍要慢慢移動磁帶 - 容量很大 (TB) - 現今很常被用來做資料備份 - Magnetic Disk(磁碟機) - ![image](https://hackmd.io/_uploads/HyFLsdU4kx.png) - head 不會真的接觸到磁碟,否則會刮傷 - 一個 sector 512 B,每次讀寫的最小單位為一個 sector - 找特定資料的過程:CHS - C: find Cylinder - H: which Head - S: Sector - transfer rate - 跟傳遞介面有關 - e.g. SATA3: 600 MB/s; typical HDD ~100 MB/s - positioning time (random access time) - 與 seek time 和 rotational latency 有關 - rotational latency - 到了特定的 cylinder,尋找想要的 sector - Typically 5400 rpm (laptop) or 7200 rpm (desktop) (~ 5 ms) - seek time - 移動 head 到想要的 sector 所在的 cylinder - 5 ~ 7 ms - attachment // TODO - ATA/IDE interface - 當今個人電腦主流的介面 - e.g. Parallel ATA (PATA) and Serial ATA (SATA) - SCSI bus ### Disk Scheduling - transfer time 無法改變,但可以透過改變資料的儲存方式提升 access time -> OS 負責 - disk scheduling optimizes seek time, which is proportional to the total seek distance - 為什麼需要 disk scheduling:因為 multiprogramming 及 write buffer,OS 上會有很多 disk request,因此就可以進行安排,決定要先做哪個 request- [補] Disk schedule 可以被化簡成 TSP (NP-hard) #### Algo. - FCFS (First Come First Serve) - slow but fair - SSTF (Shortest Seek Time First) - ![image](https://hackmd.io/_uploads/ryG_ez2Eyx.png) - Problem: starvation - 可透過 promotion 改善 - Greedy 不代表最優解(可從圖看出) - SCAN - also called elevator algo. - 先往一個方向做到底再回頭處理剩下的 - ![image](https://hackmd.io/_uploads/BJoClt84yl.png) - 尾端的 waiting time 會特別長 - C-SCAN - 比起 SCAN 提供了更平均的 waiting time - ![image](https://hackmd.io/_uploads/HygTZFI41l.png) - 把 cylinder 當作是 circular,一端到底之後(199)直接從另一端開始(0) - 過程中不會處理任何 request - C-LOOK - C-SCAN 的變形,不會走到底而是走到 request 最遠的那個 cylinder - ![image](https://hackmd.io/_uploads/S1Z-IF8Nkl.png) - 當今較長使用 SCAN 或 C-SCAN - Seek time 也會受 file allocation method 影響,利用 cylinder group (ext4) 或 inline file 可以減少花費時間 #### Disk Seek Optimization - Disk scheduling 可以被化簡成 TSP (NP-hard),不必要求最佳解 - OS 不負責 optimize rotational latency,當今的 HDD firmware 自己會處理 - e.g. SATA NCQ (Native Command Queuing) - 支援接收多個 request,由 HDD 自己排序 - 優點:可以考慮到當前 head 旋轉的角度(HDD 自己知道) - 缺點:增加 request latency #### Linux CFQ (Completely Fair Queueing) Scheduler - 負責 time slicing,讓每個 process 共用 IO processing 的時間(sync. IO) - 另外有個 global queue 儲存 async. IO,當沒有 sync. IO 時就去處理 async. IO(用 SCAN) ### RAID Structure - RAID = Redundant Array of Inexpensive Disks - 用途 - 利用平行化提升效能 - 提升 reliability,可用多餘的 disk 進行自我修復 - RAID 0 - ![image](https://hackmd.io/_uploads/r1b_5KLVJx.png) - striping - 完全為了平行化,0% redundancy -> 無法提供 reliability - RAID 1 - ![image](https://hackmd.io/_uploads/Bk1Y5K8V1x.png) - mirroring - 100% redundancy - 其中一個 disk 壞掉另一個可以直接上線 - RAID 2 (less used) - ![image](https://hackmd.io/_uploads/ByIjctLNke.png) - RAID 3 (less used) - ![image](https://hackmd.io/_uploads/SJlncFIV1g.png) - RAID 4 (less used) - ![image](https://hackmd.io/_uploads/SyRn5K8Vye.png) - 新增一個 parity disk 用來做一定程度的修復 - $A_p=A_1\oplus A_2\oplus A_3\oplus\cdots$ - 如果只有一個 disk 壞掉都還能復原 - 相較 RAID 1,redundancy 較低,但若硬碟壞掉需要花時間進行復原 - read 需要 access 一個硬碟,但 write 需要同時修改兩個 disk($A_i,\ A_p$)-> bottleneck 在 parity disk 上,所有 write 都會卡在這邊 - RAID 5 - ![image](https://hackmd.io/_uploads/Bk3p9KIV1x.png) - 把 parity 分到每個 disk 上,避免 parity disk 成為 bottleneck - RAID 6 - ![image](https://hackmd.io/_uploads/B1809KUNJx.png) - 有兩個 parity,利用其他方式(Reed-Solomon code, EVENODD parity)進行復原,可容許更多錯誤 - 但這些編碼方式就比 XOR 慢 - RAID 01 or 10 - ![image](https://hackmd.io/_uploads/ryoz0FI4ke.png) - 10 存活率較高,01 當壞掉的 disk 分屬在兩個 stripe 中就掛了 ### Network-Attached Storage (NAS) & Storage-Area Network (SAN) - NAS - ![image](https://hackmd.io/_uploads/SyY5axnV1l.png) - network 中的裝置可以透過網路連接到 NAS,達成檔案共用等功能 - Common protocols: NFS, CIFS, SAMBA - 利用 host 和 server 間的 remote procedure calls (RPC) 實作 - [補] RPC:讓一個裝置可以遠端執行另一個裝置的特定程式 [Ref](https://aws.amazon.com/tw/compare/the-difference-between-rpc-and-rest/) - SAN - ![image](https://hackmd.io/_uploads/r15zgb34kg.png) - 用在更大型的使用場景(因其更高效率,但初次架設較難) - 分離跟 storage 相關的網路 - 實體的 storage resource 並不會被 client 所知道,client 看到的一個 volume 可能分散在好幾個 storage devices ### Solid State Disks (SSD) - 用非揮發性(non-volatile)的 memory 來模擬原本的 block devices(如硬碟,以 block 為單位做讀寫) - Using Flash memory(現今主流) or battery-backed RAM(RAM + 電池) - 對 OS 來說,不用改任何架構 or 溝通方式,直接把 SSD 視做一般的硬碟 - ![image](https://hackmd.io/_uploads/S1pI4W3Nkl.png) - 利用一個介面(此圖為 SATA)跟 OS 溝通 - SSD controller:就是一顆小處理器 - 黃色的是給 controller 用的記憶體 - 紅色的都是 flash memory,可以想成一堆硬碟,因此建成磁碟陣列時平行化的程度很高 - Non-volatile memory - ![image](https://hackmd.io/_uploads/Hyd6UZhVJx.png) - Working memory:可以直接跟 CPU 對接 - Storage memory:需要用 IO 的方式讀寫 #### Flash Memory - cell:flash memory 中最小儲存的單位 - 讀寫的實際運作原理 - ![image](https://hackmd.io/_uploads/rJkWOZn4Jl.png) - Tunnel oxid 是一個絕緣層 - write 的時候,施加一個高電壓給他,使得電子可以穿過絕緣層進入 floating gate,結束後也會因絕緣層的關係使電子不會跑出來 - erase:寫資料前需要先清空電荷(只有 flash memory 才需要) - multilevel cell - 透過將電壓分的更細,可以讓一個記憶包存更多資料 - ![image](https://hackmd.io/_uploads/B16eKWhEJg.png) - SLC (1 bit) → MLC (2 bits) → TLC (3 bits) → QLC (4) - NAND flash 架構圖 - ![image](https://hackmd.io/_uploads/SyA5KZh41g.png) - block 是 erase 的最小單位;page 是 read / write 的最小單位 - 可以隨機讀但無法隨機寫,因為 flash memory 的特性,在寫之前必須 erase,但 erase 是以 block 為單位,會把所有東西都清掉 - block: 2 MB; page: 16 KB - Flash Translation Layer (FTL) - SSD 的 firmware:隱藏 SSD 內部的資訊,將其包成像硬碟一樣的內容回給 OS - Flash memory management tasks - Logical-to-physical address translation - Garbage collection - Wear leveling - Logical-to-physical address translation - write 必須把整個 block 清掉 -> 沒效率 - 利用 out-of-space update + 把就資料標為 invalid - 需要原位置(logical)和更新位置(physical)間的對應關係 -> L2P mapping table - ![image](https://hackmd.io/_uploads/SJ1F0b3Vkl.png) - mapping table 放在 controller 的 DRAM > [!Note] > - page table > - out-of-place update in log file system - Garbage collection - 會留一個 block 當作 garbage collection 的 working block - 挑一個 block,把其中 valid 的資料複製進 working block,再將其清掉成為新的 working block - 如何挑選要清掉的 block? Victim collection > [!Note] > - log file system cleaning > - defragmentation in file system design issue - Wear leveling - 平均耗損,因為一個 storage cell 有其壽命,大約 3000 P/E cycles (Program erase cycle),之後絕緣層可能會損壞,使電子漏出 - 以及還有 locality 的問題 - ![image](https://hackmd.io/_uploads/SkLdzGhVJg.png) - 因為 spatial locality,有些區域會一直被讀到,導致該 block 一直沒有 dirty 的 page,因此 garbage collection 會傾向不去動他,進而使其 erase count 極低(如圖二) - Issue:偶爾讓存在很久的 block 移到另一個 block,讓原先的 block 可以開始被 erase,以達成平均耗損

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully