Try   HackMD

Flash based SSD

Introduction

SSD (Solid-state Storage Device)

  • 異於傳統磁碟機,沒有機械式零件,純粹依靠二極體,像是 DRAM (Direct Random Access Memory)
  • 然而 SSD 與 DRAM 不同的是,SSD 在沒有電源供應的情況下,仍可以儲存資訊
  • 基於 Flash 的技術(精確的說為 NAND-base flash),由舛岡富士雄於 1980 發明
  • Flash 作為記憶體寫入的特性,在寫入 1 單位的容量 (flash page) 之前,需先抹除 (erase) 1 區塊的容量 (flash block),然而這是開銷較大的操作。另外,在對同一 flash page 做太頻繁的讀寫容易造成磨損 (wear out)。這些是 SSD 所要面臨的挑戰
  • 設計一個 Flash based SSD 需同時考慮到性能 (Performance)可靠性 (Reliability)

Flash 原理

基於原有的 MOSFET 結構中,導入 Floating gate,一個可在斷電的情況下長時間保存電荷的材料:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 當 Floating gate 未儲存電荷時,若對 Control gate 施加
    VRef>VTH
    的電壓時,此時穿隧現象形成,視為通路,邏輯電路判斷為 1

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 當 Floating gate 儲存電荷時,此時穿隧電壓
    VTH
    右移
    若 Control Gate 的電壓
    VRef
    小於穿隧電壓
    VTH
    時,視為斷路,邏輯電路判斷為 0,反之為 1

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Erase

透過將 Control gate 接地,Substrate 端施加 20V 電壓,可將 Floating gate 的電荷導出,邏輯電路再次恢復為通路,因此 Flash memory 在 Erase 之後,每個 cell 的位元重置為 1

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Flash Cell

基本上為一個二極體(transistor),可用來儲存一個或多個位元,分成:

  • Single-Level Cell (SLC): 儲存一個位元 0 或是 1 的資訊
  • Multi-Level Cell (MLC): 根據不同的電荷水平,可存 00,01,10 以及 11 的兩個位元的資訊

Basic Flash Operations

Flash Memory 支援以下基本操作:

  • Read (a page) :
    Flash 驅動程式或其他應用程式 (以下簡稱 HostClient) 透過 Read command 讀取 Flash 中特定的 Page (e.g., 2KB or 4KB)。不論記憶體的空間大小,通常在 10μs 之內完成,因此 Flash 記憶體可以達成 random access 的操作

  • Erase (a block) :
    在對 Flash 記憶體的一個 Page 寫入資料之前,需先對包含其 Page 的 Block 做 Erase (每個位元重置為 1)。因此,在寫入一個 Page 之前,需確保該 Block 的內容已經妥善備份

  • Program (a block) :
    當一個 Block 已經被 Erase,Program command 可以對該 Block 其的的 Page 寫入資料。相較於 Erase 的操作,Program 的開銷較低,通常在 100μs 之內完成

對 Flash 的一個 Page 寫入的一連串操作,可以將 Pages 分成若干個狀態:

  • INVALID : Pages 的起始狀態
  • ERASED : 透過 Erase 之後,Pages 處於 ERASE 狀態。注意到該狀態之後 Pages 才可以被 Program
  • VALID : Pages 透過 Program 寫入資料後,狀態為 VALID。狀態為 VALID 的 Pages 才可以被讀取 (Read command),且讀取後不會改變狀態
    當一個 Page 被 Program 之後,其內容不可再被改變,除非再次將該 Page 所在的 Block 做 Erase

以下以對一個含有 4 個 Page 的 Block 讀寫為範例:

                i i i i    Initial: pages in block are invalid (i)
Erase()     →   E E E E    State of pages in block set to erased (E)
Program(0)  →   V E E E    Program page 0; state set to valid (V)
Program(0)  →   error      Cannot re-program page after programming
Program(1)  →   V V E E    Program page 1
Erase()     →   E E E E    Contents erased; all pages programmable

A Detailed Example

考慮以下 4 個 8-bits 的 Pages,每個 Block 含有 4 個 Pages(實際上不可能存在如此小的容量的設計,此處只作為範例解釋)

  1. 在寫入之前,每個 Page 處於上一個寫入後的狀態 VALID:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. 若要更新 Page 0 的內容,先將該 Block 所有 Page 做 Erase:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. Page 處於 ERASE 狀態後,才可以更新內容:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

注意到以上的操作(更新 Page 0) 導致 Page 1,Page 2 與 Page 3 的內容遺失,因此在寫入資料之前,需確保妥善備份該 Block 的內容

Summary

  • Flash 提供快速的讀取速度,優於一般的磁碟機
  • Flash 在寫入之前需先 Erase 整塊 Block,不僅開銷大,且頻繁的寫入會造成磨損 (Wear out)

設計一個 Flash based 的 SSD 時,如何提昇寫入的性能與可靠性是個重要的議題

Flash translation layer (FTL)

在理解 Flash Memory 基本原理之後,一個 Flash base 的 SSD 控制器包含:

  1. 多個 Flash memory(Flash Package),以及揮發性記憶體如 SRAM,用來提供 Cache 以及記憶體映射的機制
  2. 一個邏輯控制單元

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

SSD 邏輯控制單元其中的一個重要的功能,係將由 Host 發起的 Read/Write Requests,對應到 SSD 內部的對 Flash memory 的 Read, Erase 以及 Program command
Flash translation layer(FTL) 負責此功能。其中 Host 操作的對象稱為 logic block,經由 FTL 轉換後的對象稱為 physical blockphysical page

FTL 為了達到良好的性能 (Performance),其需滿足:

  1. 並行處理:一個 SSD 內部包含一個以上的 Flash memory,如何並行處理是個重要的議題

  2. 減少 Write amplification(WAF)

    WAF 定義如下:

    WAF=total write traffic (in bytes) issued to the flash chips by the FTLthe total write traffic (in bytes) issued by the client to the SSD

FTL 亦需滿足可靠性 (Reliability):

  1. Wear out: 若一個 Block 被重複 Erase 與 Program 過多次,會導致其失效 (Unusable),為避免此情況發生,FTL 需盡可能的分散其寫入的 blocks,使 flash 內部的每個 Block 最終同時 Wear out
  2. Wear Leveling: 上述的達成每個 Block 同時 Wear out 的方式稱為 Wear Leveling

FTL 設計案例:Direct Mapping

考慮一個 FTL 的讀寫方式為一對一映射 (Direct Mapping),每個 logical page N 映射到唯一的 physical page N:

  1. Read: 在 Direct Mapping 機制下,讀取一個 Logical page 對應到直接讀取一個 Physical page
  2. Write: 在 Direct Mapping 機制下,寫入一個 Logical page 前需先讀取該 page 讀取出,Erase 該 block,最後將舊的資料與新的一併寫入

可以看出此機制會造成 WAF 的升高,且若 Host 對特定一塊 page 頻繁寫入,最終導致 Wear out,因此 Direct Mapping 機制是個錯誤的示範

Log-Structured FTL

將 Host 對 Logical block N 的 Write request,對應到一個 Block 中,可以寫入且 page 鄰近於最近寫入的。並建立映射表紀錄以利接下來可能發生的 Read requests

  1. Write: FTL 搜尋一個可以寫入的 Page,通常鄰近於上一個寫入的 Page,建立映射表並紀錄 Logical block N 的地址與 FTL 寫入 Physical page 的地址
  2. Read: FTL 利用寫入時建立的映射表,找出對應到 Logic Block N 的地址對應到的 Physical page

A Detailed Example

考慮以下 SSD 包含 16-KB 容量的 blocks,每個 block 包含 4 個 4-KB 容量的 pages(實際上不可能存在如此小的容量的設計,此處只作為範例解釋),Host 發起以下請求:

Write(100) with contents a1
Write(101) with contents a2
Write(2000) with contents b1
Write(2001) with contents b2
  • 起始階段,所有的 Physical block 與其中的 Page 接處於 INVALID:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • SSD 收到 Host 發起的對 logic block 100 的寫入請求,FTL 決定寫入 Physical block 1,由於該 Block 的 Page 處於 INVALID,需先 Erase:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Block 1 已經處於 ERASE 狀態且可以寫入,FTL 選擇將 logic block 100 的內容寫入 Page 00,並建立映射表:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Host 發起對 Logical block 100 讀取的需求,透過先前建立的映射表,找到 Logical block 100 對應到 Physical page 00:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • Host 完成後續寫入:

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

Log-Structured FTL 優點

  • FTL 透過選擇相鄰且可寫入的 Page,降低了開銷大的 Erase 操作同時降低了 WAF
  • FTL 透過平均的寫入不同的 Page,達到 Wear Leveling

Log-Structured FTL 缺點

  • 需定期實施 Garbage collection,此舉會提昇 WAF,且同時降低效能 (performance)
  • 維護映射表需要的額外開銷

Garbage Collection

考慮到上述的範例,Host 接續做以下寫入:

Write(100) with contents c1
Write(101) with contents c2
  • FTL 接續選擇相鄰且可寫入的 page 寫入:

    gc

    可以發現在以上操作後,page 00 與 page 01 的狀態雖然仍為 VALID,但其內容為舊值,視為 Garbage

由於 Log-Structured FTL 的特性,Host 覆寫曾經寫入的位址導致 Garbage 產生。為了能持續提供可用的寫入空間,SSD 必需定期進行清理,需要被清除的 Block 稱為 Dead Block,清除的動作稱為 Garbage collection

考慮上述的寫入操作之後,Garbage collection 進行以下操作:

  • Block 0 有兩個 Dead block (Page 00 與 Page 01) 以及 live blocks (Page 02 與 Page 03,其包含 Logical block 地址 2000 與 2001 的內容)

  • 將 live block 讀取出並由 FTL 選擇可寫入的位置,以此範例,FTL 選擇 Block 2 的 Page 06 與 Page 07

    gc2

  • Erase Block 0

    gc3

Mapping Table Size

另一個 Log-Structured FTL 潛在的缺陷為映射地址的維護。考慮到一個映射表 (Mapping table) 使用 4 bytes 的值對應到一個 4 KB 的 Page,若該 SSD 含有 1 TB 的容量,則需要 1 GB 的空間來映射所有 Pages,因此這種使用 Page level 做映射的 FTL 設計是不切實際的

Block-Based Mapping

考慮 Host 發起以下寫入:

Write(2000) with contents a1
Write(2001) with contents a2
Write(2002) with contents b1
Write(2003) with contents b2
  • Per-page Translation table

image

  • Block translation

由於只使用 1 個欄位紀錄需要映射的位置,我們需額外的資訊將 4 個 Logical block 的位置對應到該寫入的 Physical page
這裡使用 Logical block 地址的最低兩個位元 (Least two significant's bit) 作為 index offset,索引到 4 個連續的 Physical pages 空間。剩餘的高位元 (Most significant bits) 作為映射到 Physical page 的起始位址

block3 (1)

若要更新 2002 的位置為 c',需將 2000, 2001 與 2003 位置讀出,寫入鄰近可用的 Physical block(2),再將原 Physical block(1) 做 Erase

  • 讀取:

read block

  • 寫入:

write block 4

Block-Based Mapping 的缺點:
當 Host 只寫入小於 physical block 的容量時,整塊 Block 必需一起被處理,尤其在覆寫時,其開銷巨大。尤其現代 SSD 的設計,一個 Block 動輒為 256 KB 是常見的,因此需要更好的映射機制

Hybrid Mapping

  • 混合 Page-level mapping (更靈活的寫入) 與 Block-level mapping (減少維護映射表的開銷)

    FTL 使用兩種 Table: 以 page 為單位的 log table 以及以 block 為單位的 data table。FTL 首先在 log table 中搜尋欲更新的 logical block;若無,則到 data table 搜尋

    考慮以下範例,Host 已經將 Logical block 地址 1000, 1001, 1002 與 1003 寫入位於 Physical block 3 的 Physical page 08, 09, 10 與 11:

    hybird

  • 接著,Host 覆寫原有的 Logical block address 1000, 1001, 1002 與 1003 的值為 a', b', c'd',FTL 將 Host 覆寫的資料寫入 log block 1 :

    hybird2

  • 最後,FTL 清除 (Erase) Physical block 3 作為 log block:

hybird3

由於 Host 更新了原有在 Physical block 3 中的所有資料,FTL 可以輕易的將 data table 重新指向更新後的 Physical block,並更新 log table,這個方式稱為 Switch merge

並非所有覆寫都是將 Physical block 內的資料全部的更新,部份需保留

  • 考慮上述的範例,Host 只有覆寫 Logical block address 1000 與 1001:

image

此時 FTL 無法單純替換 data table 的值。除了 data table 仍指向原來的 Physical block 之外,仍需要更新 log table 來將 Host 覆寫的資料更新至新的 Physical page number,這個動作稱為 Partial merge,其開銷較 Switch merge 更大