Introduction to Flashmemory Storage Systems

#### ch 0 8 18 24 #### ch 1 3 12 14 19 merge hyrid series 24 CNFTL 46 formula 49 impact 61 lazy copy # Introduction to Flashmemory Storage Systems ## 00 Course Introduction ### Why Flash Memory 1. High density 2. Low Power Comsumption 3. High Performance 4. Low Cost ### Flash-Memory Characteristics 1. Write-Once -> Out-place-update (live、free、dead) 2. Bulk-Writing -> The basic unit of erase is block (but read and write is page) 3. Limited Lifetime -> Wearing-leveling logical to physical no free space -> garbage collection wear leveling ### Structure #### block has many pages #### pages consisting of main data、spare area #### spare area spare area: bad-block identification and error-correction code ### Logical view of FTL of NAND flash memory #### user data 1. reserved block for bad block 2. map blocks (also appears in RAM) 3. write buffer blocks temporarily store the incoming write data #### mata data ## 01 Management Schemes檔案 ### Architecture of NAND ### Page-Level Mapping: ### FTL LPN to PPN #### pros: high performance write amplification #### cons: mapping talbe require more space in memory ### Block-Level Mapping: ### BL The offset of LBN and PBN is the same #### pros: save more memory space #### cons: overhead is higher(basic erase unit is block) ### BLi base on BL, but can save more space by rebuilt table flash memory during read/write request ### Hybrid ### log block scheme data writes to log blocks in flash memory data can just write into log blocks when 1. log block don't have free page 2. when the logical block that contains the requested page is changed from the previous logical block merge the log block and data block ### BAST blocking mapping -> data block page mapping -> log block data block 1-1 with log block data will write into log block first, then merge with data block when 1. page is full 2. no available log block to allocate to data block #### cons: low block utilization while random access ### FAST base on BAST -> but improve the issue of low utilization #### difference with BAST two kinds of log block -> SW(only one), RW sector-level mapping for SW, RW #### write operation: first check if this write operation will cause switch operation yes -> append in SW 1. lsn must mod = 0 2. sequentially write from begin to end no -> written in RW #### pro: better log block utilization void unnecessary merge #### con: read is slower than BAST because of architecture merger operation need more time ### LAST #### architecture To extract sequential write -> more switch and patial merge and cost dowm full merge ##### sequential log buffer handle sequential write consist of several log blocks -> less replace than FAST block associative mapping ##### ramdom log buffer high temporal locality -> hot partition low temporal locality -> cold parition fully associated mapping #### Detecting the Locality Type small write: high temporal locality long write: high sequentail locality -> use a threhsold to determine the locality #### Exploiting Sequential Locality more than one SW log block -> less replacement than FAST sequential log blocks exhausted -> 1. search log block whose page are all valid -> switch merge 2. select victim use LRU -> parital merge #### Exploiting Temporal Locality reduce associativity degree 1. cluster pages with high temporal locality (hot pages) into the same log block -> random log buffer partitioning policy 3. wait until the associativity degree of a log block is decreased if it has the possibility -> select the victim among the cold blocks whose associativity degree will rarely be changed (random log buffer replacement policy) To make the invalid pages to be generated mainly in a small portion of log blocks Use update interval to classify hot or cold #### Random Log Buffer Partitioning Policy LAST distinguishes the hotness of each page based on its update interval #### Random Log Buffer Replacement Policy 1. determine a victim partition -> If there is a dead block in the hot partition, we select the hot partition as the victim partition 2. victim log block from the victim partition If the victim partition is the hot partition, we select a dead block as the victim log block. If there is no dead block, we choose a least recently updated log block For the cold partition, we choose the log block with the lowest merge cost as the victim If the victim is a dead block from the hot partition, it means that there are enough dead blocks, thus give a free block to the cold partition. If not, we give a free block to the hot partition ### SAST #### idea The basic idea of flexible group mapping is to configure the degree of sharing of log blocks among data blocks using the block-level spatial locality parameter #### architecture N data bloock in data block group The N parameter indicates the associativity among neighboring blocks K maximum number of log blocks of log block group 1. the data-block-mapping table (DBMT) LBN -> PBN 2. the log-block-mapping table (LBMT) DGN(data block group number), PBN 3. the log-page-mapping table (LPMT) DGN, LPB ,PPN DGN = LPN div (N × the number of pages per block). #### merge LRU check N data block and K log block #### KAST ### idea min associativity degree -> distribute on different log buffer worst-case block merge -> max K associativity degree more than one SLB ### CNFTL Cluster: manage logical space Region: mapping (restrict overhead of GC) Segment、frame: manage physical space number fo frame restrict linear search time Cluster as the basic read/write unit virtual block -> physical block cluster -> frame #### Mapping table CT: clusters -> segments (segment, cluster number) BT: vitual block -> physical block (region number, virtual block number -> phyiscal block) FST: to provide free segment in region BST: record physical block (free、used、reserved) #### read cluster -> segment by CT、BT search segment to get valid data of requested cluster then read from frame #### From LBA to Segment 1. Cluster number = floor(LBA / # sector per cluster) 2. Region number = Cluster number mod total number of region、segment number in CT 3. Virtual block number = floor(segment number / # segment per block) 4. use BT to optain physical block 5. search segment sequentially to get sector ### DFTL enhanced page-level mapping #### arch entire mapping -> GMT cached GMT -> CMT GTD ### CFTL change from block page level mapping CPMT CBMT tier-2 tier-1 use counter of logical page for classify hot/cold to cold -> collect scatter phyiscal page to block to hot -> remove mapping entry in CBMT ### DAC #### idea reduce ram footprint without add too much response time -> block level base better cache hit ratio -> exploring temporal、spatial locality and access frequent by two-level cache #### architecture RAM: First-level Cache (DBMTC) Second-level Cache (TPRLC、TPAFC) Translation Page Mapping Table(TPMT) Flash: Transition block (block-level address mapping table) #### Translation Page include DVBA DPPBA DRPBA one DVBA -> one DPPBA and one DRPBA #### Translation Page Mapping Table include TVPA FREQ LOCA TPPA one TVPA -> one TPPA TVPA = DVBA / # of mapping a page can store LOCA: location of physical translation page FREQ: access frequency #### First-level Cache make use of temporal locality n-way set associative mapping -> n option in set associative mapping victim selection: 1. also in second level 2. not updated since last fetch #### Second-level Cache (TPRLC) LRU is used for replacement algorithm #### Second-level Cache (TPAFC) fetch-in happen when a freq of a access is high than lowest freq LFU is used for replacement block too ### MNFTL #### goal reduce the average system response time -> reduce the overhead of GC ### WAFTL #### Data Block BMB、PMB #### Buffer zone #### Mapping table block first write into buffer zone when it's full -> full into data block -> write same data two times (waf is high) Because not knowing if the data is big or small ``` for each block if GMT if BZMT end if else if GMT for each if BZMT end if end for else ###