An Efficient Page-level FTL to Optimize Address Translation in Flash Memory

--- title : An Efficient Page-level FTL to Optimize Address Translation in Flash Memory tags : SSD description : 2023/05/24 reading group --- # An Efficient Page-level FTL to Optimize Address Translation in Flash Memory ##### Link : [paper](https://dl.acm.org/doi/pdf/10.1145/2741948.2741949) ###### paper origin: EuroSys '15 ## Introduction - Problem : - With the increasing capacity of SSDs, the **mapping table grows too large to be cached** - Reduce the extra operations caused by address translation with a small mapping cache - Proposal : - Using a relatively **small mapping cache** - **Clusters** the cached mapping entries that belong to the same translation page ## Background SSD mainly consists of - A software layer FTL - An internal RAM - Flash memory ## Experiment 1. Distribution of entries in the mapping cache - a PPN takes 4B - a flash page 4KB This result shows **only a small fraction** (less than 15%)(150/(4KB/4B) of entries in a cached translation page **are recently used**. ![](https://hackmd.io/_uploads/SJcmirfB2.png) We can see that 53%-71% of cached translation pages have more than one dirty entry cached, and **the average numbers of dirty entries in each page are above 15**. ![](https://hackmd.io/_uploads/BkiusrMBn.png) 2. Spatial locality in workloads Financial1 is a random-dominant workload, it is evident that **sequential accesses**, denoted by the diagonal lines, **are very common**. ![](https://hackmd.io/_uploads/Sy533HGH3.png) **The decline is because sequential accesses** require consecutive mapping entries, which concentrate on a few translation pages ![](https://hackmd.io/_uploads/S1ceprzBh.png) ## Design of TPFTL ### Overview - cache - a small set of TP nodes - each TP node mantains a cluster of L2P list - a counter records the number change of TP nodes - flash memory - data blocks - translation blocks ![](https://hackmd.io/_uploads/SJdJe8zBh.png) ### Page-level LRU - A TP node usually has **more than one entry node with different hotness** - Note that the hotness of each entry node is obscured by the page-level hotness, which results in less efficiency in exploiting the temporal locality ### Loading Policy **Sequential accesses are very common** - Request-level prefetching - **Spliting a request into one or more page accesses** according to its start address and length - It is efficient with large requests - Selective prefetching - dynamic length ![](https://hackmd.io/_uploads/HJQNnUMH3.png) 1. If the **number of TP nodes continues to decrease** by a threshold, PFTL assumes sequential accesses are happening - Performing selective prefetching when a cache miss occurs 2. If the number begins to continuously increase by the threshold - Stop selective prefetching ### Replacement Policy - Batch-update replacement - When a dirty entry node becomes a victim, TPFTL **writes back all the dirty entry nodes of its TP node**, and only the victim is evicted - In GC, if page is cached, all the cached dirty entries in the page are written back - Clean-first replacement - Chooses its LRU **clean entry node as a victim** ![](https://hackmd.io/_uploads/HJkc0LfS3.png) ## Evaluation - Flashsim platform - workload ![](https://hackmd.io/_uploads/HyedyL7B2.png) ### Results ![](https://hackmd.io/_uploads/HJjXMUXH3.png) ![](https://hackmd.io/_uploads/rJQfO8XBh.png) ![](https://hackmd.io/_uploads/Byj_587B2.png) ![](https://hackmd.io/_uploads/ryTDsUQS3.png) ![](https://hackmd.io/_uploads/S1BLtoNrn.png) ## Conclusion - **Extra operations degrade both the performance and lifetime** - Both a high **hit ratio** and a **low probability of replacing a dirty entry** of the mapping cache play a crucial role in reducing the system response time as well as the overall write amplification