Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. USENIX ATC'21

# Exploring the Design Space of Page Management for Multi-Tiered Memory Systems. USENIX ATC'21 ###### tags: `papernote` --- ## Summary **Paper Title**: Exploring the Design Space of Page Management for Multi-Tiered Memory Systems **Link**: https://www.usenix.org/system/files/atc21-kim-jonghyeon.pdf **year**: USENIX ATC'21 **Keyword**: Memory Tiering --- ## Problems 1. The way that operating systems currently manage pages was designed under the assumption that all the memory has the same capabilities based on DRAM. This oversimplification leads to non-optimal memory usage in tiered memory systems. - In such multi-tiered memory systems, we find that the critical factor in performance is not only the **access locality** but also the **access tier** of memory. 2. Data centers typically employ multi-chip NUMA architecture to scale up the performance of commodity servers with high core counts and memory capacity. Although this can increase the number of DIMM slots per server, scaling DRAM density is still a significant obstacle. It poses challenges in cost-effectively constructing large memory systems. 3. Meanwhile, since SCM offers byte-addressable and non-volatile properties, it is gaining traction to bridge the performance gap between DRAM and SSD. - Intel recently released the 3D XPoint non-volatile memory (DCPMM) that can be installed on DIMM without modification. - Intel DCPMM provides two types of tiered memory systems that can be categorized as `hardware-assisted` or `software-managed`. 4. Currently (Linux v5.3), Linux relies the NUMA framework that classifies memoy nodes as either local or remote in a binary way. And, Linux does not support demoteing (or reclaiming) pages from the upper-tier to the lower-tier memory. - This paper designs the page placement strategy by considering performance characteristic across access-tier as well as access-locality. :::info **Default (linux v5.3) page allocation policy** In Linux, the default page allocation policy tries to use local memory as much as possible to minimize the performance penalty incurred by accessing remote memory. Only if there is no free space in the local memory, the memory allocator looks for free space on a remote memory node known as a fallback path ::: ## DCPMM In hardwareassisted mode, DCPMM is exposed to software as the main memory while DRAM acts as a hardware-managed cache, non-visible to the software. The memory controller automatically places frequently accessed data on the DRAM cache, while the rest of the data is kept on a large capacity but slow DCPMM. On the other hand, with the operating system support, both DRAM and DCPMM can be exposed as normal memory and visible to software, tiering memory into fast and slow [15]. We call this a software-managed tiered memory system. In this environment, operating system support is supposed to effectively use both DRAM and DCPMM because the full control is given to software. **This paper focuses on system software aspects of tiered memory systems by understanding how the hardware is organized.** ## Design > Based on DCPMM (Intel's 3D XPoint non-volatile memory) as a new tier between DRAM and SSD. ### Current AutoNUMA AutoNUMA will automatically migrate pages to a memory node closer to the threa running at runtimes. The operating system examines the access locality to find whether the accessed page is placed on the local memory or remote memory. If this is on the remote memory, the page is migrated to the local memory to avoid the remote accesses for subsequent requests. :::warning This approach improves the performance of applications running on DRAM-only NUMA systems. ::: ### AutoNUMA with memory-tiering :::warning **Problems** 1. Pages in the lower-tier are not promoted when the upper-tier is fully utilized. 2. Pages are never migrated to the CPU-less (lowertier) nodes due to a NUMA policy that does not apply to multi-tiered memory systems. 3. Frequently accessed pages from the lower-tier cannot be promoted without demoting less frequently accessed pages from the upper-tier. 4. Binary page classification (either active or inactive) is too coarse-grained to be used for tiering. --- - First, we notice that the upper-tier memory is ineffectively used because more frequently accessed pages (dark red) mainly reside in the lower-tier memory (node-2). In contrast, less frequently accessed pages are placed on the upper-tier memory. The primary reason for this is that the current memory management does not allow page promotion or migration to the upper-tier memory when there is no free space. Although such a design decision is reasonable for DRAM-only systems, we need to reconsider this assumption for multi-tiered memory systems. - Even though the page promotion or migration cannot be made to the best memory node satisfying the access tier as well as the access locality, there are effective alternatives to placement in multi-tiered memory systems. When page promotion fails from remote DCPMM to local DRAM, for example, we have two possible workarounds: placing the page either on the remote DRAM or on the local DCPMM. - Second, we observe the skewed page distribution across the memory nodes in the lower-tier memory. **This is because the page movement to CPU-less nodes (DCPMM) is not considered in the current Linux operating systems**. **Since the traditional OSes were designed under the assumption that memory access performance is highly affected by the access locality between CPU and memory nodes**, moving pages to CPU-less nodes does not occur. Only if the destination upper-tier memory has free space, the pages residing in the CPU-less node (lower-tier memory) can be promoted through AutoNUMA. ::: We take advantage of the AutoNUMA facility, which periodically scans memory pages and marks them inaccessible to capture non-local DRAM accesses. Once the pages are reaccessed, it incurs a page fault, called NUMA hinting page fault. We take the NUMA faults as demand signals for the page promotion from the DCPMM nodes or the migration from the remote DRAM node. We build the access history per page with the fault-based facility and use this information when demoting pages. ## AutoTiering - NUMA page fault: - If the demanded page is going to promote or migrate from the DCPMM nodes or the remote DRAM node, current linux will check the local DRAM has enough free space first. If there is no free space, the demanded page will stay at the original place. - To improve this, AutoTiering will place the demaded page to next best memory node in the multi-tiered memory hierarchy. - ![](https://i.imgur.com/fqhijg2.png) :::info **Other information** - [Intel Memory Latency Checker](https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html) :::