# [Paper] NVMain: An Architectural-Level Main Memory Simulator for Emerging Non-volatile Memories ###### tags: `research-GraphRC` ## What is NVMain - NVMain is an **architectural-level** simulator which can model main memory design with both DRAM and emerging non-volatile memory technologies ## Why DRAM is out-dated ? - Hard to **decreasing DRAM half-pitch**, which is needed to improve memory capacity - Much of the power dissipation is due to the **interconnect** - Hard to find reliable charge storage devices - Sensing mechanism - Expensive in terms of power (refresh and stand-by power) ## Why NVM memory ? - The non-volatility gurantees that data is not lost. Data doesn't need to be refreshed and the peripheral circuitry can be **power gated** without risk of data lost - However, **write energy** of non-volatile memories is normally **high**. This might cancel some of those savings. ## Major difference between DRAM and NVM ### Write latency - Write latency on NVM can be orders of magnitude larger than the read latency. ### Endurance - | Endurance | PCM | DRAM | | ------------- | ------------ | -------- | | Written times | $10^8 - 10^9$| $10^{15}$| - NVM can only be written a finite number of times. While DRAM systems are considered to have infinite endurance. ## Bank scheduling concern ### Why most DRAM has multiple banks ? - Most main memory today is multi-bank because the use of **a single bank** can cause **latency to increase** - Main memory is normally partitioned into **multiple banks** which are joined together using a **shared interconnection bus or network**. - The most common one is **DDR-based transmission line bus** ### Multi-bank issue - Consider the following configuration - ![](https://i.imgur.com/MgT4s9m.png) - 1 rank = N devices - 1 device = many banks + 1 shared row buffer - Each device typically outputs 4, 8 or 16 bits - N is chosen to have **64-bits** of data output **in a single clock cycle** (JEDEC standard) #### Hardware resource sharing - Multiple banks are grouped together and share the same bus - Memory controller ensure no bus contention for different banks #### Peak current draw (only for NVM) - Limit the amount of power used by a **single device** to limit the **peak current draw** when multiple banks operate at the same time - For NVM, the write enery is often much higher than read energy, and can take longer time - Siginificant overlap of **writes** in several banks along with **activation** energy cause large spike in peak current ## Single bank timing | Column 1 | DRAM | NVM | | -------- | ----------------------- | ------------------- | | R/W cycle| $W_{cyc} \sim R_{cyc}$ | $W_{cyc} > R_{cyc}$ | | tRC | $tRCD + tRAS + tRP$ | $tRCD + tBURST + tRP$ | - **Read write cycle count** - Intuitively, read should take less cycles to complete than write - However, reading the data from a DRAM cell causes the data to be destroyed. Thus, data must be restored before precharge phase begin - In NVM, the data stored in the cells is not destroyed, and thus the data restoration process is unnecessary - **Read cycle time (tRC)** - **tRCD** is the activation time - **tRP** is the precharge time - **tBURST** is the burst time - **tRAS** is the data restoration time (7x of **tBURST**) - In NVM system, the data stored in the cells is not destroyed. Thus, there's no need to "restore data" after a read ### Bank-level timing - ![](https://i.imgur.com/1PHwBix.png) ## Inter bank timing - Delay in traditional DRAM - **tFAW (four-activation-window)**: the minimum time interval in which 4 activations can occur (save power) - **tRRD (row-to-row activation delay)**: to space out current-hungry row activation command to prevent drawing too much current - Delay introduced for NVM - **tWWD (write-to-write delay)** - **tWAD (write-to-activation delay)** - **tAWD (activation-to-write delay)** - For NVM, the write time can be hundreds of nanoseconds. Which means overlap of writes and row activations or with other writes ia very possible. These situation should be considered and spaced out using these parameters ### Rank-level timing - ![](https://i.imgur.com/7sN2B0v.png) ## How to determine NVM timing - **NVSim** models the area, timing, dynamic energy and leakage power of Phase-Change Memory (PCM), Spin-Torque-Transfer RAM (STT-RAM), Resistive RAM (ReRAM) or memristor, Floating Body Dynamic RAM (FBDRAM) and Single-Level Cell NAND Flash. - Using this simulator we can design a **single bank** based on constraints such as **area, latency, energy or energy delay product** - Energy and latency are calculated based on the **word size** - Timing: (Time in NVSim) / (clock period) = num of cycles - Power: Exact values of R/W energy from NVSim (Write energy per bit) ## NVM endurance model ### What is endurance modeling ? - A technique that can be used to increase the overall lifetime of the whole memory system - Memory is read and written by cache. In this case, a NVM updating a value from 2 -> 3 will only change 1 out of 512 bits ### DCW - In this case, only 1 bit should be written to preserve the lifetime of the remaining bits. The drawback is that data must be read before it is written back. This is **DCW (Data-Comparison Write)** - Here's how we achieve that - Add bank circuitry (i.e. XOR) to compare bits - Read once for many write, and write once with masks ### FlipNWrite - Count the number of bits that would be written for a particular partition of memory. If the num of bits being changed is more than half, then the data is **inverted** and **a flipped bit is set** - Data is written either inverted or non-inverted based on which has the **lowest hamming distance** - On reads, the flip bit is checked to determin if it should be inverted ### Comparison - Num of writes in FlipNWrite should be less than DCW - ![](https://i.imgur.com/KhTKMGA.png) ## What's new in NVMain 2.0 ### Energy modeling - Traditional way (DRAM): Plublished current number (**IDD value**) measured on an actual device; however, it's not available for NVM - Propose two device-level energy modeling - Current mode (based on IDD) - Energy mode (based on NVM simulator) ### Subarray-level parallelism - Subarray is defined as the basic block to support various subarray-level parallelism modes - Subarray contains the MLC write, endurance modeling, and fault model - Allow fine-grained refresh - All-bank refresh in DDR - Per-bank refresh in LPDDR - Bank-group refresh in DDR4 ### Memory object hook - Hooks are external memory objects which can snoop on requests arriving or returning from particular memory objects ### MLC - **Data encoding** is a technique that ultimately changes written to the memory cell - Data encoding is common and well-studied in NVM. However, normally ignored in DRAM system