# Memory Hierarchy
## Introduction
Memory 想要同時兼顧速度快且空間大花費又小幾乎是不可能的,必須在`Speed`與`Space`間`trade off`,因此本篇主要講述如何利用有限的Memory資源,透過Memory Hierarchy 達到Memory Access efficiency
### Principle of Locality
* **Temporal locality**
items accessed recently are likely to be accessed again soon
* **Spatial locality**
items near those accessed recently are likely to be accessed soon
利用`Principle of Locality`優勢,Memory可以採用Hierarchy方式存取
* Store everything on disk
* Copy recently accessed (and nearby) items from disk to smaller DRAM memory (Main Memory)
* Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory (Cache memory attached to CPU)

:::warning
越靠近Processor的Memory存取速度越快,但Size也越小Cost越高
:::
### Memory Hierarchy Levels

* **Block (line**)
minimum unit of information that can be either present or not present in a two-level hierarchy
> ex: two-level hierarchy (cache and main memory)
* **Hit**
if accessed data is present in an upper level (ex: cache)
* Hit rate $=\frac{\# hits}{\# accesses}$
* Hit time
the time to access the upper level including the time needed to determine whether the access is a hit or miss
* **Miss**
if accessed data is absent in an upper level
:::warning
Miss handle
此時對應的block會從lower level coped到upper level,讓upper level存取
:::
* Miss rate $= \frac{\#\ misses}{\# accesses}= 1 - hit\ rate$
* Miss penalty
the time replace a upper level block + the time to deliver block to the processor
## Memory Technologies
### SRAM Technology
SRAMS are integrated circuits that are memory arrays
* Usually with a **single access port** for either a read or a write
* With a **fixed access time** to any datum (though the read and write access time may differ)
* Using six (or less) transistors per bit
::: warning
因SRAM需要更多個transistors儲存bit使得單位面積下能夠儲存的data較DRAM少
:::
### DRAM Technology
Data stored as a charge in a capacitor
* **Single transistor** used to access the charge
* **periodically** be refreshed
* Read contents and write back
* Performed on a DRAM "row"

::: warning
DRAM在讀取時會先讀取Row到Row buffer,再透過Col addr取得我們真正要的data
:::
::: warning
DRAM速度慢(相較SRAM)的原因是因為需要periodically refresh
:::
#### Advanced DRAM Organization
* Bits in a DRAM are organized as a rectangular array
* DRAM accesses an entire row
* Burst mode
Supply successive words from a row with reduced latency
* Double data rate (DDR) SDRAM
* With added clocks: Synchronous DRAM
* Transfer on rising and falling clock edges
::: warning
假如原本只能在rising edge讀取data,現在多了falling edge,所以每個cycle可以做2個讀取動作使得data rate double
:::
#### DRAM Performance Factors
* Row buffer
Allow serveral words to be read and refreshed in parallel
* Synchronous DRAM
Allows for consecutive accesses in bursts without needing to send each address
* DRAM banking
Allows simultaneous access to multiple DRAMs (accessed rotated between banks: address interleveing)
::: warning
假如需要4個banks裡的data,只需要1個access就可以同時讀取4 banks data
:::
::: warning
這些factor可以改善DRAM bandwidth
:::
### Flash
**Nonvolatile** semiconductor storage
* 100x ~ 1000x faster that disk
* smaller, lower power, more robust (compare to disk)
#### Flash type
* **NOR flash**: bit cell like a NOR gate
* Random read/write access
* Used for instruction memory in embedded systems
* **NAND flash**: bit cell like a NAND gate
* Denser (bits/area), but block-at-a-a-time access
* Cheaper per GiB
* Used for USB keys, media storage, ...
:::danger
**Problem**
Flash bits wear out after 1000's of accesses
如果存取同一個位置的data超過1000次以上,那個位置的data會損壞
**Solution**
**Wear leveling**: remap data to less used blocks
:::
### Disk Storage
**Nonvolatile**, rotating magnetic storage
* Serveral **platters**, with information recorded magnetically on both surfaces (usually)
* Bits recorded in **tracks**, which in turn divided into **sectors** (e.g., 512 Bytes). error correction code per sector to find and correct errors


#### Disk Sectors
* each sector records
* Sector ID
* Data (512 to 4096 bytes)
* Error correcting code (ECC)
* ...
* **access to a sector (important)**
1. **Queuing delay** (other accesses are pending)
2. **Seek**
move the head to the desired track
3. **Rotational latency**
wait for the desired sector to rotate under the head
4. **Data transfer**
5. **Controller overhead**
**Example**
::: success
Given 512B sector, 15,0000rpm (revolutions per minute), 4ms average seek time, 100MiB/s transfer rate, 0.2ms controller overhead, idle disk
:::
* Average read time = 4 ms (seek time) + $\frac{15000}{2\ \times 60}$ (rotation latency) + $\frac{512\ B}{100\ MiB/s}$ (Data transfer) + 0.2 ms (controller delay) = 4 ms + 2 ms + 0.005 ms + 0.2 ms = 6.2 ms
* if actual average seek time is 1 ms
* Average read time = 3.2 ms
#### Disk Performance Issues
* Manufacturers quote average seek time
* based on all possible seeks
:::warning
事實上,因為data locality 和 OS scheduling,<span class="red">actual</span> average seek times會比average seek times還要來的小
:::
* Smart disk controller allocates physical sectors on disk
* Presents **logical** sector interface to host
* Disk drives include caches
* **Prefetch** sectors in anticipation of access
* Avoid seek and rotational delay
:::warning
這裡的caches是Disk裡面的caches不是CPU的
:::
---
### Cache Memory
* How to we know if the data is present ?
* Where do we look ?
#### Direct-Mapped Cache
* Location determined by address
* Direct mapped: only one choice
* (Block address) $mod$ (# Blocks in cache)
* \# block is a power of 2 => (use low-order address bits)
### Reference
[1] P.A.Patterson and J.L. Hennessy, Computer Organization and Design RISC-V
###### tags: `Computer Architecture`
<style>
@import url('https://fonts.googleapis.com/css2?family=PT+Serif&display=swap');
@import url('https://fonts.googleapis.com/css?family=Noto+Serif+TC&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Fira+Code&family=Noto+Serif+TC&display=swap');
* {
font-family: 'PT Serif', 'Noto Serif TC', sans-serif;
letter-spacing: 0.02em;
/* font-size: 16px;
letter-spacing: 0.02em; */
}
.easy{
color:green;
}
.medium{
color:#eda35e;
}
.red{
color:red;
}
.proof {
padding:10px;
background-color: #f7f7f7;
}
</style>