###### tags: `computer organization` `note` `thu`
{%hackmd theme-dark %}
# Computer Organization Ch5
## Memory Hierarchy Levels
### Block (= line) : unit of copying
May be multiple words
### If accessed data is present in upper level
* Hit : access satisfied by upper level
* Hit ratio: hits/accesses
### If accessed data is absent
#### Miss: block copied from lower level
* Time taken: miss penalty
* Miss ratio: misses/accesses = 1 – hit ratio
## Memory Technology
### Static RAM (SRAM,主要是用作你CPU裡的Cache)
0.5ns – 2.5ns, $2000 – $5000 per GB
### Dynamic RAM (DRAM,就你的主記憶體啦)
50ns – 70ns, $20 – $75 per GB
### Magnetic disk (你家破爛的傳統硬碟)
5ms – 20ms, $0.20 – $2 per GB
### Ideal memory (早點睡吧,夢裡什麼都有)
Access time of SRAM
Capacity and cost/GB of disk

(cache 有level 1 到 level 3)
## Cache memory (Made by SRAM)

* Small amount of fast and high speed memory
* Sits between normal main memory and CPU
### Cache Read Operation
1. CPU requests contents of memory location
2. Check cache for this data
3. If present, get from cache (fast)
4. If not present, read required block from main memory to cache
5. Then deliver from cache to CPU
6. Cache includes tags to identify which block of main memory is in each cache slot
### Locality of Reference
memory references tend to cluster
#### ***$T_1\cdot 0.95+(T_1+T_2)\cdot 0.05$***
* 95% hit cache
* 5% cache miss
* T1 is cache access time
* T2 is memory access time
## Direct Mapped (Mapping)
### Mapping : address is modulo the number of blocks in the cache

從Decimal address of reference 去計算。以22為例,22除8會餘6,因此前面的10是tag、後面的110是餘數6
| 2 | 3|
| - | -|
| Tag | Line (valid bit)|

將整理的資料放在下方欄位分類

### Mapping Function (Direct)
* 64KBytes ($2^{16}$) cache memory
* Cache block of 4 bytes ($2^{2}$) each
i.e. cache is 16K ($2^{14}$) lines (slots) of 4 ($2^{2}$)bytes
* 16MBytes ($2^{24}$) main memory
* 24 bit address
For mapping purpose, we can consider main memory to consist of **4M blocks** of 4 bytes each.
**橫切割線:**
64KB =$2^{16}$B, $2^{16}$ /$2^{2}$ =$2^{14}$, line要16k條線,C-1 = 16k
**直切割線:**
64KB =$2^{16}$B, 因此需要16條線

按照cache去切割,大約是256倍

### Block 、Line 、Address
* Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
* Address is in two parts
Least Significant w bits identify unique word
Most Significant s bits specify one memory block

* 24 bit address (Main memory)
* 2 bit word identifier (4 byte for a block, line)
* **22** bit block identifier
8 bit tag (=22-14)
14 bit slot or line
* No two blocks in the same line have the same Tag field
* Check contents of cache by finding line and checking Tag
### Bits in a cache
* 32-bit byte address
* $2^n$ blocks with $2^m$ words ($2^{m+2}$ bytes)
#### Direct-mapped (mapping)
Tag field: 32- (n+2): 2 bits used for the byte offset
* **n** bits: used for index
* The number of bits in such a cache
$2^n$ x (block size + tag size + valid field size)
$2^n$ x (**mx32** + (32-n-m-2)+ 1)
$2^n$ x (mx32 + 31-n-m)
### Set Associative Mapping
* Cache is divided into a number of sets
* Each set contains a number of lines
* A given block maps to any line in a given set
e.g. Block B can be in any line of set i
#### e.g. 2 lines per set
* 2-way set associative mapping
* A given block can be in one of 2 lines in only one set
--------------------------
* Address length = (s + w) bits
* Number of addressable units = $2^{s+w}$ words or bytes
* Block size = line size = $2^w$ words or bytes
* Number of blocks in main memory = 2d
* Number of lines in set = k (2-way, 4-way, 8-way)
* Number of sets = v = $2^d$
* Number of lines in cache = kv = k × $2^d$
* Size of tag = (s – d) bits
* S=22, w=2, k=2, d=13
| Tag | Set | Word |
|:---:|:---:|:----:|
| 9 | 13 | 2 |

#### Taking Advantage of Spatial Locality

### Fully Associative Mapping
* A main memory block can load into any line of cache
* Memory address is interpreted as tag and word
* Tag uniquely identifies block of memory
* Every line’s tag is examined for a match
* Cache searching gets **expensive (linear searching)**
#### Fully Associative Mapping Address structure
* 22 bit tag stored with each 32 bit block of data
* Compare tag field with tag entry in cache to check for hit
* Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block
## Cache Replacement Algorithms (空間不夠時決定犧牲誰的演算法)
#### 1. Hardware implemented algorithm for Direct Mapped
* No choice, each block only maps to one line
* Replace that line
一個Block對到就就只有一個Line,所以該換掉時就直接換掉了。
#### 2. Hardware implemented algorithm for Set Associate and Fully Associative
* **Least Recently used (LRU) --- 最近失寵了是吧**
**e.g. in 2 way set associative
Which of the 2 block is LRU?**
* **Least frequently used (LFU) --- 不要佔著茅坑不拉屎啊
replace block which has had fewest hits**
* First in first out (FIFO) --- ***幹你廁所都蹲了那麼久了還不快點出來***
replace block that has been in cache longest
* Random --- ***笑死你人品不好的都給我滾出來啦***
### Hits vs. Misses
* Read hits
this is what we want!
* Read misses
stall the CPU, fetch block from memory, deliver to cache, restart
* Write hits
can replace data in cache and memory (write-through)
write the data only into the cache (write-back the cache later)
* Write misses
read the entire block into the cache, then write the word
### What happens on a write?
#### Write work somewhat differently
* Suppose on a store instruction
Write the data into only the data cache
Memory would have different value
-->The cache & memory are “inconsistent”
* Keep the main memory & cache
Always write the data into both the memory and the cache
Called **write-through** (直接寫入)
* Although this design handles writes simple
Not provide very good performance
#### 我們需要放一個Write Buffer在Cache和Memory中間
* A **queue** that holds data while the data are waiting to be written to memory
* Processor:
writes data into the cache and the write buffer
* Memory controller:
write contents of the buffer to memory

#### Write back (間接寫入;也就是只寫快的一邊)
* New value only written only to the block in the cache
* The modified block is written to the lower level of the hierarchy when it is replaced
* Updates initially made in cache only
* Update bit for cache slot is set when update occurs
* If block is to be replaced, write to main memory only if update bit is set
* Other caches get out of sync
* I/O must access main memory through cache
#### Write through (直接寫入;也就是兩邊都寫)
* All writes go to main memory as well as cache
* Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
* Lots of traffic
* Slows down writes