## Week-10 Questions ### Topic : Cache #### Question : Calculate cache size and tag addresses ![image](https://hackmd.io/_uploads/HJrV5StXxl.png) 1. Total cache size The number of cache lines is independent of associativity (way), so no need to multiply 4. $$ \text{Total Cache Size} = 4096 \times 64 = 262144\ \text{bytes} = 256\ \text{KB} $$ 2. Address range * Address:`Tag | Index | Offset` - Cache line size : 64B => Offset bits $$ \text{Offset bits} = \log_2(64) = 6\ \text{bits} $$ - Number of Sets => Index bits $$ \text{Number of Sets} = \frac{4096}{4} = 1024\ \text{sets} $$ $$ \text{Index bits} = \log_2(1024) = 10\ \text{bits} $$ - Assume memory address is 32-bit => Tag bits $$ \text{Tag bits} = 32 - 10 - 6 = 16\ \text{bits} $$ --- #### Question : MESI protocol exercise ![image](https://hackmd.io/_uploads/HJvScBFQex.png) ![image](https://hackmd.io/_uploads/Skf4ZUFXlg.png) | Operation | P1\$ | P2\$ | P3\$ | P4\$ | Memory | Bus Operation | Which supply data | | -------------- | ---- | ---- | ---- | ---- | ------ | -------------------- | ----------------- | | Initial State | I | I | I | I | 0 | — | — | | P1 load X | E | I | I | I | 0 | Read | Memory | | P2 load X | S | S | I | I | 0 | Read | P1\$ | | P1 store X (1) | M | I | I | I | 0 | Read-Exclusive (INV) | S (P2\$) | | P3 load X | M | I | S | I | 0 | Read | P1\$ | | P3 store X (2) | I | I | M | I | 0 | Read-Exclusive (INV) | P1\$ | | P2 load X | I | S | M | I | 0 | Read | P3\$ | | P1 load X | S | S | M | I | 0 | Read | P3\$ | --- ### Topic : Memory #### Question : Memory system performance improvement techniques ![image](https://hackmd.io/_uploads/ByFIcHtXll.png) | Design Technique | Category A, B, C | | ------------------------------------------------ | ----------------------- | | Reduce memory access latency | A. Reduce hit time | | Higher cache associativity | B. Increase hit rate | | Processor multithreading, OoO execution | C. Reduce miss penalty | | Bigger cache | B. Increase hit rate | | Virtually indexed physically tagged cache (VIPT) | A. Reduce hit time | | Prefetch | C. Reduce miss penalty | | Non-blocking caches | C. Reduce miss penalty | | Critical word first | C. Reduce miss penalty | | Read bypass write | A. Reduce hit time | --- #### Question : Access array elements in storage order ![image](https://hackmd.io/_uploads/H1yOqHK7ee.png) **Snippet A gives better performance** * Snippet A: Row-major order Accessing elements across a row (j loop inside) keeps memory access spatially and temporally local, which greatly benefits from CPU cache prefetching. * Snippet B: Column-major order Leads to strided access, causing cache misses and degraded performance. --- #### Question : DRAM address mapping ![image](https://hackmd.io/_uploads/r1TK9BK7el.png) **1. How many memory channels?** * Channel bit: 1-bit (bit 7) $$ 2^1 = 2\text{ } \text{memory channels} $$ **2. How many DIMMs in each channel?** * DIMM bit: 1-bit (bit 8) $$ 2^1 = 2\text{ }\text{DIMM per channel} $$ **3. How many banks in each DIMM?** * Bank bit: 4-bit $$ 2^4 = 16\text{ }\text{banks per DIMM} $$ **4. If we have a dataset of 256B sequentially accessed, which CPU address bit should be assigned for channel decoding to maximize bandwidth?** * Answer: Assign **bit 8** for channel decoding * Reasoning: (1) For sequential access, we want to distribute memory accesses across channels. (2) 256B dataset => 2⁸ bytes : The lowest changing bit during sequential access is bit 0, so bit 8 changes every 256B. (3) To maximize bandwidth, assign the lowest changing bit to the channel decoder.