## Week-10 Questions
### Topic : Cache
#### Question : Calculate cache size and tag addresses

1. Total cache size
The number of cache lines is independent of associativity (way), so no need to multiply 4.
$$
\text{Total Cache Size} = 4096 \times 64 = 262144\ \text{bytes} = 256\ \text{KB}
$$
2. Address range
* Address:`Tag | Index | Offset`
- Cache line size : 64B => Offset bits
$$
\text{Offset bits} = \log_2(64) = 6\ \text{bits}
$$
- Number of Sets => Index bits
$$
\text{Number of Sets} = \frac{4096}{4} = 1024\ \text{sets}
$$ $$
\text{Index bits} = \log_2(1024) = 10\ \text{bits}
$$
- Assume memory address is 32-bit => Tag bits
$$
\text{Tag bits} = 32 - 10 - 6 = 16\ \text{bits}
$$
---
#### Question : MESI protocol exercise


| Operation | P1\$ | P2\$ | P3\$ | P4\$ | Memory | Bus Operation | Which supply data |
| -------------- | ---- | ---- | ---- | ---- | ------ | -------------------- | ----------------- |
| Initial State | I | I | I | I | 0 | — | — |
| P1 load X | E | I | I | I | 0 | Read | Memory |
| P2 load X | S | S | I | I | 0 | Read | P1\$ |
| P1 store X (1) | M | I | I | I | 0 | Read-Exclusive (INV) | S (P2\$) |
| P3 load X | M | I | S | I | 0 | Read | P1\$ |
| P3 store X (2) | I | I | M | I | 0 | Read-Exclusive (INV) | P1\$ |
| P2 load X | I | S | M | I | 0 | Read | P3\$ |
| P1 load X | S | S | M | I | 0 | Read | P3\$ |
---
### Topic : Memory
#### Question : Memory system performance improvement techniques

| Design Technique | Category A, B, C |
| ------------------------------------------------ | ----------------------- |
| Reduce memory access latency | A. Reduce hit time |
| Higher cache associativity | B. Increase hit rate |
| Processor multithreading, OoO execution | C. Reduce miss penalty |
| Bigger cache | B. Increase hit rate |
| Virtually indexed physically tagged cache (VIPT) | A. Reduce hit time |
| Prefetch | C. Reduce miss penalty |
| Non-blocking caches | C. Reduce miss penalty |
| Critical word first | C. Reduce miss penalty |
| Read bypass write | A. Reduce hit time |
---
#### Question : Access array elements in storage order

**Snippet A gives better performance**
* Snippet A: Row-major order
Accessing elements across a row (j loop inside) keeps memory access spatially and temporally local, which greatly benefits from CPU cache prefetching.
* Snippet B: Column-major order
Leads to strided access, causing cache misses and degraded performance.
---
#### Question : DRAM address mapping

**1. How many memory channels?**
* Channel bit: 1-bit (bit 7)
$$
2^1 = 2\text{ } \text{memory channels}
$$
**2. How many DIMMs in each channel?**
* DIMM bit: 1-bit (bit 8)
$$
2^1 = 2\text{ }\text{DIMM per channel}
$$
**3. How many banks in each DIMM?**
* Bank bit: 4-bit
$$
2^4 = 16\text{ }\text{banks per DIMM}
$$
**4. If we have a dataset of 256B sequentially accessed, which CPU address bit should be assigned for channel decoding to maximize bandwidth?**
* Answer: Assign **bit 8** for channel decoding
* Reasoning:
(1) For sequential access, we want to distribute memory accesses across channels.
(2) 256B dataset => 2⁸ bytes : The lowest changing bit during sequential access is bit 0, so bit 8 changes every 256B.
(3) To maximize bandwidth, assign the lowest changing bit to the channel decoder.