# SoC Laboratory Assignment W10
> [TOC]
## Cache
### a) Calculate cache size and tag addresses
Assume the following cache configuration
- Cache line size = 64 Bytes
- 4-way set-associative
- 4K cache lines
1. What is the total cache size?
64B * 4K = 256KB
2. What is the address range in the tag?
Assume that address width is `n` bits.
Number of sets = 4K / 4 = 1K
Tag width = n - 10 - 6 = (n - 16)
---
### b) MESI protocol exercise
| Operations | P1$ | P2$ | P3$ | P4$ | Memory | Bus Operation | which supply data |
| ----------------- | --- | --- | --- | --- | ------ | ------------------------- | ----------------- |
| Initial State a X | I | I | I | I | 0 | | |
| P1 load X | E | I | I | I | 0 | Read-Exculsive | Memory |
| P2 load X | S | S | I | I | 0 | Read | P1$ |
| P1 store X (1) | M | I | I | I | 0 | Read-Exculsive-Invalidate | |
| P3 load X | S | I | S | I | 1 | Read | P1$ |
| P3 store X (2) | I | I | M | I | 1 | Read-Exculsive-Invalidate | |
| P2 load X | I | S | S | I | 2 | Read | P3$ |
| P1 load X | S | S | S | I | 2 | Read | P3$ or P2$ |
---
## Memory
### c) Memory system performance improvement techniques
Average memory access time (AMAT) is calculated by
```
AMAT = cache hit time + cache hit rate * cache miss penalty
```
From the equation, there are 3 factors
> **A**. Reduce hit time
> **B**. Increase hit rate
> **C**. Reduce miss panalty
Identify which category the each of the design technique falls into
| Design Techniques | Category |
| -------------------------------------------- | -------- |
| Reduce memory access latency | C |
| Higher cache associativity | B |
| Processor multithreading, OoO execution | C |
| Bigger cache | B |
| Virtual indexed physical tagged (VIPT) cache | A |
| Prefetch | C |
| Non-blocking caches | C |
| Critical word first | C |
| Read bypass write | C |
---
### d) Access array elements in storage order
C/C++: 2D arrays are stored in row-major order.
* A
```
for (i = 0; i < 999; i++)
for (j = 0; j < 999; j++)
m[i][j] = m[i][j] + (m[i][j] * m[i][j]);
```
* B
```
for (j = 0; j < 999; j++)
for (i = 0; i < 999; i++)
m[i][j] = m[i][j] + (m[i][j] * m[i][j]);
```
A is better. Due to the spetial locality, the CPU can find next data in the cache.
---
### e) DRAM address mapping
* Given the memory address mapping
* How many memory channels?
2
* How many DIMMs are in each channel?
2
* How many banks are in each DIMM?
16 (8 DIMMs * 2 ranks)
* Which CPU adress bit should be assigned for channel decoding?
`address[8]`