# SoC Laboratory Assignment W10 > [TOC] ## Cache ### a) Calculate cache size and tag addresses Assume the following cache configuration - Cache line size = 64 Bytes - 4-way set-associative - 4K cache lines 1. What is the total cache size? 64B * 4K = 256KB 2. What is the address range in the tag? Assume that address width is `n` bits. Number of sets = 4K / 4 = 1K Tag width = n - 10 - 6 = (n - 16) --- ### b) MESI protocol exercise | Operations | P1$ | P2$ | P3$ | P4$ | Memory | Bus Operation | which supply data | | ----------------- | --- | --- | --- | --- | ------ | ------------------------- | ----------------- | | Initial State a X | I | I | I | I | 0 | | | | P1 load X | E | I | I | I | 0 | Read-Exculsive | Memory | | P2 load X | S | S | I | I | 0 | Read | P1$ | | P1 store X (1) | M | I | I | I | 0 | Read-Exculsive-Invalidate | | | P3 load X | S | I | S | I | 1 | Read | P1$ | | P3 store X (2) | I | I | M | I | 1 | Read-Exculsive-Invalidate | | | P2 load X | I | S | S | I | 2 | Read | P3$ | | P1 load X | S | S | S | I | 2 | Read | P3$ or P2$ | --- ## Memory ### c) Memory system performance improvement techniques Average memory access time (AMAT) is calculated by ``` AMAT = cache hit time + cache hit rate * cache miss penalty ``` From the equation, there are 3 factors > **A**. Reduce hit time > **B**. Increase hit rate > **C**. Reduce miss panalty Identify which category the each of the design technique falls into | Design Techniques | Category | | -------------------------------------------- | -------- | | Reduce memory access latency | C | | Higher cache associativity | B | | Processor multithreading, OoO execution | C | | Bigger cache | B | | Virtual indexed physical tagged (VIPT) cache | A | | Prefetch | C | | Non-blocking caches | C | | Critical word first | C | | Read bypass write | C | --- ### d) Access array elements in storage order C/C++: 2D arrays are stored in row-major order. * A ``` for (i = 0; i < 999; i++) for (j = 0; j < 999; j++) m[i][j] = m[i][j] + (m[i][j] * m[i][j]); ``` * B ``` for (j = 0; j < 999; j++) for (i = 0; i < 999; i++) m[i][j] = m[i][j] + (m[i][j] * m[i][j]); ``` A is better. Due to the spetial locality, the CPU can find next data in the cache. --- ### e) DRAM address mapping * Given the memory address mapping * How many memory channels? 2 * How many DIMMs are in each channel? 2 * How many banks are in each DIMM? 16 (8 DIMMs * 2 ranks) * Which CPU adress bit should be assigned for channel decoding? `address[8]`