--- tags: CA2020 --- # Assignment4: Cache contributed by < `kevin30292` > ## Ripes [Cache-Simulation](https://github.com/mortbopet/Ripes/wiki/Cache-Simulation) 1. Ways: Associativity specification. Specified in a power of two (ie. a "two-way set associative cache" will have ways=1 (2^1 = 2 ways) whereas a "direct mapped cache" will have ways=0 (2^0 = 1 way)). 2. Lines: Number of cache lines. The number of cache lines will define the size of the index used to index within the cache. Specified in a power of two. 3. Blocks: Number of blocks within each cache line. The number of blocks will define the size of the block index used to select a block within a cache line. Specified in a power of two. ## Exercise 1 - A Couple of Memory Access Scenarios ### cache.s ``` PSEUDOCODE: int array[]; for (k = 0; k < repcount; k++) { /* Step through the selected array segment with the given step size. */ for (index = 0; index < arraysize; index += stepsize) { if(option==0) array[index] = 0; // Option 0: One cache access - write else array[index] = array[index] + 1; // Option 1: Two cache accesses - read AND write } } ``` ### Scenario 1 Program Parameters: (set these by initializing the a registers in the code) >Array Size (a0): 128 (bytes, known from Venus) Step Size (a1): 8 Rep Count (a2): 4 Option (a3): 0 Cache Parameters: (set these in the Cache tab) >Cache Levels: 1 Block Size: 8 Number of Blocks: 4 Enable?: Should be green Placement Policy: Direct Mapped Associativity: 1 (Venus won’t let you change this, why?) Block Replacement Policy: LRU Set parameters and run the code can get the result: The hit rate in this case is 0. ![](https://i.imgur.com/ZAOL1Rs.png) #### Tasks 1. What combination of parameters is producing the hit rate you observe? * Because `Step Size` in bytes is exactly equal to `Block Size` in bytes. > The data needed always has same tag. 2. What is our hit rate if we increase Rep Count arbitrarily? Why? * No matter what Rep Count is. The hit rate always 0. Because the cache block always be replace. 3. How could we modify one program parameter to increase our hit rate? >Note: we don’t care if we access the same array elements. Just give us a program parameter modification that would increase the hit rate. * Change step size to 1. The hit rate can increase to 0.5. ### Scenario 2 Program Parameters: (set these by initializing the a registers in the code) Array Size (a0): 256 (bytes) >Step Size (a1): 2 Rep Count (a2): 1 Option (a3): 1 Cache Parameters: (set these in the Cache tab) >Cache Levels: 1 Block Size: 16 Number of Blocks: 16 Enable?: Should be green Placement Policy: N-Way Set Associative Associativity: 4 Block Replacement Policy: LRU Set parameters and run the code can get the result: The hit rate in this case is 0. ![](https://i.imgur.com/tz3YENe.png) #### Tasks 1. How many memory accesses are there per iteration of the inner loop? (not the one involving repcount). It’s not 1. * Every iteration read memory one time and write memory one time, so 2 times per iteration. 2. What is the repeating hit/miss pattern? WHY? (Hint: it repeats every 4 accesses). * The first of every 4 accesses will miss, and following 3 accesses will hit, and this pattern will repeat. 3. This should follow very straightforwardly from the above question: Explain the hit rate in terms of the hit/miss pattern. * In every 4 memory accesses only 1 miss, so the hit rate will be 3/4 = 0.75. 4. Keeping everything else the same, what happens to our hit rate as Rep Count goes to infinity? Why? Try it out by changing the appropriate program parameter and letting the code run! * Hit rate will approach to 1. After first Rep count, the following Rep will all have cache so the hit rate will close to 1. 5. You should have noticed that our hit rate was pretty high for this scenario, and your answer to the previous question should give you a good understanding of why. IF YOU ARE NOT SURE WHY, consider the size of the array and compare it to the size of the cache. Now, consider the following: ### Scenario 3 ## Exercise 2 - Loop Ordering and Matrix Multiplication To multiply two matrices, we can simply use 3 nested loops, assuming that matrices A, B, and C are all n-by-n and stored in one-dimensional column-major arrays: ```c= for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) C[i+j*n] += A[i+k*n] * B[k+j*n]; ``` ### Tasks 1. Record your results * ![](https://i.imgur.com/Xg5TKkA.png) 2. Which ordering(s) perform best for these 1000-by-1000 matrices? Why? * The best ordering is ==jki==. In this case, the most inner loop index is i, which can provide the best locality for cache. Also the second loop is k, which can provide better locality than j. :::info loop i: * moves through C with stride 1 * moves through A with stride 1 * moves through B with stride 0 loop k: * moves through C with stride 0 * moves through A with stride n * moves through B with stride 1 loop j: * moves through C with stride n * moves through A with stride 0 * moves through B with stride n ::: 3. Which ordering(s) perform the worst? Why? * The worest ordering is ==ikj==. In this case, the most inner loop index is j, which cause the highest miss in the most inner loop. 4. How does the way we stride through the matrices with respect to the innermost loop affect performance? * ## Exercise 3 - Cache Blocking and Matrix Transposition usage: ``` ./transpose <n> <blocksize> ``` ``` for(int i = 0; i < n; i += blocksize) for(int j = 0; j < n; j += blocksize) for(int k = i; ) ```