# Cache simulation and case study
> 葉人豪
## Cache Mapping Introduction
### Direct-Mapped Cache
Each memory block maps to exactly one cache line. The mapping is straightforward, often calculated using modulo arithmetic.
#### Advantages:
1. Simple and fast implementation.
1. Low hardware cost.
#### Disadvantages:
1. High conflict misses due to fixed mapping.
1. Reduced performance for workloads with frequent block replacements.
**Best use at applications with predictable and regular memory access patterns.**

### Fully Associative Cache
Any memory block can be stored in any cache line. This design uses content-addressable memory (CAM) to check all lines simultaneously.
#### Advantages:
1. Minimal conflict misses.
1. High flexibility in block placement.
#### Disadvantages:
1. Expensive hardware due to CAM.
1. Slower lookups for larger caches.
***Best use at critical systems where minimizing misses is crucial, despite the cost.***

### Set-Associative Cache
A compromise between direct-mapped and fully associative designs. Memory blocks are mapped to a set of lines, and within each set, any line can hold the block.
#### Advantages:
1. Reduces conflict misses compared to direct-mapped cache.
1. Offers a balance between hardware complexity and performance.
#### Disadvantages:
1. Higher cost and complexity than direct-mapped cache.
1. Slightly slower access compared to direct-mapped designs.
***Best use at general-purpose systems requiring a trade-off between cost and performance.***

### Comparison
| Feature | Direct-Mapped | Fully Associative | Set-Associative |
| -------- | -------- | -------- | -------- |
| Conflict Misses | High | Low | Moderate |
| Hardware Complexity | Low | High | Moderate |
| Access Time | Fast | Slow | Moderate |
| Cost | Low | High | Moderate |
| Use Cases | Predictable patterns | High-performance systems | General-purpose systems |
## Cache Replacement Policies
### Least Recently Used (LRU)
Evict the cache block that has not been accessed for the longest time, assuming it is the least likely to be reused soon. Maintain a tracking mechanism just like a stack or counter to record the access order of cache blocks.
#### Advantages:
1. Reduces misses for workloads with strong temporal locality where recently accessed data is likely to be reused soon.
#### Disadvantages:
1. Complex to implement, especially for high associativity caches, as it requires significant metadata to track usage.
1. Increased overhead in managing and updating access order.
***Use at applications with predictable data reuse patterns, like loops in numerical computations.***
### Random Replacement
Evict a randomly selected cache block. Use a random number generator to select the victim block.
#### Advantages:
1. Extremely simple to implement.
1. Low hardware overhead.
#### Disadvantages:
1. Can result in suboptimal performance for workloads with temporal locality.
1. Risk of evicting frequently used blocks.
***Use at situations where hardware simplicity is prioritized, or when cache access patterns are unpredictable.***
### First-In-First-Out (FIFO)
Evict the oldest cache block in a set, irrespective of its access frequency. Maintain a queue for each cache set to track the order in which blocks were added.
Replace the block at the front of the queue.
#### Advantages
1. Simple to implement, as it doesn't require tracking access frequency.
#### Disadvantages
1. Inefficient for workloads where older blocks are still frequently accessed.
1. Use Case: Simple systems or workloads with minimal temporal locality.
### Least Frequently Used (LFU)
Evict the cache block with the fewest accesses, assuming it is less likely to be reused. Maintain a counter for each cache block to track the number of accesses.
#### Advantages
1. Effective for workloads where access frequency correlates with reuse likelihood.
#### Disadvantages
1. High overhead to maintain and update counters.
1. Susceptible to aging issues, where old but frequently accessed blocks remain indefinitely.
***Use at workloads with stable access patterns, like databases or file systems.***
### Most Recently Used (MRU)
Evict the most recently accessed block, based on the idea that it may not be needed soon. Similar to LRU, but in reverse: track the most recent access and evict it.
#### Advantages
1. Useful in scenarios with cyclic or non-temporal access patterns.
#### Disadvantages
1. Rarely effective for workloads with typical temporal locality.
***Use at specialized workloads where recent accesses are unlikely to be reused soon.***
### Comparison
|Policy|Simplicity|Overhead|Performance|Best for Workloads|
|----|----|----|----|----|
|LRU|Medium|High|High|Strong temporal locality|
|Random|High|Low|Medium|Unpredictable access patterns|
|FIFO|High|Low|Medium|Minimal locality|
|LFU|Low|High|Medium|Stable access frequency|
|MRU|Medium|Medium|Low|Non-temporal access patterns|
## Problem 1 : [Quiz5 Problem B 2023 Fall](https://hackmd.io/@sysprog/arch2023-quiz5-sol#Problem-B)
Evaluate the cache performance (2-way set associative, 8 sets, block size of 2 words) using the given RISC-V assembly code. This program cycles through arrays A (starting at ```0x130```) and B (starting at ```0x200```), performing the operation ```A[i] - B[i]``` and storing the result back in ```A[i]```.
```cpp
# Initial register settings:
# x1 = 0 (loop index i)
# x2 = 4 (size of arrays A and B)
. = 0x0 # Code begins at address 0x0
loop:
slli x3, x1, 2 # Convert index to byte offset
lw x4, 0x130(x3) # Load A[i]
lw x5, 0x200(x3) # Load B[i]
sub x4, x4, x5 # Calculate A[i] - B[i]
sw x4, 0x130(x3) # Store result in A[i]
addi x1, x1, 1 # Increment index
blt x1, x2, loop # Repeat 4 times
unimp # End of this program
```
## Problem 2 : [Quiz5 Problem I 2024 Fall](https://hackmd.io/@sysprog/arch2024-quiz5-sol?stext=29883%3A10%3A0%3A1736592838%3AGtSXJJ)