# Cache simulation and case study > 葉人豪 ## Cache Mapping Introduction ### Direct-Mapped Cache Each memory block maps to exactly one cache line. The mapping is straightforward, often calculated using modulo arithmetic. #### Advantages: 1. Simple and fast implementation. 1. Low hardware cost. #### Disadvantages: 1. High conflict misses due to fixed mapping. 1. Reduced performance for workloads with frequent block replacements. **Best use at applications with predictable and regular memory access patterns.** ![截圖 2025-01-22 凌晨1.25.20](https://hackmd.io/_uploads/ByTtd8TD1g.png) ### Fully Associative Cache Any memory block can be stored in any cache line. This design uses content-addressable memory (CAM) to check all lines simultaneously. #### Advantages: 1. Minimal conflict misses. 1. High flexibility in block placement. #### Disadvantages: 1. Expensive hardware due to CAM. 1. Slower lookups for larger caches. ***Best use at critical systems where minimizing misses is crucial, despite the cost.*** ![截圖 2025-01-22 凌晨1.49.22](https://hackmd.io/_uploads/r1g4RUpwJx.png) ### Set-Associative Cache A compromise between direct-mapped and fully associative designs. Memory blocks are mapped to a set of lines, and within each set, any line can hold the block. #### Advantages: 1. Reduces conflict misses compared to direct-mapped cache. 1. Offers a balance between hardware complexity and performance. #### Disadvantages: 1. Higher cost and complexity than direct-mapped cache. 1. Slightly slower access compared to direct-mapped designs. ***Best use at general-purpose systems requiring a trade-off between cost and performance.*** ![截圖 2025-01-22 凌晨1.37.02](https://hackmd.io/_uploads/rJFHi8Tvkl.png) ### Comparison | Feature | Direct-Mapped | Fully Associative | Set-Associative | | -------- | -------- | -------- | -------- | | Conflict Misses | High | Low | Moderate | | Hardware Complexity | Low | High | Moderate | | Access Time | Fast | Slow | Moderate | | Cost | Low | High | Moderate | | Use Cases | Predictable patterns | High-performance systems | General-purpose systems | ## Cache Replacement Policies ### Least Recently Used (LRU) Evict the cache block that has not been accessed for the longest time, assuming it is the least likely to be reused soon. Maintain a tracking mechanism just like a stack or counter to record the access order of cache blocks. #### Advantages: 1. Reduces misses for workloads with strong temporal locality where recently accessed data is likely to be reused soon. #### Disadvantages: 1. Complex to implement, especially for high associativity caches, as it requires significant metadata to track usage. 1. Increased overhead in managing and updating access order. ***Use at applications with predictable data reuse patterns, like loops in numerical computations.*** ### Random Replacement Evict a randomly selected cache block. Use a random number generator to select the victim block. #### Advantages: 1. Extremely simple to implement. 1. Low hardware overhead. #### Disadvantages: 1. Can result in suboptimal performance for workloads with temporal locality. 1. Risk of evicting frequently used blocks. ***Use at situations where hardware simplicity is prioritized, or when cache access patterns are unpredictable.*** ### First-In-First-Out (FIFO) Evict the oldest cache block in a set, irrespective of its access frequency. Maintain a queue for each cache set to track the order in which blocks were added. Replace the block at the front of the queue. #### Advantages 1. Simple to implement, as it doesn't require tracking access frequency. #### Disadvantages 1. Inefficient for workloads where older blocks are still frequently accessed. 1. Use Case: Simple systems or workloads with minimal temporal locality. ### Least Frequently Used (LFU) Evict the cache block with the fewest accesses, assuming it is less likely to be reused. Maintain a counter for each cache block to track the number of accesses. #### Advantages 1. Effective for workloads where access frequency correlates with reuse likelihood. #### Disadvantages 1. High overhead to maintain and update counters. 1. Susceptible to aging issues, where old but frequently accessed blocks remain indefinitely. ***Use at workloads with stable access patterns, like databases or file systems.*** ### Most Recently Used (MRU) Evict the most recently accessed block, based on the idea that it may not be needed soon. Similar to LRU, but in reverse: track the most recent access and evict it. #### Advantages 1. Useful in scenarios with cyclic or non-temporal access patterns. #### Disadvantages 1. Rarely effective for workloads with typical temporal locality. ***Use at specialized workloads where recent accesses are unlikely to be reused soon.*** ### Comparison |Policy|Simplicity|Overhead|Performance|Best for Workloads| |----|----|----|----|----| |LRU|Medium|High|High|Strong temporal locality| |Random|High|Low|Medium|Unpredictable access patterns| |FIFO|High|Low|Medium|Minimal locality| |LFU|Low|High|Medium|Stable access frequency| |MRU|Medium|Medium|Low|Non-temporal access patterns| ## Problem 1 : [Quiz5 Problem B 2023 Fall](https://hackmd.io/@sysprog/arch2023-quiz5-sol#Problem-B) Evaluate the cache performance (2-way set associative, 8 sets, block size of 2 words) using the given RISC-V assembly code. This program cycles through arrays A (starting at ```0x130```) and B (starting at ```0x200```), performing the operation ```A[i] - B[i]``` and storing the result back in ```A[i]```. ```cpp # Initial register settings: # x1 = 0 (loop index i) # x2 = 4 (size of arrays A and B) . = 0x0 # Code begins at address 0x0 loop: slli x3, x1, 2 # Convert index to byte offset lw x4, 0x130(x3) # Load A[i] lw x5, 0x200(x3) # Load B[i] sub x4, x4, x5 # Calculate A[i] - B[i] sw x4, 0x130(x3) # Store result in A[i] addi x1, x1, 1 # Increment index blt x1, x2, loop # Repeat 4 times unimp # End of this program ``` ## Problem 2 : [Quiz5 Problem I 2024 Fall](https://hackmd.io/@sysprog/arch2024-quiz5-sol?stext=29883%3A10%3A0%3A1736592838%3AGtSXJJ)