葉人豪
Each memory block maps to exactly one cache line. The mapping is straightforward, often calculated using modulo arithmetic.
Best use at applications with predictable and regular memory access patterns.
Any memory block can be stored in any cache line. This design uses content-addressable memory (CAM) to check all lines simultaneously.
Best use at critical systems where minimizing misses is crucial, despite the cost.
A compromise between direct-mapped and fully associative designs. Memory blocks are mapped to a set of lines, and within each set, any line can hold the block.
Best use at general-purpose systems requiring a trade-off between cost and performance.
Feature | Direct-Mapped | Fully Associative | Set-Associative |
---|---|---|---|
Conflict Misses | High | Low | Moderate |
Hardware Complexity | Low | High | Moderate |
Access Time | Fast | Slow | Moderate |
Cost | Low | High | Moderate |
Use Cases | Predictable patterns | High-performance systems | General-purpose systems |
Evict the cache block that has not been accessed for the longest time, assuming it is the least likely to be reused soon. Maintain a tracking mechanism just like a stack or counter to record the access order of cache blocks.
Use at applications with predictable data reuse patterns, like loops in numerical computations.
Evict a randomly selected cache block. Use a random number generator to select the victim block.
Use at situations where hardware simplicity is prioritized, or when cache access patterns are unpredictable.
Evict the oldest cache block in a set, irrespective of its access frequency. Maintain a queue for each cache set to track the order in which blocks were added.
Replace the block at the front of the queue.
Evict the cache block with the fewest accesses, assuming it is less likely to be reused. Maintain a counter for each cache block to track the number of accesses.
Use at workloads with stable access patterns, like databases or file systems.
Evict the most recently accessed block, based on the idea that it may not be needed soon. Similar to LRU, but in reverse: track the most recent access and evict it.
Use at specialized workloads where recent accesses are unlikely to be reused soon.
Policy | Simplicity | Overhead | Performance | Best for Workloads |
---|---|---|---|---|
LRU | Medium | High | High | Strong temporal locality |
Random | High | Low | Medium | Unpredictable access patterns |
FIFO | High | Low | Medium | Minimal locality |
LFU | Low | High | Medium | Stable access frequency |
MRU | Medium | Medium | Low | Non-temporal access patterns |
Evaluate the cache performance (2-way set associative, 8 sets, block size of 2 words) using the given RISC-V assembly code. This program cycles through arrays A (starting at 0x130
) and B (starting at 0x200
), performing the operation A[i] - B[i]
and storing the result back in A[i]
.
# Initial register settings:
# x1 = 0 (loop index i)
# x2 = 4 (size of arrays A and B)
. = 0x0 # Code begins at address 0x0
loop:
slli x3, x1, 2 # Convert index to byte offset
lw x4, 0x130(x3) # Load A[i]
lw x5, 0x200(x3) # Load B[i]
sub x4, x4, x5 # Calculate A[i] - B[i]
sw x4, 0x130(x3) # Store result in A[i]
addi x1, x1, 1 # Increment index
blt x1, x2, loop # Repeat 4 times
unimp # End of this program