Try   HackMD

Cache simulation and case study

葉人豪

Cache Mapping Introduction

Direct-Mapped Cache

Each memory block maps to exactly one cache line. The mapping is straightforward, often calculated using modulo arithmetic.

Advantages:

  1. Simple and fast implementation.
  2. Low hardware cost.

Disadvantages:

  1. High conflict misses due to fixed mapping.
  2. Reduced performance for workloads with frequent block replacements.

Best use at applications with predictable and regular memory access patterns.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Fully Associative Cache

Any memory block can be stored in any cache line. This design uses content-addressable memory (CAM) to check all lines simultaneously.

Advantages:

  1. Minimal conflict misses.
  2. High flexibility in block placement.

Disadvantages:

  1. Expensive hardware due to CAM.
  2. Slower lookups for larger caches.

Best use at critical systems where minimizing misses is crucial, despite the cost.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Set-Associative Cache

A compromise between direct-mapped and fully associative designs. Memory blocks are mapped to a set of lines, and within each set, any line can hold the block.

Advantages:

  1. Reduces conflict misses compared to direct-mapped cache.
  2. Offers a balance between hardware complexity and performance.

Disadvantages:

  1. Higher cost and complexity than direct-mapped cache.
  2. Slightly slower access compared to direct-mapped designs.

Best use at general-purpose systems requiring a trade-off between cost and performance.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Comparison

Feature Direct-Mapped Fully Associative Set-Associative
Conflict Misses High Low Moderate
Hardware Complexity Low High Moderate
Access Time Fast Slow Moderate
Cost Low High Moderate
Use Cases Predictable patterns High-performance systems General-purpose systems

Cache Replacement Policies

Least Recently Used (LRU)

Evict the cache block that has not been accessed for the longest time, assuming it is the least likely to be reused soon. Maintain a tracking mechanism just like a stack or counter to record the access order of cache blocks.

Advantages:

  1. Reduces misses for workloads with strong temporal locality where recently accessed data is likely to be reused soon.

Disadvantages:

  1. Complex to implement, especially for high associativity caches, as it requires significant metadata to track usage.
  2. Increased overhead in managing and updating access order.

Use at applications with predictable data reuse patterns, like loops in numerical computations.

Random Replacement

Evict a randomly selected cache block. Use a random number generator to select the victim block.

Advantages:

  1. Extremely simple to implement.
  2. Low hardware overhead.

Disadvantages:

  1. Can result in suboptimal performance for workloads with temporal locality.
  2. Risk of evicting frequently used blocks.

Use at situations where hardware simplicity is prioritized, or when cache access patterns are unpredictable.

First-In-First-Out (FIFO)

Evict the oldest cache block in a set, irrespective of its access frequency. Maintain a queue for each cache set to track the order in which blocks were added.
Replace the block at the front of the queue.

Advantages

  1. Simple to implement, as it doesn't require tracking access frequency.

Disadvantages

  1. Inefficient for workloads where older blocks are still frequently accessed.
  2. Use Case: Simple systems or workloads with minimal temporal locality.

Least Frequently Used (LFU)

Evict the cache block with the fewest accesses, assuming it is less likely to be reused. Maintain a counter for each cache block to track the number of accesses.

Advantages

  1. Effective for workloads where access frequency correlates with reuse likelihood.

Disadvantages

  1. High overhead to maintain and update counters.
  2. Susceptible to aging issues, where old but frequently accessed blocks remain indefinitely.

Use at workloads with stable access patterns, like databases or file systems.

Most Recently Used (MRU)

Evict the most recently accessed block, based on the idea that it may not be needed soon. Similar to LRU, but in reverse: track the most recent access and evict it.

Advantages

  1. Useful in scenarios with cyclic or non-temporal access patterns.

Disadvantages

  1. Rarely effective for workloads with typical temporal locality.

Use at specialized workloads where recent accesses are unlikely to be reused soon.

Comparison

Policy Simplicity Overhead Performance Best for Workloads
LRU Medium High High Strong temporal locality
Random High Low Medium Unpredictable access patterns
FIFO High Low Medium Minimal locality
LFU Low High Medium Stable access frequency
MRU Medium Medium Low Non-temporal access patterns

Problem 1 : Quiz5 Problem B 2023 Fall

Evaluate the cache performance (2-way set associative, 8 sets, block size of 2 words) using the given RISC-V assembly code. This program cycles through arrays A (starting at 0x130) and B (starting at 0x200), performing the operation A[i] - B[i] and storing the result back in A[i].

# Initial register settings:
# x1 = 0 (loop index i)
# x2 = 4 (size of arrays A and B)
. = 0x0              # Code begins at address 0x0
loop:
    slli x3, x1, 2   # Convert index to byte offset
    lw x4, 0x130(x3) # Load A[i]
    lw x5, 0x200(x3) # Load B[i]
    sub x4, x4, x5   # Calculate A[i] - B[i]
    sw x4, 0x130(x3) # Store result in A[i]
    addi x1, x1, 1   # Increment index
    blt x1, x2, loop # Repeat 4 times
    unimp            # End of this program

Problem 2 : Quiz5 Problem I 2024 Fall