Try   HackMD

Cache simulation for various matrix operations

黃灝

In this project, our input are 32*32 interger matrix

1. Matrix multiplication

// matrix multiplication
for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        C[i][j] = 0;
        for (int k = 0; k < n; k++) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}

A. Different Cache Parameters

N-Way Set-Associative Cache

Fixed Cache Parameters:

  • Cache level : 1
  • Block size : 4 bytes
  • Number of Blocks: 16
  • Block Replacement Policy: LRU
Direct Mapped(1-Way-associated)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This picture is the cache with direct Mapped(1-Way) and 16 blocks,each block size is 4 bytes (Ripes).

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

2-way associated

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This picture is the cache with 2-way associated and 16 blocks,each block size is 4 bytes.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

4-way associated

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This picture is the cache with 4-way associated and 16 blocks,each block size is 4 bytes.
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

We can observe that increasing the number of ways in a cache, while keeping the number of blocks constant, can improve the hit rate. This is because a higher associativity reduces conflict misses, which occur when multiple memory blocks map to the same cache set. By transitioning from direct-mapped (1-way associative) to N-way associative caches, each set can store more memory blocks, making the cache more adaptable to diverse access patterns.

Different Block Sizes

Fixed Cache Parameters:

  • Cache level : 1
  • Placement Policy : Direct Mapped
  • Number of Blocks: 16
  • Block Replacement Policy: LRU
Block Sizes Result
4 bytes
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
8 bytes
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
16 bytes
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Increasing the block size improves the cache hit rate by leveraging spatial locality. Larger block sizes allow more adjacent data to be loaded into the cache with each memory access.

B. Loop Ordering

We know that different loop orders can lead to variations in cache performance, primarily due to their impact on data's spatial and temporal locality.

Fixed Cache Parameters:

  • Cache level : 1
  • Placement Policy : 2-way associated
  • Number of Blocks: 32
  • Block size : 16 bytes
  • Block Replacement Policy: LRU
loop Result
ijk
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
ikj
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
kij
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
kji
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
jik
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
jki
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

We can observe that ikj and kij have higher hit rates because they make more efficient use of cache spatial and temporal locality. In these loop orders, the access to B[k][j] is row-wise, aligning well with Row-Major memory storage and reducing cache misses. Additionally, the value of A[i][k] is stored in a temporary variable.

2. Matrix transpose

// Matrix transpose for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { dst[j][i] = src[i][j]; } }

Fixed Cache Parameters:

  • Cache level : 1
  • Placement Policy : 2-way associated
  • Number of Blocks: 32
  • Block size : 16 bytes
  • Block Replacement Policy: LRU

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Transpose with blocking

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

3. Sparse Matrix Multiplication

Sparse Matrix Multiplication (SpMM) is a specialized computation technique designed for multiplying sparse matrices, which contain a significant number of zero elements. Instead of performing operations on all elements, SpMM focuses only on the non-zero values, saving both memory and computational resources.

Matrix Multiplication

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Sparse Matrix Multiplication

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Sparse matrix multiplication improves efficiency by skipping zero elements , reducing computational complexity and memory access. It focusing resources on non-zero elements, which significantly boosts performance and effectiveness.

4. General Matrix Multiplication

GEMM (General Matrix Multiplication) is an efficient matrix multiplication operation that improves cache performance by partitioning large matrices into smaller blocks. It uses vectorization and multi-threading techniques to speed up operations.

General Matrix Multiplication is defined as the operation

C=αAB+βC with
A
and
B
as matrix inputs,
α
and
β
as scalar inputs, and
C
as a pre-existing matrix which is overwritten by the output.
A
plain matrix product
AB
is a GEMM with
α
equal to one and
β
equal to zero.

naive

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

with blocking

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Blocking improves cache performance by enhancing temporal and spatial locality. It divides matrices into smaller tiles, allowing frequently accessed data to stay in the cache longer, minimizing redundant memory accesses

Reference

github-tugbaguler/Spatial-Locality
github-iVishalr/GEMM
github-manishravula/matmul_locality
github-mortbopet/Ripes
Quiz7 of Computer Architecture (2024 Fall)
Matrix Multiplication Background User's Guide