Cache simulation for various matrix operations
黃灝
In this project, our input are 32*32 interger matrix
1. Matrix multiplication
A. Different Cache Parameters
N-Way Set-Associative Cache
Fixed Cache Parameters:
- Cache level : 1
- Block size : 4 bytes
- Number of Blocks: 16
- Block Replacement Policy: LRU
Direct Mapped(1-Way-associated)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
This picture is the cache with direct Mapped(1-Way) and 16 blocks,each block size is 4 bytes (Ripes).
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
2-way associated
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
This picture is the cache with 2-way associated and 16 blocks,each block size is 4 bytes.
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
4-way associated
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
This picture is the cache with 4-way associated and 16 blocks,each block size is 4 bytes.
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
We can observe that increasing the number of ways in a cache, while keeping the number of blocks constant, can improve the hit rate. This is because a higher associativity reduces conflict misses, which occur when multiple memory blocks map to the same cache set. By transitioning from direct-mapped (1-way associative) to N-way associative caches, each set can store more memory blocks, making the cache more adaptable to diverse access patterns.
Different Block Sizes
Fixed Cache Parameters:
- Cache level : 1
- Placement Policy : Direct Mapped
- Number of Blocks: 16
- Block Replacement Policy: LRU
Block Sizes |
Result |
4 bytes |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
8 bytes |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
16 bytes |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
Increasing the block size improves the cache hit rate by leveraging spatial locality. Larger block sizes allow more adjacent data to be loaded into the cache with each memory access.
B. Loop Ordering
We know that different loop orders can lead to variations in cache performance, primarily due to their impact on data's spatial and temporal locality.
Fixed Cache Parameters:
- Cache level : 1
- Placement Policy : 2-way associated
- Number of Blocks: 32
- Block size : 16 bytes
- Block Replacement Policy: LRU
loop |
Result |
ijk |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
ikj |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
kij |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
kji |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
jik |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
jki |
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
|
We can observe that ikj and kij have higher hit rates because they make more efficient use of cache spatial and temporal locality. In these loop orders, the access to B[k][j] is row-wise, aligning well with Row-Major memory storage and reducing cache misses. Additionally, the value of A[i][k] is stored in a temporary variable.
2. Matrix transpose
Fixed Cache Parameters:
- Cache level : 1
- Placement Policy : 2-way associated
- Number of Blocks: 32
- Block size : 16 bytes
- Block Replacement Policy: LRU
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Transpose with blocking
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
3. Sparse Matrix Multiplication
Sparse Matrix Multiplication (SpMM) is a specialized computation technique designed for multiplying sparse matrices, which contain a significant number of zero elements. Instead of performing operations on all elements, SpMM focuses only on the non-zero values, saving both memory and computational resources.
Matrix Multiplication
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Sparse Matrix Multiplication
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Sparse matrix multiplication improves efficiency by skipping zero elements , reducing computational complexity and memory access. It focusing resources on non-zero elements, which significantly boosts performance and effectiveness.
4. General Matrix Multiplication
GEMM (General Matrix Multiplication) is an efficient matrix multiplication operation that improves cache performance by partitioning large matrices into smaller blocks. It uses vectorization and multi-threading techniques to speed up operations.
General Matrix Multiplication is defined as the operation with and as matrix inputs, and as scalar inputs, and as a pre-existing matrix which is overwritten by the output. plain matrix product is a GEMM with equal to one and equal to zero.
naive
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
with blocking
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Blocking improves cache performance by enhancing temporal and spatial locality. It divides matrices into smaller tiles, allowing frequently accessed data to stay in the cache longer, minimizing redundant memory accesses
Reference
github-tugbaguler/Spatial-Locality
github-iVishalr/GEMM
github-manishravula/matmul_locality
github-mortbopet/Ripes
Quiz7 of Computer Architecture (2024 Fall)
Matrix Multiplication Background User's Guide