We can observe that increasing the number of ways in a cache, while keeping the number of blocks constant, can improve the hit rate. This is because a higher associativity reduces conflict misses, which occur when multiple memory blocks map to the same cache set. By transitioning from direct-mapped (1-way associative) to N-way associative caches, each set can store more memory blocks, making the cache more adaptable to diverse access patterns.
Different Block Sizes
Fixed Cache Parameters:
Cache level : 1
Placement Policy : Direct Mapped
Number of Blocks: 16
Block Replacement Policy: LRU
Block Sizes
Result
4 bytes
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Increasing the block size improves the cache hit rate by leveraging spatial locality. Larger block sizes allow more adjacent data to be loaded into the cache with each memory access.
B. Loop Ordering
We know that different loop orders can lead to variations in cache performance, primarily due to their impact on data's spatial and temporal locality.
Fixed Cache Parameters:
Cache level : 1
Placement Policy : 2-way associated
Number of Blocks: 32
Block size : 16 bytes
Block Replacement Policy: LRU
loop
Result
ijk
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
We can observe that ikj and kij have higher hit rates because they make more efficient use of cache spatial and temporal locality. In these loop orders, the access to B[k][j] is row-wise, aligning well with Row-Major memory storage and reducing cache misses. Additionally, the value of A[i][k] is stored in a temporary variable.
2. Matrix transpose
// Matrix transpose for(int i =0; i < size; i++){for(int j =0; j < size; j++){
dst[j][i]= src[i][j];}}
Fixed Cache Parameters:
Cache level : 1
Placement Policy : 2-way associated
Number of Blocks: 32
Block size : 16 bytes
Block Replacement Policy: LRU
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Sparse Matrix Multiplication (SpMM) is a specialized computation technique designed for multiplying sparse matrices, which contain a significant number of zero elements. Instead of performing operations on all elements, SpMM focuses only on the non-zero values, saving both memory and computational resources.
Matrix Multiplication
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Sparse matrix multiplication improves efficiency by skipping zero elements , reducing computational complexity and memory access. It focusing resources on non-zero elements, which significantly boosts performance and effectiveness.
4. General Matrix Multiplication
GEMM (General Matrix Multiplication) is an efficient matrix multiplication operation that improves cache performance by partitioning large matrices into smaller blocks. It uses vectorization and multi-threading techniques to speed up operations.
General Matrix Multiplication is defined as the operation with and as matrix inputs, and as scalar inputs, and as a pre-existing matrix which is overwritten by the output. plain matrix product is a GEMM with equal to one and equal to zero.
naive
Image Not ShowingPossible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Blocking improves cache performance by enhancing temporal and spatial locality. It divides matrices into smaller tiles, allowing frequently accessed data to stay in the cache longer, minimizing redundant memory accesses