Computer Architecture
PSEUDOCODE:
source: cache.s
Program Parameter: the following register parameters are set in cache.s
parameter | register | value |
---|---|---|
Array Size | a0 | 128(byte) |
Step Size | a1 | 8 |
Rep Count | a2 | 4 |
Option | a3 | 0 |
Cache Parameters: the following cache parameters are set in Ripes simulator
parameter | value or strategy |
---|---|
Cache Level | 1 |
Block Size | 8 |
Number of blocks | 4 |
Enable | Should be green |
Placement Policy | Direct Mapped |
Associativity | 1 |
Block Replacement Policy | LRU |
Step Size
: if it decreases the step size, the hit rate will increase,
Rep Count
: increase the Rep Count
will increase hit rate too, as we mentioned in Q1, the miss are all occured at first iteration, if we increase the total iteration times, the hit ratio will get higher by more hit times.(miss times is fixed) Below is the result that we double Rep Count
to 8.
Program Parameters:
Array Size | Step Size | Rep Count | Option |
---|---|---|---|
256(byte) | 2 | 1 | 1 |
Cache Parameters:
Parameter | value or strategy |
---|---|
Cache Level | 1 |
Number of blocks | 16 |
Block Size | 16 |
Enable | Should be green |
Placement Policy | N-Way Set Associative |
Associativity | 4 |
Block Replacement Policy | LRU |
wordLoop
, it will execute one lw
and one sw
, so it will access memory two times in each iteration.Ripes only support one-level cache now, this problem should do with Venus (TODO)
source: matrixMultiply.c
multiplication of two matrix
To compute the matrix multiplication correctly, the loop order doesn’t matter.
matrixMultiply.c
i
, j
, k
loopGiga-floating-point-operations per second.” THE BIGGER THE NUMBER THE FASTER IT IS RUNNING!
i+k*n
k+j*n
i+j*n
ijk
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 1 | 3 | 1 | 0 |
3 | 0 | 0 | 2 | 6 | 2 | 0 |
4 | 0 | 1 | 0 | 0 | 3 | 3 |
5 | 0 | 1 | 1 | 3 | 4 | 3 |
6 | 0 | 1 | 2 | 6 | 5 | 3 |
7 | 0 | 2 | 0 | 0 | 6 | 6 |
8 | 0 | 2 | 1 | 3 | 7 | 6 |
9 | 0 | 2 | 2 | 6 | 8 | 6 |
10 | 1 | 0 | 0 | 1 | 0 | 1 |
11 | 1 | 0 | 1 | 4 | 1 | 1 |
12 | 1 | 0 | 2 | 7 | 2 | 1 |
13 | 1 | 1 | 0 | 1 | 3 | 4 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 1 | 1 | 2 | 7 | 5 | 4 |
16 | 1 | 2 | 0 | 1 | 6 | 7 |
17 | 1 | 2 | 1 | 4 | 7 | 7 |
18 | 1 | 2 | 2 | 7 | 8 | 7 |
19 | 2 | 0 | 0 | 2 | 0 | 2 |
20 | 2 | 0 | 1 | 5 | 1 | 2 |
21 | 2 | 0 | 2 | 8 | 2 | 2 |
22 | 2 | 1 | 0 | 2 | 3 | 5 |
23 | 2 | 1 | 1 | 5 | 4 | 5 |
24 | 2 | 1 | 2 | 8 | 5 | 5 |
25 | 2 | 2 | 0 | 2 | 6 | 8 |
26 | 2 | 2 | 1 | 5 | 7 | 8 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 3 | 1 | 3 |
ikj
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 3 | 0 | 3 |
3 | 0 | 2 | 0 | 6 | 0 | 6 |
4 | 0 | 0 | 1 | 0 | 3 | 1 |
5 | 0 | 1 | 1 | 3 | 3 | 4 |
6 | 0 | 2 | 1 | 6 | 3 | 7 |
7 | 0 | 0 | 2 | 0 | 6 | 2 |
8 | 0 | 1 | 2 | 3 | 6 | 5 |
9 | 0 | 2 | 2 | 6 | 6 | 8 |
10 | 1 | 0 | 0 | 1 | 1 | 0 |
11 | 1 | 1 | 0 | 4 | 1 | 3 |
12 | 1 | 2 | 0 | 7 | 1 | 6 |
13 | 1 | 0 | 1 | 1 | 4 | 1 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 1 | 2 | 1 | 7 | 4 | 7 |
16 | 1 | 0 | 2 | 1 | 7 | 2 |
17 | 1 | 1 | 2 | 4 | 7 | 5 |
18 | 1 | 2 | 2 | 7 | 7 | 8 |
19 | 2 | 0 | 0 | 2 | 2 | 0 |
20 | 2 | 1 | 0 | 5 | 2 | 3 |
21 | 2 | 2 | 0 | 8 | 2 | 6 |
22 | 2 | 0 | 1 | 2 | 5 | 1 |
23 | 2 | 1 | 1 | 5 | 5 | 4 |
24 | 2 | 2 | 1 | 8 | 5 | 7 |
25 | 2 | 0 | 2 | 2 | 8 | 2 |
26 | 2 | 1 | 2 | 5 | 8 | 5 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 3 | 3 | 3 |
jik
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 1 | 3 | 1 | 0 |
3 | 0 | 0 | 2 | 6 | 2 | 0 |
4 | 1 | 0 | 0 | 1 | 0 | 1 |
5 | 1 | 0 | 1 | 4 | 1 | 1 |
6 | 1 | 0 | 2 | 7 | 2 | 1 |
7 | 2 | 0 | 0 | 2 | 0 | 2 |
8 | 2 | 0 | 1 | 5 | 1 | 2 |
9 | 2 | 0 | 2 | 8 | 2 | 2 |
10 | 0 | 1 | 0 | 0 | 3 | 3 |
11 | 0 | 1 | 1 | 3 | 4 | 3 |
12 | 0 | 1 | 2 | 6 | 5 | 3 |
13 | 1 | 1 | 0 | 1 | 3 | 4 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 1 | 1 | 2 | 7 | 5 | 4 |
16 | 2 | 1 | 0 | 2 | 3 | 5 |
17 | 2 | 1 | 1 | 5 | 4 | 5 |
18 | 2 | 1 | 2 | 8 | 5 | 5 |
19 | 0 | 2 | 0 | 0 | 6 | 6 |
20 | 0 | 2 | 1 | 3 | 7 | 6 |
21 | 0 | 2 | 2 | 6 | 8 | 6 |
22 | 1 | 2 | 0 | 1 | 6 | 7 |
23 | 1 | 2 | 1 | 4 | 7 | 7 |
24 | 1 | 2 | 2 | 7 | 8 | 7 |
25 | 2 | 2 | 0 | 2 | 6 | 8 |
26 | 2 | 2 | 1 | 5 | 7 | 8 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 3 | 1 | 1 |
jki
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 1 | 0 | 1 |
3 | 2 | 0 | 0 | 2 | 0 | 2 |
4 | 0 | 0 | 1 | 3 | 1 | 0 |
5 | 1 | 0 | 1 | 4 | 1 | 1 |
6 | 2 | 0 | 1 | 5 | 1 | 2 |
7 | 0 | 0 | 2 | 6 | 2 | 0 |
8 | 1 | 0 | 2 | 7 | 2 | 1 |
9 | 2 | 0 | 2 | 8 | 2 | 2 |
10 | 0 | 1 | 0 | 0 | 3 | 3 |
11 | 1 | 1 | 0 | 1 | 3 | 4 |
12 | 2 | 1 | 0 | 2 | 3 | 5 |
13 | 0 | 1 | 1 | 3 | 4 | 3 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 2 | 1 | 1 | 5 | 4 | 5 |
16 | 0 | 1 | 2 | 6 | 5 | 3 |
17 | 1 | 1 | 2 | 7 | 5 | 4 |
18 | 2 | 1 | 2 | 8 | 5 | 5 |
19 | 0 | 2 | 0 | 0 | 6 | 6 |
20 | 1 | 2 | 0 | 1 | 6 | 7 |
21 | 2 | 2 | 0 | 2 | 6 | 8 |
22 | 0 | 2 | 1 | 3 | 7 | 6 |
23 | 1 | 2 | 1 | 4 | 7 | 7 |
24 | 2 | 2 | 1 | 5 | 7 | 8 |
25 | 0 | 2 | 2 | 6 | 8 | 6 |
26 | 1 | 2 | 2 | 7 | 8 | 7 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 1 | 1 | 1 |
kij
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 0 | 3 | 3 |
3 | 0 | 2 | 0 | 0 | 6 | 6 |
4 | 1 | 0 | 0 | 1 | 0 | 1 |
5 | 1 | 1 | 0 | 1 | 3 | 4 |
6 | 1 | 2 | 0 | 1 | 6 | 7 |
7 | 2 | 0 | 0 | 2 | 0 | 2 |
8 | 2 | 1 | 0 | 2 | 3 | 5 |
9 | 2 | 2 | 0 | 2 | 6 | 8 |
10 | 0 | 0 | 1 | 3 | 1 | 0 |
11 | 0 | 1 | 1 | 3 | 4 | 3 |
12 | 0 | 2 | 1 | 3 | 7 | 6 |
13 | 1 | 0 | 1 | 4 | 1 | 1 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 1 | 2 | 1 | 4 | 7 | 7 |
16 | 2 | 0 | 1 | 5 | 1 | 2 |
17 | 2 | 1 | 1 | 5 | 4 | 5 |
18 | 2 | 2 | 1 | 5 | 7 | 8 |
19 | 0 | 0 | 2 | 6 | 2 | 0 |
20 | 0 | 1 | 2 | 6 | 5 | 3 |
21 | 0 | 2 | 2 | 6 | 8 | 6 |
22 | 1 | 0 | 2 | 7 | 2 | 1 |
23 | 1 | 1 | 2 | 7 | 5 | 4 |
24 | 1 | 2 | 2 | 7 | 8 | 7 |
25 | 2 | 0 | 2 | 8 | 2 | 2 |
26 | 2 | 1 | 2 | 8 | 5 | 5 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 1 | 3 | 3 |
kji
iteration | i | j | k | A_idx | B_idx | C_idx |
---|---|---|---|---|---|---|
1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 1 | 0 | 1 |
3 | 2 | 0 | 0 | 2 | 0 | 2 |
4 | 0 | 1 | 0 | 0 | 3 | 3 |
5 | 1 | 1 | 0 | 1 | 3 | 4 |
6 | 2 | 1 | 0 | 2 | 3 | 5 |
7 | 0 | 2 | 0 | 0 | 6 | 6 |
8 | 1 | 2 | 0 | 1 | 6 | 7 |
9 | 2 | 2 | 0 | 2 | 6 | 8 |
10 | 0 | 0 | 1 | 3 | 1 | 0 |
11 | 1 | 0 | 1 | 4 | 1 | 1 |
12 | 2 | 0 | 1 | 5 | 1 | 2 |
13 | 0 | 1 | 1 | 3 | 4 | 3 |
14 | 1 | 1 | 1 | 4 | 4 | 4 |
15 | 2 | 1 | 1 | 5 | 4 | 5 |
16 | 0 | 2 | 1 | 3 | 7 | 6 |
17 | 1 | 2 | 1 | 4 | 7 | 7 |
18 | 2 | 2 | 1 | 5 | 7 | 8 |
19 | 0 | 0 | 2 | 6 | 2 | 0 |
20 | 1 | 0 | 2 | 7 | 2 | 1 |
21 | 2 | 0 | 2 | 8 | 2 | 2 |
22 | 0 | 1 | 2 | 6 | 5 | 3 |
23 | 1 | 1 | 2 | 7 | 5 | 4 |
24 | 2 | 1 | 2 | 8 | 5 | 5 |
25 | 0 | 2 | 2 | 6 | 8 | 6 |
26 | 1 | 2 | 2 | 7 | 8 | 7 |
27 | 2 | 2 | 2 | 8 | 8 | 8 |
A | B | C | |
---|---|---|---|
stride | 1 | 3 | 1 |
jki
runs the best with 1.451 Gflop/sikj
perform the worst with 0.478 Gflop/ssource: transpose.c
n | 20 | 60 | 70 | 80 | 100 | 1000 | 2000 | 5000 | 10000 |
---|---|---|---|---|---|---|---|---|---|
naive | 0.001 | 0.007 | 0.01 | 0.014 | 0.021 | 2.731 | 13.802 | 106.16 | 428.3 |
blocking | 0.001 | 0.007 | 0.009 | 0.011 | 0.017 | 1.896 | 15.806 | 92.287 | 398.101 |
Q1: At what point does cache blocked version of transpose become faster than the non-cache blocked version?
A1: from n > 70, the advantage of blocked version of transpose has appear clearly, and is getting obviously when n getting larger.
Q2: Why does cache blocking require the matrix to be a certain size before it outperforms the non-cache blocked code?
A2: if the matrix size is too small, the advantage of spatial/temporal localty cannot be brought out since the data can all be loaded to memory in one time access.
blocksize | 10 | 20 | 50 | 100 | 500 | 1000 | 5000 |
---|---|---|---|---|---|---|---|
naive | 427.988 | 428.112 | 428.154 | 428.393 | 427.967 | 428.344 | 428.618 |
blocking | 478.875 | 398.02 | 296.951 | 284.136 | 278.36 | 336.922 | 428.019 |