影片: What are Heads, Tracks, Cylinders & Sectors?
練習題 6.4
(Page 411): 假設 1 MB 的檔案由 512 bytes 的邏輯區塊組成,假設程式循序讀取檔案的邏輯區塊,將讀寫頭定位到第一個區塊的時間為 Tavg_seek + Tavg_rotation
參考答案指出 (Page 459):
Reference: Cylinder-head-sector
不要跟 Transactional memory 觀念搞混
從中央處理器和記憶體存取速度的演變可看出,對電腦程式影響極大的因素是 locality。
這樣的 locality 好嗎?參見 Row- and column-major order
row, column 的中文到底怎麼翻譯呢?哪個是直的?哪個是橫的?
台灣的「列」(row) 是橫向;中國的「列」(column) 是直向,完全相反!為何有這樣的演化?因為新中國政府頒訂的稿紙格子是橫式,每一行是橫的,而台灣的稿紙是直式,每一行是直的。在英文用法,row 是「橫」的一條,column 則是像監獄欄杆「直」的一根。
另外,在計數 row, column 時,因為 row 是橫的由上往下排,所以數橫的 row 要豎著數,又因 column 是直的由左往右排,所以 column 要橫著數
反之,下方的程式碼就不是 row-major:
除非在 M 數值很小的狀況,才不會因此影響效能。
Memory Mountain
mountain.c
Layout of C Arrays in Memory
block size (B)
> sizeof(aij) bytes, exploit spatial locality
隨堂練習
N=64
和 N=60
的 cache miss rate (Page 455)6.37
C = 4096, B = 16, E = 1, S = 256
sizeof(array_t) == 64 * 64 == 4096 == 4*C
memory-cache graph
memory address start from 0 and end to 4096*4.
cell size is 16B. number(0-255) in cell means cache block number that the cell
will be cached.
0 +---------+
| 0 |
16 +---------+
| 1 |
32 +---------+
| 2 |
48 +---------+
| . |
| . |
| . |
| . |
| . |
4096-16 +---------+
| 255 |
4096 +---------+
| 0 |
4096+16 +---------+
| 1 |
4096+32 +---------+
| . |
| . |
| . |
| . |
| . |
| . |
4096*4-16+---------+
| 255 |
4096*4 +---------+
A. sumA
sum += a[i][j];
read memory address order:
0, 4, 8, 12, …, 4096*4-4
read cache order:
0, 0, 0, 0, 1, 1, 1, 1,…255, 255, 255, 255, 0, 0, 0, 0,…255, 255, 255,
255
first cache read miss, next 3 time read hit.
miss rate: 25%
B. sumB
sum += a[j][i];
read memory address order:
0, 644, 6442, … 64463, 4, 644+4, 6442+4, … 4096*4-4
read cache order:
0, 16, 32, 48, … 240,(4 times) 1, 17, 33, … 241,(4 times)
15, 31, 47, … 255(4 times)
let's see first read loop:
read cache order loop 4 times
0, 16, 32, 48, … 240,(4 times)
first loop all miss, next 3 loop all hit
so miss rate is 25%.
C. sumC
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j+=2)
sum += (a[j][i] + a[j][i+1] + a[j+1][i] + a[j+1][i+1]);
easy to see that read a[j][i+1] and a[j+1][i+1] always hit
same like
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j+=2)
sum += (a[j][i] + a[j+1][i]);
same like
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j++)
sum += a[j][i];
total read count = 64*64
because of i+=2,
read cache order only loop 2 times
0, 16, 32, 48, … 240,(2 times)
so miss rate is 50%
total read miss count = 64/2 * 64 * 50% = 64*64/4
so miss rate is still 25%.
A. sumA
sum += a[i][j];
read memory by step 1
miss rate 25%
B. sumB
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += a[j][i];
it's interesting.
let's see first inner loop a[0][0] -> a[59][0]
read memory address order:
0, 604, 6042, … 604*59
read cache order:
0, 15, 30, …, 225, (17 numbers)
255, 14, 29, …, 224, (17 numbers)
254, 13, 28, …, 223, (17 numbers)
253, 12, 27, 42, 57, 72, 87, 102, 117 (9 numbers)
all read miss and store into different blocks
next 3 loops: a[0][1] -> a[59][1], a[0][2] -> a[59][2], a[0][3] -> a[59][3]
all hit
althrough N is smaller and not power of 2, miss rate is still 25%
C. sumC
same as miss rate when N = 64
25%