為何你該深入理解記憶體
影片: What are Heads, Tracks, Cylinders & Sectors?
練習題 6.4
(Page 411): 假設 1 MB 的檔案由 512 bytes 的邏輯區塊組成,假設程式循序讀取檔案的邏輯區塊,將讀寫頭定位到第一個區塊的時間為 Tavg_seek + Tavg_rotation
參考答案指出 (Page 459):
- 最好的情況
- 區塊被映射到連續的 sector,同一個 cylinder,即可一個區塊接著一個區塊讀取,不用移動讀寫頭
- 5 + 3 + 12 = 20 ms
- 隨機的情況
- 讀取 2000 個區塊的每一部分都要計算
- 16000 ms (大幅增加到 16 秒)
Reference: Cylinder-head-sector
主軸
- 對應 CS:APP3e 的第 6 章: 6.3 儲存裝置的階層架構 (Page 421)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
不要跟 Transactional memory 觀念搞混
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
從中央處理器和記憶體存取速度的演變可看出,對電腦程式影響極大的因素是 locality。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
這樣的 locality 好嗎?參見 Row- and column-major order
row, column 的中文到底怎麼翻譯呢?哪個是直的?哪個是橫的?
台灣的「列」(row) 是橫向;中國的「列」(column) 是直向,完全相反!為何有這樣的演化?因為新中國政府頒訂的稿紙格子是橫式,每一行是橫的,而台灣的稿紙是直式,每一行是直的。在英文用法,row 是「橫」的一條,column 則是像監獄欄杆「直」的一根。
- 台灣把 row 翻成列,column 翻成 行
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
想成台灣人愛排隊,一行接著一「行」,攜伴並排就是「列」
- 中國把 row 翻成行,column 翻成 列
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
想成六四天安門廣場上,一「行」人橫著對抗進犯的坦克車,每次損耗就豎直變「烈士」(列)
另外,在計數 row, column 時,因為 row 是橫的由上往下排,所以數橫的 row 要豎著數,又因 column 是直的由左往右排,所以 column 要橫著數
反之,下方的程式碼就不是 row-major:
除非在 M 數值很小的狀況,才不會因此影響效能。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 對應 CS:APP3e 的第 6 章: 6.4 快取記憶體 (Page 425)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
cache miss 的時間代價是很高的
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Memory Mountain
- 2009 年的 Nehalem (採用 45 奈米製程,後期改用 32 奈米製程)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- 2013 年的 Haswell (和 Ivy Bridge 微架構一樣採用 22 奈米製程)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Layout of C Arrays in Memory
- C arrays 以 row-major 順序配置
- each row in contiguous memory locations
- Stepping through columns in one row:
- accesses successive elements
- if
block size (B)
> sizeof(aij) bytes, exploit spatial locality
- miss rate = sizeof(aij) / B
- Stepping through rows in one column:
- accesses distant elements
- no spatial locality!
- miss rate = 1 (i.e. 100%)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
6.37
C = 4096, B = 16, E = 1, S = 256
N = 64
sizeof(array_t) == 64 * 64 == 4096 == 4*C
memory-cache graph
memory address start from 0 and end to 4096*4.
cell size is 16B. number(0-255) in cell means cache block number that the cell
will be cached.
0 +---------+
| 0 |
16 +---------+
| 1 |
32 +---------+
| 2 |
48 +---------+
| . |
| . |
| . |
| . |
| . |
4096-16 +---------+
| 255 |
4096 +---------+
| 0 |
4096+16 +---------+
| 1 |
4096+32 +---------+
| . |
| . |
| . |
| . |
| . |
| . |
4096*4-16+---------+
| 255 |
4096*4 +---------+
A. sumA
sum += a[i][j];
read memory address order:
0, 4, 8, 12, …, 4096*4-4
read cache order:
0, 0, 0, 0, 1, 1, 1, 1,…255, 255, 255, 255, 0, 0, 0, 0,…255, 255, 255,
255
first cache read miss, next 3 time read hit.
miss rate: 25%
B. sumB
sum += a[j][i];
read memory address order:
0, 644, 6442, … 64463, 4, 644+4, 6442+4, … 4096*4-4
read cache order:
0, 16, 32, 48, … 240,(4 times) 1, 17, 33, … 241,(4 times)
15, 31, 47, … 255(4 times)
let's see first read loop:
read cache order loop 4 times
0, 16, 32, 48, … 240,(4 times)
first loop all miss, next 3 loop all hit
so miss rate is 25%.
C. sumC
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j+=2)
sum += (a[j][i] + a[j][i+1] + a[j+1][i] + a[j+1][i+1]);
easy to see that read a[j][i+1] and a[j+1][i+1] always hit
same like
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j+=2)
sum += (a[j][i] + a[j+1][i]);
same like
for (i = 0; i < N; i+=2)
for (j = 0; j < N; j++)
sum += a[j][i];
total read count = 64*64
because of i+=2,
read cache order only loop 2 times
0, 16, 32, 48, … 240,(2 times)
so miss rate is 50%
total read miss count = 64/2 * 64 * 50% = 64*64/4
so miss rate is still 25%.
N = 60
A. sumA
sum += a[i][j];
read memory by step 1
miss rate 25%
B. sumB
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += a[j][i];
it's interesting.
let's see first inner loop a[0][0] -> a[59][0]
read memory address order:
0, 604, 6042, … 604*59
read cache order:
0, 15, 30, …, 225, (17 numbers)
255, 14, 29, …, 224, (17 numbers)
254, 13, 28, …, 223, (17 numbers)
253, 12, 27, 42, 57, 72, 87, 102, 117 (9 numbers)
all read miss and store into different blocks
next 3 loops: a[0][1] -> a[59][1], a[0][2] -> a[59][2], a[0][3] -> a[59][3]
all hit
althrough N is smaller and not power of 2, miss rate is still 25%
C. sumC
same as miss rate when N = 64
25%