現代處理器設計:原理和關鍵特徵(筆記)
目前的 X86 指令是 Cisc 風格的指令集,為了兼容過去的設計,在 Risc 指令外包了一層 Cisc 的皮。
CPU Pipeline ref: vedio
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Load / Store Architecture: RISC
Superpipelining
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
提升 pipeline 的深度。
pipeline 深度提高會提升效能,但也會耗能。
因為 clock speed (= clock number per second) 受限於 pipeline 中最慢的 stage(如果每個 stage 都需要一個 cycle 完成的話),因此將 pipeline stage 分成多個小的 stage,使得 CPU 可以用更快的 clock speed 執行。
Of course, each instruction will now take more cycles to complete (latency), but the processor will still be completing 1 instruction per cycle (throughput), and there will be more cycles per second , so the processor will complete more instructions per second (actual performance) …
Superscalar
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
提升 pipeline 的寬度。
CPI (clock per cycle) = 3
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Decode 後對指令進行分類 (Dispatch),並將指令兵分多路執行。
VLIW (Very Long Instructure Word)
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
廣泛應用於 DSP,因為訊號處理常需要做完乘法接著做加法(矩陣乘法),因此人們開始思考是否能將兩種指令一起處理。
需要良好設計與平行度的演算法,因此適合特殊的訊號處理 (Compiler 很難設計)。
相較 Superscalar 需要較少的電路,因此較為省電。
Instruction Dependency & Latency Hazard Structure Hazard
硬體資源不夠多,導致在同一時間內要執行的多個指令無法執行。
例子:在 Neumann architecture 中,Instruction 與 Data 放在同一塊 Memory,當同時讀取 Memory 與 Data 時就會遇到 Structure Hazard。
Harvard vs Neumann architecture
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
解法:
加更多的硬體
用 Stall 延後指令的執行,錯開會存取到相同硬體的指令。
Data Hazard
Pipeline 中某一指令需要用到前一階段內指令尚未產生的結果 (data dependency)。
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
data dependency:
RAW (True Data Dependency)
WAW
WAR
軟體解法:
insert nop
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Instruction rescheduling
硬體解法:
Fowarding
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Forwarding + stall
Load-use Data Hazard
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Control Hazard
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Branch & Branch prediction Branch
if ( a > 7 ) {
b = c;
} else {
b = d;
}
cmp a, 7 ; a > 7 ?
ble L1
mov c, b ; b = c
br L2
L1: mov d, b ; b = d
L2: . . .
Static Branch Prediction Dynamic Branch Prediction
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
BTB (Branch Target Buffer):
一塊 cache 用來快取每個 branch 跳躍的目的地(新的 PC)。
在 IF 階段時,利用目前的 PC 作為索引值去 BTB 中查找目前 branch 的目的地。
若有找到(表示目前的指令是 branch,且預測為會跳):
ID: 更新目前的 PC。
EX: Branch 有跳的話,正常執行。
EX: Branch 沒跳的話,flush 之前的指令,且把 BTB 中對應的 PC 刪掉。
若沒找到:
ID: 照常執行。
EX: Branch 有跳的話,將目的地位址加入 BTB ,並且 flush 之前的指令。
EX: Branch 沒跳的話,照常執行。
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported
Learn More →
ref: video
BHB (Branch History Buffer):
1-bit predict
2-bit predict
Delay Branch
是一種Software-base的解決方式,又稱為Insert Safety Instruction(插入安全指令) ,也就是利用插入不論是否分支都會執行到的指令,來減少猜錯Branch所造成的Penalty,功能類似nop,但不會有空執行的狀況。
三種方式:
Delay Branch是一個簡單而有效率的方式。隨著處理器管線的延長、以及每個時脈週期分發指令個數的增加,分支的延遲變得愈來愈長,而一個延遲插槽已不敷使用。因此相較於代價更高、但彈性大的動態方案,延遲分支已經失去吸引力。
待整理:
OOO (Out of Order execution) ref: [Computer Architecture Cheat sheet] — Pipeline Hazard