tags: `linux2022`

現代處理器設計：原理和關鍵特徵（筆記）

目前的 X86 指令是 Cisc 風格的指令集，為了兼容過去的設計，在 Risc 指令外包了一層 Cisc 的皮。

CPU Pipeline

ref: vedio

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Load / Store Architecture: RISC

Superpipelining

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

提升 pipeline 的深度。
pipeline 深度提高會提升效能，但也會耗能。
因為 clock speed (= clock number per second) 受限於 pipeline 中最慢的 stage（如果每個 stage 都需要一個 cycle 完成的話），因此將 pipeline stage 分成多個小的 stage，使得 CPU 可以用更快的 clock speed 執行。
Of course, each instruction will now take more cycles to complete (latency), but the processor will still be completing 1 instruction per cycle (throughput), and there will be more cycles per second, so the processor will complete more instructions per second (actual performance)…

Superscalar

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

提升 pipeline 的寬度。
CPI (clock per cycle) = 3

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Decode 後對指令進行分類 (Dispatch)，並將指令兵分多路執行。

VLIW (Very Long Instructure Word)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

廣泛應用於 DSP，因為訊號處理常需要做完乘法接著做加法（矩陣乘法），因此人們開始思考是否能將兩種指令一起處理。
需要良好設計與平行度的演算法，因此適合特殊的訊號處理（Compiler 很難設計）。
相較 Superscalar 需要較少的電路，因此較為省電。

Instruction Dependency & Latency

Hazard

Structure Hazard

硬體資源不夠多，導致在同一時間內要執行的多個指令無法執行。
- 例子：在 Neumann architecture 中，Instruction 與 Data 放在同一塊 Memory，當同時讀取 Memory 與 Data 時就會遇到 Structure Hazard。
- Harvard vs Neumann architecture
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
解法：
- 加更多的硬體
- 用 Stall 延後指令的執行，錯開會存取到相同硬體的指令。

Data Hazard

Pipeline 中某一指令需要用到前一階段內指令尚未產生的結果 (data dependency)。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

data dependency:
- RAW (True Data Dependency)
- WAW
- WAR
軟體解法：
- insert nop
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
- Instruction rescheduling
硬體解法：
- Fowarding
  Image Not Showing Possible Reasons
  - The image file may be corrupted
  - The server hosting the image is unavailable
  - The image path is incorrect
  - The image format is not supported
  Learn More →
- Forwarding + stall
  - Load-use Data Hazard
    Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →

Control Hazard

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Branch & Branch prediction

Branch

if (a > 7) {
    b = c;
} else {
    b = d;
}

    cmp a, 7    ; a > 7 ?
    ble L1
    mov c, b    ; b = c
    br L2
L1: mov d, b    ; b = d
L2: ...

Static Branch Prediction

總是猜跳或是不跳。

Dynamic Branch Prediction

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

BTB (Branch Target Buffer):
- 一塊 cache 用來快取每個 branch 跳躍的目的地（新的 PC）。
- 在 IF 階段時，利用目前的 PC 作為索引值去 BTB 中查找目前 branch 的目的地。
- 若有找到（表示目前的指令是 branch，且預測為會跳）：
  - ID: 更新目前的 PC。
    - EX: Branch 有跳的話，正常執行。
    - EX: Branch 沒跳的話，flush 之前的指令，且把 BTB 中對應的 PC 刪掉。
- 若沒找到：
  - ID: 照常執行。
    - EX: Branch 有跳的話，將目的地位址加入 BTB ，並且 flush 之前的指令。
    - EX: Branch 沒跳的話，照常執行。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

ref: video

BHB (Branch History Buffer):
1-bit predict

2-bit predict

Delay Branch

是一種Software-base的解決方式，又稱為Insert Safety Instruction(插入安全指令) ，也就是利用插入不論是否分支都會執行到的指令，來減少猜錯Branch所造成的Penalty，功能類似nop，但不會有空執行的狀況。

三種方式：

From Before
From Target
From Fall through
其中第一種最佳，需先嘗試。若有Data Hazard狀況，才試用二、三種。
- 若分支機率很高，則試用第二種。
- 分支機率低則第三種。

Delay Branch是一個簡單而有效率的方式。隨著處理器管線的延長、以及每個時脈週期分發指令個數的增加,分支的延遲變得愈來愈長,而一個延遲插槽已不敷使用。因此相較於代價更高、但彈性大的動態方案,延遲分支已經失去吸引力。

待整理：

OOO (Out of Order execution)

ref:[Computer Architecture Cheat sheet] — Pipeline Hazard

tags: linux2022

現代處理器設計：原理和關鍵特徵（筆記）

CPU Pipeline

Superpipelining

Superscalar

VLIW (Very Long Instructure Word)

Instruction Dependency & Latency

Hazard

Structure Hazard

Data Hazard

Control Hazard

Branch & Branch prediction

Branch

Static Branch Prediction

Dynamic Branch Prediction

Delay Branch

OOO (Out of Order execution)

Read more

C 語言面試筆記

Linux 記憶體管理筆記

2022q1 Homework1 (lab0)

tags: `linux2022`