Try   HackMD

計算機結構 第三次小考 考前重點

data dependency(資料相依性)(計算機組織裡叫hazard危障)

p.4

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

資料相依性:即一個指令的執行結果是否依賴於其他指令的結果。

Name Dependence

p.6

  • Antidependence: instruction j writes a register or memory location that instruction i reads
    • Initial ordering (i before j) must be preserved
1. ADD R1, R2, 5 ; 
2. SUB R2, R3, 10 ; 
  • Output dependence: instruction i and instruction j write the same register or memory location
    • Ordering must be preserved
1. ADD R1, R2, 10 ;
2. SUB R1, R1, 5 ;

WAW,WAR,RAW成因範例及解決方法

p.7

  • Read after write (RAW) / Data Dependency
    • 成因:在一個指令寫入資料的同時,另一個指令正在讀取該資料。
    • 範例:
ADD R1, R2, R3 ; SUB R4, R1, R5 ;
  • Write after write (WAW) / Output Dependency
    • 成因:兩個指令寫入相同的目的地,並且第一個寫入尚未完成,而第二個寫入卻已經發生。
    • 範例:
ADD R1, R2, 10 ; SUB R1, R1, 5 ;
  • Write after read (WAR) / Anti Dependency
    • 成因:在一個指令讀取資料的同時,另一個指令對該資料進行寫入。
    • 範例:
ADD R1, R2, 5 ; SUB R2, R3, 10 ;
  • 解決方法
    register renaming 解決 WAW 與 WAR
    strictly ordered 解決 RAW

loop unrolling在幹嘛

p.12
將比如1000次的迴圈拆成250次的四個迴圈的指令的結構,消除部分相依性
(可能還有其他?)

  • Unroll by a factor of 4 (assume # elements is divisible by 4)
  • Eliminate unnecessary instructions
for (int i = 0; i < 16; i++) {
        sum += i;
    }

for (int i = 0; i < 16; i += 4) {
        sum += i;
        sum += i + 1;
        sum += i + 2;
        sum += i + 3;
    }

Strip Mining

  • Unknown number of loop iterations?
    • Number of iterations = n
    • Goal: make k copies of the loop body
    • Generate pair of loops:
      • First executes n mod k times
      • Second executes n / k times

Strip mining:分成兩個迴圈執行。

// First loop (executes n mod k times)
for (i = 0; i < n % k; i++) {
    // 迴圈體的一部分
    // ...
}

// Second loop (executes n / k times)
for (i = 0; i < n / k; i++) {
    // 迴圈體的剩餘部分
    // ...
}

branch predictor 在幹嘛 務必熟悉1bit,2bit branch predictor(上課照片)

p.16

  • (在幹嘛)用於預測程式中分支指令(如條件分支或迴圈分支)的走向,以提高指令執行的效率。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 1位元分支預測器(1-bit Branch Predictor):
    • 原理: 1位元分支預測器是最簡單的分支預測器之一。它僅保留一個位元,記錄上一次分支指令的執行結果(例如,是或否,分支跳躍或不跳躍)。
    • 操作: 如果上一次分支執行成功,則預測下一次分支執行成功;如果上一次分支執行失敗,則預測下一次分支執行失敗。
    • 問題: 1位元分支預測器對於連續出現相同結果的分支能夠提供合理的預測,但對於交替的分支(交錯的跳躍和不跳躍)效果較差。
  • 2位元分支預測器(2-bit Branch Predictor):
    • 原理: 2位元分支預測器引入了更多的狀態信息,具體而言,使用兩個位元來紀錄上一次的分支執行結果。
    • 操作: 根據上一次和上上次的結果,可以構建四種狀態(強跳躍、弱跳躍、強不跳躍、弱不跳躍)。預測根據當前狀態,而不僅僅是單一的成功或失敗。
    • 問題: 2位元分支預測器相對於1位元提供了更好的預測性能,但仍然可能在某些情況下無法確切預測分支。

tomasulo 演算法有哪三個步驟

p.32

  1. Issue

    • Get next instruction from FIFO queue
    • If available RS, issue the instruction to the RS with operand values if available
    • If operand values not available, stall the instruction
  2. Execute

    • When operand becomes available, store it in any reservation stations waiting for it
    • When all operands are ready, issue the instruction
    • Loads and store maintained in program order through effective address
    • No instruction allowed to initiate execution until all branches that proceed it in program order have completed
  3. Write result

    • Write result on CDB into reservation stations and store buffers
      • (Stores must wait until address and value are received)

tomasulo 演算法 結合reorder buffer後多了什麼步驟

p.40

  • Commit:
    • When ROB reaches head of ROB, update register
    • When a mispredicted branch reaches head of ROB, discard all entries

reorder buffer

需搭配 hardware speculation 和 branch predictor

(?)在 Commit 階段,可能需要處理分支指令的錯誤預測,進行指令的recovery。

請問 當初tomasulo(1965)為何不做 reorder buffer,branch predictor

1980葉子玉提出,時間對不上
1990 branch predictor 才趨於成熟

scoreboarding看一下

補充教材lecture 16 p74

Scoreboard: a bookkeeping technique

  • Out-of-order execution divides ID stage:
    1.Issue—decode instructions, check for structural hazards
    2.Read operands—wait until no data hazards, then read
    operands
  • Scoreboards date to CDC6600 in 1963
    – Readings for Monday include one on CDC6600
  • Instructions execute whenever not dependent on previous instructions and no hazards
  • CDC 6600: In order issue, out-of-order execution, outof-order commit (or completion)
    – No forwarding!
    – Imprecise interrupt/exception model for now

Four Stages of Scoreboard Control

  1. Issue—decode instructions & check for structural hazards (ID1)
    – Instructions issued in program order (for hazard checking)
    – Don’t issue if structural hazard
    – Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards)
  2. Read operands—wait until no data hazards, then read operands (ID2
    – All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data.
    – No forwarding of data in this model!
  3. Execution—operate on operands (EX
    – The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution.
  4. Write result—finish execution (WB)
    – Stall until no WAR hazards with previous instructions:
    Example:
DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14

CDC 6600 scoreboard would stall SUBD until ADDD reads operands

image

VLIW是什麼

p.47
VLIW (very long instruction word)
Package multiple operations into one instruction