# 計算機結構 第三次小考 考前重點 ## data dependency(資料相依性)(計算機組織裡叫hazard危障) p.4 ![image](https://hackmd.io/_uploads/SyWAYPfBT.png) 資料相依性:即一個指令的執行結果是否依賴於其他指令的結果。 ## Name Dependence p.6 - Antidependence: instruction j writes a register or memory location that instruction i reads - Initial ordering (i before j) must be preserved ```mips 1. ADD R1, R2, 5 ; 2. SUB R2, R3, 10 ; ``` - Output dependence: instruction i and instruction j write the same register or memory location - Ordering must be preserved ```mips 1. ADD R1, R2, 10 ; 2. SUB R1, R1, 5 ; ``` ## WAW,WAR,RAW成因範例及解決方法 p.7 - Read after write (RAW) / Data Dependency - 成因:在一個指令寫入資料的同時,另一個指令正在讀取該資料。 - 範例: ```mips= ADD R1, R2, R3 ; SUB R4, R1, R5 ; ``` - Write after write (WAW) / Output Dependency - 成因:兩個指令寫入相同的目的地,並且第一個寫入尚未完成,而第二個寫入卻已經發生。 - 範例: ```mips= ADD R1, R2, 10 ; SUB R1, R1, 5 ; ``` - Write after read (WAR) / Anti Dependency - 成因:在一個指令讀取資料的同時,另一個指令對該資料進行寫入。 - 範例: ```mips= ADD R1, R2, 5 ; SUB R2, R3, 10 ; ``` - 解決方法 register renaming 解決 WAW 與 WAR strictly ordered 解決 RAW ## loop unrolling在幹嘛 p.12 將比如1000次的迴圈拆成250次的四個迴圈的指令的結構,消除部分相依性 (可能還有其他?) - Unroll by a factor of 4 (assume # elements is divisible by 4) - Eliminate unnecessary instructions ```c for (int i = 0; i < 16; i++) { sum += i; } for (int i = 0; i < 16; i += 4) { sum += i; sum += i + 1; sum += i + 2; sum += i + 3; } ``` ## Strip Mining - Unknown number of loop iterations? - Number of iterations = n - Goal: make k copies of the loop body - Generate pair of loops: - First executes n mod k times - Second executes n / k times Strip mining:分成兩個迴圈執行。 ```c // First loop (executes n mod k times) for (i = 0; i < n % k; i++) { // 迴圈體的一部分 // ... } // Second loop (executes n / k times) for (i = 0; i < n / k; i++) { // 迴圈體的剩餘部分 // ... } ``` ## branch predictor 在幹嘛 務必熟悉1bit,2bit branch predictor(上課照片) p.16 - (在幹嘛)用於預測程式中分支指令(如條件分支或迴圈分支)的走向,以提高指令執行的效率。 ![LINE_ALBUM_2023117_231128_2](https://hackmd.io/_uploads/S1MrhwGBp.jpg =400x) ![LINE_ALBUM_2023117_231128_1](https://hackmd.io/_uploads/S1-mhwfr6.jpg =400x) - 1位元分支預測器(1-bit Branch Predictor): - 原理: 1位元分支預測器是最簡單的分支預測器之一。它僅保留一個位元,記錄上一次分支指令的執行結果(例如,是或否,分支跳躍或不跳躍)。 - 操作: 如果上一次分支執行成功,則預測下一次分支執行成功;如果上一次分支執行失敗,則預測下一次分支執行失敗。 - 問題: 1位元分支預測器對於連續出現相同結果的分支能夠提供合理的預測,但對於交替的分支(交錯的跳躍和不跳躍)效果較差。 - 2位元分支預測器(2-bit Branch Predictor): - 原理: 2位元分支預測器引入了更多的狀態信息,具體而言,使用兩個位元來紀錄上一次的分支執行結果。 - 操作: 根據上一次和上上次的結果,可以構建四種狀態(強跳躍、弱跳躍、強不跳躍、弱不跳躍)。預測根據當前狀態,而不僅僅是單一的成功或失敗。 - 問題: 2位元分支預測器相對於1位元提供了更好的預測性能,但仍然可能在某些情況下無法確切預測分支。 ## tomasulo 演算法有哪三個步驟 p.32 1. Issue - Get next instruction from FIFO queue - If available RS, issue the instruction to the RS with operand values if available - If operand values not available, stall the instruction 2. Execute - When operand becomes available, store it in any reservation stations waiting for it - When all operands are ready, issue the instruction - Loads and store maintained in program order through effective address - ==No instruction allowed to initiate execution until all branches that proceed it in program order have completed== 3. Write result - Write result on CDB into reservation stations and store buffers - (Stores must wait until address and value are received) ## tomasulo 演算法 結合reorder buffer後多了什麼步驟 p.40 - Commit: - When ROB reaches head of ROB, update register - When a mispredicted branch reaches head of ROB, discard all entries ## reorder buffer 需搭配 hardware speculation 和 branch predictor (?)在 Commit 階段,可能需要處理分支指令的錯誤預測,進行指令的recovery。 ## 請問 當初tomasulo(1965)為何不做 reorder buffer,branch predictor 1980葉子玉提出,時間對不上 1990 branch predictor 才趨於成熟 ## scoreboarding看一下 補充教材lecture 16 p74 Scoreboard: a bookkeeping technique * Out-of-order execution divides ID stage: 1.Issue—decode instructions, check for structural hazards 2.Read operands—wait until no data hazards, then read operands * Scoreboards date to CDC6600 in 1963 – Readings for Monday include one on CDC6600 * Instructions execute whenever not dependent on previous instructions and no hazards * CDC 6600: In order issue, out-of-order execution, outof-order commit (or completion) – No forwarding! – Imprecise interrupt/exception model for now Four Stages of Scoreboard Control 1. Issue—decode instructions & check for structural hazards (ID1) – Instructions issued in program order (for hazard checking) – Don’t issue if structural hazard – Don’t issue if instruction is output dependent on any previously issued but uncompleted instruction (no WAW hazards) 3. Read operands—wait until no data hazards, then read operands (ID2 – All real dependencies (RAW hazards) resolved in this stage, since we wait for instructions to write back data. – No forwarding of data in this model! 4. Execution—operate on operands (EX – The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. 5. Write result—finish execution (WB) – Stall until no WAR hazards with previous instructions: Example: ```mips= DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 ``` CDC 6600 scoreboard would stall SUBD until ADDD reads operands ![image](https://hackmd.io/_uploads/HyL19uzHa.png) ## VLIW是什麼 p.47 VLIW (very long instruction word) Package multiple operations into one instruction