--- title: Archi 4 --- # Computer Architectures NTNU 計算機結構 ##### [Back to Note Overview](https://reurl.cc/XXeYaE) ##### [Back to Computer Architectures](https://hackmd.io/@NTNUCSIE112/Archi109-2) {%hackmd @sophie8909/pink_theme %} ###### tags: `NTNU` `CSIE` `必修` `Computer Architectures` `109-2` <!-- tag順序 [學校] [系 必選] or [學程 學程名(不含學程的 e.g. 大師創業)] [課程] [開課學期]--> <!-- 網址名稱 Archi109-2_[] --> :::info 這張基本全是圖片,直接看簡報 ::: ## Ch.04 The Processor ### 4.1 Introduction - CPU performance factors - Instruction count - Determined by ISA and compiler - CPI and cycle time - Determined by CPU hardware - We will examine two MIPS implementations - A simplified version - A more realistic pipelined version - Simple subset, shows most aspects - Memory reference: `lw`,`sw` - Arithmetic/logical: `add`,`sub`,`and`,`or`,`slt` - Control transfer: `beq`,`j` *[ISA]:Instruction set architecture #### Instruction Execution - PC → instruction memory, fetch instruction - Register number → register file, read a register - Depend on an instruction class - Use ALU to calculate … - Arithmetic result - Memory address for load/store - Branch target address - Access data memory for load/store - PC ← target address or PC + 4 #### CPU Overview ![CPU Overview](https://i.imgur.com/bGNq1fr.png) #### Multiplexer ![Multiplexer](https://i.imgur.com/9g14oZ5.png) #### Control ![Control](https://i.imgur.com/6AueIsZ.png) ### 4.2 Logic Design Conventions ![Uploading file..._ay3rflgq4]() #### Logic Design Basics - Information is encoded in binary - Low voltage = 0, High voltage = 1 - Oe wire per bit - Multi-bit data are encoded on multi-wire buses. - Combinational elements - Operate on data - Output is a function of input - State (sequential) elements - Store information #### Combinational Element - AND-gate - Y = A & B ![AND-gate](https://i.imgur.com/2dhlFf9.png) - Adder - Y = A + B ![Adder](https://i.imgur.com/m53XSTh.png) - Multiplexer - Y = S ? I1: I0 ![Multiplexer](https://i.imgur.com/6R3jNEl.png) - Arithmetic/Logic Unit - Y = F(A,B) ![Arithmetic/Logic Unit](https://i.imgur.com/zgYJIWV.png) #### Sequential Element - Register stores data in a circuit - Use a clock signal to determine when to update the stored value - Edge-triggered: update when Clk changes from 0 to 1 - ![Register stores data in a circuit](https://i.imgur.com/JWXKbdb.png) - Register with a write control - Only update on a clock edge when the write control input is 1 - Used when stored value is required later - ![Register with a write control](https://i.imgur.com/J5PYBGB.png) #### Clocking Methodology - A combinational logic transforms data during a clock cycle - Between two clock edges - Input from a sequential elements, output to a sequential element - Longest delay determines a clock period ![Clocking Methodology](https://i.imgur.com/JXst2ol.png) ### 4.3 Building a Datapath #### Building a Datapath - Datapath - Elements that process data and address in the CPU - Registers, ALUs, mux's, memories, ... - We will build a MIPS datapath incrementally. - Refine the overview ddesign #### Instruction Fetch ![Instruction Fetch](https://i.imgur.com/m97yQ67.png) #### R-Format Instructions - Read two register operands - Perform an arithmetic/logical operation - Write a register result ![R-Format Instructions](https://i.imgur.com/yRQP1p3.png) #### Load/Store Instructions - Read two register operands - Calculate an address using 16bit offset - Use an ALU, but a sign-extend offset - Load: Read memory and update a register - Store: Write a register value to memory ![Load/Store Instructions](https://i.imgur.com/a4Zs4EJ.png) #### Branch Instructions - Read register operands - Compare operands - Use ALU, subtract and cheek Zero output - Calculate target address - sign-extend displacement - shift left 2 places(word displacement) - Add to PC + 4 - Already calculated by instruction fetch ![Branch Instructions](https://i.imgur.com/fj5tTbD.png) #### Composing the Elements - First-cut datapath does an instruction in one clock cycle - Each datapath element can only do one function at a time - Hence, we need separate instruction and data memories. - Use a multiplexer where alternate data sources are used for different instructions #### R-Type/Load/Store Datapath ![The datapath for the memory instructions and the R-type instructions](https://i.imgur.com/KYllr9l.png) #### Full Datapath ![The simple datapath for the core MIPS architecture combines the elements required by different instruction class](https://i.imgur.com/ntdJw0l.png) ### 4.4 A Simple Implementation Scheme #### ALU Control - ALU is used for - Load/Store: F = add - BranchL F = subtract - R-type: F depends on funct field | ALU control | Function | | ----------- | ---------------- | | 0000 | AND | | 0001 | OR | | 0010 | add | | 0110 | subtract | | 0111 | set-on-less-than | | 1100 | NOR | - Assume 2-bit ALUOp is derived from opcode - Combinational logic derives ALU control. | opcode | ALUOp | Operation | funct | ALU function | ALU control | | ------ | ----- | ---------------- | ------ | ---------------- | ----------- | | `lw` | 00 | load word | XXXXXX | add | 0010 | | `sw` | 00 | stroe word | XXXXXX | add | 0010 | | `beq` | 01 | branch word | XXXXXX | subtract | 0110 | | R-type | 10 | add | 100000 | add | 0010 | | | | subtract | 100010 | subtract | 0110 | | | | AND | 100100 | AND | 0000 | | | | OR | 100101 | OR | 0001 | | | | set-on-less-than | 101010 | set-on-less-than | 0111 | #### Main Control Unit - Control signals derived from instruction ![The three instruction classes(R-type, load and store, and branch) use two different instruction formats](https://i.imgur.com/7bsrfvh.png) #### Datapath With Control <!-- 0511 30:30 Slide 51 --> ![Datapath With Control](https://i.imgur.com/sC9OTd2.png) #### R-Type Instruction <!-- Slide 56 --> #### Load Instruction <!-- Slide 59 --> ### 4.5 An Overview of Pipelining ### 4.6 Pipelined Datapath and Control #### MIPS Pipelined Dath <!-- Slide 106 --> ![Figure 4.33 The single-cycle datapath from section 4.4(similar to Figure 4.17)](https://i.imgur.com/BtddeUt.png) #### Pipeline registers - Need registers between stages - Hold information prouced in previous cycle #### IF for Load & Store <!-- Slide 112 --> ![](https://i.imgur.com/56Y8u47.png) #### ID for Load & Store <!-- Slide 113 --> ![](https://i.imgur.com/6hiK8VI.png) #### #### Muti-Cycle Pipeline Diagram <!-- Slide 124 TB p.286 --> - A form shows resource usage - ![](https://i.imgur.com/q2JP2m6.png) ### 4.7 Data Hazards: Forwarding vs. Stalling ### 4.8 Control Hazards ### 4.9 Exceptions ### 4.10 Parallelism and Advanced Instruction Level Parallelism #### Dynamically Scheduled CPU <!-- Slide 222/p.328 --> ![Figure 4.72](https://i.imgur.com/BaQijn2.png) #### Register Renaming - Reservation stations and reorder buffer effectively provide register renaming - On instruction issue available in register file or reorder buffer - Copy to reservation station - No longer required in the register .... <!-- 未完 --> #### Speculation - Predict branch and continue issuing - Don't commit until branch outcome determined - Load speculation - Avoid load and cache miss delay - Predict the effective address - Predict loaded value - Load before completing <!-- -未完 --> #### Why Do Dynamic Scheduling - Why not just let the compiler schedule code - Not all stalls are predicable - e.g. cache miss - Cannot always schedule around branches - Branch outcome is dynamically determined - Different implementations of an ISA have different latencies and hazards #### Power Efficiency <!-- Slide 233 p.332 --> - Complexity of dynamic scheduling and speculations requires power - Multiple simpler cores may be better ![Figure 4.73](https://i.imgur.com/RMyeoem.png) ### 4.11 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Pipelines ### 4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply ### 4.14 Fallacies and Pitfalls <!-- This is the end of Sophie -->