--- title: Archi 1 --- # Computer Architectures NTNU 計算機結構 ##### [Back to Note Overview](https://reurl.cc/XXeYaE) ##### [Back to Computer Architectures](https://hackmd.io/@NTNUCSIE112/Archi109-2) {%hackmd @sophie8909/pink_theme %} ###### tags: `NTNU` `CSIE` `必修` `Computer Architectures` `109-2` <!-- tag順序 [學校] [系 必選] or [學程 學程名(不含學程的 e.g. 大師創業)] [課程] [開課學期]--> ## Ch.01 Computer Abstractions and Technology ### 1.1 Introduction #### Computer Revolutions 電腦的發展 - Progress in computer technology - Underpinned by **Moore's Law 摩爾定律** - **"Moore's Law"** 2X transistors/chip every 1.5 years (18 mouths) called - Make novel applications feasible - Automobiles - Cell phone - Computers are pervasive(普及化) #### Classes of Computers 電腦的種類 - Personal computers 個人電腦 - General purpose, variety of software - Subject to cost/performance tradeoff(權衡) - Sever computer 伺服器電腦 - Network based - High capacity, performance, reliability(可靠性) - Range from a small sever to a building sized one - Supercomputers 超級電腦 - High-end scientific and engineering calculation - Highest capability but represent a small fraction of the overall computer market 最高性能,但只占極小部分的電腦市場 - Embedded computers 嵌入式電腦 - Hidden as a component of a system - Stringent power, performance, cost constraints 嚴格的電力、性能、消耗限制 #### The PostPC Era 後 PC 世代 - Personal mobile device (PMD) 個人行動裝置 - Battery operated 電池供電 - Connect to Internet - Hundreds of dollars - Smart phones, tablets, electronic glasses - Cloud computing 雲端計算 - Warehouse scale computers (WSC) 數位倉儲型電腦 - Software as a service (SaaS) 軟體即服務 - A portion of software run on a PMD and a portion run in Cloud *[PMD]: Personal mobile device *[WSC]: Warehouse scale computers *[SaaS]: Software as a service #### Understand Performance - Algorithm - Determine *number of* **operations executed** - Programming language, compiler, architecture - Determine *number of* **machine instructions executed per operation** - Processor and memory system - Determine *how fast* **instructions** are executed - Input/Output system (including OS) - Determine *how fast* **I/O operations** are executed <!-- Introduction end --> ### 1.2 Eight Great Ideas in Computer Architecture - Design for Moore's Law - Use abstraction to simplify design - Make the common case fast - Performance via parallelism - Performance via pipelining - Performance via prediction - Hierarchy of memories - Dependability via redundancy - Backup <!-- Eight Great Ideas in Computer Architecture end --> ### 1.3 Below Your Program - Application software - High-level language (HLL) - System software - Compiler translates HLL code to machine code - Operating system (OS) - Handle I/O - Manage memory and storage - Schedule tasks and sharing resources - Hardware - Processor, memory, I/O, controllers *[HLL]:High-level language #### Levels of Program Code - High-level language - Level of abstraction closer to problem domain - Provide for productivity(生產力) and instructions - Assembly language - Textual representation of instructions - [Note of NTNU ASM](https://hackmd.io/@NTNUCSIE112/ASM) - Hardware representation - Binary digits (bits): on and off - Encoded instructions and data <!-- Below Your Program end --> ### 1.4 Under the Covers > 電腦內的**元件** #### Components of Computer - Same components for all kinds of computer - Desktop, sever, embedded,... - I/O includes... - User-interface devices - Display, keyboard, mouse - Storage devices - Hard disk, CD/DVD, flash - Network adapters - For communicating with other computers #### Touchscreen - PostPC device - Supersede keyboard and mouse - Resistive(電阻式) and capacitive(電容式) type - Most use capacity - Capacitive allows multiple touches simultaneously #### Through the Looking Glass - Screen: picture elements(pixels) - Mirror content of **frame buffer memory** #### Inside the Processor(CPU) - Datapath performs operations on data - Control - Sqeuences datapath, memory - Cache - Small fast SRAM memory fot immediate access to data #### Abstractions 概念 - Abstraction helps us deal with complexity - Hide lower-level details - Instruction set architecture (ISA) - Hardware/software interface - Application binary interface - ISA plus system software interface - Implementation - Details underlying and interface *[ISA]:Instruction set architecture #### A Safe Place for Data - Volatile main/primary memory - Lose instructions and data when off - Non-volatile secondary memory - Magnetic disk - Flash memory - Optical disk (CDROM, DVD) #### Network - Advantages: communication, resource sharing, non-local access - Local area network(LAN): Ethernet - Wide area network(WAN): Internet - Wireless network: WiFi, Bluetooth #### Technology Trends - Electronics technology continues to evolve - Increase capacity and performance - Reduce cost | Year | Technology | Relative performance | | ---- | -------------------------- | -------------------- | | 1951 | Vacuum tube | 1 | | 1965 | Transistor | 35 | | 1975 | Integrated circuit (IC) | 900 | | 1995 | Very large scale IC (VLSI) | 2,400,000 | | 2013 | Ultra large scale IC | 250,000,000,000 | <!-- Under the Covers end --> ### 1.5 Technologies for Building Processors and Memory > 處理器跟記憶體**生產技術** #### Semiconductor Technology - Silicon: semiconductor - Add materials to transform properties - Conductors(導體) - Insulators(絕緣體) - Switch #### Manufacture ICs - Yield(良率): proportion of working dies per wafer #### Integrated Circuit Cost IC成本 $$ \text{Cost per die }=\frac{\text{Cost per wafer}}{\text{Dies per wafer}\times\text{Yield}}\\ \text{Dies per wafer }\approx\text{Wafer area}/\text{Die area}\\ \text{Yield}=\frac{1}{(1+(\text{Defects per area}\times\text{Die area}/2))^2} $$ - Nonlinear relation to area and defect rate - Wafer cost and area are fixed - Defect rate is determined by manufacturing process - Die area is determined by architecture and circuit design <!-- Technologies for Building Processors and Memory end --> ### 1.6 Performance 效能 > 效能的評斷指標 #### Define Performance 總之就是要定義一個標準 *FOCUS ON **RESPONSE TIME** FOR NOW* #### Response Time and Throughput - Response time - How long it takes to do a task 執行一個任務(task)需要花多少時間 - Throughput - Total work done per unit time 每單位時間可執行的工作數量 - $e.g.$ task/transactions...per hour - How are response time and throughput affected by ... - Replace a processor with a faster version? - Add more processors? #### Relative Performance 相對性能 Define $\text{performance}=\frac{\text{1}}{\text{execution time}}$ + "X is $n$ time faster than Y" + $n=\frac{\text{Performance}_x}{\text{Performance}_y}=\frac{\text{Execution time}\ _Y}{\text{Execution time}\ _X}$ #### Measure Execution Time 測量執行時間 - Elapsed time [耗時](https://terms.naer.edu.tw/detail/416207/) - Total response time 總回應時間, 包含各個方面(aspects) - $e.g.$ Processing 執行時間、I/O、OS overhead [負擔](https://terms.naer.edu.tw/detail/144442/)、idle time 閒置時間 - Determine system performance 決定系統性能 - CPU time - Time spent on processing a given job 花在執行一個給定工作的時間 - 不計I/O time、其他工作 - Comprise user CPU time and system CPU time 包含使用者和系統的CPU time - Different programs are affected differently by CPU and system performance 不同程式執行時會被不同的CPU跟系統表現影響 #### CPU Clocking 時脈 ![#### CPU Clocking](https://i.imgur.com/idqiF9L.png) - Operation of digital hardware is governed by a constant-rate clock. 電子硬體的運作是被一個恆定的時鐘控制的 - Clock period 時脈週期: - duration of a clock cycle 一個時脈循環的時間 - e.g. $250$ ps = $0.25$ ns = $250 \times 10^{-12}$ s - Clock frequency (rate) 時脈頻率: - cycles per second 一秒多少循環 - e.g. $4.0$ GHz = $4000$ MHz = $4.0 \times 10^9$ Hz #### CPU Time 時間 $\begin{aligned} \text{CPU time}&= \text{CPU Clock Cycles}\times\text{Clock Cycle Time} \\&=\frac{\text{CPU Clock Cycles}}{\text{Clock Rate}} \end{aligned}$ + Performance can be improved by 效能可以如何增加 + Reduce number of clock cycles 減少時脈週期 + Increase clock rate 增加時脈頻率 + Hardware designer must often trade off clock rate against cycle count 硬體設計師常常要在時脈頻率跟週期盤點之間作取捨 - 增加 rate 有可能增加 cycle count ##### CPU Time Example ![](https://i.imgur.com/tT5qZGg.png) #### Instruction Count and Clock Cycle per Instruction 總執行數目 + Instruction count for a program 一個程式的總指令數量 - Determined by program, ISA and compiler 取決於程式、指令集跟編譯器 + Average clock cycles per instruction (CPI) 每指令週期平均 - Determined by CPU hardware 由電腦硬體決定 - If different instructions have different CPI 不同的指令會有不同的 CPI - Average CPI is affected by instruction mix $\begin{aligned} \text{Clock Cycles}&=\text{Instruction}\ \text{count}\times\text{Cycles Per Instruction} \end{aligned}$ $\begin{aligned} \text{CPU time}&=\text{Instruction} \ \text{count}\times \text{CPI} \times \text{Clock Cycle Time} \\&=\frac{\text{Instruction}\ \text{Count}\times \text{CPI}}{\text{Clock Rate}} \end{aligned}$ *[CPI]: Clock cycles per instruction *[IC]: Instruction count ##### CPI Example 1 ![](https://i.imgur.com/mZoxKYx.png) ##### CPI in More Details - If different instruction classes take different numbers of cycles $\text{Clock Cycles}=\sum_{i=1}^n\left(\text{CPI}_i\times\text{Instruction Count}_i\right)$ - Weighted average CPI $\text{CPI} = \frac{\text{Clock Cycles}}{\text{Instruction Count}} = \sum^n_{i=1}\left(\text{CPI}_i\frac{\text{Instruction Count}_i}{\text{Instruction Count}}\right)$ ##### CPI Example 2 ![](https://i.imgur.com/QmyM5BT.png) #### Performance Summary 效能總結 $\text{CPU Time}=\frac{\text{Instructions}}{\text{Program}}\times\frac{\text{Clock cycles}}{\text{Instruction}}\times\frac{\text{Seconds}}{\text{Clock cycle}}$ - 效能被一些東西影響 + Algorithm 演算法 - 影響 IC,有可能影響 CPI + Programming language 程式語言 - 影響 IC、CPI + Compiler 編譯器 - 影響 IC、CPI + Instruction set architecture 指令集架構 - 影響 IC、CPI、$T_c$(Clock period) <!-- Performance end --> ### 1.7 The Power Wall > 功率消耗的瓶頸 #### Power Trends 功率趨勢 在 CMOS IC 科技中 $\text{Power} = \text{Capacitive load} \times\text{Voltage}^2\times \text{Frequency}$ + $\text{CMOS}$:互補式金屬氧化物半導體 + $\text{Capacitive load}$:電容負載 #### Reduce Power ![](https://i.imgur.com/pqXE9i8.png) #### Power wall - can't reduce voltage further - can't remove more heat <!-- The Power Wall end --> ### 1.8 The Sea Change > 計算機結構的改變 #### Uniprocessor Performance 單一處理器的效能 ![](https://i.imgur.com/Xzjesxf.png) #### Multiprocessors 多核心處理器 + Multicore microprocessor - More than one processor per chip 一片晶片中有多於一個處理器 + Require explicitly parallel programming 需要明確的平行程式設計 + Compare with instruction level parallelism 跟指令層級平行相比 + Hardware executes multiple instructions at once 硬體一次執行多個指令 + Hidden from programmers 向工程師隱藏 + Hard to do 困難的點 + Program for performance 為了效率做設計 + Load balance 加載平衡 + Optimize communication and synchronization 最佳化溝通及同步化 <!-- The Sea Change end --> ### Benchmarks > 評估電腦效能 #### SPEC CPU Benchmark + 工程師用來測量效能的東西 + Programs are used to measure performance + Supposedly typical of actual workload + **SPEC** a.k.a. **Standard Performance Evaluation Corporation** + 發明 CPU、I/O、Web 的測量基準 + SPEC CPU 2006 + Elapsed time to execute a selection of programs 執行一系列的程序所需要花的時間 + 忽略 I/O,專注在 CPU 的表現上 + Normalize relative to reference machine 根據參考的機器做標準化 + Summarize as geometric mean of performance ratios 總結為性能的幾何平均數 $\sqrt[n]{\prod_{i=1}^n\text{Execution time ratio}_i}$ + CINT2006(integer) CFP2006(floating-point) <!-- Benchmarks end --> ### Fallacies and Pitfalls 悖論、陷阱 > ~~迷思~~ #### 【Pitfall】Amdahl's Law - Improve an aspect of a computer and expect a proportional improvement in over performance $T_\text{improved} = \frac{T_\text{affected}}{\text{improvement}}+ T_\text{unaffected}$ - Corollary 推論: make the common case fast #### 【Fallacy】Low Power at Idle - Consider designing processors to make power proportional to load #### 【Pitfall】MIPS as a Performance Metric - MIPS: Millions of Instructions Per Second - Doesn't account for... - 電腦間 ISAs 的差異 - Differences complexity between instructions - CPI varies between programs on a given CPU $\begin{aligned} MIPS &= \frac{\text{Instruction count}}{\text{Execution time}\times 10^6} &= \frac{\text{Instruction count}}{\frac{\text{Instruction count}\times \text{CPI}}{\text{Clock rate}}\times 10^6} &= \frac{\text{Clock rate}}{\text{CPI}\times 10^6} \end{aligned}$ <!-- Fallacies and Pitfalls end --> ### Concluding Remarks - Cost/performance is improving... - Due to underlying technology development - Hierarchical layers of abstraction - In both hardware and software - Instruction set architecture - Hardware/software interface - Execution time: the best performance measure - Power is a limiting factor 電力是發展限制的因素之一 - Use parallelism to improve performance 使用平行運算來增加效能