---
title: Archi 1
---
# Computer Architectures
NTNU 計算機結構
##### [Back to Note Overview](https://reurl.cc/XXeYaE)
##### [Back to Computer Architectures](https://hackmd.io/@NTNUCSIE112/Archi109-2)
{%hackmd @sophie8909/pink_theme %}
###### tags: `NTNU` `CSIE` `必修` `Computer Architectures` `109-2`
<!-- tag順序 [學校] [系 必選] or [學程 學程名(不含學程的 e.g. 大師創業)] [課程] [開課學期]-->
## Ch.01 Computer Abstractions and Technology
### 1.1 Introduction
#### Computer Revolutions 電腦的發展
- Progress in computer technology
- Underpinned by **Moore's Law 摩爾定律**
- **"Moore's Law"**
2X transistors/chip every 1.5 years (18 mouths) called
- Make novel applications feasible
- Automobiles
- Cell phone
- Computers are pervasive(普及化)
#### Classes of Computers 電腦的種類
- Personal computers 個人電腦
- General purpose, variety of software
- Subject to cost/performance tradeoff(權衡)
- Sever computer 伺服器電腦
- Network based
- High capacity, performance, reliability(可靠性)
- Range from a small sever to a building sized one
- Supercomputers 超級電腦
- High-end scientific and engineering calculation
- Highest capability but represent a small fraction of the overall computer market
最高性能,但只占極小部分的電腦市場
- Embedded computers 嵌入式電腦
- Hidden as a component of a system
- Stringent power, performance, cost constraints
嚴格的電力、性能、消耗限制
#### The PostPC Era 後 PC 世代
- Personal mobile device (PMD) 個人行動裝置
- Battery operated 電池供電
- Connect to Internet
- Hundreds of dollars
- Smart phones, tablets, electronic glasses
- Cloud computing 雲端計算
- Warehouse scale computers (WSC) 數位倉儲型電腦
- Software as a service (SaaS) 軟體即服務
- A portion of software run on a PMD and a portion run in Cloud
*[PMD]: Personal mobile device
*[WSC]: Warehouse scale computers
*[SaaS]: Software as a service
#### Understand Performance
- Algorithm
- Determine *number of* **operations executed**
- Programming language, compiler, architecture
- Determine *number of* **machine instructions executed per operation**
- Processor and memory system
- Determine *how fast* **instructions** are executed
- Input/Output system (including OS)
- Determine *how fast* **I/O operations** are executed
<!-- Introduction end -->
### 1.2 Eight Great Ideas in Computer Architecture
- Design for Moore's Law
- Use abstraction to simplify design
- Make the common case fast
- Performance via parallelism
- Performance via pipelining
- Performance via prediction
- Hierarchy of memories
- Dependability via redundancy
- Backup
<!-- Eight Great Ideas in Computer Architecture end -->
### 1.3 Below Your Program
- Application software
- High-level language (HLL)
- System software
- Compiler translates HLL code to machine code
- Operating system (OS)
- Handle I/O
- Manage memory and storage
- Schedule tasks and sharing resources
- Hardware
- Processor, memory, I/O, controllers
*[HLL]:High-level language
#### Levels of Program Code
- High-level language
- Level of abstraction closer to problem domain
- Provide for productivity(生產力) and instructions
- Assembly language
- Textual representation of instructions
- [Note of NTNU ASM](https://hackmd.io/@NTNUCSIE112/ASM)
- Hardware representation
- Binary digits (bits): on and off
- Encoded instructions and data
<!-- Below Your Program end -->
### 1.4 Under the Covers
> 電腦內的**元件**
#### Components of Computer
- Same components for all kinds of computer
- Desktop, sever, embedded,...
- I/O includes...
- User-interface devices
- Display, keyboard, mouse
- Storage devices
- Hard disk, CD/DVD, flash
- Network adapters
- For communicating with other computers
#### Touchscreen
- PostPC device
- Supersede keyboard and mouse
- Resistive(電阻式) and capacitive(電容式) type
- Most use capacity
- Capacitive allows multiple touches simultaneously
#### Through the Looking Glass
- Screen: picture elements(pixels)
- Mirror content of **frame buffer memory**
#### Inside the Processor(CPU)
- Datapath performs operations on data
- Control
- Sqeuences datapath, memory
- Cache
- Small fast SRAM memory fot immediate access to data
#### Abstractions 概念
- Abstraction helps us deal with complexity
- Hide lower-level details
- Instruction set architecture (ISA)
- Hardware/software interface
- Application binary interface
- ISA plus system software interface
- Implementation
- Details underlying and interface
*[ISA]:Instruction set architecture
#### A Safe Place for Data
- Volatile main/primary memory
- Lose instructions and data when off
- Non-volatile secondary memory
- Magnetic disk
- Flash memory
- Optical disk (CDROM, DVD)
#### Network
- Advantages: communication, resource sharing, non-local access
- Local area network(LAN): Ethernet
- Wide area network(WAN): Internet
- Wireless network: WiFi, Bluetooth
#### Technology Trends
- Electronics technology continues to evolve
- Increase capacity and performance
- Reduce cost
| Year | Technology | Relative performance |
| ---- | -------------------------- | -------------------- |
| 1951 | Vacuum tube | 1 |
| 1965 | Transistor | 35 |
| 1975 | Integrated circuit (IC) | 900 |
| 1995 | Very large scale IC (VLSI) | 2,400,000 |
| 2013 | Ultra large scale IC | 250,000,000,000 |
<!-- Under the Covers end -->
### 1.5 Technologies for Building Processors and Memory
> 處理器跟記憶體**生產技術**
#### Semiconductor Technology
- Silicon: semiconductor
- Add materials to transform properties
- Conductors(導體)
- Insulators(絕緣體)
- Switch
#### Manufacture ICs
- Yield(良率): proportion of working dies per wafer
#### Integrated Circuit Cost IC成本
$$
\text{Cost per die }=\frac{\text{Cost per wafer}}{\text{Dies per wafer}\times\text{Yield}}\\
\text{Dies per wafer }\approx\text{Wafer area}/\text{Die area}\\
\text{Yield}=\frac{1}{(1+(\text{Defects per area}\times\text{Die area}/2))^2}
$$
- Nonlinear relation to area and defect rate
- Wafer cost and area are fixed
- Defect rate is determined by manufacturing process
- Die area is determined by architecture and circuit design
<!-- Technologies for Building Processors and Memory end -->
### 1.6 Performance 效能
> 效能的評斷指標
#### Define Performance
總之就是要定義一個標準
*FOCUS ON **RESPONSE TIME** FOR NOW*
#### Response Time and Throughput
- Response time
- How long it takes to do a task
執行一個任務(task)需要花多少時間
- Throughput
- Total work done per unit time
每單位時間可執行的工作數量
- $e.g.$ task/transactions...per hour
- How are response time and throughput affected by ...
- Replace a processor with a faster version?
- Add more processors?
#### Relative Performance 相對性能
Define $\text{performance}=\frac{\text{1}}{\text{execution time}}$
+ "X is $n$ time faster than Y"
+ $n=\frac{\text{Performance}_x}{\text{Performance}_y}=\frac{\text{Execution time}\ _Y}{\text{Execution time}\ _X}$
#### Measure Execution Time 測量執行時間
- Elapsed time [耗時](https://terms.naer.edu.tw/detail/416207/)
- Total response time 總回應時間, 包含各個方面(aspects)
- $e.g.$ Processing 執行時間、I/O、OS overhead [負擔](https://terms.naer.edu.tw/detail/144442/)、idle time 閒置時間
- Determine system performance 決定系統性能
- CPU time
- Time spent on processing a given job
花在執行一個給定工作的時間
- 不計I/O time、其他工作
- Comprise user CPU time and system CPU time
包含使用者和系統的CPU time
- Different programs are affected differently by CPU and system performance
不同程式執行時會被不同的CPU跟系統表現影響
#### CPU Clocking 時脈

- Operation of digital hardware is governed by a constant-rate clock.
電子硬體的運作是被一個恆定的時鐘控制的
- Clock period 時脈週期:
- duration of a clock cycle
一個時脈循環的時間
- e.g. $250$ ps = $0.25$ ns = $250 \times 10^{-12}$ s
- Clock frequency (rate) 時脈頻率:
- cycles per second
一秒多少循環
- e.g. $4.0$ GHz = $4000$ MHz = $4.0 \times 10^9$ Hz
#### CPU Time 時間
$\begin{aligned}
\text{CPU time}&=
\text{CPU Clock Cycles}\times\text{Clock Cycle Time}
\\&=\frac{\text{CPU Clock Cycles}}{\text{Clock Rate}}
\end{aligned}$
+ Performance can be improved by
效能可以如何增加
+ Reduce number of clock cycles
減少時脈週期
+ Increase clock rate
增加時脈頻率
+ Hardware designer must often trade off clock rate against cycle count
硬體設計師常常要在時脈頻率跟週期盤點之間作取捨
- 增加 rate 有可能增加 cycle count
##### CPU Time Example

#### Instruction Count and Clock Cycle per Instruction 總執行數目
+ Instruction count for a program
一個程式的總指令數量
- Determined by program, ISA and compiler
取決於程式、指令集跟編譯器
+ Average clock cycles per instruction (CPI)
每指令週期平均
- Determined by CPU hardware
由電腦硬體決定
- If different instructions have different CPI
不同的指令會有不同的 CPI
- Average CPI is affected by instruction mix
$\begin{aligned}
\text{Clock Cycles}&=\text{Instruction}\ \text{count}\times\text{Cycles Per Instruction}
\end{aligned}$
$\begin{aligned}
\text{CPU time}&=\text{Instruction} \ \text{count}\times \text{CPI} \times \text{Clock Cycle Time}
\\&=\frac{\text{Instruction}\ \text{Count}\times \text{CPI}}{\text{Clock Rate}}
\end{aligned}$
*[CPI]: Clock cycles per instruction
*[IC]: Instruction count
##### CPI Example 1

##### CPI in More Details
- If different instruction classes take different numbers of cycles
$\text{Clock Cycles}=\sum_{i=1}^n\left(\text{CPI}_i\times\text{Instruction Count}_i\right)$
- Weighted average CPI
$\text{CPI} = \frac{\text{Clock Cycles}}{\text{Instruction Count}} = \sum^n_{i=1}\left(\text{CPI}_i\frac{\text{Instruction Count}_i}{\text{Instruction Count}}\right)$
##### CPI Example 2

#### Performance Summary 效能總結
$\text{CPU Time}=\frac{\text{Instructions}}{\text{Program}}\times\frac{\text{Clock cycles}}{\text{Instruction}}\times\frac{\text{Seconds}}{\text{Clock cycle}}$
- 效能被一些東西影響
+ Algorithm 演算法
- 影響 IC,有可能影響 CPI
+ Programming language 程式語言
- 影響 IC、CPI
+ Compiler 編譯器
- 影響 IC、CPI
+ Instruction set architecture 指令集架構
- 影響 IC、CPI、$T_c$(Clock period)
<!-- Performance end -->
### 1.7 The Power Wall
> 功率消耗的瓶頸
#### Power Trends 功率趨勢
在 CMOS IC 科技中
$\text{Power} = \text{Capacitive load} \times\text{Voltage}^2\times \text{Frequency}$
+ $\text{CMOS}$:互補式金屬氧化物半導體
+ $\text{Capacitive load}$:電容負載
#### Reduce Power

#### Power wall
- can't reduce voltage further
- can't remove more heat
<!-- The Power Wall end -->
### 1.8 The Sea Change
> 計算機結構的改變
#### Uniprocessor Performance 單一處理器的效能

#### Multiprocessors 多核心處理器
+ Multicore microprocessor
- More than one processor per chip
一片晶片中有多於一個處理器
+ Require explicitly parallel programming
需要明確的平行程式設計
+ Compare with instruction level parallelism
跟指令層級平行相比
+ Hardware executes multiple instructions at once
硬體一次執行多個指令
+ Hidden from programmers
向工程師隱藏
+ Hard to do 困難的點
+ Program for performance
為了效率做設計
+ Load balance
加載平衡
+ Optimize communication and synchronization
最佳化溝通及同步化
<!-- The Sea Change end -->
### Benchmarks
> 評估電腦效能
#### SPEC CPU Benchmark
+ 工程師用來測量效能的東西
+ Programs are used to measure performance
+ Supposedly typical of actual workload
+ **SPEC** a.k.a. **Standard Performance Evaluation Corporation**
+ 發明 CPU、I/O、Web 的測量基準
+ SPEC CPU 2006
+ Elapsed time to execute a selection of programs
執行一系列的程序所需要花的時間
+ 忽略 I/O,專注在 CPU 的表現上
+ Normalize relative to reference machine
根據參考的機器做標準化
+ Summarize as geometric mean of performance ratios
總結為性能的幾何平均數
$\sqrt[n]{\prod_{i=1}^n\text{Execution time ratio}_i}$
+ CINT2006(integer) CFP2006(floating-point)
<!-- Benchmarks end -->
### Fallacies and Pitfalls 悖論、陷阱
> ~~迷思~~
#### 【Pitfall】Amdahl's Law
- Improve an aspect of a computer and expect a proportional improvement in over performance
$T_\text{improved} = \frac{T_\text{affected}}{\text{improvement}}+ T_\text{unaffected}$
- Corollary 推論: make the common case fast
#### 【Fallacy】Low Power at Idle
- Consider designing processors to make power proportional to load
#### 【Pitfall】MIPS as a Performance Metric
- MIPS: Millions of Instructions Per Second
- Doesn't account for...
- 電腦間 ISAs 的差異
- Differences complexity between instructions
- CPI varies between programs on a given CPU
$\begin{aligned}
MIPS &= \frac{\text{Instruction count}}{\text{Execution time}\times 10^6}
&= \frac{\text{Instruction count}}{\frac{\text{Instruction count}\times \text{CPI}}{\text{Clock rate}}\times 10^6}
&= \frac{\text{Clock rate}}{\text{CPI}\times 10^6}
\end{aligned}$
<!-- Fallacies and Pitfalls end -->
### Concluding Remarks
- Cost/performance is improving...
- Due to underlying technology development
- Hierarchical layers of abstraction
- In both hardware and software
- Instruction set architecture
- Hardware/software interface
- Execution time: the best performance measure
- Power is a limiting factor
電力是發展限制的因素之一
- Use parallelism to improve performance
使用平行運算來增加效能