- 計算機結構
- 第一章 - Computer Abstractions and Technology
- 1.1 - Introduction
- 1.2 - Eight Great Ideas in Computer Architecture
- 1.3 - Below Your Program
- 1.4 - Under the Covers
- 1.5 - Technologies for Building Processors and Memory
- 1.6 - Performance
- 1.7 - The Power Wall
- 1.8 - The Sea Change(巨變):The Switch from Uniprocessors to Multiprocessors
- 1.9 - Real Stuff:Benchmarking(基準測試) the Intel Core i7
- 1.10 - Fallacies and Pitfalls(謬誤 & 隱患)
- 1.11 - Concluding Remarks
- 1.12 - Historical Perspective and Further Reading
- 1.13 - Exercises
- 第二章 - Instructions:Language of the Computer
- 2.1 - Introduction
- 2.2 - Operations of the Computer Hardware
- 2.3 - Operands of the Computer Hardware
- 2.4 - Signed and Unsigned Numbers
- 2.5 - Representing Instructions in the Computer
- 2.6 - Logical Operations
- 2.7 - Instructions for Making Decisions
- 2.8 - Supporting Procedures in Computer Hardware
- 2.9 - Communicating with People
- 2.10 - MIPS Addressing for 32-Bit Immediates and Addresses
- 2.11 - Parallelism and Instructions:Synchronization
- 2.12 - Translating and Starting a Program
- 2.13 - A C Sort Example to Put It All Together
- 2.14 - Arrays versus Pointers
- 2.15 - Advanced Material:Compiling C and Interpreting Java
- 2.16 - Real Stuff:ARMv7 (32-bit) Instructions
- 2.17 - Real Stuff:x86 Instructions
- 2.18 - Real Stuff:ARMv8 (64-bit) Instructions
- 2.19 - Fallacies and Pitfalls
- 2.20 - Concluding Remarks
- 2.21 - Historical Perspective and Further Reading
- 2.22 - Exercises
- 第三章 - Arithmetic for Computers
- 期中考範圍到此,以下為期末考範圍
- 第四章 - The Processor
- 4.1 - Introduction
- 4.2 - Logic Design Conventions
- 4.3 - Building a Datapath
- 4.4 - A Simple Implementation Scheme
- 4.5 - An Overview of Pipelining
- 4.6 - Pipelined Datapath and Control
- 4.7 - Data Hazards:Forwarding v.s. Stalling
- 4.8 - Control Hazards
- 4.9 - Exceptions
- 4.10 - Parallelism and Advanced Instruction Level Parallelism
- 4.11 - Real Stuff:The ARM Cortex-A8 and Intel Core i7 Pipelines
- 4.12 - Instruction-Level Parallelism and Matrix Multiply
- 4.13 - Advanced Topic:An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations
- 4.14 - Fallacies and Pitfalls
- 4.15 - Conduding Remarks
- 4.16 - Historical Perspective and Further Reading
- 4.17 - Exercises
- 第五章 - Large and Fast:Exploiting Memory Hierarchy
- 5.1 - Introduction
- 5.2 - Memory Technologies
- 5.3 - The Basics of Caches
- 5.4 - Measuring and Improving Cache Performance
- 5.5 - Dependable Memory Hierarchy
- 5.6 - Virtual Machine(VM)
- 5.7 - Virtual Memory
- 5.8 - A Common Framework for Memory Hierarchy
- 5.9 - Using a Finite-State Machine to Control a Simple Cache
- 5.10 - Parallelism and Memory Hierarchies: Cache Coherence
- 5.11 - Real Stuff:The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies
- 5.12 - Going Faster:Cache Blocking and Matrix Multiply
- 5.13 - Fallacies and Pitfalls
現代計算機的分類:
計算機系統結構中的 8 個 Great Ideas。
程式運行架構 & 過程:
電子元件的發展演變:
年分 電子元件技術 單位資源消耗能得到的相對性能 1951 電子管(Vacuum tube) 1 1965 晶體管(Transistor) 35 1975 集成電路(Integrated circuit, IC) 900 1995 Very large scale IC, VLSI 2,400,000 2013 Ultra large scale IC 250,000,000,000
CPU 芯片製造:
延伸資料:
效能比較:X 與 Y 的效能比數 n
執行時間(Execution Time):
系統效能針對的時間計量方式是前者,而 CPU 效能針對的時間計算方式是後者。
Elapsed time / Response time / Wall clock time
跑某個任務實際花費的時間,包含了 I/O、訪存、操作系統 overhead… 等等。
CPU time / CPU execution time
CPU 跑某個任務消耗的時間,實際上是 CPU 跑了多少個時鐘週期,不包含其他部分消耗所花費的時間(I/O、線程切換… 等等)。
指令總數 計算指令 記億體載入指令 記憶體貯存指令 分支指令 20 萬 45% 20% 15% 20%
機器 時脈頻率 計算 CPI 記億體載入 CPI 記憶體貯存 CPI 分支 CPI P1 1 GHz 1 8 8 2 P2 1.5 GHz 80%:1 20%:2 10 10 2
- 期中考題目(2013 - 1):
一個程式在時脈頻率分別是 1 GHz 和 1.5 GHz 的兩部計算機(P1 和 P2)上執行,共執行了 20 萬個指令。其中,計算指令佔 45%,記億體載入指令佔 20%,記憶體貯存指令佔 15%,分支指令佔 20%。P1 的計算指令的 CPI 是 1,記憶體載人或貯存指令是 8,分支指令是 2。P2 的計算指令中有 80% 的 CPI 仍是 1,剩餘的計算指令的 CPI 增為 2,記億體載入或貯存指令是 10,分支指令也是 2。
請計算 P1 和 P2 的:
(1) 總執行時間。
(2) CPI。
(3) 比較兩機器的效能。
- 解答:
- P1 的總執行時間:
- P2 的總執行時間:
- P1 的 CPI:
- P2 的 CPI:
功耗與主頻幾乎是同等上升的,主頻越高功耗越高。電腦發展前期主頻提升很快,但到了最近,處理器的主頻基本不再提升,因為功耗已經達到了一個相當高的程度,在散熱等其他方面還沒辦法跟上的時候只能限制主頻的提升。
動態功耗
CPU 的功耗主要是動態功耗,來源於晶體管的開關切換,即高低電平(0 和 1)之間的翻轉,這中間本質上是個充放電的過程,所以這部分能量消耗是無法避免的。
(動態功耗)(電容負載) (電壓) (切換頻率)
靜態功耗
主要來自於晶體管的漏電流,幾乎無法避免,只能通過工藝改進來減少。
受到工藝、功耗等的限制,已經無法再單純地提升主頻來提高處理器的性能,因此轉向提高處理器的並行能力來繼續發展 CPU,即單核心開始向多核心轉變。
之前由於性能的提升主要在工藝、硬體層面,對於軟體來說影響很小。但是多核處理器出現之後,也推動了平行演算法的發展,因為只有從演算法上進行改良才能夠更好地利用多核心處理器的優勢。
提及:每秒百萬指令(Millions of Instructions Per Second, MIPS)。
Amdahl's law(阿姆達爾定律):
第二章將以 MIPS 為例,介紹計算機指令。
設計原則 1:簡單有助於規整(Simplicity favors regularity)。
設計原則 2:越少越快(Smaller is faster)。
MIPS 有 32 個 32 位元的暫存器。
設計原則 3:優化常見情況(Make the Common Case Fast)
Example:A[12] = h + A[8];(h in $s2, base address of A in $s3)
2s-Complement Signed Integers
Example:negate +2
Sign Extension
Example:8-bit → 16-bit
指令將以二進位呈現(所謂的機器語言, machine code)
以 MIPS 為例,指令以 32 位元的二進位呈現
MIPS R-format Instructions(R:Register)
op | rs | rt | rd | shamt | funct |
---|---|---|---|---|---|
6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits |
Example:add $t0, $s1, $s2
special | $s1 | $s2 | $t0 | 0 | add |
---|---|---|---|---|---|
0 | 17 | 18 | 8 | 0 | 32 |
000000 | 10001 | 10010 | 01000 | 00000 | 100000 |
MIPS I-format Instructions(I:Immediate)
op | rs | rt | constant or address |
---|---|---|---|
6 bits | 5 bits | 5 bits | 16 bits |
設計原則 4:好的設計需要適宜的折衷方案
(Good design demands good compromises)
Examples:
Instruction | Format | op | rs | rt | rd | shamt | funct | address |
---|---|---|---|---|---|---|---|---|
add | R | 0 | reg | reg | reg | 0 | n.a. | |
sub (subtract) | R | 0 | reg | reg | reg | 0 | n.a. | |
add immediate | I | reg | reg | n.a. | n.a. | n.a. | constant | |
lw (load word) | I | reg | reg | n.a. | n.a. | n.a. | address | |
sw (store word) | I | reg | reg | n.a. | n.a. | n.a. | address |
Operation | C | Java | MIPS |
---|---|---|---|
Shift left | << | << | sll |
Shift right | >> | >>> | srl |
Bitwise AND | & | & | and, andi |
Bitwise OR | | | | | or, ori |
Bitwise NOT | ~ | ~ | nor |
Question:
Why don't we use blt, bge, …?
Answer:
Hardware for "<, ≥, …" is slower than "=, ≠".
On the other hand, we can make beq and bne common case.
暫存器號 符號名 用途 0 zero 看起來象浪費,其實很有用 1 at 保留給彙編器使用 2~3 v0、v1 函式返回值 4~7 a0~a3 前頭幾個函式引數 8~15 t0~t7 臨時暫存器,子過程可以不儲存就使用 16~23 s0~s7 暫存器變數 24~25 t8、t9 同 8~15(t0~t7),臨時暫存器 26、27 k0、k1 保留給異常處理函式使用 28 gp global pointer:方便存取全域或靜態變數 29 sp stack pointer 30 s8 / fp 第 9 個暫存器變數,也可用做 frame pointer 31 ra 返回地址
Byte / Halfword Operations
MIPS J-format Instructions(J:Jump)
op | address |
---|---|
6 bits | 26 bits |
提及:原子操作(Atomic read / write memory operation)
The object file(for UNIX systems)typically contains six distinct pieces, provide information for building a complete program:
設計原則(Design principles)
在本章中應該要學會的指令
Instruction class | MIPS examples |
---|---|
算數 / Arithmetic | add、sub、addi |
資料 / Data transfer | lw、sw、lb、lbu、lh、lhu、sb、lui |
邏輯 / Logical | and、or、nor、andi、ori、sll、srl |
條件 / Condition branch | beq、bne、slt、slti、sltiu |
跳轉 / Jump | j、jr、jal |
提及:算術邏輯單元(Arithmetic Logic Unit, ALU)。
名詞:multiplicand(被除數)、multiplier(除數)。
使用 2 個 32 位元的暫存器儲存 remainder(餘數)和 quotient(商數)
MIPS Division Instructions
No overflow or divide-by-0 checking.
Software must perform checks if required.
硬體運作圖:
邏輯流程圖:
Two representations
S | Exponent | Fraction |
---|---|---|
1 bit | 8 bits | 23 bits |
S | Exponent | Fraction |
---|---|---|
1 bit | 11 bits | 52 bits |
Single-Precision Range
Double-Precision Range
Floating-Point Examples:
Represent –0.75:
- S = 1
- Fraction =
- Exponent = –1 + Bias
Single:–1 + 127 = 126 = (8 bits)
Double:–1 + 1023 = 1022 = (11 bits)- Answer =
Single:
Double:
What number is represented by this single-precision float:
()
- S = 1
- Fraction = (23 bits)
- Exponent = (8 bits) =
- Answer
Special cases:
FP Instructions:
Right Shift and Division
Only for unsigned integers
Assumptions of associativity may fail
Instruction Fetch
處理指令需要的幾個基本元素:
R-format Instructions
I-format Instructions
Branch Instructions
R-Type/Load/Store Datapath
Full Datapath
Pipelining Analogy
MIPS Pipeline
Pipeline Performance
Pipeline Speedup
Hazards(冒險)
Forwarding(轉發)(a.k.a. Bypassing, 旁路)
技術 | 訪問時間 | 價格 / GB | 特點 |
---|---|---|---|
SRAM | 0.5~2.5 ns | $500~1000 | 數據用晶體管存儲,可直接讀取數據。 |
DRAM | 50~70 ns | $10~20 | 數據用電容存儲,會不斷漏電,需要定期刷新。(充電) |
FLASH | 5k~50k ns | $0.75~1.00 | EEPROM,電擦除可編程存儲器,有讀寫次數上限。 |
HDD | 5M~20M ns | $0.05~0.10 | 用磁盤、磁頭等進行磁性存儲和讀寫。 |
Measure Cache Performance
Cache Performance Example
Average Access Time
Performance Summary
Associative Caches
Example of Associative Cache
Spectrum of Associativity
Associativity Example
How Much Associativity
Set Associative Cache Organization
Replacement Policy
Multilevel Caches
Dependability
Dependability Measures
為了提高MTTF 的三種方案: