or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
計算機結構課程
黃婷婷教授的 Computer Architecture 課程錄影
第 01 講 Course Outline
Q: 為什麼電腦不用十進位而用二進位?
A: signal 的 voltage 只能分成 high 和 low \(\to\) 只能有兩種state
電子電路:
數位電子學
Computer Architecture
Q: What is Computer Architecture?
A: Computer Architecture = Instruction Set Architecture + Machine Organization
第 02 講 Computer's history
第 03, 04 講 Computer Abstractions and Technology
Performance
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →response time 不一定會增加,需要考慮是否能夠將工作做好分工。但是 throughput 一定會增加,單位時間內可以完成的工作量肯定加倍。 response time 和 throughput 不一定有絕對關係。
Elapsed time: total response time, include all aspects (Processing, I/O, OS overhead, idle time) \(\to\) Determine system performance
CPU time: Time spent processing a given job ,只關心 cpu 花的時間,不計算 I/O 或 idle time 。
\(\text{performance} = \frac{1}{\text{Execution Time}}\)
比較 X 和 Y 的 performance ,若 X is n times faster than Y ,則滿足以下等式:
\(n = \frac{Performance_X}{Performance_Y} = \frac{Execution\ time_Y}{Execution\ time_X}\)
Clock period (clock cycle time): duration of a clock cycle
Clock frequency (clock rate): cycles per second
\[ \begin{align} \text{CPU time} &= \text{CPU Clock Cycles} \times \text{Clock Cycle Time}\\ &= \frac{\text{CPU Clock Cycles}}{\text{Clock Rate}} \\ &= \text{CPU 要幾個 cycle }\times\text{一個 cycle 時間} \end{align} \]
如果用更高階的觀點來看,則會考慮一個程式需要幾個 instruction 來完成
\[\begin{align}\text{Clock Cycles} &= \text{ Instruction Count} \times \text{Cycles per Instruction(CPI)}\\ \text{CPU Time} &= \text{ Instruction Count} \times \text{CPI } \times \text{Clock Cycle Time} \\ &= \frac{\text{Instruction Count} \times \text{CPI}}{\text{Clock Rate}} \end{align}\]
一個程式需要多少 instruction 來完成跟以下因素有關:
不同的 instruction 會有不同的 clock cycle
\(Clock\ Cycles = \displaystyle\sum_{i = 1}^{n}(CPI_i \times Instruction\ Count_i)\)
Weighted average CPI
\(CPI = \frac{\displaystyle\sum_{i = 1}^{n}(CPI_i \times Instruction\ Count_i)}{Instruction\ Count}\)
MIPS(Millions of Instruction Per Second)
\(MIPS = \frac{Instruction\ Count}{Execution\ time \times 10^6} = \cfrac{Instruction\ Count}{\frac{Instruction\ Count \times CPI}{Clock\ Rate} \times 10 ^6} = \frac{Clock\ Rate}{CPI \times 10^6}\)
Performance Summary
\(CPU\ Time = \frac{Seconds}{Program} = \frac{Instructions}{Program} \times \frac{Clock\ Cycles}{Instruction} \times \frac{Seconds}{Clock\ Cycle}\)
\(T_{improved} = \frac{T_{affected}}{improvement\ factor} + T_{unaffected}\)
\(a = \frac{1}{10^4}\)
Power Consumption
\(Dynamic\ Power\ Consumption = Capacitive\ load \times Voltage^2 \times Frequency\)
Instruction set architecture
MIPS Register Convention:
R Type Instruction
Q: 為什麼不直接把 opcode 的長度增加就好?
A: 因為如果把 opcode 的長度增加,i type 和 j type opcode 的長度也要增加,那 i type 的 immediate 的長度和 j type 的 target address 的長度就會減少
add, sub, and, or, slt (set on less than)
add $s0, $s1, $s2
sll (shift left logical), srl (shift right logical), sra (shift right arithmatic)
srl $s0, $s1, 4
I Type Instruction
addi, andi, slti
constant: 16 bits 2's complement stored in immediate
addi $s0, $s1, -50
lw, sw, lb, lbu, sb, lh, sh, lhu
offset: stored in immediate,以 byte 為單位
sw $s0, 12(\$s1)
beq, bne (conditional branch)
bne $s0, $s1, Exit
J Type Instruction
If Statement
c code:
f, g, …, j: $s1, $s2, … $s4
MIPS code:
Loop statement
c code:
i in $s3, k in $s5, address of save in $s6
MIPS code:
Procedure Call
in caller:
in callee:
in caller:
32-bit Constant
第10~14講 Computer Arithmetic (7/1)
ALU
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →MIPS Multiplication
mfhi rd
mflo rd
MIPS Division
div rs, rt
mfri rd
mflo rd
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Floating Point
IEEE 754 standard
significand: 因為大家都是 1.xxxxxxxx,所以 leading 1 就不儲存了,這樣可以存更多個 bit
single precision (32 bit):
normalized number = \((-1)^{sign} \times 1.significand \times 2^{exponent - 127}\)
special value:
double precision (64 bit):
Floating-Point Addition
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Floating-Point Multiplication
MIPS Floating Point
separate floating point instructions:
FPU:
參考資料: http://www.ece.lsu.edu/ee4720/2014/lfp.s.html
第15~17講 Single-Cycle Processor
Storage Element
Register File
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →one 32-bit input bus: busW
RB selects the register to put on busB
RW selects the register to be written via busW when Write Enable is 1
Memory
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →one output bus: Data Out
Write Enable: address selects the memory word to be written via the Data In bus
Datapath
A Single Cycle Datapath
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Datapath with Control Unit
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Datapath with Control and Jump Instruction
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Control Unit
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →ALU Control
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →真值表:
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →得到 logic equation:
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →提問: 為什麼 func3, func2, func1, func0 不是 don't care?
電路圖:

Main Control
How to Design a Processor
第18~21講 Pipelining
Pipeline
概念
Split Single-cycle Datapath into 5 Steps: IF(instruction fetch), ID(instruction decode and register file read), EX(execution or address calculation), MEM(data memory access), WB(write back)

加上 Pipeline Register:

已經得到但還沒用到的資源也要繼續傳下去
Control signal
Pipeline Hazard
Structural Hazard:
Data Hazard
Data Hazard and Forwarding (R-Type and R-Type)
RAW(read after write): i2 tries to read operand before i1 write it

WAR(write after read): i2 tries to write operand before i1 read it
WAW(write after write): i2 tries to write operand before i1 write it
insert the NOPs \(\to\) slow us down

forwarding

ForwardA = 10
ForwardB = 10
ForwardA = 01
ForwardB = 01
Data Hazard and Stalling (Load and R-Type)
\(\to\) stall the pipeline for one cycle
Control Hazard (Branch Hazard)
在 instruction fetch 的時候做
1-bit predictor: 只要一次錯就改 table
\(\to\) 改良成 2-bit predictor,即兩次 predict 錯才改 table
even with predictor, still need to calculate target address \(\to\) 1-cycle penalty for a taken branch
Exception
Instruction-Level Parallelism (ILP)
IPC = 5/4 = 1.25
IPC = 14/8 = 1.75
第22~26講 Memory (7/8)
Memory Technology
Memory Hierarchy
at any given time, data is copied between only two adjacent levels
block: the basic unit of information transfer
two different types of locality:
using the principle of locality:
terminology
Cache
direct-mapped cache
block placement:
在 cache 裡要儲存 valid bit + tag + data
block size:
在剛好都要 access 餘數相同的 block 時,會一直 cache miss,可是有可能其他空間都是空的,會浪費且沒有效率,所以就思考有沒有其他更有效率使用空間的方式
Associative Caches
Cache Missses
Cache Performance
with simplifying asssumptions:
\(Memory\ stall\ cycles \\ = \frac{Memory\ accesses}{Program} \times Miss\ rate \times Miss\ penalty \\ = \frac {Insructions}{Program} \times \frac{Misses}{Instruction} \times Miss\ penalty\)
\(Average\ Memory\ Access\ Time\ (AMAT) = hit\ time + miss\ rate \times miss\ penalty\)
Actual CPI = base CPI + Miss CPI
Main Memory
Memory Design to Support Cache
Access of DRAM