Computer Architectures
NTNU 計算機結構
Ch.02 Instructions: Language of the Computer
2.1 Introduction
Instruction Set 指令集
- A repertoire of instruction of a computer
計算機的指令集
- 不同的電腦會有不同的指令集
- 早期的電腦都有非常簡單的指令集
- 許多現代電腦也有簡單的指令集
MIPS Instruction Set MIPS 指令集
- Large share of embedded core market
- Applications in consumer electronics, network/storage equipment, cameras, printers…
2.2 Operations of Computer Hardware
Arithmetic Operations 算術運算
- Add and subtract, three operands 加減法、三個操作數
- Two sources and one destination
兩個來源一個目標
- 【Design Principle 1】Simplicity favors regularity 簡潔有助於規律性
- 規律性有助於讓實作更簡單
- Simplicity enables higher performance at lower cost
- 簡化可以更容易最佳化
Example
2.3 Operands of Computer Hardware
Register Operands 暫存器的運算子
- used by Arithmetic instructions
- MIPS has 32 * 32-bit register file
- used for frequently accessed data
- Numbered 0 to 31
- word 字節: 32-bit data
- Assembler names 組譯器中的名字
- temporary values 暫時的數值:
$t0
,$t1
,\(\dots\),$t9
- saved variables 儲存的變數:
$s0
,$s1
,\(\dots\),$s7
- 【Design Principle 2】Smaller is faster 越小越快
- c.f. main memory 主記憶體: million of locations 百萬個位置
Example
- C Code
a,b,c,d,e 分別存在
$s0
,$s1
,$s2
,$s3
,$s4
- Compiled MIPS code
Memory Operands 記憶體的運算
- Main memory is used for composite data
主記憶體是用於綜合資料
- Array 陣列、 structure 結構、 dynamic data 動態資料…
- 用於算術運算(arithmetic operations)
- Load values from memory into register
將數值從記憶體載入到快取
- Store result from register to memory
將結果從快取存入到記憶體
- Memory is byte addressed
記憶體用byte定址
- Each address identifies an 8-bit/byte
每個位置表示一個 8-bit/byte
- Words are aligned in memory
字節在記憶體中為對齊的
- Address nust be a multiple of 4
地址必為四的倍數
- MIPS is Big Endian
- Big-Endian 是指資料放進記憶體中的時候,最高位的位元組會放在最低的記憶體位址上
- Little-Endian 則相反
Example 1
Example 2
h 在 $s2
, A的基礎位置在 $s3
- Compiled MIPS Code
- Index 8 需要 offset of 32
- Index 12 需要 offset of 48
Registers v.s. Memory 暫存器 v.s. 記憶體
- Registers are faster to acess than memory
暫存器相較記憶體可以更快的存取
- Operating on memory data requires loads and stores
在記憶體上做運算還需要加載跟保存的時間
- More instructions to be executed
需要更多指令
- Compilers must use registers for variable as much as possible
編譯器必須盡可能把暫存器用在變數上
- Only spill to memory for less frequently used variables
真的沒有空間可以用才會將不常使用的變數放到記憶體中
- Register optimization is important
暫存器最佳化很重要
- Constant data specified in an instruction
指令中指定的常數數據
- No subtract immediate instruction 沒有減去的直接指令
- 直接加一個負數(negative constant)
- 【Design Principle 3】Make the common case faster 讓常見情況處裡的更快
- Small constants are common
小的常數的運算很常見
- Immediate operand avoids a load instruction
直接操作可以免除常數加載到暫存器的指令
2.4 Signed and Unsigned Numbers
Constant Zero 常數零
- MIPS暫存器 0 (
$zero
) 為常數 0
- Cannot be overwritten
不能被覆寫
- Useful for common operations
對常做的操作有用的
- e.g. move between registers
在暫存器之間搬移資料
Unsigned Binary Integers 無符號二進位整數
- Given an n-bit number
給一個 n-bit 的數字
\(x=x_{n-1}2^{n-1}+x_{n-2}2^{n-2}+\cdots+x_12^1+x_02^0\)
- 範圍:\(0\) to \(+2^n-1\)
- Example
- \(\begin{aligned}
&\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 1011_2
\\&= 0 + \cdots + 1×2^3 + 0×2^2 +1×2^1 +1×2^0
\\&= 0 + \cdots + 8 + 0 + 2 + 1 = 11_{10}
\end{aligned}\)
- Using 32 bits
\(0\) to \(+4,294,967,295(2^{32}-1)\)
2s-Complement Signed Integers 二補數有符號整數
- 給一個 n-bit 的數字(最高位不一樣)
\(x=-x_{n-1}2^{n-1}+x_{n-2}2^{n-2}+\cdots+x_12^1+x_02^0\)
- 範圍:\(-2^{n-1}\)~\(+2^{n-1}-1\)
- Example
- \(\begin{aligned}
&\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1100_2
\\&= –1×2^{31} + 1×2^{30} + … + 1×2^2 +0×2^1 +0×2^0
\\&= –2,147,483,648 + 2,147,483,644 = –4_{10}
\end{aligned}\)
- 使用 32 bits
\(–2,147,483,648\) ~ \(+2,147,483,647\)
- Bit 31 為符號 bit
- \(-(-2^{n-1})\) 不能表示
- 非負數字有相同的無符號及二補數表示
- 一些特殊的數字
- \(0:0000\ 0000 \cdots 0000\)
- \(–1:1111\ 1111 \cdots 1111\)
- 最負:\(1000\ 0000\ \cdots\ 0000\)
- 最正:\(0111\ 1111\ \cdots\ 1111\)
Signed Negation 整數正負號變換
- Complement and add 1
- complement: \(0\rightarrow 1,1\rightarrow 0\)
\(\begin{aligned}
x+\bar x&=1111\cdots 111_2=-1\\
\implies\bar x+1&=-x
\end{aligned}\)
- Example: negate+2
- \(+2=0000\ 0000\cdots 0010_2\)
- \(\begin{aligned}-2&=1111\ 1111\cdots 1101_2+1\\&=1111\ 1111\cdots 1110_2\end{aligned}\)
Sign Extension 有符號擴展
- Representing a number using more bits
用更多 bits 表示一個數字
- Preserve the numeric value 保留數字
- In MIPS instruction set
在 MIPS 指令集中
addi
: extend immediate value 擴展立即數值
lb
,lh
: extend loaded byte/halfword 擴展加載的
beq
,bne
: extend the displacement (之後才會介紹)
- Replicate the sign bit to the left
左邊的 bit 用 sign bit 填滿
- unsigned values: extend with 0s
- Examples: 8-bit to 16-bit
- +2: 0000 0010 → 0000 0000 0000 0010
- -2: 1111 1110 → 1111 1111 1111 1110
2.5 Representing Instructions in the Computer
Represent Instructions
- Instructions are encoded in binary
- MIPS instructions
- Encoded as 32-bit instruction words
- Small number of formats encoding operation code(opcode), register numbers, …
- Regularity
$t0
- $t7
: registers 8 - 15
$s0
- $s7
: registers 16 - 23
$t8
- $t9
: registers 24 - 25

- Instruction fields
op
: operation code (opcode)
rs
: first source register number
rt
: second source register number
rd
: destination register number
shamt
: shift amount (000000 for now)
funct
: function code (extends opcode)
Hexadecimal
- Base 16
- Compact representation of bit strings
- 4 bits per hex digit

- Immediate arithmetic and load/store instructions
rt
: destination or source register number
- Constant: \(-2^{15}\) to \(+2^{15}-1\)
- Address: offset added to base address in
rs
- 【Design Principle 4】Good design demands good compromises
好的設計有好的妥協
- Different formats complicate decoding, but allow 32-bit instructions uniformly
雖然解碼變複雜,但維持所有指令的長度都是32-bit
- Keep formats as similar as possible
保持部分欄位是相似的
(op
、rs
、rt
的欄位在 R,I-format 一樣)
Store Program Computers

- Instructions are represented in binary just like data
- Instructions and data are stored in memory
- Programs can operate on programs
- e.g. compilers, linkers, …
- Binary compatibility allows compiled programs to work on different computers
2.6 Logical Operations
Logical Operations
- Instructions for bitwise manipulation
- Useful for extracting and inserting groups of bits in a word
Operation |
C |
Java |
MIPS |
Shift left |
<< |
<< |
sll |
Shift right |
>> |
>>> |
srl |
Bitwise AND |
& |
& |
and , andi |
Bitwise OR |
| |
| |
or , ori |
Bitwise NOT |
~ |
~ |
nor |
Shift Operations

- shamt: how many positions to ahift
- Shift left logical:
sll
- Shift left and fill with 0 bits
sll
by i bits multiplies by \(2^i\)
- ~Shift right logical:
srl
- Shift right and fill with 0 bits
srl
by i bits divides by \(2^i\) (unsigned only)
AND Operations
- Useful to mask(遮罩) bits in a word
- Select some bits, clear others to 0
- Example:
and $t0, $t1, $t2

OR Operations
- Useful to include bits in a word
- Set some bits to 1, leave others unchanged
設定特定 bit 為 1
- Example:
or $t0, $t1, $t2

NOT Operations
- Useful to invert bits in a word
- Change 0 to 1, and 1 to 0
- MIPS has NOR 3-operand instruction
- Example: a NOR b == NOT (a or b)
nor $t0, $t1. $zero

Conditional Operation
- Branch to a labeled instruction if a condition is true
- Otherwise, continue sequentially
beq rs, rt, L1
- if (rs == rt) branch to instruction labeled L1
bne rs,rt, L1
- if (rs != rt) branch to instruvciton labeled L1
j L1
- unconditional kump to instruction labeled L1
2.7 Instructions for Making Decisions
Compiling If Statements

- C code
- Compiled MIPS code
Compiling Loop Statements
- C Code
- Compiled MIPS code
Basic Blocks
- A basic block is a sequence of instructions with …
- No embedded branches( except at end)
- No branch targets ( expect at beginning)
- A compiler identifies basic blocks for optimization.
- An advanced processor can accelerate execution of basic blocks.
More Conditional Operations
- Set result to 1 if a condition is true
slt rd, rs, rt
if( rs < rt ) rd = 1; else rd = 0;
slti rt, rs, constant
if( rs < constant ) rt = 1; else rt = 0
- Use in combination with
beq
bne
Branch Instruction Design
- Why not
blt
, bqe
?
為何在 MIPS 中沒有設計這兩個指令
- Hardware for <, > ,… is slower than =, \(\neq\)
- Combine with branch involves more work per instruction, requiring a slower clock.
- All instructions are penalized
因為指令處理的方式,如果有某些指令特別長,那其他沒那麼久的也會跟著一起變慢
Signed v.s. Unsigned
- Signed comparison:
slt
, slti
- Unsigned comparison:
sltu
, sltui
- Example
2.8 Supporting Procedures in Computer Hardware
Procedure Calling
- caller 跟 callee 執行權的轉移
- caller 就是原本在的地方
- callee 就是呼叫 function 的東西
- 因為他們共用 register,所以需要做一些處理避免資料遺失
- 透過 register 溝通
- Steps required
- Place parameters in registers
- caller 設定要呼叫的prcedure的參數,執行權從 caller 到 callee
- Transfer control to procedure
- Acquire storage for procedure
- callee 開始執行程式
- 原本 caller 用的 register 的資料存起來
- 因為 callee 才知道需要多少空間
- Perform procedure’s operations
- Place result in register for caller
- Return to place of call
- 交執行權還給caller
- 在 Step 3 的時候如果有保留的數值,需要在這步之前將數值還原回原本的樣子
Register Usage
|
type |
register |
|
$v0, $v1 |
result values |
2, 3 |
procedure calling 中 output |
$a0 - $a3 |
arguments |
4 - 7 |
procedure calling 中input |
$t0 - $t7 |
temporary values |
8 - 15 |
can be overwritten by callee |
$s0 - $s7 |
saved variables |
16 - 23 |
must be saved/ restored by callee |
$t8 - $t9 |
temporary values |
24, 25 |
can be overwritten by callee |
$gp |
global pointer for ststic data |
28 |
|
$sp |
stack pointer |
29 |
procedure calling 中可以隨意使用的 |
$fp |
frame pointer |
30 |
|
$ra |
return address |
31 |
procedure calling 中儲存 caller 的位置 |
Procedure Call Instructions
- Procedure call: jump and link
jal ProcedureLabel
- Address of the following instruction put in $ra
- Jump to target address
- Procedure return: jump register
jr $ra
- Copy $ra to program counter
- Can also be used for computed jumps
- E.g., for case/switch statements
Local Data on the Stack
- Local data is allocated by callee.
- E.g., C automatic variables
- Procedure frame (activation record)
- Used by some compilers to manage stack storage

- stack 從最高開始往下放
$sp
指向 stack 最底
Leaf Procedure Example
Non-Leaf Procedures
- Procedures that call other procedures
- For nested(嵌套) call, caller needs to save on the stack …
- Its return address(它的 caller)
- Any arguments and temporaries needed after the call
- Restore from the stack after the call
Example
Memory Layout

- Text: program code
- Static data: global variables
- E.g., static variables in C, constant arrays and strings
$gp
is initialized to address allowing ±offsets into this segment.
- Dynamic data: heap
- E.g., malloc in C, new in Java
- Stack: automatic storage
Character Data
- Byte-encoded character set
- ASCII: 128 characters = 95 graphic + 33 control
- Latin-1: 256 characters = ASCII + 96 more graphic characters
- Unicode: 32-bit character set
- Used in Java, C++ wide characters, …
- Most of world’s alphabets, plus symbols
- UTF-8, UTF-16: variable-length encodings
Byte/Halfword Operations
- Could use bitwise operations
- MIPS byte/halfword load/store
- String processing is a common case.
- Sign extend to 32 bits in
rt
- Zero extend to 32 bits in
rt
- Store just rightmost byte/halfword
String Copy Example 字串拷貝
- C Code
- Null-terminated string
最後一個index為NULL的字串
- MIPS
Communicating with People
32bit Constant
- Most constant are small 多數的常數都很小
- 16-bit immediate us sufficient 通常都只有16個bits
- For an occasional 32-bit constant
lui rt, constant
- Copy 16-bit constant to left 16 bits or
rt
- Clear right 16 bits of
rt
to 0
Branch Addressing
- Branch instructions specify…
- Opcode, two registers, target address
- Most branch targets are near branch.
- Forward or backward

- PC-relative addressing (PC (program counter)相對定址)
- Target address = PC + offset * 4 (MIPS一個指令4個 byte)
- PC is already incremented by 4 by this time
Jump Addressing
- Jump (
j
and jal
) targets could be anywere in text segment
- Encode full address in an instruction
- (Pseudo) Direct jump addressing
- Target address = PC31…28:(address*4)
4個跟PC借 接下來26個bits 剩下直接x4
Target Addressing Example
- Loop code from the earlier example
- Assume Loop at location 80000

Branch Far Away
- If a branch target is too far to encode with 16-bit offset assembler rewrites the code
- 當要跳過去的目標位置太遠無法用 16 bit 表達,可以改寫指令使存放 address 的 bit 更大
- Example
Addressing Mode Summary

2.10 Parallelism and Instructions: Synchronization
Synchronization 同步
- Two processors share an area of memory
- P1 writes, then P2 reads
- Data race if P1 and P2 don't synchronize(race 資料不同)
- Result depends of order of accesses
- Hardware support is required.
- Atomic read/write memory operation
- No other access to a location is allowed between read and write.
- 一段 atomic operation 間不能被其他指令中斷
- Could be a single instruction
- E.g., atomic swap of register ↔ memory
提供資料對換
- Or an atomic pair of instructions
Synchronization in MIPS
- Load linked:
ll rt, offset(rs)
- Store conditional:
sc rt, offset(rs)
- Succeed if the location is not changed since the
ll
- Fail if the location is changed
- 檢查
rt
的值在 ll
和 sc
間是否被更改過
- Example: atomic swap
$s4
with M[$s1]
(to test/set lock variable)
2.11 Translation and Starting a Program
Translation and Startup

Assembler Pseudoinstructions
- Most assembler instructions represent machine instructions one-to-one.
- Pseudoinstructions: figments of the assembler’s imagination
$at
(register 1): assembler temporary
Produce Object Module
- An assembler (or compiler) translates a program into machine instructions.
- Provide information for building a complete program
- Header: describe contents of object module
描述檔案中有哪些內容
- Text segment: translated instructions
機器語言碼
- Static data segment: data allocated for the life of the program
整個程式都需要的資料(全域變數
- Relocation information: for contents that depend on absolute location of loaded program
要根據絕對記憶體位置運行的數據的位置
- Symbol table: undefined labels, e.g., global definitions and external references
未定義的標籤(不包含在該檔案中的東西)
- Debug information: for associating with source code
紀錄給方便除錯的東西
Link Object Modules
- Produce an executable image
產生一個可執行的映象檔
- Merge segments
把不同的區塊結合在一起
- Resolve labels (determine their address)
把在 library 中的 label 找出來
- Patch location-dependent and external references
- Could leave location dependencies for fixing by a relocating loader
- But with virtual memory, no need to do this
- Program can be loaded into absolute location in virtual memory space.
Load a Program
- Load from an image file on a disk into memory
- Read the header to determine segment sizes
- Create a virtual address space
- Copy the text and initialized data into memory
- Or set page table entries so they can be faulted in
- Set up arguments on stack
- Initialize registers (including $sp, $fp, $gp)
- Jump to startup routine
- Copy arguments to $a0, … and call the main routine
- When the main routine returns, do exit() system call
Dynamic Linking
- Only link/load a library procedure when it is called
- Require the procedure code to be relocatable
- Avoid image bloat caused by static linking of all (transitively) referenced libraries
- Automatically pick up new library versions
Lazy Linkage

Start Java Applications


2.12 A C sort Example to Put It All Together
C Swap Example
- Illustrate use of assembly instructions for a C bubble sort function
- Swap procedure
- v in $a0, k in $a1, temp in $t0
Procedure Swap
C Procedure Sort
Procedure Body
Full Procedure
Lesson Learnt
- Instruction count and CPI are not good performance indicators in isolation.
- Compiler optimizations are sensitive to algorithms.
- Java/JIT compiled code is significantly faster than JVM interpreted.
- Comparable to optimized C in some cases
- Nothing can fix a dumb algorithm!
Arrays versus Pointers
- Array indexing involves …
- Multiply index by element size
- Add to array base address
- Pointers correspond directly to memory addresses
- Can avoid indexing complexity
Real Stuff: ARM Instructions, x86 Instructions, ARM v8(64-bit) Instructions
Fallacies and Pitfalls