Computer Architectures

NTNU 計算機結構

Back to Note Overview
Back to Computer Architectures
tags: NTNU CSIE 必修 Computer Architectures 109-2

Ch.02 Instructions: Language of the Computer

2.1 Introduction

Instruction Set 指令集

  • A repertoire of instruction of a computer
    計算機的指令集
  • 不同的電腦會有不同的指令集
    • 但有許多方面會相同
  • 早期的電腦都有非常簡單的指令集
    • 簡化設計
  • 許多現代電腦也有簡單的指令集

MIPS Instruction Set MIPS 指令集

  • Large share of embedded core market
    • Applications in consumer electronics, network/storage equipment, cameras, printers

2.2 Operations of Computer Hardware

Arithmetic Operations 算術運算

  • Add and subtract, three operands 加減法、三個操作數
    • Two sources and one destination
      兩個來源一個目標
    ​​​​add a,b,c #a=b+c ​​​​sub a,b,c #a=b-c
    • 所有算術運算都是這個格式
  • 【Design Principle 1】Simplicity favors regularity 簡潔有助於規律性
    • 規律性有助於讓實作更簡單
    • Simplicity enables higher performance at lower cost
    • 簡化可以更容易最佳化
Example
  • C Code
    ​​​​a = ( b + c ) - ( d + e )
  • MIPS Code
    ​​​​add t0, b, c #temp t0 = b+c ​​​​add t1, d, e #temp t1 = d+e ​​​​sub a,t0,t1 # a = t0-t1

2.3 Operands of Computer Hardware

Register Operands 暫存器的運算子

  • used by Arithmetic instructions
  • MIPS has 32 * 32-bit register file
    • used for frequently accessed data
    • Numbered 0 to 31
    • word 字節: 32-bit data
  • Assembler names 組譯器中的名字
    • temporary values 暫時的數值: $t0,$t1,\(\dots\),$t9
      • saved variables 儲存的變數: $s0,$s1,\(\dots\),$s7
  • 【Design Principle 2】Smaller is faster 越小越快
    • c.f. main memory 主記憶體: million of locations 百萬個位置
Example
  • C Code
    ​​​​a=(b+c)-(d+e)
    a,b,c,d,e 分別存在 $s0,$s1,$s2,$s3,$s4
  • Compiled MIPS code
    ​​​​add $t0,$s1,$s2 ​​​​add $t1,$s3,$s4 ​​​​sub $s0,$t0,$t1

Memory Operands 記憶體的運算

  • Main memory is used for composite data
    主記憶體是用於綜合資料
    • Array 陣列、 structure 結構、 dynamic data 動態資料
  • 用於算術運算(arithmetic operations)
    • Load values from memory into register
      將數值從記憶體載入到快取
    • Store result from register to memory
      將結果從快取存入到記憶體
  • Memory is byte addressed
    記憶體用byte定址
    • Each address identifies an 8-bit/byte
      每個位置表示一個 8-bit/byte
  • Words are aligned in memory
    字節在記憶體中為對齊的
    • Address nust be a multiple of 4
      地址必為四的倍數
  • MIPS is Big Endian
    • Big-Endian 是指資料放進記憶體中的時候,最高位的位元組會放在最低的記憶體位址上
    • Little-Endian 則相反

Example 1

  • C code
g=h+A[8];
  • g in $s1, h in $s2, base adddddress of A in $s3

  • Compiled MIPS code

    • Index 8 requires offset of 32
      • 4bytes per word
    ​​​​lw $t0, 32($s3) # load word ​​​​# 32 is offset ​​​​# $s3 is base register ​​​​add $s1, $s2, $t0 ​​​​

Example 2

  • C Code
A[12] = h + A[8] ;

h 在 $s2 , A的基礎位置在 $s3

  • Compiled MIPS Code
    • Index 8 需要 offset of 32
    • Index 12 需要 offset of 48
    ​​​​lw $t0, 32($s3) # load word ​​​​add $t0, $s2, $t0 ​​​​sw $t0, 48($s3) # store word

Registers v.s. Memory 暫存器 v.s. 記憶體

  • Registers are faster to acess than memory
    暫存器相較記憶體可以更快的存取
    • Operating on memory data requires loads and stores
      在記憶體上做運算還需要加載跟保存的時間
    • More instructions to be executed
      需要更多指令
  • Compilers must use registers for variable as much as possible
    編譯器必須盡可能把暫存器用在變數上
    • Only spill to memory for less frequently used variables
      真的沒有空間可以用才會將不常使用的變數放到記憶體中
    • Register optimization is important
      暫存器最佳化很重要

Immediate Operands 直接操作數

  • Constant data specified in an instruction
    指令中指定的常數數據
addi $s3,$s3,4 # $s3 = $s3 + 4
  • No subtract immediate instruction 沒有減去的直接指令
    • 直接加一個負數(negative constant)
    ​​​​addi $s2, $s1, -1 ​​​​# $s2 = $s1 + (-1)
  • 【Design Principle 3】Make the common case faster 讓常見情況處裡的更快
    • Small constants are common
      小的常數的運算很常見
    • Immediate operand avoids a load instruction
      直接操作可以免除常數加載到暫存器的指令

2.4 Signed and Unsigned Numbers

Constant Zero 常數零

  • MIPS暫存器 0 ($zero) 為常數 0
    • Cannot be overwritten
      不能被覆寫
  • Useful for common operations
    對常做的操作有用的
    • e.g. move between registers
      在暫存器之間搬移資料
    ​​​​add $t2,$s1,$zero #t2=s1+0

Unsigned Binary Integers 無符號二進位整數

  • Given an n-bit number
    給一個 n-bit 的數字
    \(x=x_{n-1}2^{n-1}+x_{n-2}2^{n-2}+\cdots+x_12^1+x_02^0\)
  • 範圍:\(0\) to \(+2^n-1\)
  • Example
    • \(\begin{aligned} &\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 0000\ 1011_2 \\&= 0 + \cdots + 1×2^3 + 0×2^2 +1×2^1 +1×2^0 \\&= 0 + \cdots + 8 + 0 + 2 + 1 = 11_{10} \end{aligned}\)
  • Using 32 bits
    \(0\) to \(+4,294,967,295(2^{32}-1)\)

2s-Complement Signed Integers 二補數有符號整數

  • 給一個 n-bit 的數字(最高位不一樣)
    \(x=-x_{n-1}2^{n-1}+x_{n-2}2^{n-2}+\cdots+x_12^1+x_02^0\)
  • 範圍:\(-2^{n-1}\)~\(+2^{n-1}-1\)
  • Example
    • \(\begin{aligned} &\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1111\ 1100_2 \\&= –1×2^{31} + 1×2^{30} + … + 1×2^2 +0×2^1 +0×2^0 \\&= –2,147,483,648 + 2,147,483,644 = –4_{10} \end{aligned}\)
  • 使用 32 bits
    \(–2,147,483,648\) ~ \(+2,147,483,647\)
  • Bit 31 為符號 bit
    • 1 代表負
    • 0 代表正
  • \(-(-2^{n-1})\) 不能表示
  • 非負數字有相同的無符號及二補數表示
  • 一些特殊的數字
    • \(0:0000\ 0000 \cdots 0000\)
    • \(–1:1111\ 1111 \cdots 1111\)
    • 最負:\(1000\ 0000\ \cdots\ 0000\)
    • 最正:\(0111\ 1111\ \cdots\ 1111\)

Signed Negation 整數正負號變換

  • Complement and add 1
    • complement: \(0\rightarrow 1,1\rightarrow 0\)
      \(\begin{aligned} x+\bar x&=1111\cdots 111_2=-1\\ \implies\bar x+1&=-x \end{aligned}\)
  • Example: negate+2
    • \(+2=0000\ 0000\cdots 0010_2\)
    • \(\begin{aligned}-2&=1111\ 1111\cdots 1101_2+1\\&=1111\ 1111\cdots 1110_2\end{aligned}\)

Sign Extension 有符號擴展

  • Representing a number using more bits
    用更多 bits 表示一個數字
    • Preserve the numeric value 保留數字
  • In MIPS instruction set
    在 MIPS 指令集中
    • addi: extend immediate value 擴展立即數值
    • lb,lh: extend loaded byte/halfword 擴展加載的
    • beq,bne: extend the displacement (之後才會介紹)
  • Replicate the sign bit to the left
    左邊的 bit 用 sign bit 填滿
    • unsigned values: extend with 0s
  • Examples: 8-bit to 16-bit
    • +2: 0000 0010 → 0000 0000 0000 0010
    • -2: 1111 1110 → 1111 1111 1111 1110

2.5 Representing Instructions in the Computer

Represent Instructions

  • Instructions are encoded in binary
    • Called machine code
  • MIPS instructions
    • Encoded as 32-bit instruction words
    • Small number of formats encoding operation code(opcode), register numbers,
  • Regularity
    • $t0 - $t7: registers 8 - 15
    • $s0 - $s7: registers 16 - 23
    • $t8 - $t9: registers 24 - 25

MIPS R-format Instructions

  • Instruction fields
    • op: operation code (opcode)
    • rs: first source register number
    • rt: second source register number
    • rd: destination register number
    • shamt: shift amount (000000 for now)
    • funct: function code (extends opcode)

Hexadecimal

  • Base 16
    • Compact representation of bit strings
    • 4 bits per hex digit

MIPS I-format Instructions

  • Immediate arithmetic and load/store instructions
    • rt: destination or source register number
    • Constant: \(-2^{15}\) to \(+2^{15}-1\)
    • Address: offset added to base address in rs
  • 【Design Principle 4】Good design demands good compromises
    好的設計有好的妥協
    • Different formats complicate decoding, but allow 32-bit instructions uniformly
      雖然解碼變複雜,但維持所有指令的長度都是32-bit
    • Keep formats as similar as possible
      保持部分欄位是相似的
      (oprsrt 的欄位在 R,I-format 一樣)

Store Program Computers

  • Instructions are represented in binary just like data
  • Instructions and data are stored in memory
  • Programs can operate on programs
    • e.g. compilers, linkers,
  • Binary compatibility allows compiled programs to work on different computers
    • standardized ISAs

2.6 Logical Operations

Logical Operations

  • Instructions for bitwise manipulation
  • Useful for extracting and inserting groups of bits in a word
Operation C Java MIPS
Shift left << << sll
Shift right >> >>> srl
Bitwise AND & & and, andi
Bitwise OR | | or, ori
Bitwise NOT ~ ~ nor

Shift Operations

  • shamt: how many positions to ahift
  • Shift left logical: sll
    • Shift left and fill with 0 bits
    • sll by i bits multiplies by \(2^i\)
  • ~Shift right logical: srl
    • Shift right and fill with 0 bits
    • srl by i bits divides by \(2^i\) (unsigned only)

AND Operations

  • Useful to mask(遮罩) bits in a word
    • Select some bits, clear others to 0
    • Example:
      and $t0, $t1, $t2

OR Operations

  • Useful to include bits in a word
    • Set some bits to 1, leave others unchanged
      設定特定 bit 為 1
    • Example:
      or $t0, $t1, $t2

NOT Operations

  • Useful to invert bits in a word
    • Change 0 to 1, and 1 to 0
  • MIPS has NOR 3-operand instruction
    • Example: a NOR b == NOT (a or b)
      nor $t0, $t1. $zero

Conditional Operation

  • Branch to a labeled instruction if a condition is true
    • Otherwise, continue sequentially
  • beq rs, rt, L1
    • if (rs == rt) branch to instruction labeled L1
  • bne rs,rt, L1
    • if (rs != rt) branch to instruvciton labeled L1
  • j L1
    • unconditional kump to instruction labeled L1

2.7 Instructions for Making Decisions

Compiling If Statements

  • C code
    ​​​​if(i==j) f = g + h; ​​​​else f = g - h; ​​​​// f, g, ... in $s0, $s1, ...
  • Compiled MIPS code
    ​​​​ bne $s3, $s4, Else ​​​​ add $s0, $s1, $s2 ​​​​ j Exit ​​​​Else: `sub $s0, $s1, $s2` ​​​​Exit: ...

Compiling Loop Statements

  • C Code
    ​​​​while( save[i] == k ) ​​​​ i += 1; ​​​​// i in $s3, k in $s5, address of save in $s6
  • Compiled MIPS code
    ​​​​​Loop: sll $t1, $s3, 2 ​​​​​ #save $s1*4 in $s3 ​​​​​ # ​​​​​ add $t1, $t1, $s6 ​​​​​ #(add $s3 at save[0] to get save[i]) ​​​​​ lw $t0, 0($t1) ​​​​​ bne $t0, $s5, Exit ​​​​​ addi $s3, $s3, 1 ​​​​​ j Loop ​​​​​Exit: ...

Basic Blocks

  • A basic block is a sequence of instructions with
    • No embedded branches( except at end)
    • No branch targets ( expect at beginning)
  • A compiler identifies basic blocks for optimization.
  • An advanced processor can accelerate execution of basic blocks.

More Conditional Operations

  • Set result to 1 if a condition is true
    • Otherwise, set to 0
  • slt rd, rs, rt
    • if( rs < rt ) rd = 1; else rd = 0;
  • slti rt, rs, constant
    • if( rs < constant ) rt = 1; else rt = 0
  • Use in combination with beq bne
    ​​​​slt $t0, $s1, $s2 # if ($s1 < $s2) ​​​​bne $t0, $zero, L # branch to L

Branch Instruction Design

  • Why not blt, bqe?
    為何在 MIPS 中沒有設計這兩個指令
    • Common case
    • 效率更高
  • Hardware for <, > , is slower than =, \(\neq\)
    • Combine with branch involves more work per instruction, requiring a slower clock.
    • All instructions are penalized
      因為指令處理的方式,如果有某些指令特別長,那其他沒那麼久的也會跟著一起變慢

Signed v.s. Unsigned

  • Signed comparison: slt, slti
  • Unsigned comparison: sltu, sltui
  • Example
# $s0 = 1111 1111 1111 1111 1111 1111 1111 1111 # $s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt $t0, $s0, $s1 #signed #-1 < +1, so $t0 = 1 sltu $t0, $s0, $s1 #unsigned +4294967296 > +1, so $t0 = 0

2.8 Supporting Procedures in Computer Hardware

Procedure Calling

  • caller 跟 callee 執行權的轉移
    • caller 就是原本在的地方
    • callee 就是呼叫 function 的東西
    • 因為他們共用 register,所以需要做一些處理避免資料遺失
    • 透過 register 溝通
  • Steps required
    1. Place parameters in registers
      • caller 設定要呼叫的prcedure的參數,執行權從 caller 到 callee
    2. Transfer control to procedure
    3. Acquire storage for procedure
      • callee 開始執行程式
      • 原本 caller 用的 register 的資料存起來
      • 因為 callee 才知道需要多少空間
    4. Perform procedure’s operations
    5. Place result in register for caller
    6. Return to place of call
      • 交執行權還給caller
      • 在 Step 3 的時候如果有保留的數值,需要在這步之前將數值還原回原本的樣子

Register Usage

type register
$v0, $v1 result values 2, 3 procedure calling 中 output
$a0 - $a3 arguments 4 - 7 procedure calling 中input
$t0 - $t7 temporary values 8 - 15 can be overwritten by callee
$s0 - $s7 saved variables 16 - 23 must be saved/ restored by callee
$t8 - $t9 temporary values 24, 25 can be overwritten by callee
$gp global pointer for ststic data 28
$sp stack pointer 29 procedure calling 中可以隨意使用的
$fp frame pointer 30
$ra return address 31 procedure calling 中儲存 caller 的位置

Procedure Call Instructions

  • Procedure call: jump and link
    jal ProcedureLabel
    • Address of the following instruction put in $ra
    • Jump to target address
  • Procedure return: jump register
    jr $ra
    • Copy $ra to program counter
    • Can also be used for computed jumps
      • E.g., for case/switch statements

Local Data on the Stack

  • Local data is allocated by callee.
    • E.g., C automatic variables
  • Procedure frame (activation record)
    • Used by some compilers to manage stack storage
  • stack 從最高開始往下放
  • $sp 指向 stack 最底

Leaf Procedure Example

  • C code
// g, h, i, j in $a0, $a1, $a2, $a3 int leaf_example (int g, h, i, j) { int f; // f in $s0 // hence, need to save $s0 on stack f = (g + h) - (i + j); // Result in $v0 return f; }
  • MIPS code
### 我也不知道為什麼這邊打一個$ 底下就都可以註解了 ### leaf_example: # 將 $sp 移出 4 個空間 addi $sp, $sp, -4 # save $s0 on stack sw $s0, 0($sp) # Procedure body # $a0(g) + $a1(h) 暫存在 $t0 add $t0, $a0, $a1 # $a2(i) + $a3(i) 暫存在 $t1 add $t1, $a2, $a3 # $s0(f) = $t0(g+h) - $t1(i+j) sub $s0, $t0, $t1 # Result add $v0, $s0, $zero # Restore $s0 lw $s0, 0($sp) addi $sp, $sp, 4 # Return jr $ra

Non-Leaf Procedures

  • Procedures that call other procedures
  • For nested(嵌套) call, caller needs to save on the stack …
    • Its return address(它的 caller)
    • Any arguments and temporaries needed after the call
  • Restore from the stack after the call
Example
  • C Code
// n in $a0 int fact( int n ) { if( n < 1 ) return 1; else return n * fact( n - 1 ); // result in $v0 }
  • MIPS
### 我也不知道為什麼這邊打一個$ 底下就都可以註解了 ### fact: #把 caller 的東西都保存下來 # adjust stack for 2 items(8 bits) addi $sp, $sp, -8 # save return address, $ra sw $ra, 4($sp) # save 1 argument, $a0 sw $a0, 0($sp) # test for n < 1 slti $t0, $a0, 1 beq $t0, $zero, L1 # if so, result is 1 addi $v0, $zero, 1 # pop 2 items from stack addi $sp, $sp, 8 # and return jr $ra L1: # else decrement n addi $a0, $a0, -1 # recursive call jal fact # 把 caller 的東西都放回去 # restore original n lw $a0, 0($sp) # restore return address lw $ra, 4($sp) # pop 2 items from stack addi $sp, $sp, 8 # multiply to get result mul $v0, $a0, $v0 # and return jr $ra

Memory Layout

  • Text: program code
  • Static data: global variables
    • E.g., static variables in C, constant arrays and strings
    • $gp is initialized to address allowing ±offsets into this segment.
  • Dynamic data: heap
    • E.g., malloc in C, new in Java
  • Stack: automatic storage

Character Data

  • Byte-encoded character set
    • ASCII: 128 characters = 95 graphic + 33 control
    • Latin-1: 256 characters = ASCII + 96 more graphic characters
  • Unicode: 32-bit character set
    • Used in Java, C++ wide characters, …
    • Most of world’s alphabets, plus symbols
    • UTF-8, UTF-16: variable-length encodings

Byte/Halfword Operations

  • Could use bitwise operations
  • MIPS byte/halfword load/store
    • String processing is a common case.
    • Sign extend to 32 bits in rt
      ​​​​​​​​# sign extend a byte ​​​​​​​​lb rt, offset(rs) ​​​​​​​​# sign extend a halfword ​​​​​​​​lh rt, offset(rs)
    • Zero extend to 32 bits in rt
      ​​​​​​​​# unsign(zero) extend a byte ​​​​​​​​lbu rt, offset(rs) ​​​​​​​​# unsign(zero) extend a halfword ​​​​​​​​lhu rt, offset(rs)
    • Store just rightmost byte/halfword
      ​​​​​​​​sb rt, offset(rs) ​​​​​​​​sh rt, offset(rs)

String Copy Example 字串拷貝

  • C Code
    • Null-terminated string
      最後一個index為NULL的字串
    ​​​​// x in $a0, y in $a1 ​​​​void strcpy(char x[], char y[]) ​​​​{ ​​​​ // i in $s0 ​​​​ int i; ​​​​ i = 0; ​​​​ while( (x[i] = y[i]) != '\0') ​​​​ { ​​​​ i += 1; ​​​​ } ​​​​}
  • MIPS
    ​​​​# non-leaf 只需要保留 s 系列 ​​​​strcpy: ​​​​ addi $sp, $sp, -4 # adjust stack for 1 item ​​​​ sw $S0, 0($sp) # save $s0 ​​​​ add $s0, $zero, $zero # i - 0 ​​​​L1: ​​​​ add $t1, $s0, $a1 # add of y[i] in $t1 ​​​​ lbu $t2, 0($t1) # $t2 = y[i] ​​​​ add $t3, $s0, $a0 #addr of x[i] in $t3 ​​​​ sb $t2, 0($t3) # x[i] = y[i] ​​​​ beq $t2, $zero, L2 # exit loop if y[i] == 0 ​​​​ addi $s0, $s0, 1 # i = i +1 ​​​​ j L1 # next iteration of loop ​​​​L2: ​​​​ lw $s0, 0($sp) # restore saved $s0 ​​​​ addi $sp, $sp, 4 # pop 1 item from stack ​​​​ jr $ra # and return

Communicating with People

2.9 MIPS Addressing for 32-Bit Immediates and Addresses

32bit Constant

  • Most constant are small 多數的常數都很小
  • 16-bit immediate us sufficient 通常都只有16個bits
  • For an occasional 32-bit constant
    lui rt, constant
    • Copy 16-bit constant to left 16 bits or rt
    • Clear right 16 bits of rt to 0

Branch Addressing

  • Branch instructions specify
    • Opcode, two registers, target address
  • Most branch targets are near branch.
    • Forward or backward
  • PC-relative addressing (PC (program counter)相對定址)
    • Target address = PC + offset * 4 (MIPS一個指令4個 byte)
    • PC is already incremented by 4 by this time

Jump Addressing

  • Jump (j and jal) targets could be anywere in text segment
    • Encode full address in an instruction
      • 在 MIPS 中 op 都是 6 bits
  • (Pseudo) Direct jump addressing
    • Target address = PC3128:(address*4)
      4個跟PC借 接下來26個bits 剩下直接x4
Target Addressing Example
  • Loop code from the earlier example
    • Assume Loop at location 80000

Branch Far Away

  • If a branch target is too far to encode with 16-bit offset assembler rewrites the code
  • 當要跳過去的目標位置太遠無法用 16 bit 表達,可以改寫指令使存放 address 的 bit 更大
    • e.g. j and jal 有 26 bits
  • Example
    ​​​​beq $s0, $s1, L1 # 16 bits ​​​​bne $s0, $s1, L2 # 16 bits ​​​​j L1 # 26 bits ​​​​L2: ...po

Addressing Mode Summary

2.10 Parallelism and Instructions: Synchronization

Synchronization 同步

  • Two processors share an area of memory
    • P1 writes, then P2 reads
    • Data race if P1 and P2 don't synchronize(race 資料不同)
      • Result depends of order of accesses
  • Hardware support is required.
    • Atomic read/write memory operation
    • No other access to a location is allowed between read and write.
      • 一段 atomic operation 間不能被其他指令中斷
  • Could be a single instruction
    • E.g., atomic swap of register ↔ memory
      提供資料對換
  • Or an atomic pair of instructions

Synchronization in MIPS

  • Load linked: ll rt, offset(rs)
  • Store conditional: sc rt, offset(rs)
    • Succeed if the location is not changed since the ll
      • Returns 1 in rt
    • Fail if the location is changed
      • Returns 0 in rt
    • 檢查 rt 的值在 llsc 間是否被更改過
  • Example: atomic swap $s4 with M[$s1] (to test/set lock variable)
    ​​​​try:add $t0,$zero,$s4 #copy exchange value ​​​​ ll $t1,0($s1) #load linked ​​​​ sc $t0,0($s1) #store conditional ​​​​ beq $t0,$zero,try #branch store fails ​​​​ add $s4,$zero,$t1 #put load value in $s4

2.11 Translation and Starting a Program

Translation and Startup

Assembler Pseudoinstructions

  • Most assembler instructions represent machine instructions one-to-one.
  • Pseudoinstructions: figments of the assembler’s imagination
    ​​​​move $t0, $t1 ​​​​# add $t0, $zero, $t1
    ​​​​blt $t0, $t1, L ​​​​# slt $at, $t0, $t1 ​​​​# if( $t0 < $t1 ) $at = 1 ​​​​# bne $at, $zero, L
    • $at (register 1): assembler temporary

Produce Object Module

  • An assembler (or compiler) translates a program into machine instructions.
  • Provide information for building a complete program
    • Header: describe contents of object module
      描述檔案中有哪些內容
    • Text segment: translated instructions
      機器語言碼
    • Static data segment: data allocated for the life of the program
      整個程式都需要的資料(全域變數
    • Relocation information: for contents that depend on absolute location of loaded program
      要根據絕對記憶體位置運行的數據的位置
    • Symbol table: undefined labels, e.g., global definitions and external references
      未定義的標籤(不包含在該檔案中的東西)
    • Debug information: for associating with source code
      紀錄給方便除錯的東西
  • Produce an executable image
    產生一個可執行的映象檔
    • Merge segments
      把不同的區塊結合在一起
    • Resolve labels (determine their address)
      把在 library 中的 label 找出來
    • Patch location-dependent and external references
  • Could leave location dependencies for fixing by a relocating loader
    • But with virtual memory, no need to do this
    • Program can be loaded into absolute location in virtual memory space.

Load a Program

  • Load from an image file on a disk into memory
    • Read the header to determine segment sizes
    • Create a virtual address space
    • Copy the text and initialized data into memory
      • Or set page table entries so they can be faulted in
    • Set up arguments on stack
    • Initialize registers (including $sp, $fp, $gp)
    • Jump to startup routine
    • Copy arguments to $a0, … and call the main routine
    • When the main routine returns, do exit() system call

Dynamic Linking

  • Only link/load a library procedure when it is called
    • Require the procedure code to be relocatable
    • Avoid image bloat caused by static linking of all (transitively) referenced libraries
    • Automatically pick up new library versions

Lazy Linkage

Start Java Applications

  • 跨平台
    • 編成 JVM

h3標題

2.12 A C sort Example to Put It All Together

C Swap Example

  • Illustrate use of assembly instructions for a C bubble sort function
  • Swap procedure
void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1] v[k+1] = temp; }
  • v in $a0, k in $a1, temp in $t0

Procedure Swap

swap: sll $t1,$a1,2 #$t1=k*4 add $t1,$a0,$t1 #$t1=v+(k*4) lw $t0, 0($t1) # (address of v[k]) lw $t2, 4($t1) #$t0(temp)=v[k] sw $t2, 0($t1) #v[k] = $t2(v[k+1]) sw $t0, 4($t1) #v[k+1 = $t0] (temp) jr $ra #return to calling routine

C Procedure Sort

  • Non-leaf(calls swap)
// v in $a0, n in $a1 void sort( int v[], int n) { // i in $s0, j in $s1 int i, j; for(i = 0; i < n; i += 1) { for( j = i - 1; j <= 0 && v[j] > v[j+1]; j -= 1) { // swap 為自定義函式 // 將 v[j] 跟 v[j+1] 交換 swap(v,j); } } }

Procedure Body

move $s2, $a0 # save $a0 into $s2 move $s3, $a1 # save $a1 into $s3 move $s0, $zero # i = 0 for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 $s3 (i n) beq $t0, $zero, exit1 # go to exit1 if $s0 $s3 (i n) addi $s1, $s0, –1 # j = i – 1 for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0) sll $t1, $s1, 2 # $t1 = j * 4 add $t2, $s2, $t1 # $t2 = v + (j * 4) lw $t3, 0($t2) # $t3 = v[j] lw $t4, 4($t2) # $t4 = v[j + 1] slt $t0, $t4, $t3 # $t0 = 0 if $t4 $t3 beq $t0, $zero, exit2 # go to exit2 if $t4 $t3 move $a0, $s2 # 1st param of swap is v (old $a0) move $a1, $s1 # 2nd param of swap is j jal swap # call swap procedure addi $s1, $s1, –1 # j –= 1 j for2tst # jump to test of inner loop exit2: addi $s0, $s0, 1 # i += 1 j for1tst # jump to test of outer loop exit1:

Full Procedure

sort: addi $sp, $sp, -20 #make room on stack for 5 registers sw $ra, 16($sp) #save $ra on stack sw $s3, 12($sp) #save $s3 on stack sw $s2, 8($sp) #save $s2 on stack sw $s1, 4($sp) #save $s1 on stack sw $s0, 0($sp) #save $s3 on stack ... #procedure body in the previous exit1: lw $s0, 0($sp) #restore $s0 from stack lw $s1, 4($sp) #restore $s1 from stack lw $s2, 8($sp) #restore $s2 from stack lw $s3, 12($sp) #restore $s3 from stack lw $s0, 0($sp) #restore $s0 from stack addi $sp, $sp, 20 #restore stack pointer jr $ra #return to calling routine

Lesson Learnt

  • Instruction count and CPI are not good performance indicators in isolation.
    • 單看指令數跟 CPI 無法評斷效能好壞
  • Compiler optimizations are sensitive to algorithms.
  • Java/JIT compiled code is significantly faster than JVM interpreted.
    • Comparable to optimized C in some cases
  • Nothing can fix a dumb algorithm!
    • 沒有東西可以拯救智障的演算法

Arrays versus Pointers

  • Array indexing involves …
    • Multiply index by element size
    • Add to array base address
  • Pointers correspond directly to memory addresses
    • Can avoid indexing complexity

Real Stuff: ARM Instructions, x86 Instructions, ARM v8(64-bit) Instructions

Fallacies and Pitfalls