Try   HackMD

Chapter 3 Machine-Level Respresentation of Programs

3.1 History

3.2 Program Encoding

unix> gcc -O1 -o p p1.c p2.c

-O1 最佳化1
流程

  1. C preprocessor expands the sourse code(#include) and MARCOs(#define).
  2. Complier -> generates the assembly code p1.s p2.s
  3. Assembler -> convert assembly code to machine code p1.o p2.o

這樣可以執行了嗎?
到這一步時候所有指令都已經被轉換成為機械碼,可是 global value 的位址還未被決定

  1. linker 連結所有 object code 並且並生成執行檔 p
    嚴格來說這個才是被系統執行的程式碼 linking -> Chapter 7

3.2.1 Machine-Level Code

ISA(Instructure set architecture) defines the format and behavior of processor.

Memory addresses used by machine-level program is "virtual address" -> Chapter 9

在 Mechine-Level 程式裡有許多需要被額外注意的暫存器

  1. PC ( Program Counter ) ""%eip" in IA32
    the addressin the memory of the next instruction to be executed.
  2. integer register file
    此處會有兩種用途 : 一種為做 keep trace of critical parts of program, 一種為做 tmp 像是 local value
  3. condition code register
    if, while 等判斷會透過 flag 進行 jmp
  4. floating-point register

Program Memory includes

  1. executable mechine code
  2. some information for OS
  3. run-time stack
  4. heap

3.2.3 Code Example

gcc -O1 -S code.c
cat sum.s             
	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15	sdk_version 10, 15, 6
	.globl	_sum                    ## -- Begin function sum
	.p2align	4, 0x90
_sum:                                   ## @sum
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	movl	%edi, %eax
	addl	%esi, %eax
	addl	%eax, _accum(%rip)
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.globl	_accum                  ## @accum
.zerofill __DATA,__common,_accum,4,2
.subsections_via_symbols

3.2.3 Notes on Formating

.file  "simple.c"

開頭為 . 的均為予 linker 用途

ATT( AT&T) vs Intel format

3.3 Data Formats

-看架構

3.4 Accessing information

  • 前 6 個視作 general-purpose register 但有些指令有特定的用途

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    3.7 章節會討論到 (%eax %ecx %edx) 以及 (%ebx %edi %esi) 兩者 save/ restore 的不同

  • %esp %ebp -> program stack

3.5 Arithmetic and Logical Operations

3.5.1 lea

3.5.2 unary amd binary operations

3.5.3 shift

logical vs arithmetic

3.5.1 special Arithmetic operations

3.6 Control

condition behavior : loop switch and conditional execution -> jump instruction

3.6.1 Condition Codes

CPU maintain a set of single-bit condition code

CF: Carry Flag - 最近的指令所產生的值得最大位 bit
ZF: Zero Flag - 最近的指令所產生的值 = 0
SF: Sign Flag - 最近的指令所產生的值 = 負數
OF: Overflow Flag - 最近的指令所產生的值造成2補數的 Overflow

  • lea 為純計算 address 所以不會更改 condition code
  • xor CF OF = 0
  • shift CF -> 最後被推出去的 bit, OF = 0
  • INC DEC set OF and ZF but leave CF unchange

3.6.2 Accessing the Condition Codes

CMP

  • 類似 SUB 但是不會更改到 Destination value 只更改 CF
  • ATT format 中是倒述的

TEST

  • 同上 類似 AND

SET

  • 其根據所其分支 instruction 進行 SET

3.6.3 Jump Instructions and Their Encodings

indirect jmp :

jmp *(%eax)

Chapter 7 會講述linking 時候 jmp 的目標細節

  1. most common: PC relative -> offset (1 2 4 bytes)

  2. absolute address (4 bytes)

  • jmp XXXX ->下個PC 開始計算
  • PC-relative 可以計算相對位置
  • 在程式 linker 階段可計算絕對位置

3.6.4 Translating Conditional Branches

  • goto version

3.6.5 Loops

while vs do-while

  • gcc 會透過先將 while 以及 for 轉換成為 do-while 形式
    why? 方便做 initial-test 最佳化

for(init;test-expr;update-expr)

  • continue

3.6.6 Conditional Move Instruction

綜觀上述,傳統上進行分支操作時都會以 control 作為分支,也就是說跳遇到指定的地點進行指令。

有個比較高效率的方式為透過 data ,也就是說兩者均執行但只取用整確的分支。其實作在新進的IA32架構上。

曾經的 conditional move instruction 因為 gcc 不轉譯 (過往相容問題) 其程式碼故很少用到,其用途為 nop 或者 將 controll flag mov 至 目標 register 上。

可以手動讓 gcc 使用 conditional move instruction
gcc -march=i686

IA232 不支援 單byte的condition mov instruction

3.6.7 Switch Statments

  • jump array
i address
0x20 0xff858545

gcc 的一個op -> && ,用以放置code location pointer。

jmp    *.L7(,%eax,4)  goto *jmptable[i]

3.7 Procedures

大部分硬體只做到 control from one part of a program to another,參數傳遞以及local value 透過 program stack 實現。

3.7.1 Stack Frame Structure

%ebp -> 主要用這個
%esp -> 會隨著程式進行動態調整

3.7.2 Transsferring Control

詳讀參考-> https://xz.aliyun.com/t/2554

3.7.3 Register Usage Convention

caller / callee save register
callee save: %ebx %esi %edi
這代表當 P -> Q 時候, Q必須先把這些存起來放置,並且在 ret 回去前要將這些放回去

放置會透過先 %esp 的方式進行

3.7.4 Procedure Example

3.7.5 Recursive Procedure

遞迴 > push callee save register

3.8 Array Allocation and Access

3.8.1 Basic Principles

3.8.2 Pointer Arithmetic

  • pointer arthmetic

3.8.3 Nested Arrays

  • define N XXX

3.8.4

3.8.5 Variable-Size Array

-> malloc/ calloc
int n;
array[n][n]

3.9 Heterogeneous(異構) Data Structures

3.9.1 strtucture

3.9.2 union

endian 議題

3.9.3 alignment

3.10 Putting it Together : Understanding Pointers

  1. every pointer has an associated type
  2. created with '&'
  3. deference with '*'
  4. array -> pointer
  5. casting from one type of pointer to another changes its type but not itts value
  6. pointer to function

3.11 Life in the Real World : Using GDB Debugger

-O1 vs -O2

    1. the control structures become more entangled
    1. Procedure calls are often inline
    1. recursion is often replaced by iteration

3.12 Out-of-Bounds Memory References and Buffer Overflow

security monoculture -> address stack layout randomization(ASLR) ->nop sled buffer overflow

Stack Corruption Dection ->canary
gcc -fno-stack-protector
gcc-stack protector -> every extra space will be referred as guard value
limiting which portions of memory can hold executable code
C中插入組語:

  1. 寫好透過linker
  2. asm()

3.13 x86-64 : Extending IA32 to 64 bits

IA32 -> IA64
VLIW instruction packs into bundles -> higher degree of parallel execution

3.13.1 History

3.13.2 An Overview of x86-64

  1. Pointer and long int -> 64 bits long, Interfer arithmetic operator 支援 8,16,32,64 bits
  2. 8 -> 16 個 general purpose register set
  3. 可以傳遞的 argument 可以最高放置 6 個在 register中 ->可不用透過 stack 讀取
  4. Conditional operation 會傾向使用 Conditional Move Instruction -> 效能提升
  5. Floating-point operations are implemented using the register-oriented instruction set introduced with SSEv2 rather than the stack-based approach supported by IA32.
  • long double

3.13.3 Control

  • rep -> 防止jmp 預測 > AMD Guideline
    gcc -m64 ,其編譯的結果與之前所編譯在IA32的不同
  1. movl, addl -> movq addq
  2. %esi -> %rsi
  3. No stack frame get generated in x86-64 version
  4. Argument passed in registers

3.13.3 Accessing Information

簡述:

  1. rsp : hold the pointer to the top stack element
  2. no frame pointer ->rbp used as a general-purpose register
  3. different procedure call -> 3.13.4
  4. some arithmetic instruction -> %rax %rdx

3.13.4 Control

與IA32相似

Procedure:
由於擴充許多暫存器,故不需要遵守先前IA32 caller callee 所制定 procedure 流程

x68 Procedure:

  1. 6 個 arguments 可以透過暫存器傳遞
  2. callq instruction stores a 64 bits return address on the stack
  3. 許多 function 不會使用到 stack frame ,並且由於有多餘的register,所以只有 local value 過多無法擺進 register 中才會在 stack 中 allocate。
  4. Function can access storage on the stack up to 128 bytes beyond the current value of stack pointer. This allows some function to store information on the stack without altering the stack pointer. (?)
  5. No frame pointer. references to stack locations are made relative to the stack 。pointer. Most functions allocate their total stack storage needs at the beginning of the call and keep the stack pointer at a fixed position.
  6. as IA32, caller , callee save register in needed.

Argument passing

Stack frames
對於 leaf procedure 而言其實不太需要 push 東西進 stack 中,大部分僅僅需要存放進 stack 的東西為 return address

換句話說,其實需要放進 stack 中的時機只有下述情況

  1. 太多的local value 以至於 register 放不進
  2. local value 是以 array 以及 structure
  3. 使用到 & 去讀取 local value 的位址
  4. pass argument to other function (上述所說)
  5. save the callee save register

對於 x86 系統中,Procedure 對於 stack 操作不像是 IA32 那麼頻繁。 x86 的 stack frame 經常為固定大小,此會再 procedure 發生時決定(也就是 %rbp),也就是說其實用不到指向stack頂端的 pointer。

當caller 呼叫 callee 時候,會將 return address 先 push 進去 stack 當中。當由 callee 回到 caller 時候,會將 stack 的值 pop 出來。所以對於 caller 而言,對於 offset of stack pointer 並不會影響。

Register Saving Convention
x86 callee save register
%rbx %rbp %r12 - %r15

x86-64 ABI specties that programs can use the 128 bytes beyond the current stack pointer. The ABI refers to this area as the red zone.

3.13.5 Data Structure

x86 follows a more stringent set of alignment requirements.

3.13.6 Concluding Observations about x86

3.14 Machine-Level Representations of Floating-Point Programs

3.15 Summary

just-in-time compilation -> byte code -> 可以適應在多種平台上


後記

第3章如果老師所說確實是瓶頸,花了好多時間研讀以及做題目,對於之前不了解於 x86-64 以及 i386 架構上的 coroutine 差別也能有清楚認知,非常值得熟讀此章。不過這處講述的還挺偏表層的,細部還有許多尚須探討。