Assembly Language in UNIX
Complier GCC
-
-m32
: x86 obj. code
-
-masm=intel
: Intel/AT&T syntax
-
-fno-stack-protecter
: disable stack protector, easier 4 understanding stack frame
Assembler YASM
A computer program which translates assembly language to machine language
yasm -f elf32
: Output x86 object codes
yasm -f elf64
: Output x86_64 object codes
Linker ld
A linker takes one or more object files generated by a compiler or an assembler and combines them into a single executable file, library file, or another 'object' file.
ld -m elf_i386
: Link with x86 object codes
ld -m elf_x86_64
: Link with x86_64 object codes
Debugger gdb
/w plugin gdb_peda
-
Installion:
ASM Charcteristics
- Low level language
- One-to-one mapping from mnemonics to machine codes Assembler: Turn assembly codes to machine codes
- Machine and platform-dependent
⋅⋅* Different machines/platforms have different assembler
⋅⋅* Even assemblers on the same machine/platform could be different
Instruction Execution Cycle
- Fetch: Read instruction code from address in PC and place in IR. ( IR ← Memory[PC] )
- Decode: Hardware determines what the opcode/function is.
- Fetch operands (from memory if necessary): If any operands are memory addresses, known as the effective address, or EA for short, initiate memory read cycles to read them into CPU registers.
- Execute: Perform the function of the instruction. If arithmetic or logic instruction, utilize the ALU circuits to carry out the operation on data in registers.
- Store output (in memory if necessary): If destination is a memory address, initiate a memory write cycle to transfer the result from the CPU to memory.
Basic Execution Environment
Addressable Memory
-
Long mode (x86_64)
- Long mode allows the microprocessor to access 64-bit memory space, and access 64-bit long registers.
- When a computer is powered on, the CPU starts in real mode and begins booting. The 64-bit operating system then checks and switches the CPU into Long mode and then starts new kernel-mode threads running 64-bit code.
- 256 TB
- 48-bit virtual address
-
Protected mode (x86)
- Protected mode may only be entered after the system software sets up one descriptor table and enables the Protection Enable (PE) bit in the control register 0 (CR0)
- 4GB
- 32-bit virtual address
-
Real-address and Virtual-8086 modes
- Real mode is characterized by a 20-bit segmented memory address space (giving exactly 1 MiB of addressable memory) and unlimited direct software access to all addressable memory, I/O addresses and peripheral hardware.
- 1MB space
- 20-bit address
- 靠offset/segment定址
General Purpose Registers/Access Parts of Registers

Index and Base Registers
- An index register is used for modifying operand addresses during the run of a program, typically for doing vector/array operations.
- 同一暫存器,用到區塊不同,名稱即不同。
- AH, AL, AX不會影響到整個RAX,若是EAX會,則會改變其他部分補0。
64-bit name |
32-bit name |
16-bit name |
RDI |
EDI |
DI |
RSI |
ESI |
SI |
RBP |
EBP |
BP |
RSP |
ESP |
SP |
Basic Element
Integer Constants
- Opt. leading +/-
- Radix Characters:
- h - hexadecimal
- d - decimal
- b - binary
- r - encoded real (?)
Interger Expressions
Character and String Constants
- Enclose character in single or double quotes
- 'A',"x"
- ASCII character = 1byte
- Enclose strings in single or double quotes
- "ABC",'xyx'
- Each character occupies a single byte
- Embedded quotes
Reserved Words & Identifiers
- Reserved words cannot be used as identifiers
- Identifiers
- not case sensitive
- first character must be a letter,_,@,?,or$
Directives(組合程式指引)
- Commands that are recognized and acted upon by the assembler
- Not part of the Intel instruction set
- Used to declare code
- Different assemblers have different directives
Instructions
- Assembled into machine code by assembler
- Executed at runtime by the CPU
- An Instruction contains:
- Label (opt.)
- Mnemonic (man.)
- Operands (dep. on the instruction)
- Commands (opt.)
Labels
- Act as place markers
- Data label
- unique
- myArray (not folloewd by colon)
- Code label
- target of jump and loop instructions
- L1: (folloewd by colon))
Mnemonics and Operands
- Instruction Mnemonics
- Operands
- constant
- constant expression
- register
- memory(datalabel)
The Assemble-Link-Execute Cycle

- Create an ASCII text file (source file).
- The assembler reads the source file and produces an object file.
- The linker reads the object file and checks to see if the program contains any calls to procedures in a link library.
- The operating system loader utility reads the executable file into memory and branches the CPU to the program’s starting address, and the program begins to execute.
Defining Data
Intrinsic Data Types(內建資料型別)
Type |
Usage |
BYTE |
8-bit unsigned integer. B stands for byte |
SBYTE |
8-bit signed integer. S stands for signed |
WORD |
16-bit unsigned integer |
SWORD |
16-bit signed integer |
DWORD |
32-bit unsigned integer. D stands for double |
SDWORD |
32-bit signed integer. SD stands for signed double |
FWORD |
48-bit integer (Far pointer in protected mode) |
QWORD |
64-bit integer. Q stands for quad |
TBYTE |
80-bit (10-byte) integer. T stands for Ten-byte |
REAL4 |
32-bit (4-byte) IEEE short real |
REAL8 |
64-bit (8-byte) IEEE long real |
REAL10 |
80-bit (10-byte) IEEE extended real |
Data definition statement
-
A data definition statement sets aside storage in memory for a variable.
-
All initializers become binary data in memory
-
A data definition has the following syntax:
-
Legacy Data Directives:
Directive |
Usage |
DB |
8-bit integer |
DW |
16-bit integer |
DD |
32-bit integer or real |
DQ |
64-bit integer or real |
DT |
define 80-bit (10-byte) integer |
Defining BYTE and SBYTE Data
-
Each of the following defines a single byte of storage
-
A question mark (?) initializer leaves the variable uninitialized.
-
Multiple Initializers
- Memory layout of a byte sequence.

-
Defining Strings
-
DUP Operator
- Use DUP to allocate (create space for) an array or string. Syntax: counter DUP ( argument )
Defining WORD and SWORD Data
-
Define storage for 16-bit integers
-
or double characters
-
single value or multiple values
-
Memory layout

Defining DWORD and SDWORD Data
- Storage definitions for signed and unsigned 32-bit integers
Defining QWORD, TBYTE, Real Data
- Storage definitions for quadwords, tenbyte values, and real numbers
Uninitialized Data (BSS Section)
- resb – 1-byte
- resw – 2-byte
- resd – 4-byte
- resq – 8-byte
- rest – 10-byte
- resdq – 16-byte
- reso – the same as resdq
Symbolic Constants
A symbolic constant (or symbol definition) is created by associating an identifier (a symbol) with an integer expression or some text. Symbols do not reserve storage. They are used only by the assembler when scanning a program, and they cannot change at runtime. The following table summarizes their differences.

Equal-Sign Directive
- name = expression
- 32-bit integer value.
- redefined
- name is called a symbolic constant
Calculating the Sizes of Arrays and Strings
- current location counter: $
EQU Directive
- Define a symbol as either an integer or text expression.
- Cannot be redefined
TEXTEQU Directive
- Define a symbol as either an integer or text expression
- Called a text macro
- Can be redefined
Data Transfers Instructions
Operand Types
- Immediate—uses a numeric literal expression
- Register—uses a named register in the CPU
- Memory—references a memory location
- Instruction Operand Notation, 32-Bit Mode.
Operand |
Description |
reg8 |
8-bit general-purpose register: AH, AL, BH, BL, CH, CL, DH, DL |
reg16 |
16-bit general-purpose register: AX, BX, CX, DX, SI, DI, SP, BP |
reg32 |
32-bit general-purpose register: EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP |
reg Any |
general-purpose register |
sreg |
16-bit segment register: CS, DS, SS, ES, FS, GS |
imm |
8-, 16-, or 32-bit immediate value |
imm8 |
8-bit immediate byte value |
imm16 |
16-bit immediate word value |
imm32 |
32-bit immediate doubleword value |
reg/mem8 |
8-bit operand, which can be an 8-bit general register or memory byte |
reg/mem16 |
16-bit operand, which can be a 16-bit general register or memory word |
reg/mem32 |
32-bit operand, which can be a 32-bit general register or memory doubleword |
mem |
An 8-, 16-, or 32-bit memory operand |
Direct Memory Operands
- A direct memory operand is a named reference to storage in memory.
- The named reference (label) is automatically dereferenced by the assembler
MOV Instruction
-
Syntax:
-
Both operands must be the same size.
-
Both operands cannot be memory operands.
-
The instruction pointer register (IP, EIP, or RIP) cannot be a destination operand.
XCHG Instruction
- XCHG exchanges the values of two operands. At least one operand must be a register. No immediate operands are permitted.
Direct-Offset Operands
- A constant offset is added to a data label to produce an effective address (EA). The address is dereferenced to get the value inside its memory location.
LEA Instruction
- Move the address into the target operand
- Some special usage
- Add a constant to a register
- Quick multiplication of 2, 3, 5, 9
Bitwise Operations
Operation |
Description |
AND |
Boolean AND operation between a source operand and a destination operand. |
OR |
Boolean OR operation between a source operand and a destination operand. |
XOR |
Boolean exclusive-OR operation between a source operand and a destination operand. |
NOT |
Boolean NOT operation on a destination operand.TEST Implied boolean AND operation between a source and destination operand, setting the CPU flags appropriately. |
Shift and Rotate Instructions
- Shift and Rotate Instructions.

Shift Instruction
Shift left evalue = value * 2
Shift right evalue = value / 2
Logical shift
- SHL

- SHR

Arithmetic shift right
- SAL == SHL
- SAR reg/mem, imm8/cl

Rotate Instructions
Rotate w/o the carry flag
- ROL reg/mem, imm8/cl
- ROR reg/mem, imm8/cl

Rotate with the carry flag
- RCL reg/mem, imm8/cl
- RCR reg/mem, imm8/cl

Multiplication and Division Instructions
MUL Instruction
-
In 32-bit mode, MUL (unsigned multiply) instruction multiplies an 8-, 16-, or 32-bit operand by either AL, AX, or EAX.
Multiplicand |
Multiplier |
Product |
AL |
reg/mem8 |
AX |
AX |
reg/mem16 |
DX:AX |
EAX |
reg/mem32 |
EDX:EAX |
IMUL Instruction
-
IMUL (signed integer multiply ) multiplies an 8-, 16-, or 32-bit signed operand by either AL, AX, or EAX
DIV Instruction
Sign Extension Instructions (CBW, CWD, CDQ)
- CBW (convert byte to word) extends AL into AH

- CWD (convert word to doubleword) extends AX into DX
- CDQ (convert doubleword to quadword) extends EAX into EDX
IDIV Instruction
Control Flow Instructions
TEST Instruction
-
Performs a nondestructive AND operation between each pair of matching bits in two operands
-
No operands are modified, but the Zero flag is affected.
-
Example: jump to a label if either bit 0 or bit 1 in AL is set.
CMP Instruction
Conditional Jumps
- can be divided into four groups
- Jumps based on specific flag values
- Jumps based on equality between operands or the value of (E)CX
- Jumps based on comparisons of unsigned operands
- Jumps based on comparisons of signed operands
- Jumps Based on Specific Flag Values

- Jumps Based on Equality

- Jumps Based on Unsigned Comparisons

- Jumps Based on Signed Comparisons

Conditional Structures


Implement Compound
Logical AND Operator
-
When implementing the logical AND operator, consider that HLLs use short-circuit evaluation
-
if (al > bl) AND (bl > cl) X = 1
-
reduce the code to five instructions by changing the initial JA instruction to JBE:
Logical OR Operator
-
When implementing the logical OR operator, consider that HLLs use short-circuit evaluation
-
We can use "fall-through" logic to keep the code as short as possible:
-
if (al > bl) OR (bl > cl) X = 1
WHILE Loops
-
A WHILE loop is really an IF statement followed by the body of the loop, followed by an unconditional jump to the top of the loop.
while( eax < ebx) eax = eax + 1;
FOR Loop
-
CPU built-in loops
-
C-style FOR loop
Stack Operations
- Imagine a stack of plates
- plates are only added to the top
- plates are only removed from the top
- LIFO structure

Runtime Stack
- Managed by the CPU, using two registers
- SS (stack segment)
- ESP (stack pointer)(SP in Real-address mode)

PUSH Operation
Put a number into the stack
ESP = ESP - sizeof(object)
[ESP] = object
-
A 32-bit push operation decrements the stack pointer by 4 and copies a value into the location pointed to by the stack pointer.

-
Same stack after pushing two more integers:

-
The stack grows downward. The area below ESP is always available (unless the stack has overflowed).higher->bottom
-
Syntax
POP Operation
Remove a number from the stack
ESP = ESP + sizeof(top object)
Function Calls
- The CALL instruction calls a procedure
- pushes offset of next instruction on the stack
- copies the address of the called procedure into EIP
- The RET instruction returns from a procedure
- pops top of stack into EIP
Call and Return Example
- CALL: change program control flow
- Call a function
- PUSH return address into the stack
- Set RIP to be the entry point of the target
- function
- Parameters can be passed by using registers or the stack
- RET: return to the caller
- pop rip
- …but you cannot do this by yourself – dst cannot be RIP/EIP
- Return value is passed by using RAX/EAX register
- The CALL instruction pushes 00000025 onto the stack, and loads 00000040 into EIP

- The RET instruction pops 00000025 from the stack into EIP

Nested Procedure Calls

Passing Register Arguments to Procedures
- Use registers
- Easier and faster
- Limited number of stacks ==> Limited number of parameters
- Use stack
- ◦ Basically no limitation on the number of parameters (only depends on the available stack space)
After the CALL statement, we have the option of copying the sum in EAX to a variable.