0x01. Windows Internal Basics & Assembly

0x01. Windows Internal Basics & Assembly

Virtual Memory

一般我們看到的Windows操作系統，都是在User Mode底下的

滑鼠/鍵盤
瀏覽器
Office
遊戲
…

相對的，在我們看不到的底層系統運作稱為Kernel Mode

如下，每個程式可以稱之為Process，Process範圍包含User Mode/Kernel Mode

小算盤
瀏覽器
資料夾

當每個Process執行時，以32位元系統來說會被分配4GB的Ram給Porcess (User/Kernel 各2GB)

64位元則是分配8/248 TB

	32位元	64位元
User Mode Address	0 - 0x7fff ffff	0 - 0x7ff ffff ffff
Kernel Mode Address	0x8000 0000 - 0xffff ffff	0xffff 0800 0000 0000 - 0xffff ffff ffff ffff

電腦真的有這麼大的 RAM???

沒有，32位元為例，User Mode中的這2GB是虛擬的，並不會真正佔用這麼多，而Kernel mode是所有processes共用2GB的記憶體

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

…

主要提的會以User Mode為主

PE Intro

PE Structure

PE: Portable Executable，常見的exe檔案是PE, DLL(待會會提到)也是PE的一種
有自己特別一套架構，跟依照的規範

既然目標是做逆向工程，所以得了解一下PE檔案的架構

針對PE檔案，可以用PE View/CFF Exploror等工具來觀測裡面的資料

Magic Bytes

在惡意程式分析的時候，常常會遇到一種狀況
攻擊者在一個PE檔案裏面嵌入了另一個PE檔案，等待主檔案執行後釋放出內嵌的exe檔案

下列的特殊字元能幫助識別是否有新的內嵌PE檔案出現在Memory/檔案之中，稱之為Magic byte

DOS Header: 0 -> 5A 4D (MZ - Little Endian)
PE Offset: 0x3C -> 0xF0

再強調一次，這些是微軟定義的

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

PE Signature

0xF0的Offset -> 45 50 (PE - little endian)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Sections

PE檔案當然會帶有更多的東西，包含以下，我們用section來區隔，每個section有不同的功用

.text: 程式碼
.idata: import library列表
.edata: export library列表
.data: Global variable
.rsrc: 程式需要的額外資源

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Example

來看看人家WannaCry的資源放了什麼

MZ?? PE???(圖中A4 A5, 19C 19D)
裡頭塞個PE檔

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

PE Map to Memory

承一開始所說的虛擬位址概念(User Mode 2GB)，PE檔案裡面也會標示Sections對應到Virtual Memory的位址

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

大概是這樣子

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

再對照一次
Virtual Size: 程式執行放到memory大小，在Debugger裡面也會看到一樣大小
Virtual Address: 在Debugger中，距離Base Address的Offset
Raw Size: PE檔案該Section真實大小
Raw Address: PE檔案該Section的Offset

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

大神PE詳細講解

DLL vs EXE

DLL: 純Library，無法點兩下就自己跑
EXE: 執行檔，會需要DLL的幫忙來執行各種功能

初探程式執行的方式

一般想要執行程式如小算盤，通常就是點兩下然後小算盤就出現了

你可以做加加減減，可是，實際到底做了些甚麼事情呢?

一開始的小算盤可能是用C++編寫，經過compile(編譯)之後，會變成shellcode(機械碼)的形式，其實就是hex number

剛剛看到了每個程式，包含小算盤都會有.text section，儲存程式碼的section，裡面儲存就是很多的Hex number format

當程式開始執行後，CPU會解讀.text的這些Hex number，搭配對應的指令集去做運算

但是人類無法看懂Hex number，所以需要使用Disassembly等工具，將Hex number轉譯成看得懂的assembly code(組合語言)

Registers

為何需要Register?

程式執行時，假設2+3
2跟3都要存在某個地方，讓CPU來相加，暫存2,3的地方就是暫存器

R開頭: 64位元
E開頭: 32位元

R(E) IP: 指到下一個即將被執行的指令

R(E) AX：加減乘除 - 尾 / Function return
R(E) BX：用於陣列的某Offset
R(E) CX：計數器
R(E) DX：加減乘除 - 頭

R(E) SI：陣列Copy - Source
R(E) DI：陣列Copy - Destination

R(E) BP：Stack Base
R(E) SP：Stack Top

R8 - R15: 64位元獨有，提供額外空間

長度 (RBX/RCX/RDX也可套用)

                   RAX       
0000 0000 0000 00 0  0

                   EAX
          0000 00 0  0
          
                    AX
               00 0  0
               
                    AH
                  0 
                    AL
                     0

Flag

在程式執行時，除了Register之外，還有一個叫Flag的東西紀錄程式的狀態

Set: 1
Not set/clear: 0

ZF (Zero Flag) : 剛剛做的Assembly運算後為0?
CF (Carry Flag): 在進行加法或減法運算時，如果產生了進位或借位，有就Set
SF (Sign Flag) : 運算結果是否為負數？

這邊有大神對Flag的介紹

Stack

單純依賴Register做程式計算資料的暫存是不夠的
Windows引進了Stack的概念，stack其實只是一塊記憶體，供程式隨時取用

他的特性以其取用方法如下

後進先出
Function 變數 (32位元 Only)
push 向上增長(位值變小)
pop 向下增長(為值變大)
Pop/Push 才能改變
pop eax = copy esp value to eax, and esp - 4

Assembly

雖然假設看本系列的人有一定組合語言基礎，但還是介紹一下常見到的指令以及概念

程式透過編譯器，變成機械碼(真正儲存在執行黨內的東西)
分析師透過反彙編器(IDA/GDB..)打開，看到的是組合語言

           Compiler   EXE    Disassembler
C code        ->     機械碼      ->    逆向時所看到的組合語言
int a;         |   55           |     push ebp
printf('bla'); |   8B EC        |     mov ebp, esp    
return 0;      |   ........     |     .....

機械碼 VS 組合語言

B9       42 00 00 00  ; little endian
mov ecx, 0x42

一些常見組合語言，以32位元，intel架構為例

mov eax, 0x20 ;  rax = 0x20
mov eax, [0x11223344]  ; 將存在0x11223344的東西 copy到rax

add eax, 1    ; eax += 1
inc eax       ; eax += 1 比上面的好，因為機械碼會比較短
dec eax       ; eax -= 1
sub eax, 0x10 ; eax -= 10
add eax, ebx  ; eax = eax + ebx

mul 0x50      ; eax = eax * 0x50  結果會以 edx:eax格式儲存
div 0x50      ; edx:eax / 0x50    eax = 商, edx = 餘數

and eax, ebx  ; eax = eax && ebx
xor eax, eax  ; eax = eax (xor) eax, 一種快速將eax設為0的方法
xor eax, 0x20 ; eax = eax (xor) 0x20, 常用於加密字串避免被在靜態分析時被發現
or eax, 0x20  ; eax = eax || 0x20

shl eax, 1    ; 10110001 左移變成 01100010, CF set
shr eax, 1    ; 10110001 右移變成 01011000, CF set
              ; 根據被移除位是否為1來規範CF flag

rol eax, 1    ; 10110001 轉圈左移變成 01100011, CF set
ror eax, 1    ; 10110001 轉圈右移變成 11011000, CF set

jz            ; 如果 Zero Flag（ZF）設置（即為1），則跳轉到指定的程式碼位置。ZF通常在前一個指令比較或測試兩個值並且結果為零時設置。
jnz           ; 如果 Zero Flag（ZF）未設置（即為0），則跳轉到指定的程式碼位置。
je            ; 與 jz 指令相同，如果 Zero Flag（ZF）設置（即為1），則跳轉到指定的程式碼位置。
jg            ; 如果前一個 cmp 或 sub 指令的結果顯示第一個操作數大於第二個操作數（並且沒有發生溢出），則跳轉到指定的程式碼位置。

cmp eax, ebx  ; eax - ebx, 用flag保留運算結果, eax值不會更改 
              ; if eax == ebx -> ZF set 1 
              ; if eax != ebx -> ZF set 0


push eax      ; push eax's data to top stack,   esp = esp - 4
pop eax       ; pop data from top stack to eax, esp = esp + 4

pushad        ; 將 eax,ecx,edx,ebx,esp,ebp,esi,edi 依序push進去stack
popad         ; 將 stack 依序pop回edi,esi,ebp,esp,ebx,edx,ecx,eax 
              ; 常常用在 packer 上面，保存原本程式暫存器的數值

ret           ; pop eip(there's no such instruction)
nop           ; do nothing, good for padding

Calling Convention ( _stdcall )

在組合語言中，與一般寫程式無異，都需要有function的存在來幫助程式做重複性的工作
有function就必定會有argument要傳進子function，但是32/64位元的方法有點不一樣

32位元：利用Stack保存進入Function
64位元：主要利用Register保存 RCX, RDX, R8, R9, Stack

return value: RAX/EAX

下列是個32位元小程式，以及他對應的assembly

int test(int a, int b)
{
  int x, y
  x = a;
  y = b;
  return 0;
}

int main()
{
  test(2,3);
  return 0;
}

main:

101 push 3
102 push 2
103 Call test     ; 偷偷做push 104 (return address)
104 add esp, 0x08 ; 恢復先前因push而被減掉的stack
105 xor eax, eax  ; main's return 0

test:

200 push ebp               ; 保存Main EBP值
201 mov ebp, esp           ; 創造新的Stack, EBP 變小 往上跑
202 sub esp, 0x08          ; 為新的Stack開 Local buffer

203 mov eax, [ebp+0x08]    ; eax = 3
204 mov [ebp-0x04], eax    ; store to local variable
205 mov eax, [ebp+0x0c]    ; eax = 2
206 mov [ebp-0x08], eax    ; store to local variable

207 xor eax, eax           ; eax = 0, test's return 0
208 mov esp, ebp           ; 還原原本 ESP
209 pop ebp                ; 還原原本EBP
20A ret 8                  ; return 2個參數下面存的return address

Example stack layout

After execution to 206, what stack looks like

|Address|      Value     |
--------------------------
|  10   |        2       |<- EBP - 0x08 / New ESP ; 202 sub esp, 0x08 ; 206 mov [ebp-0x08], eax
|  14   |        3       |<- EBP - 0x04           ; 204 mov [ebp-0x04], eax 
|  18   |     old rbp    |<- EBP                  ; 200 push ebp      ; 201 mov ebp, esp
|  1c   | return address |              ; 103 Call test 
|  20   |        2       |<- EBP + 0x04 ; 102 push 2
|  24   |        3       |<- EBP + 0x08 ; 101 push 3

重點：
確保Stack用完後，ESP/EBP回到原本值
Stack被使用後並不需要清除成0

EBP + 0x08 : Main's argument
EBP - 0x04 : test's local variable

64位元的傳值方法不是透過push/pop，而是透過register

還是沒有看懂？
沒關係，這邊有30cm大神的文章更多資源

更多資訊可以從秋聲大神的書找到，相當推薦

至於識別一些常見的C語言結構，也可以參考惡意代碼分析實戰第六章

上述的大概就是要進入逆向工程的一些前置知識，下一篇會介紹在開始分析惡意程式之前所需要的環境

-0xbc

0x01. Windows Internal Basics & Assembly

Virtual Memory

PE Intro

PE Structure

Magic Bytes

PE Signature

Sections

Example

PE Map to Memory

DLL vs EXE

初探程式執行的方式

Registers

Flag

Stack

Assembly

Calling Convention ( _stdcall )

tags: Malware Analysis Reverse Engineering tutorials

Read more

DeFiHackLabs Week8 Group 2 Assignments

Blockchain Security Tutorials

Ethernaut CTF Level 1 - Fallback

Ethernaut CTF Level 13 - GatekeeperOne

tags: `Malware Analysis` `Reverse Engineering` `tutorials`