owned this note
owned this note
Published
Linked with GitHub
# Pwn Intro
## Club Resources
* [Practice Problems](https://ctf.tjcsec.club)
* [Codespaces Desktop](https://github.com/TJCSec/desktop)
* [Shell Commands List](https://hackmd.io/@tjcsc/cmd)
## Introduction
Pwn refers to taking control of an application or network. This can take a variety of forms, but most commonly it is low level binary exploitation.
## Low Level Basics
At the lowest level, programs are just lists of instructions that perform binary operations on data. These instructions can be read as assembly language, which is the human readable version of machine code. The CPU reads each instruction from memory and executes it, then moves on to the next set of instructions. Instructions like JMP (which jumps to a different part of the program) and CALL (which calls a function) allow programs to do more than just execute a list of tasks by letting them move around the list of instructions.
What about data? A program can't do much if it doesn't have any data to work with. There are 2 main places that programs use data from (at least that are important for binary exploitation): CPU registers and RAM (**R**andom **A**ccess **M**emory). Registers are memory inside of the CPU, usually 32 or 64 bits, depending on the CPU architecture. The data inside of registers is whatever the program is actively using - most of the time data is stored in RAM and moved into registers when it's needed. There are also several special registers - most importantly, RIP, RSP, and RBP. RIP holds the address of the current instruction, while RSP and RBP hold the addresses of the top and bottom of the stack.

RAM is separated into several sections, as shown above. The text and initialized data both come directly from the program file. The text section is the program instructions and the initialized data is any other symbols from the program, such as global and static variables. The heap and stack both hold runtime data as needed by the program, such as local variables and function calls. The heap is fairly complicated, but the main idea that you need to know is that heap memory is allocated by the programmer and generally holds large data sections. References to any data on the heap are stored on the stack in order for the program to access it. The stack is slightly less complicated but it can still be difficult to understand.
A reference for x64 instructions and registers can be found [here](https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf)
## The Stack
The stack mostly holds local variables and the function call stack. Values can be pushed and popped from the top of the stack just like the stack data structure (if you haven't taken data structures yet you can think of this as a stack of plates, where each plate holds a certain piece of data). This works by tracking where in memory the top and bottom of the stack are using the registers RSP and RBP. When a value is pushed to the stack, it is moved into the memory address in RSP, which is then decreased by the size of the value (usually 32 or 64 bits, depending on CPU architecture).
### Function calls and stack frames
The standard calling convention for 64 bit intel processors (aka x64 processors) goes as follows:
1. The first 6 arguments for the function are stored in the registers RDI, RSI, RDX, RCX, R8, and R9. Any additional arguments are pushed to the top of the stack.
2. The current value of RIP is pushed to the stack so that the program knows where to go back to after executing the function.
3. Then the address of the function being called is moved into RIP and the code in the function is executed.
4. Whenever the function returns (using the RET instruction), the return value is put in the RAX register, the stack pointer is moved back to wherever the return address is, and the return address is popped into the RIP register.
This process is universal for all functions called in a x64 processor, but there are some conventions that most functions follow.

One of the most common and most important conventions is to push the value of RBP onto the stack, then move the value of RSP into RBP, making the top of the stack the new base of the stack - this is what makes a function's stack frame. The value of RSP can also then be adjusted to make room for local variables. After a function has finished, the values of RBP and RSP must be returned to their previous state, which can easily be done with the LEAVE command, which moves the value in RBP into RSP (moves the top of the stack to point to the current bottom), then pops the next value from the stack into RBP. Since the last value pushed to the stack before RSP and RBP were originally adjusted was the old value of RBP, this process returns both registers to the state they were in when the function was called.
## Buffer Overflow
One of the most common methods of binary exploitation is buffer overflow. If the user input is larger than the alloted space in memory and proper protections are not in place, the input will overflow into adjacent memory. If the buffer is on the stack, which it usually is, this allows the user to change any data below the buffer on the stack, which includes other local variables and return addresses. By changing return addresses, the user can execute any other code included in the binary file (which will be covered in a later lecture).
### Protecting against buffer overflow
There are many protections against buffer overflow, so in simple programs it is usually not hard to protect against it. Buffer overflow exploits can be hard to notice, however, especially in complicated applications. Because of this it is important to always be careful with user input.
The simplest and most important way to avoid buffer overflow is to simply limit the number of characters that can be used as input. Avoid using the `gets` function or `scanf("%s")` and use `fgets` instead, as you can specify input size. NEVER trust user input, and always make sure it is properly sanitized and contained.
Another protection against buffer overflow is using a stack canary. This is a value stored on the stack between local variables and the return address, with a copy of it elsewhere in memory. If a buffer is overflowed and the canary is changed, it will no longer match the other instance of the canary and the program will know that the buffer was overflowed. This the return address from being tampered with, but can still allow a hacker to change local variable.
To protect against return oriented programming (ROP), most programs use *position independent executables*. This means that the address of the program in memory is randomized, so a hacker can't change a return address to run other parts of the program.
If a hacker can obtain the address of the buffer, they could theoretically write their own code to the heap or stack, then return to said code and execute it. To avoid this, c programs mark the heap and stack as non-executable, so the operating system does not allow any code to be executed from those memory sections.
## Pwntools
Pwntools is a python library designed to help with binary exploitation. It includes a variety of tools for ELF manipulation, servers, and memory analysis, in addition to more specific exploits, such as ROP and format string attacks. The easiest way to get comfortable with pwntools is to explore the [documentation](https://docs.pwntools.com/en/stable/intro.html) and find the tools you need for challenges.
### Tubes
The `pwnlib.tubes` module allows communication with processes and servers. `pwnlib.tubes.remote` allows you to connect to a server and send data including non-printable characters, which is usually necessary for pwn challenges. The most important functions are shown in the example below.
```
conn = remote('challenge.tjcsec.club', 31359) # connect to the given url and port
# to interact with a binary file:
# conn = process("./chall")
conn.recvline() # receives characters until a newline
payload = b'\x00\x01\x02\x03'
conn.sendline(payload) # sends payload and a newline
print(conn.recv()) # receives all characters
```
### Packing integers
The `pwnlib.util.packing` module allows you to convert python integers into their low-level bytes. It takes into account the endianness (order of bytes) of the architecture and the size of the integer (filling any extra bytes with 0s for uints). Most commonly you will use the functions `p64(value)` and `p32(value)` for 64 and 32 bit integers, respectively.
For other useful pwntools modules like ELF manipulation, assembly and disassembly, format strings attacks, and ROP, see https://docs.pwntools.com/en/stable/intro.html
## GDB
For an explanation of GDB, see the [lecture notes](https://hackmd.io/@tjcsc/SyY4Mn8Tel) from the reverse engineering intro.
A quick reference for GDB commands can be found at [https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf](https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf)
Additionally, I highly recommend installing GEF (GDB Enhanced Functions) using the command `bash -c "$(curl -fsSL https://gef.blah.cat/sh)"`