Pwn Intro

Club Resources

Slides

Follow along here

Introduction

Pwn refers to taking control of an application or network. This can take a variety of forms, but most commonly it is low level binary exploitation.

Low Level Basics

At the lowest level, programs are just lists of instructions that perform binary operations on data. These instructions can be read as assembly language, which is the human readable version of machine code. The CPU reads each instruction from memory and executes it, then moves on to the next set of instructions. Instructions like JMP and CALL allow programs to do more than just a list of tasks by allowing the program to move around the list of instructions.

What about data? A program can't do much if it doesn't have any data to work with. There are 2 places that programs use data from: CPU registers and RAM. Registers are memory inside of the CPU, usually 32 or 64 bits, depending on the CPU architecture. The data inside of registers is whatever the program is actively using - most of the time data is stored in RAM and moved into registers when it's needed. There are also several special registers - most importantly, RIP, RSP, and RBP. RIP holds the address of the current instruction, while RSP and RBP hold the addresses of the top and bottom of the stack.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

RAM is separated into several sections, as shown above. The text and initialized data both come from the program file. The text section is the program instructions and the initialized data is any other symbols from the program, such as global and static variables. The heap and stack both hold runtime data as needed by the program, such as local variables and function calls. The heap is fairly complicated, but the main idea that you need to know is that heap memory is allocated by the programmer and generally holds large data sections. References to any data on the heap must be stored on the stack in order for the program to access it. The stack is slightly less complicated but it can still be difficult to understand.

The Stack

The stack mostly holds local variables and the function call stack. Values can be pushed and popped from the top of the stack just like the stack data structure. This works by tracking where in memory the top and bottom of the stack are using the registers RSP and RBP. When a value is pushed to the stack, it is moved into the memory address in RSP, which is then decreased by the size of the value (usually 32 or 64 bits, depending on CPU architecture).

Function calls and stack frames

The standard calling convention for 64 bit intel processors (which are the most common architecture) goes as follows: When a function is called the current value of RIP is pushed to the stack, then the address of the function being called is moved into RIP. The first six arguments of the function are passed in the registers RDI, RSI, RDX, RCX, R8, R9, and any other arguments are pushed to the stack. When the function returns, the return value is put in the RAX register, the stack pointer is moved back to wherever the return address is, and the return address is popped into the RIP register. This convention is followed for all function calls.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Another convention which is extremely common, but not required, is to push the value of RBP onto the stack, then move the value of RSP into RBP, making the top of the stack the new base of the stack - this is what makes a function's stack frame. The value of RSP can also be adjusted to make room for local variables. After a function has finished, the values of RBP and RSP must be returned to their previous state.

Buffer Overflow

One of the most common methods of binary exploitation is buffer overflow. If the user input is larger than the alloted space in memory and proper protections are not in place, the input can overflow into adjacent memory. If the buffer is on the stack, which it usually is, the user can change any data below the buffer on the stack. This includes other local variables, but also return addresses. By changing return addresses, the user can execute any other code included in the binary file.

Protecting against buffer overflow

There are many protections against buffer overflow, so in simple programs it is usually not hard to protect against it. Buffer overflow exploits can be hard to notice, however, especially in complicated applications. Because of this it is important to always be careful with user input.

The simplest way to avoid buffer overflow is to simply limit the number of characters that can be used as input. Avoid using the gets function or scanf("%s") and use fgets instead, as you can specify input size.

Another protection against buffer overflow is using a stack canary. This is a value stored on the stack below local variables and somewhere else in memory. If a buffer is overflowed and the canary is changed, it will no longer match the other instance of the canary and the program will know that the buffer was overflowed.

To protect against return oriented programming (ROP), most programs use position independent executables. This means that the address of the program in memory is randomized, so a hacker can't change a return address to run other parts of the program.

If a hacker can obtain the address of the buffer, they could theoretically write their own code to the heap or stack, then return to said code and execute it. To avoid this, c programs mark the heap and stack as non-executable, so the operating system does not allow any code to be executed from those memory sections.

Pwntools

Pwntools is a python library designed to help with binary exploitation. It includes a variety of tools for ELF manipulation, servers, and memory analysis, in addition to more specific exploits, such as ROP and format string attacks. The easiest way to get comfortable with pwntools is to explore the documentation and find the tools you need for challenges.

Tubes

The pwnlib.tubes module allows communication with processes and servers. pwnlib.tubes.remote allows you to connect to a server and send data including no-printable characters, which is usually necessary for pwn challenges. The most important functions are shown in the example below.

conn = remote('challenge.tjcsec.club', 31359) # connect to the given url and port
# to interact with a binary file:
# conn = process("./chall")
conn.recvline() # receives characters until a newline
payload = b'\x00\x01\x02\x03'
conn.sendline(payload) # sends payload and a newline
print(conn.recv()) # receives all characters

Packing integers

The pwnlib.util.packing module allows you to convert python integers into their low-level bytes. It takes into account the endianness (order of bytes) of the architecture and the size of the integer (filling any extra bytes with 0s for uints). Most commonly you will use the functions p64(value) and p32(value) for 64 and 32 bit integers, respectively.

For other useful functions like ELF manipulation, assembly and disassembly, and ROP see https://docs.pwntools.com/en/stable/intro.html

GDB

https://users.ece.utexas.edu/~adnan/gdb-refcard.pdf