Reverse Engineering Intro

# Reverse Engineering Intro ### What is reverse engineering? According to ctf101.org, "Reverse Engineering in a CTF is typically the process of taking a compiled (machine code, bytecode) program and converting it back into a more human readable format." However, it can also be trying to just deciper a program to do what we want it to do. ### What is a binary/machine code? When we compile a program, the compiler first turns our high-level code into assembly, a small set of instructions executed by the processor. This gets compressed into machine code, which has many formats, including ELF. When executed, these binaries get read by the processor and are run according to the instructions inside. ### Low-level architecture CPUs contain small boxes of memory called registers, which contain data for intermediary operation and are extremely fast to read and write from. ![image](https://hackmd.io/_uploads/rykpV38agl.png) Most of the lines in assembly deal with registers; for example, `mov eax, 4` moves the value 4 into the register `eax`. A good guide exists [here](https://flint.cs.yale.edu/cs421/papers/x86-asm/asm.html). ### GDB GDB is a debugger that is widely used for debugging regular compiled programs, but also for reverse engineering. Some people have made variants of GDB such as [gef](https://github.com/hugsy/gef) and [pwndbg](https://github.com/pwndbg/pwndbg) which have enhanced features for pwn/rev specifically. To use it, install GDB on your computer according to your operating system (windows users will most likely need to use either WSL or some other linux emulating environment). To use GDB on a program, run `gdb chall` where chall is the name of the binary. You may need to run `chmod +x chall` first. For those who are unfamilliar with what debugging is, it is simply running a program while also being able to stop it at any time and analyze the state of the program and memory. You can stop by setting breakpoints at specific lines of code, and when a debugger like GDB hits a breakpoint, it will stop and allow you to run commands until you allow the program to continue. This makes it a lot easier for programmers to spot bugs as you can see the program working in real time rather than just analyzing the static code. GDB has many commands and can be used in a wide variety of ways to analyze a program. Put simply, gdb has a steep learning curve and it's quite hard to get used to it at first. However, like vim, you only need a small step of instructions to get started: - `b` for setting breakpoints - ex: `b *main` sets a breakpoint at the start of `main`. - `run` to start running the program - `c` to continue - `s` for step (moves one assembly instruction forward) - `disassemble` to get assembly for a function - ex: `disassemble main` - `print var` to print a local variable named `var` - `x/wx [memory address]` to view a word at that memory address - `x/40x [memory address]` to view 40 words ### Other tools Other tools like ghidra and binary ninja also exist, which serve to give a higher-level analysis of binaries. They take up more space on your computer, but have more features. These programs can translate a binary directly back to C, albeit with many inconsistencies. It is impossible to directly translate a binary back into the language it was written in due to the nature of compiled languages. ### Conclusion In my opinion, the best way to learn rev is to just do a lot rev challenges (and get good at using a debugger). Rev is quite literally the definition of backwards thinking, and unlike other ctf categories, it is more ad hoc. There are many resources online, and most ctfs have a rev category. Have fun with challenges!