angr - HackMD

## What is angr? ![angry shadow](https://static.wikia.nocookie.net/6faf6ef9-6ee5-473f-96b6-0cf7b1a27d12) It's a python library to analyze executable files, including the ability to explore *every* possible control flow path rather than having to choose one! It has elements of both ghidra and "angr is an open-source binary analysis platform for Python. It combines both static and dynamic symbolic ('concolic') analysis, providing tools to solve a variety of tasks." ### Installing The library is available to install using `pip`, and you might also want to install `monkeyhex` to print all numbers as hex in the interpreter: `pip install angr monkeyhex` ### Justification The tools we've looked at so far each are useful in certain scenarios: ghidra is helpful when trying to determine exactly how a binary works and looking for hidden functionality (a process called static analysis), while gdb is more useful when trying to observe how a binary works while running, including specific memory offsets and function calls (which is called dynamic analysis). As their blurb suggests, angr allows you to perform both of these processes simultaneously, although to a lesser extent than each on their own, and with some extremely helpful, unique functionality. angr lets us perform **symbolic execution**, which means that we can run code without actually specifying real values for variables, letting us take various paths through the code and determine which one we want to follow. ### Control Flow All code has a **control flow**, which is the way the code can **branch** depending on various conditions. Branching is when the code can pick one of several paths (e.g. whether the condition in an if/for/while statement is true), and the control flow is the overall path the code follows when it runs. It's a lot like a flowchart! In assembly, which is a human-readable equivalent of the instructions a computer runs, the control flow is channeled with **conditional jumps**, where the processor decides whether to go somewhere else in the code if some condition is met. ![Diagram showcasing control flow](https://hackmd.io/_uploads/SkBO2fuc6.png) ## Using angr Note: this will all be in the python interpreter, available by running `python3` without any arguments ```python >>> import angr >>> import monkeyhex >>> proj = angr.Project('/path/to/whateverbinary') # replace this with the file you want to open # the project has various pieces of information about the binary >>> proj.arch # the architecture the binary is meant to run on <Arch AMD64 (LE)> >>> proj.entry # the address where instruction execution will start when the binary runs 0x401670 >>> proj.loader # the <Loaded true, maps [0x400000:0x5004000]> ``` Now that we've loaded the binary, we can use the various methods in `proj.factory.*` to create objects that we can test. For example, we can get a **basic block** using `proj`'s `proj.factory.basic_block(location)` method. A basic block is a sequence of instructions with only one entrance and one exit--the instructions get run in sequence with no option of jumping elsewhere. These are important because they what angr operates on: it breaks all of the code into basic blocks, then follows these blocks until the control flow changes. Then, it decides how to follow the available branches, something that I'll discuss later. Operating on these basic blocks won't be important for us, but it's good to know about how they work. ```python # getting and examining a basic block >>> block = proj.factory.block(proj.entry) # lift a block of code from the program's entry point <Block for 0x401670, 42 bytes> >>> block.pp() # pretty-print a disassembly to stdout 0x401670: xor ebp, ebp 0x401672: mov r9, rdx 0x401675: pop rsi 0x401676: mov rdx, rsp 0x401679: and rsp, 0xfffffffffffffff0 0x40167d: push rax 0x40167e: push rsp 0x40167f: lea r8, [rip + 0x2e2a] 0x401686: lea rcx, [rip + 0x2db3] 0x40168d: lea rdi, [rip - 0xd4] 0x401694: call qword ptr [rip + 0x205866] >>> block.instructions # how many instructions are there? 0xb >>> block.instruction_addrs # what are the addresses of the instructions? [0x401670, 0x401672, 0x401675, 0x401676, 0x401679, 0x40167d, 0x40167e, 0x40167f, 0x401686, 0x40168d, 0x401694] ``` angr also operates using **states**, reusable representations of the binary's current position in the code, variable values, etc. We will need to get at least one state to run the program at all, which we can obtain like so: ```python # this code uses a binary that asks for a password and prints a flag when it is correct # in this case, the intended password is SOSNEAKY, but there would be no convenient way of knowing this # instead, we will step through the code until we get a result we want # the binary is available at https://github.com/angr/angr-examples >>> proj = angr.Project('examples/fauxware/fauxware') >>> state = proj.factory.entry_state() ``` Then, we can create a **simulation manager** (an object that allows us to step through the program however we want): ```python >>> state = proj.factory.entry_state(stdin=angr.SimFile) # create an entry state with the contents of stdin as a variable >>> simgr = proj.factory.simulation_manager(state) ``` If we only care about the end state of the program, we can use `simgr.run()` to run along all branches until the code terminates: ```python >>> simgr.run() >>> simgr <SimulationManager with 3 deadended> ``` Then we can move these states between **stashes**, the groups that states are organized into. For example, we can filter the states to only include the ones we want: ```python >>> simgr.move(from_stash='deadended', to_stash='authenticated', filter_func=lambda s: b'Welcome' in s.posix.dumps(1)) # if you haven't seen a lambda function before, it's like having an inline `def fn(a,b):` that doesn't have a name. see https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions ``` We can also step through the code with finer control, running until we get to a branch instead of until the end. When this occurs, angr will create several new states, with one for each possibility the branch could have. We can then use those states to determine what inputs we can use to get different outputs: ```python >>> print(simgr.active) # currently, there's only one state since we haven't had to branch [<SimState @ 0x400580>] >>> while True: ... succ = state.step() ... if len(succ.successors) == 2: ... break ... state = succ.successors[0] >>> state1, state2 = succ.successors >>> state1 <SimState @ 0x400629> >>> state2 <SimState @ 0x400699 # now, we create a symbolic variable for the input with a size determined by state1, which we know is the state we want >>> input_data = state1.posix.stdin.load(0, state1.posix.stdin.size) >>> state1.solver.eval(input_data, cast_to=bytes) # input if we got the flag b'\x00\x00\x00\x00\x00\x00\x00\x00\x00SOSNEAKY\x00\x00\x00' >>> state2.solver.eval(input_data, cast_to=bytes) # input if we didn't b'\x00\x00\x00\x00\x00\x00\x00\x00\x00S\x00\x80N\x00\x00 \x00\x00\x00\x00' ``` If you ***hate fun*** (or just don't really care about how you get the result you want), then you can use `simgr.explore()` to carry out this process automatically, storing the results in the `found` stash. You can pass functions to filter what you want and don't with the `find=fn` and `avoid=fn` keyword arguments: ```python >>> simgr = proj.factory.simgr() # create a default manager which automatically uses the entry state >>> simgr.explore(find=lambda s: b"Congrats" in s.posix.dumps(1), avoid=lambda s: b"Go away" in s.posix.dumps(1)) >>> state_solved = simgr.found[0] >>> print(s.posix.dumps(1)) # stdout, file descriptor 1 Enter password: Congrats! >>> print(s.posix.dumps(0)) # stdin, file descriptor 0, which lets us find the input we'll need to use to get the same result remotely ``` ## Conclusion These notes only scratch the surface of angr; it has tons more features which I didn't have the time to explain properly here. Go take a look at their documentation! It's very straightforward, and covers these topics and more in detail. Also, angr doesn't exist in a vacuum, and it is probably used best in conjuction with other tools. For example, you could decompile a binary with ghidra, find branches or functions of interest, debug them in gdb, and then use angr to get the result you want! The library also has a GUI called angr-management which is still in alpha. I've never tried it, but if you're interested, the link is below. ## Links - [angr docs](https://docs.angr.io/en/latest/getting-started/index.html) - [examples of CTF challenges solved using angr](https://docs.angr.io/en/latest/appendix/more-examples.html) - [angr example code, used in docs](https://github.com/angr/angr-examples/) - [angr-management gui](https://github.com/angr/angr-management)