--- title: Linux Kernel pwn tags: security lang: zh_tw --- # Linux Kernel pwn ## Inspection ### 取得 symbol address 方便下斷點觀察 ``` # 觀察 printk 輸出 echo /proc/sys/kernel/dmesg_restrict # 觀察是否有權限讀 printk 輸出 dmesg # 拿 kernel module address cat /proc/modules lsmod # 拿 all symbol address echo /proc/sys/kernel/kptr_restrict # 觀察是否有權限讀 kallsyms cat /proc/kallsyms ``` ### 從 bzImage 萃取 vmlinux 取得 vmlinux 後, 就能再用 ROPgadget 取得各種 gadget 使用 Linux 內建的腳本 [extract-vmlinux]( https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux) 就能完成 ``` ./extract-vmlinux bzImage > vmlinux ``` ### 取得 gadget * 對 vmlinux 用 [ROPgadget](https://github.com/JonathanSalwan/ROPgadget) 即可, 注意要設定 range 以取出有執行權限的 gadget * 用 ropper 跑比較久, 但比較準 * `ropper --nocolor --file ./vmlinux > ropper_gadget` ### 取得 kASLR offset 若有啟用 kASLR, 則還需要換算 gadget address ``` # 取得實際的 text offset # cat /proc/kallsyms | grep "T _text" ffffffff88600000 T _text # 取得 vmlinux 中的 text offset # readelf -a vmlinux | grep " .text" [ 1] .text PROGBITS ffffffff81000000 00200000 00 .text .notes __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl __ksymtab_strings __param __modver # 兩者相減就是 offset 了 # 假設 gadget 在 vmlinux 中的 0xffffffff811bc873 # 其實際的 address 為 # gadget - vmlinux.text.base + real.text.offset # 0xffffffff811bc873 - 0xffffffff81000000 + ffffffff88600000 # = 0xffffffff887bc873 ``` ### 取得 memory info 查看 slab/slub 記憶體分配資訊 ``` sudo cat /proc/meminfo slabtop sudo cat /proc/slabinfo ``` ## 防禦機制 ### SMEP * Kernel 不能執行 user code * 紀錄在 cr4 ### SMAP * Kernel 不能讀寫 user memory * 紀錄在 cr4 ### KPTI * 效果類似 SMEP + SMAP * 拿來擋 meltdown 的 side channel attack * 回 user mode 時要修 cr3 * 可以呼叫 `swapgs_restore_regs_and_return_to_usermode` 這個 function, 直接幫你修好 cr3 ## 常用 structure * `tty_struct` (0x2e0, 0x2c0) base, heap * [定義](https://elixir.bootlin.com/linux/latest/source/include/linux/tty.h#L285) * Allocate * `int pfd = open("/dev/ptmx", O_RDWR|O_NOCTTY);` * Release * `close(pfd)` * Write * `write(pfd, garbage, sizeof(garbage)` * Control flow * 控制 `tty_operations` vtable, 有機會控執行流 * 相關 offset * offset 0: 需要為 MAGIC ```c /* tty magic number */ #define TTY_MAGIC 0x5401 ``` * offset 0x18: 為 tty_operations 結構, 以下是此結構 offset * offset 0x56: 為 write 的 function pointer * `shm_file_data` (0x20) base, heap * `seq_operations` (0x20) base * `msg_msg` (0x30 ~ 0x1000) heap * `msgget()` * `msgsnd(qid, &msgbuf, real_size - 0x30, 0)` * 呼叫 `kmalloc(size+0x30)` * 將 msgbuf 內容 copy ⾄ chunk + 0x30 處,前 0x30 為其 header * `msgrcv(qid, &msgbuf, real_size - 0x30, 1, 0)` * `kfree()` * https://duasynt.com/blog/linux-kernel-heap-spray * * `subprocess_info` (0x60) base, heap * 參考 * [Kernel Exploitで使える構造体集](https://ptr-yudai.hatenablog.com/entry/2020/03/16/165628) ## Common Vulns ### Double fetch * kernel space 和 user space 的 race condition * kernel 存取兩次同樣的 user space data, 之間產生 race condition 的空隙 (第一次存取給你 A, 第二次存取給你 B) ## 各種 Exploit ### 提權 * 執行到 commit_creds(prepare_kernel_cred(0)) 完成提權 * pop rdi * mov rdi, rax * 回到 user mode * swapgs * iretq * 並執行 /bin/sh * iretq 排好 gadget, 依序是 * rip * user_cs * user_eflags * user_sp * user_ss * 可用以下函數儲存使用者暫存器 ```c unsigned long user_cs, user_ss, user_eflags, user_sp; void save_stats() { asm( "movq %%cs, %0\n" "movq %%ss, %1\n" "movq %%rsp, %3\n" "pushfq\n" "popq %2\n" : "=r"(user_cs), "=r"(user_ss), "=r"(user_eflags), "=r"(user_sp) : : "memory" ); } ``` ### modprobe_path * 其為 kernel global variable * `/proc/sys/kernel/modprobe` * 執行一個 kernel 認不得的執行檔格式時, kernel 會去執行 `modprobe_path` 字串所寫的程式 * trace code 流程 * sys_execve * do_execve * do_execveat_common * bprm_execve * exec_binprm * **search_binary_handler** * request_module * call_modprobe * call_usermodehelper_exec * 利用方式 * 改掉 modprobe_path, 改成自己的惡意程式 * 執行一個不是執行檔的東西, 讓 kernel 認不得 * kernel 就會去執行你的惡意程式 :) * 利用方式 2 * 但是改掉後繼續跑會 panic * 沒關係, 改掉後讓程式進無窮迴圈就好, 讓程式在背景執行 * 不會炸裂 + 還是有改到 `modprobe_path` * 補充冷知識 * Shebang ### setxattr & userfaultfd * `userfaultfd` * 為某段記憶體註冊 page fault handler * `copy_from_user` `copy_to_user` 也會踩到 page fault * 製造 race condition * `setxattr` ```c static long setxattr(struct dentry *d, const char __user *name, const void __user *value, size_t size, int flags) { int error; void *kvalue = NULL; char kname[XATTR_NAME_MAX + 1]; if (flags & ~(XATTR_CREATE|XATTR_REPLACE)) return -EINVAL; error = strncpy_from_user(kname, name, sizeof(kname)); if (error == 0 || error == sizeof(kname)) error = -ERANGE; if (error < 0) return error; if (size) { if (size > XATTR_SIZE_MAX) return -E2BIG; kvalue = kvmalloc(size, GFP_KERNEL); if (!kvalue) return -ENOMEM; if (copy_from_user(kvalue, value, size)) { error = -EFAULT; goto out; } if ((strcmp(kname, XATTR_NAME_POSIX_ACL_ACCESS) == 0) || (strcmp(kname, XATTR_NAME_POSIX_ACL_DEFAULT) == 0)) posix_acl_fix_xattr_from_user(kvalue, size); else if (strcmp(kname, XATTR_NAME_CAPS) == 0) { error = cap_convert_nscap(d, &kvalue, size); if (error < 0) goto out; size = error; } } error = vfs_setxattr(d, kname, kvalue, size, flags); out: kvfree(kvalue); return error; } ``` * 注意裡面調用 `kvmalloc` `copy_from_user` `kfree` 的部分 * 任意時機調用 `kvmalloc` (size 1 ~ 0x10000) * 會從 user land 複製 data 到 chunk (`copy_from_user`) * 搭配 `userfaultfd` 可以任意控制 `kfree` 時機 ### mmap * 可以用 `mmap` 申請記憶體 * 配合類似 `mov esp, eax` 的 gadget, 讓 stack migrate 到 `mmap` 申請到的空間, 在只能控一次執行流時, 能以此將問題變成 ROP * 使用範例 ```c // Allocate rop_chain address space ULL *rop_chain = mmap((void *)addr, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); ``` ### signal * 回到 user mode 卻發生異常? * `signal(SIGSEGV, get_shell);` ## Other notes ### current gs:off_14D00 Kernel 中用 `current` 代表著現在的 task 查看以下 code ```c /* * This routine handles page faults. It determines the address, * and the problem, and then passes it off to one of the appropriate * routines. */ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { struct vm_area_struct *vma; struct task_struct *tsk; struct mm_struct *mm; vm_fault_t fault, major = 0; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; u32 pkey; tsk = current; mm = tsk->mm; prefetchw(&mm->mmap_sem); ``` ([v4.19.98| __do_page_fault#1206](https://elixir.bootlin.com/linux/v4.19.98/source/arch/x86/mm/fault.c#L1206)) 其對應的 x64 組語如下 ```asm .text:FFFFFFFF81043110 push rbp .text:FFFFFFFF81043111 mov rax, gs:off_14D00 .text:FFFFFFFF8104311A mov rbp, rsp .text:FFFFFFFF8104311D push r15 .text:FFFFFFFF8104311F push r14 .text:FFFFFFFF81043121 push r13 .text:FFFFFFFF81043123 mov r13, rsi .text:FFFFFFFF81043126 push r12 .text:FFFFFFFF81043128 mov r12, rdi .text:FFFFFFFF8104312B push rbx .text:FFFFFFFF8104312C mov rbx, rdx .text:FFFFFFFF8104312F sub rsp, 38h .text:FFFFFFFF81043133 mov r14, [rax+2A8h] .text:FFFFFFFF8104313A mov [rbp+var_40], rax .text:FFFFFFFF8104313E lea rax, [r14+60h] .text:FFFFFFFF81043142 mov [rbp+var_38], rax .text:FFFFFFFF81043146 prefetcht0 byte ptr [r14+60h] ``` **可以看得出來 `gs:off_14D00` 就存著 `current`** ### gdb debug ``` add-symbol-file path/to/.kofile ``` * 若 module 有帶 symbol, 可以這樣 load symbol ``` add-symbol-file driver.ko <driver_base_addr> [-s <section1_name> <section1_addr>] ... ``` ### Source code tracing copy_from_user -> ... -> copy_user_generic ### cpio ``` mkdir rootfs cd rootfs gunzip ../rootfs.cpio.gz sudo cpio -idm < /path/to/rootfs.cpio sudo chown root:root -R rootfs sudo find . -print | sudo cpio -o -Hnewc > ../my.cpio sudo find . -print | sudo cpio -o -Hnewc | gzip -9 > ../my.cpio.gz ``` * [decompress_cpio.sh](https://gist.github.com/brant-ruan/784808bc192fff533d8be22932c4e2b6) ```bash #!/bin/bash # Decompress a .cpio.gz packed file system rm -rf ./initramfs && mkdir initramfs pushd . && pushd initramfs cp ../initramfs.cpio.gz . gzip -dc initramfs.cpio.gz | cpio -idm &>/dev/null && rm initramfs.cpio.gz popd ``` * [compile_exp_and_compress_cpio.sh](https://gist.github.com/brant-ruan/b67dc2fbae150e7bc76fda914816f534) ```bash #!/bin/bash # Compress initramfs with the included statically linked exploit in=$1 out=$(echo $in | awk '{ print substr( $0, 1, length($0)-2 ) }') gcc $in -static -o $out || exit 255 mv $out initramfs pushd . && pushd initramfs find . -print0 | cpio --null --format=newc -o 2>/dev/null | gzip -9 > ../initramfs.cpio.gz popd ``` `./compile_exp_and_compress_cpio.sh exploit.c` ### Kernel Memory Layout * https://elixir.bootlin.com/linux/v4.9.249/source/Documentation/x86/x86_64/mm.txt ``` Virtual memory map with 4 level page tables: 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm hole caused by [48:63] sign extension ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space ... unused hole ... ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole ```