Try   HackMD

simplefs Kernel panic bug problem

source code repo simplefs

ls make kernel panic

By execute the kernel module and use "ls" to show the list, we will get the kernel panic

[ 2679.036002] usercopy: Kernel memory exposure attempt detected from SLUB object 'simplefs_cache' (offset 0, size 5)!
[ 2679.038088] ------------[ cut here ]------------
[ 2679.038091] kernel BUG at mm/usercopy.c:102!
[ 2679.039056] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 2679.040065] CPU: 0 PID: 167958 Comm: ls Tainted: G    B   W  OE      6.2.0-39-generic #40-Ubuntu
[ 2679.041651] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2679.043180] RIP: 0010:usercopy_abort+0x6c/0x80
[ 2679.044119] Code: 58 b7 51 48 c7 c2 e7 2e 5d b7 41 52 48 c7 c7 30 97 59 b7 48 0f 45 d6 48 c7 c6 56 ac 57 b7 48 89 c1 49 0f 45 f3 e8 34 80 d1 ff <0f> 0b 49 c7 c1 20 1b 5c b7 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[ 2679.048676] RSP: 0018:ffffb4a400e8fce8 EFLAGS: 00010246
[ 2679.049847] RAX: 0000000000000067 RBX: 0000000000000000 RCX: 0000000000000000
[ 2679.051319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2679.052860] RBP: ffffb4a400e8fd00 R08: 0000000000000000 R09: 0000000000000000
[ 2679.054378] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 2679.055847] R13: ffff9d59851ec600 R14: 0000000000000001 R15: ffff9d5987099310
[ 2679.057173] FS:  00007f312f146800(0000) GS:ffff9d59be600000(0000) knlGS:0000000000000000
[ 2679.058554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2679.059634] CR2: 00005602991e2438 CR3: 0000000007094002 CR4: 0000000000370ef0
[ 2679.060871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2679.062135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2679.063375] Call Trace:
[ 2679.064178]  <TASK>
[ 2679.064762]  ? show_regs+0x6d/0x80
[ 2679.065773]  ? die+0x37/0xa0
[ 2679.066630]  ? do_trap+0xd4/0xf0
[ 2679.067410]  ? do_error_trap+0x71/0xb0
[ 2679.068150]  ? usercopy_abort+0x6c/0x80
[ 2679.068893]  ? exc_invalid_op+0x52/0x80
[ 2679.069649]  ? usercopy_abort+0x6c/0x80
[ 2679.070419]  ? asm_exc_invalid_op+0x1b/0x20
[ 2679.071225]  ? usercopy_abort+0x6c/0x80
[ 2679.071974]  ? usercopy_abort+0x6c/0x80
[ 2679.072736]  __check_heap_object+0xe3/0x120
[ 2679.073578]  check_heap_object+0x185/0x1d0
[ 2679.074383]  __check_object_size.part.0+0x72/0x150
[ 2679.075277]  __check_object_size+0x23/0x30
[ 2679.076156]  readlink_copy+0x4c/0x80
[ 2679.076890]  vfs_readlink+0x66/0x130
[ 2679.077604]  do_readlinkat+0x117/0x140
[ 2679.078360]  __x64_sys_readlink+0x1e/0x30
[ 2679.079131]  do_syscall_64+0x58/0x90
[ 2679.079872]  ? do_syscall_64+0x67/0x90
[ 2679.082472]  ? exit_to_user_mode_loop+0xe0/0x130
[ 2679.083845]  ? exit_to_user_mode_prepare+0x30/0xb0
[ 2679.085051]  ? syscall_exit_to_user_mode+0x37/0x60
[ 2679.086478]  ? do_syscall_64+0x67/0x90
[ 2679.087469]  ? irqentry_exit+0x43/0x50
[ 2679.088402]  ? exc_page_fault+0x91/0x1b0
[ 2679.089401]  entry_SYSCALL_64_after_hwframe+0x73/0xdd
[ 2679.090492] RIP: 0033:0x7f312ef0d6fb
[ 2679.091373] Code: 73 01 c3 48 8b 0d 1d 87 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 86 0e 00 f7 d8
[ 2679.094802] RSP: 002b:00007fff9b2fa3e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000059
[ 2679.096526] RAX: ffffffffffffffda RBX: 00005602991d1df0 RCX: 00007f312ef0d6fb
[ 2679.098462] RDX: 0000000000000006 RSI: 00005602991df9f0 RDI: 00007fff9b2fa3f0
[ 2679.100395] RBP: 00007fff9b2fa8a0 R08: 00007f312eff6ce0 R09: 0000000000000040
[ 2679.101850] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fff9b2fa3f0
[ 2679.103302] R13: 00005602991d702b R14: 0000000000000006 R15: 00005602991df9f0
[ 2679.104912]  </TASK>
[ 2679.106150] Modules linked in: simplefs(OE) isofs kvm_intel ppdev binfmt_misc kvm nls_iso8859_1 irqbypass parport_pc parport input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd psmouse cryptd virtio_net net_failover failover floppy [last unloaded: simplefs(OE)]
[ 2679.116560] ---[ end trace 0000000000000000 ]---
[ 2679.130149] RIP: 0010:usercopy_abort+0x6c/0x80
[ 2679.131628] Code: 58 b7 51 48 c7 c2 e7 2e 5d b7 41 52 48 c7 c7 30 97 59 b7 48 0f 45 d6 48 c7 c6 56 ac 57 b7 48 89 c1 49 0f 45 f3 e8 34 80 d1 ff <0f> 0b 49 c7 c1 20 1b 5c b7 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[ 2679.136616] RSP: 0018:ffffb4a400e8fce8 EFLAGS: 00010246
[ 2679.138266] RAX: 0000000000000067 RBX: 0000000000000000 RCX: 0000000000000000
[ 2679.140115] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2679.141744] RBP: ffffb4a400e8fd00 R08: 0000000000000000 R09: 0000000000000000
[ 2679.143263] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 2679.144736] R13: ffff9d59851ec600 R14: 0000000000000001 R15: ffff9d5987099310
[ 2679.147121] FS:  00007f312f146800(0000) GS:ffff9d59be600000(0000) knlGS:0000000000000000
[ 2679.149410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2679.151342] CR2: 00005602991e2438 CR3: 0000000007094002 CR4: 0000000000370ef0
[ 2679.152949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2679.154446] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

check the call trace, we see that __x64_sys_readlink system call fail and trace the code

SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf, int, bufsiz)

it call do_readlinkat -> vfs_readlink -> readlink_copy -> copy_to_user fail,

How to fix

so we have 2 ways to fix it

  1. implement inode_operations hook - readlink to prevent call vfs_readlink
  2. fix the copy to user
    1. in simplefs_readlink function hook, alloc the kernel memery for tmp and copy to user, still using kernel copy_to_user function.
      ​​​​​​​​static int simplefs_vfs_readlink(struct dentry *dentry, 
      ​​​​​​​​                                char __user *buffer, 
      ​​​​​​​​                                int buflen)
      ​​​​​​​​{
      ​​​​​​​​    struct inode *inode = d_inode(dentry);
      ​​​​​​​​    char *link;
      ​​​​​​​​    int len = strlen(READ_ONCE(inode->i_link));
      ​​​​​​​​    link = kzalloc(len + 1, GFP_KERNEL);
      ​​​​​​​​    memcpy(link, READ_ONCE(inode->i_link), len);
      ​​​​​​​​    if (len > (unsigned) buflen)
      ​​​​​​​​        len = buflen;
      ​​​​​​​​    if (copy_to_user(buffer, link, len))
      ​​​​​​​​        len = -EFAULT;
      ​​​​​​​​    kfree(link);
      ​​​​​​​​    return len;
      ​​​​​​​​}
      
    2. fix cache problem
      ​​​​​​​​ int simplefs_init_inode_cache(void)
      ​​​​​​​​ {
      ​​​​​​​​-    simplefs_inode_cache = kmem_cache_create(
      ​​​​​​​​-    "simplefs_cache", sizeof(struct simplefs_inode_info), 0, 0, NULL);
      
      ​​​​​​​​+    simplefs_inode_cache = kmem_cache_create_usercopy(
      ​​​​​​​​+        "simplefs_cache", sizeof(struct simplefs_inode_info), 0, 0,
      ​​​​​​​​+        0,
      ​​​​​​​​+        sizeof(struct simplefs_inode_info),
      ​​​​​​​​+        NULL);
      ​​​​​​​​     if (!simplefs_inode_cache)
      ​​​​​​​​        return -ENOMEM;
      ​​​​​​​​     return 0;
      ​​​​​​​​ }
      
      by kernel doc.: kmem_cache_create() or kmem_cache_create_usercopy() before it can be used. The second function should be used if a part of the cache might be copied to the userspace

test result

Testing cmd: ln file hdlink...Success

Testing cmd: mkdir dir/dir...Success

Testing cmd: ln -s file symlink...Success

Testing cmd: ls -lR...Success

Testing cmd: mkdir len_of_name_of_this_dir_is_29...Success

Testing cmd: touch len_of_name_of_the_file_is_29...Success

Testing cmd: ln -s dir len_of_name_of_the_link_is_29...Success

Testing cmd: echo abc > file...Success

Testing cmd: dd if=/dev/zero of=file bs=1M count=12 status=none...dd: error writing 'file': File too large

Check if exist: drwxr-xr-x 3 dir...Success

Check if exist: -rw-r--r-- 2 file...Success

Check if exist: -rw-r--r-- 2 hdlink...Success

Check if exist: drwxr-xr-x 2 dir...Success

Check if exist: lrwxrwxrwx 1 symlink...Success

commit log

Fix segmentation fault about exe symbolic link and then run ls commad.

Description


When create a symbolic link and then execute 'ls' command,
this will make the segmentation fault in user space

Root cause


By checking kernel log, we can see that
"usercopy: Kernel memory exposure attempt detected from SLUB object 'simplefs_cache' (offset 0, size 5)!",
and by tracing kernel call, we also find call "readlink_copy" fail

In readlink_copy, this will copy data to user space,
and the message shows kernel memoery exposure.

Fix solution


By kernel document
https://docs.kernel.org/core-api/memory-allocation.html?highlight=kmem_cache_create

"kmem_cache_create() or kmem_cache_create_usercopy() before it can be used.
The second function should be used if a part of the cache might be copied to the userspace"

and readlink will copy the target name from simplefs inode link(inode->i_link) to the user space,
so we replace kmem_cache_create to kmem_cache_create_usercopy

How has this been tested:


run make check

Testing cmd: ln file hdlinkSuccess
Testing cmd: mkdir dir/dirSuccess
Testing cmd: ln -s file symlinkSuccess
Testing cmd: ls -lRSuccess
Testing cmd: mkdir len_of_name_of_this_dir_is_29Success
Testing cmd: touch len_of_name_of_the_file_is_29Success
Testing cmd: ln -s dir len_of_name_of_the_link_is_29Success
Testing cmd: echo abc > fileSuccess
Testing cmd: dd if=/dev/zero of=file bs=1M count=12 status=nonedd: error writing 'file': File too large
Check if exist: drwxr-xr-x 3 dirSuccess
Check if exist: -rw-rr 2 fileSuccess
Check if exist: -rw-rr 2 hdlinkSuccess
Check if exist: drwxr-xr-x 2 dirSuccess
Check if exist: lrwxrwxrwx 1 symlinkSuccess

Close #30