# simplefs Kernel panic bug problem
source code repo [simplefs](https://github.com/sysprog21/simplefs)
## ls make kernel panic
By execute the kernel module and use =="ls"== to show the list, we will get the kernel panic
```
[ 2679.036002] usercopy: Kernel memory exposure attempt detected from SLUB object 'simplefs_cache' (offset 0, size 5)!
[ 2679.038088] ------------[ cut here ]------------
[ 2679.038091] kernel BUG at mm/usercopy.c:102!
[ 2679.039056] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 2679.040065] CPU: 0 PID: 167958 Comm: ls Tainted: G B W OE 6.2.0-39-generic #40-Ubuntu
[ 2679.041651] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 2679.043180] RIP: 0010:usercopy_abort+0x6c/0x80
[ 2679.044119] Code: 58 b7 51 48 c7 c2 e7 2e 5d b7 41 52 48 c7 c7 30 97 59 b7 48 0f 45 d6 48 c7 c6 56 ac 57 b7 48 89 c1 49 0f 45 f3 e8 34 80 d1 ff <0f> 0b 49 c7 c1 20 1b 5c b7 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[ 2679.048676] RSP: 0018:ffffb4a400e8fce8 EFLAGS: 00010246
[ 2679.049847] RAX: 0000000000000067 RBX: 0000000000000000 RCX: 0000000000000000
[ 2679.051319] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2679.052860] RBP: ffffb4a400e8fd00 R08: 0000000000000000 R09: 0000000000000000
[ 2679.054378] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 2679.055847] R13: ffff9d59851ec600 R14: 0000000000000001 R15: ffff9d5987099310
[ 2679.057173] FS: 00007f312f146800(0000) GS:ffff9d59be600000(0000) knlGS:0000000000000000
[ 2679.058554] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2679.059634] CR2: 00005602991e2438 CR3: 0000000007094002 CR4: 0000000000370ef0
[ 2679.060871] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2679.062135] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2679.063375] Call Trace:
[ 2679.064178] <TASK>
[ 2679.064762] ? show_regs+0x6d/0x80
[ 2679.065773] ? die+0x37/0xa0
[ 2679.066630] ? do_trap+0xd4/0xf0
[ 2679.067410] ? do_error_trap+0x71/0xb0
[ 2679.068150] ? usercopy_abort+0x6c/0x80
[ 2679.068893] ? exc_invalid_op+0x52/0x80
[ 2679.069649] ? usercopy_abort+0x6c/0x80
[ 2679.070419] ? asm_exc_invalid_op+0x1b/0x20
[ 2679.071225] ? usercopy_abort+0x6c/0x80
[ 2679.071974] ? usercopy_abort+0x6c/0x80
[ 2679.072736] __check_heap_object+0xe3/0x120
[ 2679.073578] check_heap_object+0x185/0x1d0
[ 2679.074383] __check_object_size.part.0+0x72/0x150
[ 2679.075277] __check_object_size+0x23/0x30
[ 2679.076156] readlink_copy+0x4c/0x80
[ 2679.076890] vfs_readlink+0x66/0x130
[ 2679.077604] do_readlinkat+0x117/0x140
[ 2679.078360] __x64_sys_readlink+0x1e/0x30
[ 2679.079131] do_syscall_64+0x58/0x90
[ 2679.079872] ? do_syscall_64+0x67/0x90
[ 2679.082472] ? exit_to_user_mode_loop+0xe0/0x130
[ 2679.083845] ? exit_to_user_mode_prepare+0x30/0xb0
[ 2679.085051] ? syscall_exit_to_user_mode+0x37/0x60
[ 2679.086478] ? do_syscall_64+0x67/0x90
[ 2679.087469] ? irqentry_exit+0x43/0x50
[ 2679.088402] ? exc_page_fault+0x91/0x1b0
[ 2679.089401] entry_SYSCALL_64_after_hwframe+0x73/0xdd
[ 2679.090492] RIP: 0033:0x7f312ef0d6fb
[ 2679.091373] Code: 73 01 c3 48 8b 0d 1d 87 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 59 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 e9 86 0e 00 f7 d8
[ 2679.094802] RSP: 002b:00007fff9b2fa3e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000059
[ 2679.096526] RAX: ffffffffffffffda RBX: 00005602991d1df0 RCX: 00007f312ef0d6fb
[ 2679.098462] RDX: 0000000000000006 RSI: 00005602991df9f0 RDI: 00007fff9b2fa3f0
[ 2679.100395] RBP: 00007fff9b2fa8a0 R08: 00007f312eff6ce0 R09: 0000000000000040
[ 2679.101850] R10: 0000000000000000 R11: 0000000000000206 R12: 00007fff9b2fa3f0
[ 2679.103302] R13: 00005602991d702b R14: 0000000000000006 R15: 00005602991df9f0
[ 2679.104912] </TASK>
[ 2679.106150] Modules linked in: simplefs(OE) isofs kvm_intel ppdev binfmt_misc kvm nls_iso8859_1 irqbypass parport_pc parport input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd psmouse cryptd virtio_net net_failover failover floppy [last unloaded: simplefs(OE)]
[ 2679.116560] ---[ end trace 0000000000000000 ]---
[ 2679.130149] RIP: 0010:usercopy_abort+0x6c/0x80
[ 2679.131628] Code: 58 b7 51 48 c7 c2 e7 2e 5d b7 41 52 48 c7 c7 30 97 59 b7 48 0f 45 d6 48 c7 c6 56 ac 57 b7 48 89 c1 49 0f 45 f3 e8 34 80 d1 ff <0f> 0b 49 c7 c1 20 1b 5c b7 4d 89 ca 4d 89 c8 eb a8 0f 1f 00 90 90
[ 2679.136616] RSP: 0018:ffffb4a400e8fce8 EFLAGS: 00010246
[ 2679.138266] RAX: 0000000000000067 RBX: 0000000000000000 RCX: 0000000000000000
[ 2679.140115] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2679.141744] RBP: ffffb4a400e8fd00 R08: 0000000000000000 R09: 0000000000000000
[ 2679.143263] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000005
[ 2679.144736] R13: ffff9d59851ec600 R14: 0000000000000001 R15: ffff9d5987099310
[ 2679.147121] FS: 00007f312f146800(0000) GS:ffff9d59be600000(0000) knlGS:0000000000000000
[ 2679.149410] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2679.151342] CR2: 00005602991e2438 CR3: 0000000007094002 CR4: 0000000000370ef0
[ 2679.152949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2679.154446] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
```
check the call trace, we see that __x64_sys_readlink system call fail and trace the code
[SYSCALL_DEFINE3(readlink, const char __user *, path, char __user *, buf, int, bufsiz)](https://elixir.bootlin.com/linux/v6.2/source/fs/stat.c#L501)
it call [do_readlinkat](https://elixir.bootlin.com/linux/v6.2/source/fs/stat.c#L459) -> [vfs_readlink](https://elixir.bootlin.com/linux/v6.2/source/fs/namei.c#L5005) -> [readlink_copy](https://elixir.bootlin.com/linux/v6.2/source/fs/namei.c#L4980) -> [copy_to_user](https://elixir.bootlin.com/linux/v6.2/source/include/linux/uaccess.h#L166) fail,
## How to fix
so we have 2 ways to fix it
1. implement inode_operations hook - readlink to prevent call ==vfs_readlink==
2. fix the copy to user
1. in simplefs_readlink function hook, alloc the kernel memery for tmp and copy to user, still using kernel ==copy_to_user== function.
```c
static int simplefs_vfs_readlink(struct dentry *dentry,
char __user *buffer,
int buflen)
{
struct inode *inode = d_inode(dentry);
char *link;
int len = strlen(READ_ONCE(inode->i_link));
link = kzalloc(len + 1, GFP_KERNEL);
memcpy(link, READ_ONCE(inode->i_link), len);
if (len > (unsigned) buflen)
len = buflen;
if (copy_to_user(buffer, link, len))
len = -EFAULT;
kfree(link);
return len;
}
```
2. fix cache problem
```diff
int simplefs_init_inode_cache(void)
{
- simplefs_inode_cache = kmem_cache_create(
- "simplefs_cache", sizeof(struct simplefs_inode_info), 0, 0, NULL);
+ simplefs_inode_cache = kmem_cache_create_usercopy(
+ "simplefs_cache", sizeof(struct simplefs_inode_info), 0, 0,
+ 0,
+ sizeof(struct simplefs_inode_info),
+ NULL);
if (!simplefs_inode_cache)
return -ENOMEM;
return 0;
}
```
[by kernel doc.](https://docs.kernel.org/core-api/memory-allocation.html?highlight=kmem_cache_create): ==kmem_cache_create() or kmem_cache_create_usercopy() before it can be used. The second function should be used if a part of the cache might be copied to the userspace==
## test result
```
Testing cmd: ln file hdlink...Success
Testing cmd: mkdir dir/dir...Success
Testing cmd: ln -s file symlink...Success
Testing cmd: ls -lR...Success
Testing cmd: mkdir len_of_name_of_this_dir_is_29...Success
Testing cmd: touch len_of_name_of_the_file_is_29...Success
Testing cmd: ln -s dir len_of_name_of_the_link_is_29...Success
Testing cmd: echo abc > file...Success
Testing cmd: dd if=/dev/zero of=file bs=1M count=12 status=none...dd: error writing 'file': File too large
Check if exist: drwxr-xr-x 3 dir...Success
Check if exist: -rw-r--r-- 2 file...Success
Check if exist: -rw-r--r-- 2 hdlink...Success
Check if exist: drwxr-xr-x 2 dir...Success
Check if exist: lrwxrwxrwx 1 symlink...Success
```
## commit log
Fix segmentation fault about exe symbolic link and then run ls commad.
### Description
-----------
When create a symbolic link and then execute 'ls' command,
this will make the segmentation fault in user space
### Root cause
----------
By checking kernel log, we can see that
"usercopy: Kernel memory exposure attempt detected from SLUB object 'simplefs_cache' (offset 0, size 5)!",
and by tracing kernel call, we also find call "readlink_copy" fail
In readlink_copy, this will copy data to user space,
and the message shows kernel memoery exposure.
### Fix solution
----------
By kernel document
https://docs.kernel.org/core-api/memory-allocation.html?highlight=kmem_cache_create
"kmem_cache_create() or kmem_cache_create_usercopy() before it can be used.
The second function should be used if a part of the cache might be copied to the userspace"
and readlink will copy the target name from simplefs inode link(inode->i_link) to the user space,
so we replace kmem_cache_create to kmem_cache_create_usercopy
### How has this been tested:
----------
run make check
Testing cmd: ln file hdlink...Success
Testing cmd: mkdir dir/dir...Success
Testing cmd: ln -s file symlink...Success
Testing cmd: ls -lR...Success
Testing cmd: mkdir len_of_name_of_this_dir_is_29...Success
Testing cmd: touch len_of_name_of_the_file_is_29...Success
Testing cmd: ln -s dir len_of_name_of_the_link_is_29...Success
Testing cmd: echo abc > file...Success
Testing cmd: dd if=/dev/zero of=file bs=1M count=12 status=none...dd: error writing 'file': File too large
Check if exist: drwxr-xr-x 3 dir...Success
Check if exist: -rw-r--r-- 2 file...Success
Check if exist: -rw-r--r-- 2 hdlink...Success
Check if exist: drwxr-xr-x 2 dir...Success
Check if exist: lrwxrwxrwx 1 symlink...Success
Close #30