# Cache leak in simplefs [Repo.](https://github.com/sysprog21/simplefs) ## Auxiliary Tool [kmodleak](https://github.com/tzussman/kmodleak) : eBpf for slab memoroy tracing ## leak tracing ### ls/ mv cmd leaking [commit](https://github.com/sysprog21/simplefs/commit/5afbafc6872e31a9244efbc9deacc648bc132870) ``` 0 [<ffffffffa5d2f98e>] __alloc_pages+0x24e 1 [<ffffffffa5d2f98e>] __alloc_pages+0x24e 2 [<ffffffffa5d51d3e>] alloc_pages+0x9e 3 [<ffffffffa5cbd4de>] __page_cache_alloc+0x7e 4 [<ffffffffa5cc1342>] pagecache_get_page+0x152 5 [<ffffffffa5dedec8>] grow_dev_page+0x48 6 [<ffffffffa5deebac>] __getblk_gfp+0xbc 7 [<ffffffffa5deed01>] __bread_gfp+0x11 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- seems __bread() leak. 8 [<ffffffffc098427b>] ftrace_trampoline+0x427b 9 [<ffffffffc09845a7>] ftrace_trampoline+0x45a7 10 [<ffffffffa5dafaed>] vfs_mkdir+0xad 11 [<ffffffffa5db3368>] do_mkdirat+0x128 12 [<ffffffffa5db351c>] __x64_sys_mkdir+0x4c 13 [<ffffffffa5a04d64>] x64_sys_call+0x94 14 [<ffffffffa67c2ce6>] do_syscall_64+0x56 15 [<ffffffffa68000df>] entry_SYSCALL_64_after_hwframe+0x67 ``` by __bread() which is called from ==sb_bread()== ==ls== use simplefs_iterate() to list file ```diff - if (dblock->files[0].inode == 0) + if (dblock->files[0].inode == 0) { + brelse(bh2); + bh2 = NULL; break; - + } ``` ==mv== use simplefs_rename() to rename the file ```diff - if (new_pos < 0) - brelse(bh2); + + brelse(bh2); ``` ------ ### umount file system make leak. [commit](https://github.com/sysprog21/simplefs/commit/cc21eeeab47ec0438eecf5a6c5b8816c810ef943) #### symptom ``` 16 bytes in 1 allocations from stack addr = 0xffff9b5ac6799e00 size = 16 0 [<ffffffffb5f61dcf>] __kmalloc_track_caller+0x1ef 1 [<ffffffffb5f61dcf>] __kmalloc_track_caller+0x1ef 2 [<ffffffffb5ee6212>] kstrdup+0x32 3 [<ffffffffb5ee6278>] kstrdup_const+0x28 4 [<ffffffffb5ef3336>] kmem_cache_create_usercopy+0xd6 5 [<ffffffffc09622cc>] ftrace_trampoline+0x22cc 6 [<ffffffffc096a010>] ftrace_trampoline+0xa010 7 [<ffffffffb5c03a06>] do_one_initcall+0x46 8 [<ffffffffb5d9edc2>] do_init_module+0x52 9 [<ffffffffb5da07f5>] load_module+0xb45 10 [<ffffffffb5c955b0>] __kretprobe_trampoline+0x0 11 [<ffffffffb5da0bd8>] __x64_sys_finit_module+0x18 12 [<ffffffffb5c06793>] x64_sys_call+0x1ac3 13 [<ffffffffb69c63e6>] do_syscall_64+0x56 14 [<ffffffffb6a00124>] entry_SYSCALL_64_after_hwframe+0x6c 16 bytes in 1 allocations from stack addr = 0xffff9b5ac6799440 size = 16 0 [<ffffffffb5f61dcf>] __kmalloc_track_caller+0x1ef 1 [<ffffffffb5f61dcf>] __kmalloc_track_caller+0x1ef 2 [<ffffffffb5ee6212>] kstrdup+0x32 3 [<ffffffffb5ee6278>] kstrdup_const+0x28 4 [<ffffffffb625304d>] kvasprintf_const+0x5d 5 [<ffffffffb62a5ab3>] kobject_set_name_vargs+0x23 6 [<ffffffffb62a5ddd>] kobject_init_and_add+0x5d 7 [<ffffffffb5f627cc>] sysfs_slab_add+0x18c 8 [<ffffffffb5f64cbc>] __kmem_cache_create+0x3c 9 [<ffffffffb5ef33c7>] kmem_cache_create_usercopy+0x167 10 [<ffffffffc09622cc>] ftrace_trampoline+0x22cc 11 [<ffffffffc096a010>] ftrace_trampoline+0xa010 12 [<ffffffffb5c03a06>] do_one_initcall+0x46 13 [<ffffffffb5d9edc2>] do_init_module+0x52 14 [<ffffffffb5da07f5>] load_module+0xb45 15 [<ffffffffb5c955b0>] __kretprobe_trampoline+0x0 16 [<ffffffffb5da0bd8>] __x64_sys_finit_module+0x18 17 [<ffffffffb5c06793>] x64_sys_call+0x1ac3 18 [<ffffffffb69c63e6>] do_syscall_64+0x56 19 [<ffffffffb6a00124>] entry_SYSCALL_64_after_hwframe+0x6c ``` #### Anslyize: It seems that when the module unload, the kmem_cache still remains in the slab cache system. By [rcu_barrier document](https://docs.kernel.org/RCU/rcubarrier.html) :::info Pseudo-code using rcu_barrier() is as follows: 1. Prevent any new RCU callbacks from being posted. 2. Execute rcu_barrier(). 3. Allow the module to be unloaded. ::: But why do we must use rcu_barrier at the umount function? 1. at ***Paul E. McKenney*** [post](https://paulmck.livejournal.com/7314.html): :::info Unless there is some other mechanism to ensure that all the RCU callbacks have been invoked before the module exit, there needs to be code in the module-exit function that does the following: 1. Prevents any new RCU callbacks from being posted. In other words, make sure that no future call_rcu() invocations happen from this module unless those call_rcu() invocations touch only functions and data that outlive this module. 2. Invokes rcu_barrier(). 3. Of course, if the module uses call_rcu_sched() instead of call_rcu(), then it should invoke rcu_barrier_sched() instead of rcu_barrier(). Similarly, if it uses call_rcu_bh() instead of call_rcu(), then it should invoke rcu_barrier_bh() instead of rcu_barrier(). If the module uses more than one of call_rcu(), call_rcu_sched(), and call_rcu_bh(), then it must invoke more than one of rcu_barrier(), rcu_barrier_sched(), and rcu_barrier_bh(). ::: 2. At [slab: remove synchronous rcu_barrier() call in memcg cache release path](https://github.com/torvalds/linux/commit/657dc2f9722092e951de95a8109428994541440b) commit, we know that slab caches needs RCU to help destroy it :::info SLAB_DESTORY_BY_RCU caches need to flush all RCU operations before destruction because slab pages are freed through RCU ::: 3. The dentry struct use d_rcu for share memory [dcache.h](https://elixir.bootlin.com/linux/v6.10/source/include/linux/dcache.h#L112) ```c struct dentry { <skip...> /* * d_alias and d_rcu can share memory */ union { struct hlist_node d_alias; /* inode alias list */ struct hlist_bl_node d_in_lookup_hash; /* only for in-lookup ones */ struct rcu_head d_rcu; } d_u; <skip...> } __randomize_layout; ``` :::warning Before destroying the keme cache, we should ensure that the cache is freed and not used by others. ::: So we add rcu_barrier() before destory the kmem cache ```diff /* De-allocate the inode cache */ void simplefs_destroy_inode_cache(void) { + rcu_barrier(); kmem_cache_destroy(simplefs_inode_cache); } ``` #### Still remain leaking ``` 1 stacks with outstanding allocations: 192 bytes in 1 allocations from stack addr = 0xffff9b5b19eac300 size = 192 0 [<ffffffffb5f610ff>] kmem_cache_alloc+0x26f 1 [<ffffffffb5f610ff>] kmem_cache_alloc+0x26f 2 [<ffffffffb5fbcd57>] __d_alloc+0x27 3 [<ffffffffb5fbd67a>] d_alloc+0x1a 4 [<ffffffffb5fc0694>] d_alloc_parallel+0x54 5 [<ffffffffb5fae23f>] __lookup_slow+0x5f 6 [<ffffffffb5faf174>] lookup_one_unlocked+0x84 7 [<ffffffffb5faf1ed>] lookup_positive_unlocked+0x1d 8 [<ffffffffb611db8a>] debugfs_lookup+0x5a 9 [<ffffffffb5f64d59>] debugfs_slab_release+0x19 10 [<ffffffffb5ef3705>] kmem_cache_destroy+0xe5 11 [<ffffffffc09622fa>] ftrace_trampoline+0x22fa 12 [<ffffffffc0965a22>] ftrace_trampoline+0x5a22 13 [<ffffffffb5d9ec24>] __do_sys_delete_module.constprop.0+0x184 14 [<ffffffffb5d9ed62>] __x64_sys_delete_module+0x12 15 [<ffffffffb5c0639f>] x64_sys_call+0x16cf 16 [<ffffffffb69c63e6>] do_syscall_64+0x56 17 [<ffffffffb6a00124>] entry_SYSCALL_64_after_hwframe+0x6c done ``` After fix, leak still occurs when module is unloaded. We can see the eBpf tracing flow, the ==kmem_cache_destroy== has the kmem_cache_alloc in the function. So we need to add rcu_barrier() after ==simplefs_destroy_inode_cache()== function. ```diff @@ -58,6 +75,7 @@ static int __init simplefs_init(void) err_inode: simplefs_destroy_inode_cache(); + rcu_barrier(); err: return ret; } @@ -69,6 +87,7 @@ static void __exit simplefs_exit(void) pr_err("Failed to unregister file system\n"); simplefs_destroy_inode_cache(); + rcu_barrier(); pr_info("module unloaded\n"); } ``` #### reference [Rcu Doc.](https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html) [Linux2.6.23 :sleepable RCU的实现](http://www.wowotech.net/kernel_synchronization/linux2-6-23-RCU.html) [memcg: zap memcg_slab_caches and memcg_slab_mutex](https://github.com/torvalds/linux/commit/d5b3cf7139b8770af4ed8bb36a1ab9d290ac39e9) [add rcu_barrier() synchronization point](https://tuxist.de/git/jan.koester/linux/-/commit/ab4720ec76b756e1f8705e207a7b392b0453afd6) [rcu and super block example](https://kukuruku.co/post/teaching-the-file-system-to-read/) kmem destory issue thread list [mm, slab: asynchronously destroy caches with outstanding objects](https://www.spinics.net/lists/rcu/msg16042.html) [mm, slub: handle pending kfree_rcu() in kmem_cache_destroy()](https://www.spinics.net/lists/kernel/msg5290371.html)