### [Peilin Ye's blog](https://hackmd.io/@ypl/Sk8YAobw9) # Understanding "invisible" `/proc/[tid]` subdirectories > **Linux:** 5.18-rc5, commit [1728c0567f70](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53ad228682899689d8a3a0f91e399febe88a1db3) ("net: phy: smsc: add LAN8742 phy support.") ``` ypl@home:~$ cat /proc/$$/stat | cut -d' ' -f2 (bash) ypl@home:~$ cat /proc/self/stat | cut -d' ' -f2 (cat) ypl@home:~$ cut -d' ' -f2 < /proc/self/stat (cut) ``` This post briefly explains why `ls` doesn't show `/proc/[tid]` subdirectories for child threads. ## procfs According to [man proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html): > The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. It is commonly mounted at `/proc`. ``` ypl@home:~$ ls /proc 1 177 34 45 58 686 acpi irq net 10 18 35 450 59 69 buddyinfo kallsyms pagetypeinfo 11 183 36 46 60 690 bus kcore partitions 114 184 37 466 61 7 cgroups key-users schedstat 116 19 38 469 615 70 cmdline keys self ... ``` See those numbers? Each of these so-called `[pid]` subdirectories corresponds to a process, or a thread group leader (TGL). However, `ls /proc` doesn't show `[tid]` subdirectories. For example, imagine an application with 2 threads: ``` ypl@home:~$ ls /proc/662/task 662 663 ``` Here, `662` is the thread group leader, and `663` is a child thread. `ls /proc` only shows `662`: ``` ypl@home:~$ ls /proc | grep 662 662 ``` The `663` subdirectory is not shown, but somehow you can `cd` into it: ``` ypl@home:~$ ls /proc | grep 663 ypl@home:~$ cd /proc/663 ypl@home:/proc/663$ ls arch_status environ mountinfo personality statm attr exe mounts projid_map status autogroup fd mountstats root syscall ... ``` It's there, just "invisible" to `ls`, as also documented in [man proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html): > The `/proc/[tid]` subdirectories are not visible when iterating through `/proc` with getdents(2) (and thus are not visible when one uses ls(1) to view the contents of `/proc`). I found this behavior very interesting. How is it implemented? ## TL;DR > (disclaimer: for recreational purposes only! :-) Apply this to your kernel: ```diff diff --git a/fs/proc/base.c b/fs/proc/base.c index c1031843cc6a..579ee323b797 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3420,7 +3420,7 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite pid = find_ge_pid(iter.tgid, ns); if (pid) { iter.tgid = pid_nr_ns(pid, ns); - iter.task = pid_task(pid, PIDTYPE_TGID); + iter.task = pid_task(pid, PIDTYPE_PID); if (!iter.task) { iter.tgid += 1; goto retry; ``` Now `ls /proc` shows both `[pid]` and `[tid]` directories. Yay! ``` ypl@home:~$ ls /proc/662/task 662 663 ypl@home:~$ ls /proc | grep 662 662 ypl@home:~$ ls /proc | grep 663 663 ``` It's probably gonna break a lot of stuff based on procfs though... ## Walk-through (My) `ls` uses the [getdents64(2)](https://man7.org/linux/man-pages/man2/getdents64.2.html) system call to read directory entries from `/proc`: ``` stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3 fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 getdents64(3, /* 184 entries */, 32768) = 4808 getdents64(3, /* 0 entries */, 32768) = 0 close(3) = 0 ``` [getdents64(2)](https://man7.org/linux/man-pages/man2/getdents64.2.html) is defined in `fs/readdir.c`: ```c SYSCALL_DEFINE3(getdents64, unsigned int, fd, struct linux_dirent64 __user *, dirent, unsigned int, count) { struct fd f; struct getdents_callback64 buf = { .ctx.actor = filldir64, .count = count, .current_dir = dirent }; int error; f = fdget_pos(fd); if (!f.file) return -EBADF; error = iterate_dir(f.file, &buf.ctx); ... ``` It calls `iterate_dir()`, which first checks if `/proc` is actually a directory: ```c int iterate_dir(struct file *file, struct dir_context *ctx) { struct inode *inode = file_inode(file); bool shared = false; int res = -ENOTDIR; if (file->f_op->iterate_shared) shared = true; else if (!file->f_op->iterate) goto out; ... ``` If neither `.iterate_shared` nor `.iterate` is implemented, `iterate_dir()` returns `-ENOTDIR`. In our case though, it then calls `/proc`'s own `.iterate_shared` implementation, `proc_root_readdir()`: ```c static int proc_root_readdir(struct file *file, struct dir_context *ctx) { if (ctx->pos < FIRST_PROCESS_ENTRY) { int error = proc_readdir(file, ctx); if (unlikely(error <= 0)) return error; ctx->pos = FIRST_PROCESS_ENTRY; } return proc_pid_readdir(file, ctx); } ``` Here, `proc_pid_readdir()` uses `next_tgid()` to take care of those `[pid]` subdirectories in a loop: ```c ... for (iter = next_tgid(ns, iter); iter.task; iter.tgid += 1, iter = next_tgid(ns, iter)) { char name[10 + 1]; unsigned int len; cond_resched(); if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE)) continue; len = snprintf(name, sizeof(name), "%u", iter.tgid); ctx->pos = iter.tgid + TGID_OFFSET; if (!proc_fill_cache(file, ctx, name, len, proc_pid_instantiate, iter.task, NULL)) { put_task_struct(iter.task); return 0; } } ... ``` Yep! This is where our TL;DR diff comes into play. Take another look at `next_gid()`: ```c ... retry: iter.task = NULL; pid = find_ge_pid(iter.tgid, ns); if (pid) { iter.tgid = pid_nr_ns(pid, ns); iter.task = pid_task(pid, PIDTYPE_TGID); if (!iter.task) { iter.tgid += 1; goto retry; ... ``` It skips `pid` if it's not a `PIDTYPE_TGID` (thread group ID). In other words, `proc_pid_readdir()` only reports thread group leaders. This is exactly why `ls /proc` doesn't show `[tid]` subdirectories! ## Appendix A: Call Tree ``` fs/readdir.c:SYSCALL_DEFINE3(getdents64) :iterate_dir() /* file->f_op->iterate_shared() */ fs/proc/root.c:proc_root_readdir() fs/proc/base.c:proc_pid_readdir() :next_tgid() ```