### [Peilin Ye's blog](https://hackmd.io/@ypl/Sk8YAobw9)
# Understanding "invisible" `/proc/[tid]` subdirectories
> **Linux:** 5.18-rc5, commit [1728c0567f70](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53ad228682899689d8a3a0f91e399febe88a1db3) ("net: phy: smsc: add LAN8742 phy support.")
```
ypl@home:~$ cat /proc/$$/stat | cut -d' ' -f2
(bash)
ypl@home:~$ cat /proc/self/stat | cut -d' ' -f2
(cat)
ypl@home:~$ cut -d' ' -f2 < /proc/self/stat
(cut)
```
This post briefly explains why `ls` doesn't show `/proc/[tid]` subdirectories for child threads.
## procfs
According to [man proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html):
> The proc filesystem is a pseudo-filesystem which provides an interface to kernel data structures. It is commonly mounted at `/proc`.
```
ypl@home:~$ ls /proc
1 177 34 45 58 686 acpi irq net
10 18 35 450 59 69 buddyinfo kallsyms pagetypeinfo
11 183 36 46 60 690 bus kcore partitions
114 184 37 466 61 7 cgroups key-users schedstat
116 19 38 469 615 70 cmdline keys self
...
```
See those numbers? Each of these so-called `[pid]` subdirectories corresponds to a process, or a thread group leader (TGL). However, `ls /proc` doesn't show `[tid]` subdirectories. For example, imagine an application with 2 threads:
```
ypl@home:~$ ls /proc/662/task
662 663
```
Here, `662` is the thread group leader, and `663` is a child thread. `ls /proc` only shows `662`:
```
ypl@home:~$ ls /proc | grep 662
662
```
The `663` subdirectory is not shown, but somehow you can `cd` into it:
```
ypl@home:~$ ls /proc | grep 663
ypl@home:~$ cd /proc/663
ypl@home:/proc/663$ ls
arch_status environ mountinfo personality statm
attr exe mounts projid_map status
autogroup fd mountstats root syscall
...
```
It's there, just "invisible" to `ls`, as also documented in [man proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html):
> The `/proc/[tid]` subdirectories are not visible when iterating through `/proc` with getdents(2) (and thus are not visible when one uses ls(1) to view the contents of `/proc`).
I found this behavior very interesting. How is it implemented?
## TL;DR
> (disclaimer: for recreational purposes only! :-)
Apply this to your kernel:
```diff
diff --git a/fs/proc/base.c b/fs/proc/base.c
index c1031843cc6a..579ee323b797 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3420,7 +3420,7 @@ static struct tgid_iter next_tgid(struct pid_namespace *ns, struct tgid_iter ite
pid = find_ge_pid(iter.tgid, ns);
if (pid) {
iter.tgid = pid_nr_ns(pid, ns);
- iter.task = pid_task(pid, PIDTYPE_TGID);
+ iter.task = pid_task(pid, PIDTYPE_PID);
if (!iter.task) {
iter.tgid += 1;
goto retry;
```
Now `ls /proc` shows both `[pid]` and `[tid]` directories. Yay!
```
ypl@home:~$ ls /proc/662/task
662 663
ypl@home:~$ ls /proc | grep 662
662
ypl@home:~$ ls /proc | grep 663
663
```
It's probably gonna break a lot of stuff based on procfs though...
## Walk-through
(My) `ls` uses the [getdents64(2)](https://man7.org/linux/man-pages/man2/getdents64.2.html) system call to read directory entries from `/proc`:
```
stat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
openat(AT_FDCWD, "/proc", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents64(3, /* 184 entries */, 32768) = 4808
getdents64(3, /* 0 entries */, 32768) = 0
close(3) = 0
```
[getdents64(2)](https://man7.org/linux/man-pages/man2/getdents64.2.html) is defined in `fs/readdir.c`:
```c
SYSCALL_DEFINE3(getdents64, unsigned int, fd,
struct linux_dirent64 __user *, dirent, unsigned int, count)
{
struct fd f;
struct getdents_callback64 buf = {
.ctx.actor = filldir64,
.count = count,
.current_dir = dirent
};
int error;
f = fdget_pos(fd);
if (!f.file)
return -EBADF;
error = iterate_dir(f.file, &buf.ctx);
...
```
It calls `iterate_dir()`, which first checks if `/proc` is actually a directory:
```c
int iterate_dir(struct file *file, struct dir_context *ctx)
{
struct inode *inode = file_inode(file);
bool shared = false;
int res = -ENOTDIR;
if (file->f_op->iterate_shared)
shared = true;
else if (!file->f_op->iterate)
goto out;
...
```
If neither `.iterate_shared` nor `.iterate` is implemented, `iterate_dir()` returns `-ENOTDIR`. In our case though, it then calls `/proc`'s own `.iterate_shared` implementation, `proc_root_readdir()`:
```c
static int proc_root_readdir(struct file *file, struct dir_context *ctx)
{
if (ctx->pos < FIRST_PROCESS_ENTRY) {
int error = proc_readdir(file, ctx);
if (unlikely(error <= 0))
return error;
ctx->pos = FIRST_PROCESS_ENTRY;
}
return proc_pid_readdir(file, ctx);
}
```
Here, `proc_pid_readdir()` uses `next_tgid()` to take care of those `[pid]` subdirectories in a loop:
```c
...
for (iter = next_tgid(ns, iter);
iter.task;
iter.tgid += 1, iter = next_tgid(ns, iter)) {
char name[10 + 1];
unsigned int len;
cond_resched();
if (!has_pid_permissions(fs_info, iter.task, HIDEPID_INVISIBLE))
continue;
len = snprintf(name, sizeof(name), "%u", iter.tgid);
ctx->pos = iter.tgid + TGID_OFFSET;
if (!proc_fill_cache(file, ctx, name, len,
proc_pid_instantiate, iter.task, NULL)) {
put_task_struct(iter.task);
return 0;
}
}
...
```
Yep! This is where our TL;DR diff comes into play. Take another look at `next_gid()`:
```c
...
retry:
iter.task = NULL;
pid = find_ge_pid(iter.tgid, ns);
if (pid) {
iter.tgid = pid_nr_ns(pid, ns);
iter.task = pid_task(pid, PIDTYPE_TGID);
if (!iter.task) {
iter.tgid += 1;
goto retry;
...
```
It skips `pid` if it's not a `PIDTYPE_TGID` (thread group ID). In other words, `proc_pid_readdir()` only reports thread group leaders. This is exactly why `ls /proc` doesn't show `[tid]` subdirectories!
## Appendix A: Call Tree
```
fs/readdir.c:SYSCALL_DEFINE3(getdents64)
:iterate_dir() /* file->f_op->iterate_shared() */
fs/proc/root.c:proc_root_readdir()
fs/proc/base.c:proc_pid_readdir()
:next_tgid()
```