# Working session 2021-08-09 ## Interacting with the containerd socket The containerd runtime exposes a gRPC endpoint in the form of a unix socket at `/run/containerd/containerd.sock`. The containerd repository has a utility called `ctr` which can be used to interact with the containerd engine. Building the client is simple ```bash= git clone https://github.com/containerd/containerd cd containerd go build -o ctr cmd/ctr/main.go ``` This binary can be packaged with the container in the pod, mount the socket on the container and we are good to go. ## Mounting the host It seems that GKE is allowing to mount `/var` and even `/` on the container filesystem. This will allow us to read the whole filesystem of the host. We were able to see contents and interesting things we found * the whole `/etc` folder is visible * the `/proc` filesystem of the host is visible although the container isn't sharing the host pid namespace. * the `/proc/sys` is also available. we were able to modify the `/proc/sys/kernel/randomize_va_space` to disable ASLR. * since `/etc` was also open, we were able to persist it using `/etc/sys/01-aslr.conf` and adding the following line ``` kernel.randomize_va_space = 0 ``` this is not guaranteed to work because we haven't tried to actually exploit it. * we were able to read the `/home` files for different users. This had files from the `kubernetes` user which contained some information in the `/home/kubernetes/kube-env` file which seemed some bootstrapping config. * we were able to add a `/etc/cron.d` folder, although we were not able to execute a cron on the host, it could be a syntax error. * we found that the host was possibly running a gentoo build. ``` root@osquery-deployment-88c6cbc94-bpjvv:/app/run/etc# cat env.d/00basic # /etc/env.d/00basic # Do not edit this file PATH="/opt/bin" ROOTPATH="/opt/bin" MANPATH="/usr/local/share/man:/usr/share/man" INFOPATH="/usr/share/info" CONFIG_PROTECT_MASK="/etc/gentoo-release" LDPATH='/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib' ``` ## Checkpointing We were able to perform a checkpoint using the `ctr` utility and using the containerd socket. ```bash= export CONTAINERD_ADDRESS=/app/run/containerd/containerd.sock ``` The above config is done since we are mounting the host file system on a different root path. `ctr` by default looks at the `/run/containerd/containerd.sock` path if not supplied. It can also be done manually ```bash= ctr --address /app/run/containerd/containerd.sock ... ``` To list all the containers, we need to also specify the namespace since no containers are present in the default namespace. So, either ```bash= export CONTAINERD_NAMESPACE=k8s.io ctr containers list # or ctr --namespace k8s.io containers list ``` Once you have the container id from listing the container, we can checkpoint it ```bash= # assuming the socket file and namespace are setup correctly ctr containers checkpoint --rw {{ container_id }} {{ checkpoint_name}} ``` The `ctr` utility does not allow 2 things * listing all the checkpoints for a container * defining the export path for the criu dumps, [containerd issues #2053](https://github.com/containerd/containerd/issues/2053) There seems to be 2 ways to checkpoint, one using the container identity and one using a task. ```bash= ctr containers checkpoint ... ctr task checkpoint --exit --image-path {{ dump_path }} {{ container_id }} ``` The task way seems to allow the checkpoint to be dumped in a path but it doesn't work very well from the container since the host process (the containerd daemon) can't find the `criu` binary in the `$PATH`. ```bash= $ ctr -n k8s.io t checkpoint --image-path /var/foo --work-path /var/criulogs edda50d2bc166c60aa902e8e013164af4a5d484e3e9e1f7588accb0d1b9bb0f7 ctr: runc did not terminate successfully: CRIU version check failed: exec: "criu": executable file not found in $PATH path= /run/containerd/io.containerd.runtime.v2.task/k8s.io/edda50d2bc166c60aa902e8e013164af4a5d484e3e9e1f7588accb0d1b9bb0f7/criu-dump.log: unknown ``` Similar APIs are exposed as * using the [`containerd.Container`](https://pkg.go.dev/github.com/containerd/containerd@v1.5.5#Container) interface ```go= type Container interface { Checkpoint(context.Context, string, ...CheckpointOpts) (Image, error) //... } ``` * using the [`containerd.Task`](https://pkg.go.dev/github.com/containerd/containerd@v1.5.5#Task) interface ```go= type Task interface { // Checkpoint serializes the runtime and memory information of a task into an // OCI Index that can be pushed and pulled from a remote resource. // // Additional software like CRIU maybe required to checkpoint and restore tasks // NOTE: Checkpoint supports to dump task information to a directory, in this way, // an empty OCI Index will be returned. Checkpoint(context.Context, ...CheckpointTaskOpts) (Image, error) //... } ``` ### Attempts to find a file created * We tried to look for any file being created in the whole host filesystem using `inotifywait` but none were found for the criu dump. * We tried to do an strace on the `ctr` within the container, but that didn't work because the `ctr` was running in an isolated PID namespace and it was just making gRPC calls to the containerd socket which is running in a different namespace which would invoke `criu` present on the container but being executed in the host PID namespace. The calls become opaque after the gRPC call itself. * Investigating the filesystem. We tried to look for all the folders, but didn't find any place that would have any dumps or trace of a criu image dump. We tried looking at any files being created within a few seconds of the checkpoint using `find . -type d -mmin -5` but didn't see anything interesting. Restoring was unverified, but it would work most likely. The restore threw an error because we were trying to restore a container with a container id that already existed on the host. ## Questions that we have * _Is the host "wiped" before re-using for any other pod?_ If not, then our settings can get persisted and make the next cluster vulnerable, like for example, we have the ASLR settings persisted * _Do other pods get scheduled on the same host at the same time?_ If yes, since we are able to view the host file system and connect with the containerd socket which is running on the host namespace, we could potentially see and inspect all containers running on the host. * If we are able to persist ASLR, could it result in an memory bug and could that allow an escape to the host namespace. Right now we are still under the PID namespace although we can see all PIDs. This is evident by the fact that we are able to see `/bin/reboot` but can't actually invoke it. ```bash= root@osquery-deployment-88c6cbc94-bpjvv:/app/run/bin# reboot Running in chroot, ignoring request. ```