# Docker Notes
###### tags: `Docker` `Container` `System Design` `Distributed System` `Software` `Software Packaging` `Server` `System Administration` `Linux` `Computer Networking` `Operating System` `Cross Platform`
Containerizing applications.

:::warning
**More Secure Docker: Podman**
*With seamless intergration with Docker ecosystem & Kubernetes environment.*
Podman only relies on Linux's OS-level containerization, but can run Docker images & commands, along with Kubernetes pod concept support.
> Podman independently runs containers without interacting with a rootful service frontend to operate the container runtime:
> 
[Containerization with PodMan - Medium](https://medium.com/@ifeoluwashola06/containerization-with-podman-5482d4d74751)
More:
- [下一代容器管理工具Podman,对比Docker有哪些优势,架构是什么样的?- 技术爬爬虾 TechShrimp](https://www.youtube.com/watch?v=P68C6CQB0F4)
> 
> (Source: [Podman: Managing pods and containers in a local container runtime - Red Hat Dev](https://developers.redhat.com/blog/2019/01/15/podman-managing-containers-pods))
:::
:::info
**依賴 Linux Kernel 權限功能實作之 Docker 沙箱容器**
*I.e., relies on OS-level virtualization.*
> 
>
> (Source: [Container security fundamentals part 2: Isolation & namespaces - Datadog Security Labs](https://securitylabs.datadoghq.com/articles/container-security-fundamentals-part-2/))
- **`namespaces` 系統級進程資源隔離**
*由 Linux kernel 提供 process 的++資源存取隔離++方法。*
| Namespace | Isolated Resources |
|:-----------:|:------------------------------------------------------------------------------------------------------:|
| **PID** | PID-process table |
| **IPC** | Kernel message queues (UNIX System V / POSIX), semaphores, shared memory |
| **Time** | `CLOCK_MONOTONIC` & `CLOCK_BOOTTIME` |
| **Network** | IPv4 / IPv6 stacks, firewall rules (iptables / nftables), `/proc/net` & `/sys/class/net`, sockets, ... |
| **UTS** | Hostname & domainname |
| **Mount** | Mount points |
| **User** | UID & GID |
| **Cgroup** | `cgroups` root directories |
User-space creating new isolated `namespace` frontend:
```sh!
unshare -p -f --mount-proc <program>
```
[unshare(1) - Linux manual page](https://man7.org/linux/man-pages/man1/unshare.1.html)
[namespaces(7) - Linux manual page](https://man7.org/linux/man-pages/man7/namespaces.7.html)
[cgroup_namespaces(7) - Linux manual page](https://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html)
- **`cgroups` 系統級階層式進程群資源監控 & 管理**
*由 Linux kernel 提供進階 process group 的++資源存取限制和管理++。*
:::warning
**Control Groups:** used to be called *Process Containers*.
:::
例如一 process 在某一 control group 規則下將受到以下約束:
- **系統資源使用率限制 (Limit):** CPU、RAM、I/O、檔案、...。
> 在容器化技術以前,使用的是 `setrlimit()` / `ulimit` 以對「單一 UNIX process」的 `task_struct` 進行系統資源限制標註。
- **優先級設定 (Prioritize):** 可存取 CPU 之核心數、I/O throughput、...。
- **結算 (Account):** 計算一 process 在行程間之資源使用概況。
- **凍結 (Freeze):** 在 runtime 期間強制對特定 group 中 processes 作暫停、檢查、重新啟動、...。
> **Control Group V1 -> V2 Evolution**
> - [Control Group v2 - Admin Guide - Linux Kernel Docs](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html): cpu, memory, io, process, misc, ...
> - [第一千零一篇的 cgroups 介紹 - smalltown - Medium](https://medium.com/starbugs/第一千零一篇的-cgroups-介紹-a1c5005be88c)
:::info
**Systemd 之系統資源分配及管理應用**
`systemd` (pid 1 init system) 利用 Linux kernel 中的 `cgroups` subsystem 來做系統資源管理,而不是一般 process-based 資源管理邏輯 (e.g., pid, namespaces, ...),有以下優點:
- 使其能享有為之更高層級的「作業系統層級的虛擬容器化」特色和功能。
- 額外使用 `systemd-nspawn` 和 `machinectl` 來分別輔助建立與管理 Linux 上的容器。
- `cgorups` 與 `systemd` 的 target 與服務相依性相整合,使統一的系統服務與容器化資源管理界面更好管理、開發和擴充。
E.g.,
- journald: `/system.slice/systemd-journald.service`
- gdm3: `/system.slice/gdm.service`
- gdm session(s): `/user.slice/user-128.slice/session-c1.scope`
- sshd: `/system.slice/ssh.service`
- sshd session(s): `/user.slice/user-1000.slice/session-361.scope`
> `1000` 可替換為其他 USER 之 user/group id:
> ```sh!
> cat /etc/passwd # list all
>
> id -u # -g # list current user
> ```
>

More:
- [[Systemd] What are CGroup Slices Used For - Stack Exchange](https://unix.stackexchange.com/questions/683141/what-are-cgroup-slices-used-for)
:::
- **`chroot`:** for filesystems.
*由 Linux Kernel 硬體抽象管理層直接對 file system 存儲空間進行分配管制。*
- **`seccomp`:** for process syscalls.
*限制某 process 可使用的 syscall。*
> [seccomp 在 K8s 中應用 - iThome](https://ithelp.ithome.com.tw/articles/10337741)
:::
## Get Started
- [Get Started - Docker doc](https://docs.docker.com/get-started/)
- [Developing inside a Container - VSCode](https://code.visualstudio.com/docs/devcontainers/containers)
### 環境設置 Dockerfile
自動化 container 環境建置 & 應用程式部屬 script
```dockerfile=
FROM node:alpine-image # load basic image from repository
WORKDIR /application # create working dir in container
ADD . /application # copy files in .Dockerfile's dir to container's dir
RUN npm install # module dependencies from package.json
EXPOSE 3000 # network settings
ENV PATH=$PATH:/bin
ENV DEBUG=0
CMD node index.js # execute application
RUN apt-get update # update apt package info
RUN apt-get -i nginx # install nginx
...
```
#### 經 Dockerfile 打包成 Docker Image
```sh!
docker build -t <image-name> ./Dockerfile/path --no-cache
```
## 常用指令
#### 執行新 Container
```sh!
docker run -it --rm
<image-name>
<cmd>
```
```sh!
docker run --name <new-container-name>
-dp 3000:3000
-w /workdir
--restart=always
<image-name>
sh -c "yarn install && yarn run dev"
```
:::info
**Access GUI Systems from Docker Containers**
Build Docker image:
- Dockerfile
```dockerfile=
...
# Set environment variable to use host's X11 display
ENV DISPLAY=:0
WORKDIR /app
CMD ["app"]
```
> [!TIP]
> Run `RUN apt-get update && apt-get install gedit && gedit` to test.
Run the container:
- For X11:
```sh!
# allow Docker to access X11
xhost +local:docker
docker run -it --rm \
-e DISPLAY=$DISPLAY \ # bind the correct host display id (e.g., :1)
-v /tmp/.X11-unix:/tmp/.X11-unix \ # allow the container to communicate with the X server
\ # -v $HOME/.Xauthority:/root/.Xauthority
\ # --network host
<image-name>
# reset X11 access permission
xhost -local:docker
```
- For Wayland windowing system:
```sh!
# allow Docker to access Wayland (better impl: group permission add docker)
sudo chmod a+rw /run/user/$(id -u)/wayland-0
docker run -it --rm \
-e XDG_RUNTIME_DIR=/tmp \
-e QT_QPA_PLATFORM=wayland \
-e WAYLAND_DISPLAY=$WAYLAND_DISPLAY \
-v $XDG_RUNTIME_DIR/$WAYLAND_DISPLAY:/tmp/$WAYLAND_DISPLAY \
--user=$(id -u):$(id -g) \
\ # --network host
<image-name>
# reset Wayland access permission
sudo chmod 644 /run/user/$(id -u)/wayland-0
```
> [!TIP]
> More: [Running with Docker - openage - GitHub](https://github.com/SFTtech/openage/blob/master/doc/build_instructions/docker.md).
> **[What are UNIX-domain sockets under `/tmp/.X11-unix`?](https://unix.stackexchange.com/questions/196677/what-is-tmp-x11-unix)**
>
> The X server use UNIX-domain sockets (as the IPC system) to communicate with clients like `xterm`, `firefox`, etc. via some kind of reliable stream of bytes.
>
> A UNIX-domain socket is probably a bit more secure than a TCP socket open to the world, and probably a bit faster, as the kernel does it all, and does not have to rely on an ethernet or wireless card.
Refs:
- [Running GUI Applications in Docker Containers: A Step-by-Step Guide - Priyam Sanodiya - Medium](https://medium.com/@priyamsanodiya340/running-gui-applications-in-docker-containers-a-step-by-step-guide-335b54472e4b)
:::
:::warning
**GPU 對 Container 支援**
額外支援 container 應用諸如 `/dev/dri` 控制、與 host 端環境互動、container 對 GPU 資源存取管控、... 等。
(此 Container 支援套件需 host 端的適配 GPU driver 與 docker engine 配合。)
[Installing the NVIDIA Container Toolkit - Nvidia Docs](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
1. 下載並安裝 GPU 對 Container 支援套件。
```sh!
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
```
```sh!
sudo apt update
sudo apt install nvidia-container-toolkit
```
2. 配置相關參數於 `/etc/nvidia-container-runtime/config.toml`。
```sh!
sudo nvidia-ctk runtime configure --runtime=docker
```
3. 重啟 docker daemon。
```sh!
sudo systemctl restart docker
```
4. 測試 `docker --runtime=nvidia --gpus all`。
```sh!
sudo docker run --rm \
--runtime=nvidia \
--gpus all
--env NVIDIA_VISIBLE_DEVICES=all \
--env NVIDIA_DRIVER_CAPABILITIES=all \
<image-name> nvidia-smi
```
5. (Container 中安裝其他 GPU graphics driver 之 CLI、library。)
```sh!
sudo apt install -y vulkan-tools # will install with some useful vulkan drivers
vulkaninfo # Vulkan driver info
export VK_LOADER_DEBUG=all # error,warn,info,debug,layer,driver
./<vulkan-application>
```
```sh!
nvidia-smi # e.g., driver version: 535.247.01
sudo apt install libnvidia-gl-<driver-major-version> # e.g., libnvidia-gl-535
```
:::
#### 停止 Container
```sh!
docker stop <container-id>
```
#### 重啟舊 Container
```sh!
docker start <container-id>
```
#### 刪除 (--force) Container
```sh!
docker rm -f <container-id>
```
#### 列出現有 (--all) Container
```sh!
docker ps -a
```
#### Container's bash log
```sh!
docker logs -f <container-id> # -f: interactive
```
#### 刪除 Image
```sh!
docker rmi <image-name>
```
```sh!
# remove <none> images (previously built)
docker image prune
# remove all NOT-IN-USE images
docker image prune -a
```
#### 列出所有 Image
```sh!
docker images
```
#### 在 Host 端直接對 Container 下命令
```sh!
docker exec -it <container-id> <cmd>
```
#### 使用 gcc Container 編譯 C/C++
[amd64/gcc - Docker Hub](https://hub.docker.com/r/amd64/gcc/)
```sh!
docker run -it --rm \
-v ${PWD}:/app \
-w /app \
amd64/gcc:4.9 \
sh -c "gcc -o a.out main.c"
# run the compiled app outside the container
./a.out
```
#### 清除 Docker 未使用空間
```sh!
docker system prune -a
```
More:
- [Is it safe to clean docker/overlay2/ - StackOverflow](https://stackoverflow.com/questions/46672001/is-it-safe-to-clean-docker-overlay2)
## Volumes
存儲永久資料的目錄位址
### Create Docker Volume
使用 Docker 內建存儲系統 (由 docker 建立於其 VM 中)
#### 建立新 Docker Volume
```sh!
docker volume create <new-volume-name>
```
:::info
**Various Types of Docker Volumes**
```sh!
# named bind mount
docker volume create --driver local \
--opt type=none \
--opt device=/home/user/test \
--opt o=bind \
my_mnt_vol
# nfs
docker volume create --driver local \
--opt type=nfs \
--opt o=nfsvers=4,addr=nfs.example.com,rw \
--opt device=:/path/to/dir \
my_fns_vol
# overlay
docker volume create --driver local \
--opt type=overlay \
--opt o=lowerdir=${PWD}/ro-data,upperdir=${PWD}/upper1,workdir=${PWD}/work1 \
--opt device=overlay \
my_overlay_vol
```
> [!TIP]
> **Overlay Docker Volume:** Get the benefits of a read-only filesystem for the host, and a writable filesystem for the container.
>
> More example: [Where are Docker Images Stored? Docker Container Paths Explained - freeCodeCamp.org](https://www.freecodecamp.org/news/where-are-docker-images-stored-docker-container-paths-explained/)
Refs:
- [docker - volumes vs mount binds. what are the use cases? - Stack Exchange](https://serverfault.com/questions/996785/docker-volumes-vs-mount-binds-what-are-the-use-cases)
:::
#### 執行新 Container 時繫結 Docker Volume
```sh!
docker run --mount type=volume,src=<volume-name>,target=mnt/path, \
readonly \
<container-name>
```
#### 執行 Container 時創建新 Docker Volume
```sh!
docker run \
-v <volume-name>:mnt/path \
<container-name>
```
#### 察看 Docker Volume 資訊
```sh!
docker volume inspect <volume-name>
```
#### Remote Server Docker Volume
- [vieux/docker-volume-sshfs - GitHub](https://github.com/vieux/docker-volume-sshfs)
### Bind Mount Volume
以現有 Host 資料夾作為 Volume,可與 host 共同使用同一個儲存空間 (由 host machine filesystem 管理)
#### 執行新 Container 時繫結 Host's Directory
```sh!
docker run --mount type=bind,src=${PWD},target=mnt/path, readonly \
<container-name>
```
```sh!
docker run -v ${PWD}:mnt/path \
-w mnt/path \
<container-name>
```
## Image Layers Construction

> [!TIP]
> **Optimization for Docker image layers:** [Docker Image BEST Practices - From 1.2GB to 10MB - Better Stack](https://www.youtube.com/watch?v=t779DVjCKCs).
- Ex. 製作 *Alpine-Apache-MySQL* Image
:::info
從 DockerHub 下載 *Alpine* Image 模板,docker 執行 Image 產生 *Alpine* Container,啟動 Container 於內安裝 *Apache*,安裝後將此 Container 打包成新的 *Alpine-Apache* Image Layer。
而第二層繼續反覆這樣的流程,在以 *Alpine-Apache* Image 為底的 Container 中續安裝 *MySQL*,打包此 Container 即完成 *Alpine-Apache-MySQL* Image Layer。
:::
#### 從現有 Container 打包成新 Image Layer
[Docker Container 基礎入門篇 1 #Image Layer - Medium](https://azole.medium.com/docker-container-%E5%9F%BA%E7%A4%8E%E5%85%A5%E9%96%80%E7%AF%87-1-3cb8876f2b14#0d18)
```sh!
docker commit <container-id> <new-image-name>
```
## Docker Network
*OS-Level Virtualization Subnetwork*


- [【入门篇】Docker网络模式 - Bridge | Host | None - 技术蛋老师](https://www.youtube.com/watch?v=va-9hcq-a5Q)
- [計算機網路 - Network Namespace - 0x00f7 - hackmd](https://hackmd.io/@0xff07/SJzOwViYF#%E4%BE%8B%E5%AD%90%E4%B8%80%EF%BC%9A%E5%85%A9%E5%80%8B-Network-Namespace-%E7%94%A8-veth-%E9%80%A3%E6%8E%A5)
### Network DNS Inspection
```sh!
docker run -it --network <network-name> nicolaka/netshoot
dig <network-alias-name>
```
### Docker Network Drivers
Docker 預設 Linux 網路驅動:
- **bridge**: Containers 使用 LAN 溝通,bridge 對外有 NAT

- **host**: 直接使用 host internet interface
- **overlay**: 以 VXLAN (Virtual Extensible LAN) 實作

- **macvlan**: Containers 具不同 mac 位址、不同 ip 位址
- **VEPA (Virtual Ethernet Port Aggregator) mode**: 視 containers 為網路中 independent hosts

- **bridge mode**: 先用 bridge 並聯 host's containers,再與 host's bridge 並聯

- **ipvlan**: Containers 具相同 mac 位址、不同 ip 位址
#### 建立新 Subnetwork
預設驅動為 *bridge*
```sh!
docker network create <network-name>
```
#### 執行 Container 於指定 Network
```sh!
docker run --network <network-name>
--network-alias <container-DNS-resolved-hostname>
<container-name>
```
### [進階] 由 Kubernetes 管理 Network Routing
- Further readings
[WebAPIs 實作 #WebRTC - shibarashinu](https://hackmd.io/sETH6DLeTOmoIDtAea1EMQ#WebRTC)
## 管理、調度 Services in Containers
System frameworks for container orchestration
E.g. [Kubernetes - shibarashinu](/EJSpVmMjS5Gx3CFZCCzWaQ), [Docker Compose](https://blog.techbridge.cc/2018/09/07/docker-compose-tutorial-intro/)
### Containers - The Solution of Dependency Hell
{%youtube IeEUvhTebcM %}
### Kubernetes 系統建置.config
```yaml=
services: # services 列出 web, redis 兩項專案中的服務
web:
build: . # Build 在同一資料夾的 Dockerfile(描述 Image 要組成的 yaml 檔案)
ports:
- "5000:5000" # 對外 port 對應 Container's port
volumes:
- .:/code # 從本地資料夾 mount 掛載進去的資料
links:
- redis # 連結 redis,讓兩個 container 可以互通網路
redis:
image: redis # 指定 redis image 作為 service 基礎
```