# Docker Notes ###### tags: `Docker` `Container` `System Design` `Distributed System` `Software` `Software Packaging` `Server` `System Administration` `Linux` `Computer Networking` `Operating System` `Cross Platform` Containerizing applications. ![](https://i.imgur.com/iIcf8es.png =600x) :::warning **More Secure Docker: Podman** *With seamless intergration with Docker ecosystem & Kubernetes environment.* Podman only relies on Linux's OS-level containerization, but can run Docker images & commands, along with Kubernetes pod concept support. > Podman independently runs containers without interacting with a rootful service frontend to operate the container runtime: > ![image](https://hackmd.io/_uploads/r1ZwugVIlg.png =500x) [Containerization with PodMan - Medium](https://medium.com/@ifeoluwashola06/containerization-with-podman-5482d4d74751) More: - [下一代容器管理工具Podman,对比Docker有哪些优势,架构是什么样的?- 技术爬爬虾 TechShrimp](https://www.youtube.com/watch?v=P68C6CQB0F4) > ![image](https://hackmd.io/_uploads/BJDbRlVUex.png =400x) > (Source: [Podman: Managing pods and containers in a local container runtime - Red Hat Dev](https://developers.redhat.com/blog/2019/01/15/podman-managing-containers-pods)) ::: :::info **依賴 Linux Kernel 權限功能實作之 Docker 沙箱容器** *I.e., relies on OS-level virtualization.* > ![image](https://hackmd.io/_uploads/SJZyeQRmgg.png =500x) > > (Source: [Container security fundamentals part 2: Isolation & namespaces - Datadog Security Labs](https://securitylabs.datadoghq.com/articles/container-security-fundamentals-part-2/)) - **`namespaces` 系統級進程資源隔離** *由 Linux kernel 提供 process 的++資源存取隔離++方法。* | Namespace | Isolated Resources | |:-----------:|:------------------------------------------------------------------------------------------------------:| | **PID** | PID-process table | | **IPC** | Kernel message queues (UNIX System V / POSIX), semaphores, shared memory | | **Time** | `CLOCK_MONOTONIC` & `CLOCK_BOOTTIME` | | **Network** | IPv4 / IPv6 stacks, firewall rules (iptables / nftables), `/proc/net` & `/sys/class/net`, sockets, ... | | **UTS** | Hostname & domainname | | **Mount** | Mount points | | **User** | UID & GID | | **Cgroup** | `cgroups` root directories | User-space creating new isolated `namespace` frontend: ```sh! unshare -p -f --mount-proc <program> ``` [unshare(1) - Linux manual page](https://man7.org/linux/man-pages/man1/unshare.1.html) [namespaces(7) - Linux manual page](https://man7.org/linux/man-pages/man7/namespaces.7.html) [cgroup_namespaces(7) - Linux manual page](https://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html) - **`cgroups` 系統級階層式進程群資源監控 & 管理** *由 Linux kernel 提供進階 process group 的++資源存取限制和管理++。* :::warning **Control Groups:** used to be called *Process Containers*. ::: 例如一 process 在某一 control group 規則下將受到以下約束: - **系統資源使用率限制 (Limit):** CPU、RAM、I/O、檔案、...。 > 在容器化技術以前,使用的是 `setrlimit()` / `ulimit` 以對「單一 UNIX process」的 `task_struct` 進行系統資源限制標註。 - **優先級設定 (Prioritize):** 可存取 CPU 之核心數、I/O throughput、...。 - **結算 (Account):** 計算一 process 在行程間之資源使用概況。 - **凍結 (Freeze):** 在 runtime 期間強制對特定 group 中 processes 作暫停、檢查、重新啟動、...。 > **Control Group V1 -> V2 Evolution** > - [Control Group v2 - Admin Guide - Linux Kernel Docs](https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html): cpu, memory, io, process, misc, ... > - [第一千零一篇的 cgroups 介紹 - smalltown - Medium](https://medium.com/starbugs/第一千零一篇的-cgroups-介紹-a1c5005be88c) :::info **Systemd 之系統資源分配及管理應用** `systemd` (pid 1 init system) 利用 Linux kernel 中的 `cgroups` subsystem 來做系統資源管理,而不是一般 process-based 資源管理邏輯 (e.g., pid, namespaces, ...),有以下優點: - 使其能享有為之更高層級的「作業系統層級的虛擬容器化」特色和功能。 - 額外使用 `systemd-nspawn` 和 `machinectl` 來分別輔助建立與管理 Linux 上的容器。 - `cgorups` 與 `systemd` 的 target 與服務相依性相整合,使統一的系統服務與容器化資源管理界面更好管理、開發和擴充。 E.g., - journald: `/system.slice/systemd-journald.service` - gdm3: `/system.slice/gdm.service` - gdm session(s): `/user.slice/user-128.slice/session-c1.scope` - sshd: `/system.slice/ssh.service` - sshd session(s): `/user.slice/user-1000.slice/session-361.scope` > `1000` 可替換為其他 USER 之 user/group id: > ```sh! > cat /etc/passwd # list all > > id -u # -g # list current user > ``` > ![image](https://hackmd.io/_uploads/SyixxXAmle.png =500x) More: - [[Systemd] What are CGroup Slices Used For - Stack Exchange](https://unix.stackexchange.com/questions/683141/what-are-cgroup-slices-used-for) ::: - **`chroot`:** for filesystems. *由 Linux Kernel 硬體抽象管理層直接對 file system 存儲空間進行分配管制。* - **`seccomp`:** for process syscalls. *限制某 process 可使用的 syscall。* > [seccomp 在 K8s 中應用 - iThome](https://ithelp.ithome.com.tw/articles/10337741) ::: ## Get Started - [Get Started - Docker doc](https://docs.docker.com/get-started/) - [Developing inside a Container - VSCode](https://code.visualstudio.com/docs/devcontainers/containers) ### 環境設置 Dockerfile 自動化 container 環境建置 & 應用程式部屬 script ```dockerfile= FROM node:alpine-image # load basic image from repository WORKDIR /application # create working dir in container ADD . /application # copy files in .Dockerfile's dir to container's dir RUN npm install # module dependencies from package.json EXPOSE 3000 # network settings ENV PATH=$PATH:/bin ENV DEBUG=0 CMD node index.js # execute application RUN apt-get update # update apt package info RUN apt-get -i nginx # install nginx ... ``` #### 經 Dockerfile 打包成 Docker Image ```sh! docker build -t <image-name> ./Dockerfile/path --no-cache ``` ## 常用指令 #### 執行新 Container ```sh! docker run -it --rm <image-name> <cmd> ``` ```sh! docker run --name <new-container-name> -dp 3000:3000 -w /workdir --restart=always <image-name> sh -c "yarn install && yarn run dev" ``` :::info **Access GUI Systems from Docker Containers** Build Docker image: - Dockerfile ```dockerfile= ... # Set environment variable to use host's X11 display ENV DISPLAY=:0 WORKDIR /app CMD ["app"] ``` > [!TIP] > Run `RUN apt-get update && apt-get install gedit && gedit` to test. Run the container: - For X11: ```sh! # allow Docker to access X11 xhost +local:docker docker run -it --rm \ -e DISPLAY=$DISPLAY \ # bind the correct host display id (e.g., :1) -v /tmp/.X11-unix:/tmp/.X11-unix \ # allow the container to communicate with the X server \ # -v $HOME/.Xauthority:/root/.Xauthority \ # --network host <image-name> # reset X11 access permission xhost -local:docker ``` - For Wayland windowing system: ```sh! # allow Docker to access Wayland (better impl: group permission add docker) sudo chmod a+rw /run/user/$(id -u)/wayland-0 docker run -it --rm \ -e XDG_RUNTIME_DIR=/tmp \ -e QT_QPA_PLATFORM=wayland \ -e WAYLAND_DISPLAY=$WAYLAND_DISPLAY \ -v $XDG_RUNTIME_DIR/$WAYLAND_DISPLAY:/tmp/$WAYLAND_DISPLAY \ --user=$(id -u):$(id -g) \ \ # --network host <image-name> # reset Wayland access permission sudo chmod 644 /run/user/$(id -u)/wayland-0 ``` > [!TIP] > More: [Running with Docker - openage - GitHub](https://github.com/SFTtech/openage/blob/master/doc/build_instructions/docker.md). > **[What are UNIX-domain sockets under `/tmp/.X11-unix`?](https://unix.stackexchange.com/questions/196677/what-is-tmp-x11-unix)** > > The X server use UNIX-domain sockets (as the IPC system) to communicate with clients like `xterm`, `firefox`, etc. via some kind of reliable stream of bytes. > > A UNIX-domain socket is probably a bit more secure than a TCP socket open to the world, and probably a bit faster, as the kernel does it all, and does not have to rely on an ethernet or wireless card. Refs: - [Running GUI Applications in Docker Containers: A Step-by-Step Guide - Priyam Sanodiya - Medium](https://medium.com/@priyamsanodiya340/running-gui-applications-in-docker-containers-a-step-by-step-guide-335b54472e4b) ::: :::warning **GPU 對 Container 支援** 額外支援 container 應用諸如 `/dev/dri` 控制、與 host 端環境互動、container 對 GPU 資源存取管控、... 等。 (此 Container 支援套件需 host 端的適配 GPU driver 與 docker engine 配合。) [Installing the NVIDIA Container Toolkit - Nvidia Docs](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) 1. 下載並安裝 GPU 對 Container 支援套件。 ```sh! curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list ``` ```sh! sudo apt update sudo apt install nvidia-container-toolkit ``` 2. 配置相關參數於 `/etc/nvidia-container-runtime/config.toml`。 ```sh! sudo nvidia-ctk runtime configure --runtime=docker ``` 3. 重啟 docker daemon。 ```sh! sudo systemctl restart docker ``` 4. 測試 `docker --runtime=nvidia --gpus all`。 ```sh! sudo docker run --rm \ --runtime=nvidia \ --gpus all --env NVIDIA_VISIBLE_DEVICES=all \ --env NVIDIA_DRIVER_CAPABILITIES=all \ <image-name> nvidia-smi ``` 5. (Container 中安裝其他 GPU graphics driver 之 CLI、library。) ```sh! sudo apt install -y vulkan-tools # will install with some useful vulkan drivers vulkaninfo # Vulkan driver info export VK_LOADER_DEBUG=all # error,warn,info,debug,layer,driver ./<vulkan-application> ``` ```sh! nvidia-smi # e.g., driver version: 535.247.01 sudo apt install libnvidia-gl-<driver-major-version> # e.g., libnvidia-gl-535 ``` ::: #### 停止 Container ```sh! docker stop <container-id> ``` #### 重啟舊 Container ```sh! docker start <container-id> ``` #### 刪除 (--force) Container ```sh! docker rm -f <container-id> ``` #### 列出現有 (--all) Container ```sh! docker ps -a ``` #### Container's bash log ```sh! docker logs -f <container-id> # -f: interactive ``` #### 刪除 Image ```sh! docker rmi <image-name> ``` ```sh! # remove <none> images (previously built) docker image prune # remove all NOT-IN-USE images docker image prune -a ``` #### 列出所有 Image ```sh! docker images ``` #### 在 Host 端直接對 Container 下命令 ```sh! docker exec -it <container-id> <cmd> ``` #### 使用 gcc Container 編譯 C/C++ [amd64/gcc - Docker Hub](https://hub.docker.com/r/amd64/gcc/) ```sh! docker run -it --rm \ -v ${PWD}:/app \ -w /app \ amd64/gcc:4.9 \ sh -c "gcc -o a.out main.c" # run the compiled app outside the container ./a.out ``` #### 清除 Docker 未使用空間 ```sh! docker system prune -a ``` More: - [Is it safe to clean docker/overlay2/ - StackOverflow](https://stackoverflow.com/questions/46672001/is-it-safe-to-clean-docker-overlay2) ## Volumes 存儲永久資料的目錄位址 ### Create Docker Volume 使用 Docker 內建存儲系統 (由 docker 建立於其 VM 中) #### 建立新 Docker Volume ```sh! docker volume create <new-volume-name> ``` :::info **Various Types of Docker Volumes** ```sh! # named bind mount docker volume create --driver local \ --opt type=none \ --opt device=/home/user/test \ --opt o=bind \ my_mnt_vol # nfs docker volume create --driver local \ --opt type=nfs \ --opt o=nfsvers=4,addr=nfs.example.com,rw \ --opt device=:/path/to/dir \ my_fns_vol # overlay docker volume create --driver local \ --opt type=overlay \ --opt o=lowerdir=${PWD}/ro-data,upperdir=${PWD}/upper1,workdir=${PWD}/work1 \ --opt device=overlay \ my_overlay_vol ``` > [!TIP] > **Overlay Docker Volume:** Get the benefits of a read-only filesystem for the host, and a writable filesystem for the container. > > More example: [Where are Docker Images Stored? Docker Container Paths Explained - freeCodeCamp.org](https://www.freecodecamp.org/news/where-are-docker-images-stored-docker-container-paths-explained/) Refs: - [docker - volumes vs mount binds. what are the use cases? - Stack Exchange](https://serverfault.com/questions/996785/docker-volumes-vs-mount-binds-what-are-the-use-cases) ::: #### 執行新 Container 時繫結 Docker Volume ```sh! docker run --mount type=volume,src=<volume-name>,target=mnt/path, \ readonly \ <container-name> ``` #### 執行 Container 時創建新 Docker Volume ```sh! docker run \ -v <volume-name>:mnt/path \ <container-name> ``` #### 察看 Docker Volume 資訊 ```sh! docker volume inspect <volume-name> ``` #### Remote Server Docker Volume - [vieux/docker-volume-sshfs - GitHub](https://github.com/vieux/docker-volume-sshfs) ### Bind Mount Volume 以現有 Host 資料夾作為 Volume,可與 host 共同使用同一個儲存空間 (由 host machine filesystem 管理) #### 執行新 Container 時繫結 Host's Directory ```sh! docker run --mount type=bind,src=${PWD},target=mnt/path, readonly \ <container-name> ``` ```sh! docker run -v ${PWD}:mnt/path \ -w mnt/path \ <container-name> ``` ## Image Layers Construction ![](https://i.imgur.com/5hYErXt.png =600x) > [!TIP] > **Optimization for Docker image layers:** [Docker Image BEST Practices - From 1.2GB to 10MB - Better Stack](https://www.youtube.com/watch?v=t779DVjCKCs). - Ex. 製作 *Alpine-Apache-MySQL* Image :::info 從 DockerHub 下載 *Alpine* Image 模板,docker 執行 Image 產生 *Alpine* Container,啟動 Container 於內安裝 *Apache*,安裝後將此 Container 打包成新的 *Alpine-Apache* Image Layer。 而第二層繼續反覆這樣的流程,在以 *Alpine-Apache* Image 為底的 Container 中續安裝 *MySQL*,打包此 Container 即完成 *Alpine-Apache-MySQL* Image Layer。 ::: #### 從現有 Container 打包成新 Image Layer [Docker Container 基礎入門篇 1 #Image Layer - Medium](https://azole.medium.com/docker-container-%E5%9F%BA%E7%A4%8E%E5%85%A5%E9%96%80%E7%AF%87-1-3cb8876f2b14#0d18) ```sh! docker commit <container-id> <new-image-name> ``` ## Docker Network *OS-Level Virtualization Subnetwork* ![](https://hackmd.io/_uploads/SkHFltdMeg.png =600x) ![](https://i.imgur.com/GT0h33O.png =600x) - [【入门篇】Docker网络模式 - Bridge | Host | None - 技术蛋老师](https://www.youtube.com/watch?v=va-9hcq-a5Q) - [計算機網路 - Network Namespace - 0x00f7 - hackmd](https://hackmd.io/@0xff07/SJzOwViYF#%E4%BE%8B%E5%AD%90%E4%B8%80%EF%BC%9A%E5%85%A9%E5%80%8B-Network-Namespace-%E7%94%A8-veth-%E9%80%A3%E6%8E%A5) ### Network DNS Inspection ```sh! docker run -it --network <network-name> nicolaka/netshoot dig <network-alias-name> ``` ### Docker Network Drivers Docker 預設 Linux 網路驅動: - **bridge**: Containers 使用 LAN 溝通,bridge 對外有 NAT ![](https://i.imgur.com/KMScg1I.png =500x) - **host**: 直接使用 host internet interface - **overlay**: 以 VXLAN (Virtual Extensible LAN) 實作 ![](https://i.imgur.com/KMScg1I.png =500x) - **macvlan**: Containers 具不同 mac 位址、不同 ip 位址 - **VEPA (Virtual Ethernet Port Aggregator) mode**: 視 containers 為網路中 independent hosts ![](https://i.imgur.com/mkZPFAv.png =450x) - **bridge mode**: 先用 bridge 並聯 host's containers,再與 host's bridge 並聯 ![](https://i.imgur.com/VudUbAD.png =450x) - **ipvlan**: Containers 具相同 mac 位址、不同 ip 位址 #### 建立新 Subnetwork 預設驅動為 *bridge* ```sh! docker network create <network-name> ``` #### 執行 Container 於指定 Network ```sh! docker run --network <network-name> --network-alias <container-DNS-resolved-hostname> <container-name> ``` ### [進階] 由 Kubernetes 管理 Network Routing - Further readings [WebAPIs 實作 #WebRTC - shibarashinu](https://hackmd.io/sETH6DLeTOmoIDtAea1EMQ#WebRTC) ## 管理、調度 Services in Containers System frameworks for container orchestration E.g. [Kubernetes - shibarashinu](/EJSpVmMjS5Gx3CFZCCzWaQ), [Docker Compose](https://blog.techbridge.cc/2018/09/07/docker-compose-tutorial-intro/) ### Containers - The Solution of Dependency Hell {%youtube IeEUvhTebcM %} ### Kubernetes 系統建置.config ```yaml= services: # services 列出 web, redis 兩項專案中的服務 web: build: . # Build 在同一資料夾的 Dockerfile(描述 Image 要組成的 yaml 檔案) ports: - "5000:5000" # 對外 port 對應 Container's port volumes: - .:/code # 從本地資料夾 mount 掛載進去的資料 links: - redis # 連結 redis,讓兩個 container 可以互通網路 redis: image: redis # 指定 redis image 作為 service 基礎 ```