---
title: "讀書筆記 | Docker Deep Dive (2024)"
description: The HackMD template for general note (test).
image: https://i.imgur.com/dXfEMFE.png
tags: Docker, 讀書筆記
---
<style>
figcaption {
text-align: center;
font-size: 0.75em;
}
</style>
<p style="text-align: center">
<img src="https://hackmd.io/_uploads/ryjHGP_Myg.png" height=512/>
</p>
<!-- table of contents (TOC) -->
<details>
<summary>目錄</summary>
[TOC]
</details>
## Overview
- The feature of the container model is that every container shares the OS of the host it's running on.
- The major technologies behind modern containers include: *kernel namespaces*, *control groups (cgroups)*, capabilities and more.
- **Kubernetes** is the industry standard platform for deploying and managing containerized applications. Older versions of k8s used *Docker* to start and stop containers. However, newer versions use *containerd*.
## Docker and The Container-Related Standards and Projects
### Docker
> The word *Docker* is a British expression meaning *dock worker* that referes to a person who loads and unloads cargo from ships.
There're two major parts to the Docker platform:
- The Docker CLI (client) : just the familiar docker command-line tool for deploying and managing containers. It converts simple commands into API requests and sends them to the engine.
- The Docker Engine (server) : comprises all the server-side components that run and manage containers.
<figure>
<img src="https://hackmd.io/_uploads/Hyv0Sw_Myg.png">
<figcaption>Docker CLI and Daemon</figcaption>
</figure>
### Container-Related Standards and Projects
There're several standards and governance bodies influencing the development of containers and its ecosystem. e.g.:
- The Open Container Initiative (OCI) : maintains three standards like *image-spec*, *runtime-spec* and *distribution-spec*.
- The CloudNative Computing Foundation (CNCF) : hosts important projects such as Kubernetes, containerd, Notary, Prometheus, Cilium...
- The Moby Project : created by Docker as a community-led place for developers to build specialized tools for building container platforms.
## Getting Started: The Ops Perspective and The Dev Perspective
### The Ops Perspective
#### Check Docker is Working
A typical Docker installation installs the *client* and the *engine* on the same machine, and configures them to talk to each other. Run the `docker version` command to ensure both are installed and running:
``` shell
$ docker version
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:19 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.3.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 41ca978
Built: Fri Sep 20 11:41:19 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.23
GitCommit: 57f17b0a6295a39009d861b89e3b3b87b005ca27
runc:
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0
```
#### Download Images and Start Containers
The **images** are objects that contain everything an application needs to run. It includes an OS filesystem, the application and all dependencies. They're similar to *VM templates* or *classes* in development.
``` session
$ docker pull ubuntu:latest
latest: Pulling from library/ubuntu
b91d8878f844: Download complete
Digest: sha256:e9569c25505f33ff72e88b2990887c9dcf230f23259da296eb814fc2b41af999
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest e9569c25505f 10 days ago 106MB
```
Use the `docker run` command to start a new container, and attach the shell to the container's terminal; We can also use the `docker attach` command to attach the shell to the container's main process for executing command inside the container.
``` session
$ docker run --name test -it ubuntu:latest bash
root@bbd2e5ad1817:/#
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
bbd2e5ad1817 ubuntu:latest "/bin/bash" 7 mins Up 7 min test
$ docker attach test
root@bbd2e5ad1817:/#
```
We can use the `docker attach` command to attach the shell to the container's main process for executing command inside the container.
#### Delete the Container
``` session
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
bbd2e5ad1817 ubuntu:latest "/bin/bash" 9 mins Up 9 min test
$ docker stop test
test
$ docker rm test
test
```
### The Dev Perspective
#### Build Docker Images And Run as Containers
The *Dockerfile* is a plain-text document that tells Docker how to build the application and dependencies into an image. For example:
``` dockerfile
FROM alpine
LABEL maintainer="nigelpoulton@hotmail.com"
RUN apk add --update nodejs npm curl
COPY . /src
WORKDIR /src
RUN npm install
EXPOSE 8080
ENTRYPOINT ["node", "./app.js"]
```
We can run `docker build` command to create a new image based on the instructions in the Dockerfile. Also, that's called *"containerized the application"* in jargon.
``` session
$ git clone https://github.com/nigelpoulton/psweb.git && cd psweb
$ docker build -t test:latest .
[+] Building 36.2s (11/11) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => naming to docker.io/library/test:latest 0.0s
=> => unpacking to docker.io/library/test:latest 0.7s
$ docker images
REPO TAG IMAGE ID CREATED SIZE
test latest 0435f2738cf6 21 seconds ago 160MB
$ docker run -d --name web1 --publish 8080:8080 test:latest
```
#### Clean Up
``` session
$ docker rm web1 -f
web1
$ docker rmi test:latest
Untagged: test:latest
Deleted: sha256:0435f27...cac8e2b
```
## The Docker Engine
### Two Major Components: The Docker Daemon and LXC
**Docker Engine** is jargon for the server-side components of Docker that run and manage containers. It's similar to ESXi in VMware.
The Docker Engine had two major components:
- *The Docker Daemon* was a monolithic binary containing all the code for the API, image builders, container execution, volumes, networking.
- *LXC* did the hard work of interfacing with the Linux Kernel and constructing the required namespaces and cgroups to build and start containers.
But now, Docker replaced LXC with its own tool *libcontainer*. And the Docker Dameon are break apart the features.
<figure>
<img src="https://hackmd.io/_uploads/BJ9YAvOGkx.png">
<figcaption>Docker Engine Components and Responsibilites</figcaption>
</figure>
### `runc (r)` and `containerd (c)`
Docker and Kubernetes both use `runc` as their default low-level runtime, and both pair it with the `containerd` high-level runtime:
- `containerd` operates as the high-level runtime managing lifecycle events
- `runc` operates as the low-level runtime executing lifecycle events by interfacing
with the kernel to do the work of actually building containers and deleting them
Most of the time, containerd is paired with runc as its lowlevel runtime. However, it uses `shims` that make it possible to replace `runc` with other low-level runtimes.
### The Process of `docker run`
<figure>
<img src="https://hackmd.io/_uploads/r1HQeduMyg.png">
<figcaption>The process of <code>docker run</code></figcaption>
</figure>
When we run commands like `docker run`, the Docker client converts them into API requests and sends them to the API exposed by the Docker daemon.
- The daemon can expose the API on a local socket or over the network. By default, the local socket is `/var/run/docker.sock` on Linux or ` \pipe\docker_engine` on Windows.
- The daemon communicates with *containerd* via a CRUD-style API over gRPC.
- The *containerd* will converts the required Docker image into an OCI bundle and tells *runc* to use this to create a new container.
- The container is started as a child process of *runc*. As soon as the containers starts, *runc* exits.
Sometimes, we call this **daemonless containers**.
### The `shim` Component
The Docker Engine uses **shims** in between *containerd* and the *OCI* layer. It brings the following benefits:
- Daemonless Containers.
- Improves Effeciency.
- Makes the OCI Layer Pluggable.
The *containerd* forks a *shim* and a *runc* process for every new container. Each *runc* process exits as soon as the container starts running, leaving the *shim* process as the container's parent process.
## The Docker Images
### Docker Images and Registries
An **image** is a *read-only* package containing everything we need to run an application. This means they include application code, dependencies, a minimal set of OS constructs, and metadata.
The image registries contain one or more *image repositories*, and image repositories contain one or more images:
- The *Local Repository* is jargon for an area on the local machine, where Docker stores images for more convenient access. Sometimes, it's been called "image cache".
- People store images in centralized places called *registries*. Most modern registries implement the OCI distribution-spec.
Most of the popular applications and operating systems have *official repositories* on Docker Hub. Their URLs that exist at the top level of the Docker Hub namespace.
### Image Naming and Tagging
A fully qualified image name include the registry name, user name, organization name, repository name, and tag. (Docker will automatically populates the registry and tag values if we don't specify them).
<figure>
<img src="https://hackmd.io/_uploads/SJK48d_Gke.png">
<figcaption>Fully qualified image name.</figcaption>
</figure>
### Images and Layers
Images are made by stacking independent layers and representing them as a single
unified object. Note that ==images are *build-time* constructs, whereas containers are *run-time* constructs==.
<figure>
<img src="https://hackmd.io/_uploads/ryWVPddzyg.png">
<figcaption>Docker Images and Stacked Layers.</figcaption>
</figure>
#### Inspect Layers Information
We can use the `docker inspect` or `docker history` command to inspect layer information. Also, when we pull the images, each line ending with `Pull complete` represents a layer.
``` session
$ docker inspect node:latest
[
{
"Id": "sha256:09d13f1ec5ed523d89cbb8976c48a79d3efc033c9c66ec53915ceb569e4406b5",
"RepoTags": [
"node:latest"
],
...
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:ec8ae7dad7aba50e0f8bff1dc969d34d3584fb7ada6ce9948dad83e95939b5cc",
"sha256:de0d18f93508670ab2b3cc68c87ea02006b72b988a939ba6a0e0dd71cbfbd329",
"sha256:29842e18ccdd85692bd8ec615cb35cecba3fb4021234e67d76244bae975ae6db",
"sha256:c5d4093056babc39221f883cf48609f24ea97cb29312fc40e0fbe350ca0a56b7",
"sha256:e27a35349faa7ce69e85e350be2d6328bf69ecc236175a08d28e4ff7e1719aa0",
"sha256:49f14352d048e099a09224e57fff85022d5be5652a539beb46644c3d7228f7ce",
"sha256:9e2c4e8f8639dd6b716d4d2666f26714d6fa4f4d604f19e0ea8d770005b3767b",
"sha256:34c7b32b88fe8d91638482ce1c8bd5a151d654d32df57d8c2e78ef45a118aafb"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]
```
> Note that `docker history` command shows the build history of an image and is not a strict list of layers in the final image. For example, some Dockerfile instructions (`ENV`, `EXPOSE`, `CMD`, and `ENTRYPOINT`) only add metadata and don’t create layers.
### More about Images and Layers
Under the hood, Docker uses **storage drivers** to stack layers and present them as a unified filesystem and image. Almost all Docker setups use the *overlay2* driver, but *zfs*, *btrfs* and *vfs* are alternative options.
<figure>
<img src="https://hackmd.io/_uploads/rkJuYddzkl.png">
<figcaption>Stacking Layers.</figcaption>
</figure>
- All Docker images start with a **base layer**, and every time we add new content, Docker adds a new layer.
- The file in the higher layer obscures the dile directly below it. We can update the file in an image by adding new layers.
- Images can share layers, leading to efficiencies in space and
performance.
- Images and layers have their own **digests**. And all changes to layers or image manifests result in new hashes, giving us an
easy and reliable way to know if changes have been made.
- Docker compares hashes before and after every push and pull to ensure no tampering
has occurred.
- Docker support **Multi-architecture images** for different platforms and architectures, such as Windows and Linux on variations of ARM, x64, PowerPC, s390x and more. By using command like `docker buildx imagetools inspect alpine:latest`, we can see the different architectures supported behind the given image tag.
## The Docker Containers
> The **Containers** are run-time instances of images, it's designed to be stateless, ephemeral and immutable. The containers should only run a single process and we use them to build microservices applications.
### Containers and Virtual Machines
**The VM models** virtualized hardware. When the hypervisor boots, it claims all hardware resources such as CPU, RAM, Storage and Network Adapters. Once we have the VM, we install an OS and then an application.
**The Container models** virtualize operating systems. To deploy an application, we ask Docker to create a container by carving up OS resources into virtual versions.
<figure>
<img src="https://hackmd.io/_uploads/SkOflFuMyx.png">
<figcaption>VM Model vs Container Model.</figcaption>
</figure>
### How Containers Start Applications
There are three ways you can tell Docker how to start an app in a container:
1. An `ENTRYPOINT` instruction in the image.
2. A `CMD` instruction in the image.
3. A CLI argument.
The `ENTRYPOINT` and `CMD` instructions are optional image metadata that store the command Docker uses to start the default app.
::: warning
- The `ENTRYPOINT` instructions cannot be overridden on the CLI, and anything you pass in via the CLI will be appended to the `ENTRYPOINT` instruction as an argument.
- The `CMD` instructions can be overridden by CLI arguments.
:::
### The Restart Policies
The Container **restart policies** are a simple form of self-healing that allows the local Docker Engine to automatically restart failed containers. Docker supports the following four policies:
| Restart Policy | non-zero exit code | Zero exit code | docker stop | daemon restarts |
| :--: | :--: | :--: | :--: | :--: |
| `no` | N | N | N | N |
| `on-failure` | Y | N | N | Y |
| `always` | Y | Y | N | Y |
| `unless-stopped` | Y | Y | N | N |
### Containerize Applications
<figure>
<img src="https://hackmd.io/_uploads/Sy41cYdfyg.png">
<figcaption>Step of containering applications.</figcaption>
</figure>
The basic flow of containering applications is shown below:
1. Write the applications and create the list of dependencies.
2. Create a Dockerfile that tells Docker how to build and run the app.
3. Build the app into an image.
4. Push the image to a registry (optional).
5. Run a container from the image.
#### Containerize Single-Container Applications
Let's start with a simple Node.js web application that serves a web page on port `8080`. Newer versions of Docker support the `docker init` command that analyses applications and automatically creates `Dockerfile` that implement good practice:
``` session
$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/node-app
$ docker init
```
The process created a new `Dockerfile` and placed it in the current directory:
``` dockerfile
# syntax=docker/dockerfile:1
ARG NODE_VERSION=20.8.0
FROM node:${NODE_VERSION}-alpine
# Use production node environment by default
ENV NODE_ENV production
WORKDIR /usr/src/app
# Download dependencies as a separate step to take advantage of Docker's caching.
# Leverage a cache mount to /root/.npm to speed up subsequent builds.
# Leverage a bind mounts to package.json and package-lock.json to avoid having to copy them into this layer.
RUN --mount=type=bind,source=package.json,target=package.json \
--mount=type=bind,source=package-lock.json,target=package-lock.json \
--mount=type=cache,target=/root/.npm \
npm ci --omit=dev
# Run the application as a non-root user.
USER node
# Copy the rest of the source files into the image.
COPY . .
# Expose the port that the application listens on.
EXPOSE 8080
# Run the application.
CMD node app.js
```
Then use the `docker build` command to build the application into a container image. Don't forget the trailing period `.` as this tells Docker to *use the current working directory as the* **build context**.
``` session
$ docker build -t ddd-book:ch8.node .
$ docker inspect ddd-book:ch8.node
$ docker run -d --name c1 -p 5005:8080 nigelpoulton/ddd-book:ch8.node
```
<figure>
<img src="https://hackmd.io/_uploads/HJTcMnOfye.png">
<figcaption>There're 7 layers when only 4 Dockerfile instructions created layers. Some of layers are come from the base layer.</figcaption>
</figure>
In the Dockerfile, all non-comment lines are called **instructions** or **steps** and take the format `<INSTRUCTION> <arguments>`. Some instructions (e.g. `FROM`, `RUN`, `COPY` and `WORKDIR`) create new layers, whereas others (e.g. `EXPOSE`, `ENV`, `CMD` and `ENTRYPOINT`) add metadata.
<figure>
<img src="https://hackmd.io/_uploads/HyjQx6_zyx.png">
<figcaption>Maps the Dockerfile instructions to image layers.</figcaption>
</figure>
#### Multi-stage Builds for Production (1)
The container images should only contain the stuff needed to run the applications in production.
``` session
$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/multi-stage
$ docker build -t multi:full .
```
That's why we need **multi-stage builds** —— Multi-stage builds *use a single Dockerfile with multiple `FROM` instructions*. And each `FROM` instruction represents a new **build stage**.
``` dockerfile
# Stage 0 (base) : builds an image with compilation tools
FROM golang:1.22.1-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
COPY . .
# Stage 1 (build-client) : compiles the client executable
FROM base AS build-client
RUN go build -o /bin/client ./cmd/client
# Stage 2 (build-server) : compiles the server executable
FROM base AS build-server
RUN go build -o /bin/server ./cmd/server
# Stage 3 (prod) : copies the client and server executables into a slim image
FROM scratch AS prod
COPY --from=build-client /bin/client /bin/
COPY --from=build-server /bin/server /bin/
ENTRYPOINT [ "/bin/server" ]
```
There're 4 `FROM` instructions, and each of these is a distinct **build stage**. Docker will number them starting from `0`. By using the multi-stage builds, each stage ouputs an intermediate image that later stages can use. However, Docker deletes them when the final stage completes.
#### Multi-stage Builds for Production (2)
We can also build multiple images from a single Dockerfile. Docker makes it easy to create a separate image for each by splitting the final `prod` stage into two stages as follows:
``` dockerfile
# Stage 0 (base) : builds an image with compilation tools
FROM golang:1.22.1-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
COPY . .
# Stage 1 (build-client) : compiles the client executable
FROM base AS build-client
RUN go build -o /bin/client ./cmd/client
# Stage 2 (build-server) : compiles the server executable
FROM base AS build-server
RUN go build -o /bin/server ./cmd/server
# Stage (prod-client)
FROM scratch AS prod-client
COPY --from=build-client /bin/client /bin/
ENTRYPOINT [ "/bin/client" ]
# Stage (prod-server)
FROM scratch AS prod-server
COPY --from=build-server /bin/server /bin/
ENTRYPOINT [ "/bin/server" ]
```
With a Dockerfile like this, we can use the `docker build` command and give that which of the two final stages to target for the build.
``` session
$ docker build -t multi:client --target prod-client -f Dockerfile .
$ docker build -t multi:server --target prod-server -f Dockerfile .
```
### Docker Build System
#### Buildx, BuildKit, Drivers and Build Cloud
Behind the scenes, Docker's build system has a client and server:
- Client: Buildx
- Server: BuildKit
We can configure Buildx to talk to multiple Buildkit instances. Each instance called "builder", and the builders can be the local machine, cloud computing engine instance or Docker's Build Cloud.
<figure>
<img src="https://hackmd.io/_uploads/By3_IpOzJx.png">
<figcaption>Docker Build Architecture.</figcaption>
</figure>
We can use the following command to see the builders we have configured on the system.
``` session
$ docker buildx ls
NAME/NODE DRIVER/ENDPOINT STATUS BUILDKIT PLATFORMS
default* docker
\_ default \_ default running v0.16.0 linux/arm64, linux/arm/v7, linux/arm/v6
```
``` session
$ docker buildx inspect default
Name: default
Driver: docker
Last Activity: 2024-11-18 13:38:51 +0000 UTC
Nodes:
Name: default
Endpoint: default
Status: running
BuildKit version: v0.16.0
Platforms: linux/arm64, linux/arm/v7, linux/arm/v6
Labels:
org.mobyproject.buildkit.worker.moby.host-gateway-ip: 172.17.0.1
```
#### Multi-architecture Builds
We can use the `docker build` command to build images from multiple architectures, including ones different from the local machine. We can use the `docker buildx create` command to create a new builder which uses the `docker-container` driver:
``` bash
# create builders
$ docker buildx create --driver=docker-container --name=container
# make it the defualt builder
$ docker buildx use container
# build images
$ docker buildx build --builder=container \
--platform=linux/amd64,linux/arm64 \
-t nigelpoulton/ddd-book:ch8.1 --load .
```
### Good Practice for Building Images
- **Leverage the Build Cache** : The cache is only available to other builds on the same system; However, your team can share the cache on Docker Build Cloud.
- **Only Install Essential Packages** : Some package managers provide a way to only download and install essential packages instead of the entire internet. e.g. `apt` with `--no-install-recommends` flag.
- **Clean Up**
## Multi-Container Applications with Docker Compose
### Overview
The modern cloud-native applications are combining lots of small services. That's the *microservices applications*. We can use the `docker compose` command to deploy and manage the application with the compose file in YAML format.
::: info
There is also a [Compose Specification](https://www.compose-spec.io/) driving Compose as an open standard for defining multi-container microservices applications. The specification is community-led and kept separate from the Docker implementation to maintain better governance and clearer demarcation.
:::
### The Sample Application
The directory `ddd-book/multi-container` is the **build context** and contains all the application code, configuration files needed to deploy and manage the application:
``` bash
$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/multi-container
$ ls -l
total 20
drwxrwxr-x 4 ubuntu ubuntu 4096 May 21 15:53 app
-rw-rw-r-- 1 ubuntu ubuntu 288 May 21 15:53 Dockerfile
-rw-rw-r-- 1 ubuntu ubuntu 18 May 21 15:53 requirements.txt
-rw-rw-r-- 1 ubuntu ubuntu 355 May 21 15:53 compose.yaml
-rw-rw-r-- 1 ubuntu ubuntu 332 May 21 15:53 README.md
$ docker compose up --detach
```
- The `app` folder contains the application code, views, and templates.
- The `Dockerfile` describes how to build the image for the web-fe service.
- The `requirements.txt` file lists the application dependencies.
- The `compose.yaml` file is the Compose file that describes how the app works.
The application with two services `web-fe` and `redis`, a network `counter-net` and a volume `counter-vol`.
<figure>
<img src="https://hackmd.io/_uploads/rJ5FvA_fyl.png">
<figcaption>The sample application.</figcaption>
</figure>
Docker Compose use the compose files in YAML file to define microservices applications. It should be named as `compose.yaml` or `compose.yml` in convention:
``` yaml
networks:
counter-net:
volumes:
counter-vol:
services:
web-fe:
build: .
deploy:
replicas: 1
command: python app.py
ports:
- target: 8080
published: 5001
networks:
- counter-net
volumes:
- type: volume
source: counter-vol
target: /app
redis:
image: redis:alpine
deploy:
replicas: 1
networks:
counter-net:
```
::: info
Connecting both services to the `counter-net` network means they can resolve each
other by name and communicate. This is important, as the following extract from the
app.py file shows the web app communicating with the redis service by name:
``` python
import time
import redis
from flask import Flask, render_template
app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)
```
:::
### Manage Applications with Compose
#### Shutdown the Containers
Run the following command to **shut down** the application. Docker removes both the containers and the networks. However, the volume still exists, including the data stored in it:
``` bash
# shut down the application
$ docker compose down
# show the volumes
$ docker volume ls
```
#### Check the State of Containers
To check the current state and the network ports of the application, just use the `docker compose ps` command; Moreover, run the `docker compose top` command to list the processes inside each container:
``` session
# show the state of containers
$ docker compose ps
NAME COMMAND SERVICE STATUS PORTS
multi-container-redis-1 "docker-entrypoint.." redis Up 33 sec 6379/tcp
multi-container-web-fe-1 "python app/app.py" web-fe Up 33 sec 0.0.0.0:5001->8080
# show the processes inside each container
$ docker compose top
multi-container-redis-1
UID PID PPID ... CMD
lxd 12023 11980 redis-server *:6379
multi-container-web-fe-1
UID PID PPID ... CMD
root 12024 12002 0 python app/app.py python app.py
root 12085 12024 0 /usr/local/bin/python app/app.py python app.py
```
::: info
The `PID` numbers returned are the `PID` numbers as seen from the Docker Host instead of the containers.
:::
#### Stop and Restart the Application
Both the `docker compose down` command and the `docker compose stop` command halt Docker Compose applications, but in different ways:
- `docker compose down` will stop all containers and deletes associated networks, volumes (when using `-v`), and optionally images (when using `--rmi`). This fully removes the application and its resources.
- `docker compose stop` only stops the containers, leaving networks, volumes, and the containers themselves intact. The application can be quickly restarted later using docker compose start.
``` bash
$ docker compose stop
[+] Running 2/2
- Container multi-container-redis-1 Stopped 0.4s
- Container multi-container-web-fe-1 Stopped 0.5s
$ docker compose restart
[+] Running 2/2
- Container multi-container-redis-1 Started 0.4s
- Container multi-container-web-fe-1 Started 0.5s
```
#### Clean Up
Running the following command to stop and delete the application. The `--volumes` flag will delete all of the app's volumes, and the `--rmi all` will delete all of its images.
``` bash
$ docker-compose down --volumes --rmi all
- Container multi-container-web-fe-1 Removed 0.2s
- Container multi-container-redis-1 Removed 0.1s
- Volume multi-container_counter-vol Removed 0.0s
- Image multi-container-web-fe:latest Removed 0.1s
- Image redis:alpine Removed 0.1s
- Network multi-container_counter-net Removed 0.1s
```
## Docker Swarm
### Overview
**Docker Swarm** is not only an enterprise-grade *cluster of Docker nodes*, but also an *orchestrator of microservices applications*.
- **Cluster** : Docker Swarm groups one or more Docker nodes into a cluser. We will get an encrypted distributed cluster store, encrypted networks, mutual TLS, secure cluster join tokens, and a PKI. We can even add and remove nodes non-disruptively. We call these clusters "swarms".
- **Orchestration** : Docker Swarm makes deploying and managing complex microservices applications easy. We can define applications declaratively in Compose files and use simple Docker commands to deploy them to the "swarm". We can even perform rolling updates, rollbacks, and scaling operations.
### Basic Concepts
<figure>
<img src="https://hackmd.io/_uploads/S13-vbtGyg.png">
<figcaption>High-level swarm.</figcaption>
</figure>
A *swarm* is one or more Docker nodes that can be physical servers, VMs, cloud instances or even Raspberry Pi. The only requirement is that they all run Docker and can communicate over reliable networks.
Every node in a swarm is either a *manager* or a *worker*:
- **Managers** run the control plane services that maintain the state of the cluster and schedule user applications to workers.
- **Workers** run user applications.
Swarm uses TLS to encrypt communications, authenticate nodes, and authorize roles (managers and workers). It also configures and performs automatic key rotation.
### Build Secure Swarm Cluster
#### Create VMs
Run the following commands to create 5 VMs running Docker. Then named 3 of the nodes as `mgr1`, `mgr2` and `mgr3`, and named the other 2 `wrk1` and `wrk2`:
``` bash
$ multipass launch docker --name mgr1
Launched: mgr1
$ multipass launch docker --name mgr2
Launched: mgr2
$ multipass launch docker --name mgr3
Launched: mgr3
$ multipass launch docker --name wrk1
Launched: wrk1
$ multipass launch docker --name wrk2
Launched: wrk2
$ multipass ls
Name State IPv4 Image
mgr1 Running 192.168.64.61 Ubuntu 22.04 LTS
172.17.0.1
mgr2 Running 192.168.64.62 Ubuntu 22.04 LTS
172.17.0.1
mgr3 Running 192.168.64.63 Ubuntu 22.04 LTS
172.17.0.1
wrk1 Running 192.168.64.64 Ubuntu 22.04 LTS
172.17.0.1
wrk2 Running 192.168.64.65 Ubuntu 22.04 LTS
172.17.0.1
```
#### Initialize a New Swarm
Before a Docker node joins a swarm, it runs in *single-engine mode* and can only run regular containers. After joining a swarm, it switches into *swarm mode* and can run advanced containers called *swarm services*. To initialize a swarm:
1. Initialize the first manager : Initialize a new swarm from `mgr1`.
2. Join workers : Join `wrk1` and `wrk2` as worker nodes.
3. Join additional managers : join `mgr2` and `mgr3` as additional managers.
**Step 01**. Log on to `mgr1` and initialize a new swarm.
``` session
$ docker swarm init \
--advertise-addr 192.168.64.61:2377 \
--listen-addr 192.168.64.61:2377
Swarm initialized: current node (d21lyz...c79qzkx) is now a manager.
```
- The `--advertise-addr` flag is optional and tells Docker which of the node's
IP addresses to advertise as the swarm API endpoint. It’s usually one of the
node's IP addresses but can also be an external load balancer.
- The `--listen-addr` flag tells Docker which of the node's interfaces to accept
swarm traffic on. It defaults to the same value as `--advertise-addr` if you
don't specify it. However, if `--advertise-addr` is a load balancer, you must
use `--listen-addr` to specify a local IP.
**Step 02**. List the nodes in the swarm. See the tokens needed to add new workers and managers.
``` session
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
d21...qzkx * mgr1 Ready Active Leader
$ docker swarm join-token worker
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-0uahebax...c87tu8dx2c 192.168.64.61:2377
$ docker swarm join-token manager
To add a manager to this swarm, run the following command:
docker swarm join --token SWMTKN-1-0uahebax...ue4hv6ps3p 192.168.64.61:2377
```
**Step 03**. Log on `wrk1` and `wrk2` then join them as worker nodes.
``` session
$ docker swarm join \
--token SWMTKN-1-0uahebax...c87tu8dx2c \
10.0.0.1:2377 \
--advertise-addr 192.168.64.64:2377 \
--listen-addr 192.168.64.64:2377
$ docker swarm join \
--token SWMTKN-1-0uahebax...c87tu8dx2c \
10.0.0.1:2377 \
--advertise-addr 192.168.64.65:2377 \
--listen-addr 192.168.64.65:2377
```
**Step 04**. Log on `mgr2` and `mgr3` then join them as managers.
``` session
$ docker swarm join \
--token SWMTKN-1-0uahebax...ue4hv6ps3p \
10.0.0.1:2377 \
--advertise-addr 192.168.64.63:2377 \
--listen-addr 192.168.64.62:2377
$ docker swarm join \
--token SWMTKN-1-0uahebax...ue4hv6ps3p \
10.0.0.1:2377 \
--advertise-addr 192.168.64.63:2377 \
--listen-addr 192.168.64.62:2377
```
**Step 05**. List the nodes in the swarm.
``` session
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
0g4rl...babl8 * mgr2 Ready Active Reachable 26.1.1
2xlti...l0nyp mgr3 Ready Active Reachable 26.1.1
d21ly...9qzkx mgr1 Ready Active Leader 26.1.1
8yv0b...wmr67 wrk1 Ready Active 26.1.1
e62gf...l5wt6 wrk2 Ready Active 26.1.1
```
::: success
- You should keep the join tokens in a safe place, as they're all that's required to join other nodes to the swarm.
- Make sure the network ports are open between all nodes: `2377/tcp`, `7946/tcp,udp`, `4789/udp`
- The nodes with nothing in the `MANAGE STATUS` column are work nodes.
:::
### High Availability (HA) and Security
#### The Leader and The Followers
Swarm Clusters are highly available (HA) —— one or more managers can fail and the swarm will keep running. It implements **active/passive multi-manager HA**:
- There're are the active manager (*leader*) and the passitive managers (*followers*), and the leader is the only manager that can update the swarm configuration.
- If the leader fails, one of the followers will be elected as the new leader. And the swarm will keep running without any service interruption.
> *Leader* and *follower* is Raft terminology. Swarm implements the [The Raft Consensus Algorithm](https://raft.github.io/) to maintain a consistent cluster state across multiple highly-available managers.
<figure>
<img src="https://hackmd.io/_uploads/Bk-BaiFMyg.png">
<figcaption>The active/passive multi-manager HA in Docker Swarm.</figcaption>
</figure>
#### Good Practices When Manager HA
Here are the good practices apply when it comes to manager HA:
1. Always deploy an odd number of managers.
2. Don’t deploy too many managers (3 or 5 is usually enough). Because more participants means longer times to achieve consensus.
3. Spread managers across availability zones.
Consider the following situations:
<figure>
<img src="https://hackmd.io/_uploads/S1qd0jtMJg.png">
<figcaption>The active/passive multi-manager HA in Docker Swarm.</figcaption>
</figure>
- **Even Number of Managers** : network incident created a network partition with two managers on either side. The **split brain** occured because neither side can be sure it has a majority, and the cluster goes into *read-only* mode.
- **Odd Number of Managers** : the swarm remains fully operational in *read-write* mode because two managers on the right side of the network partition know they have a majority (quorum).
<figure>
<img src="https://hackmd.io/_uploads/S1qd0jtMJg.png">
<figcaption>Even number of managers / Odd number of managers</figcaption>
</figure>
#### Locking a Swarm
Swarms automatically configures a lot of security features, but restarting an old manager or restoring an old backup can potentially compromise the cluster. Hence, we should use Swarms's **autolock** feature to force restarted managers to present a key before being admitted back into the swarm:
``` bash
# autolock new swarms at build time
$ docker swarm init --autolock=true
```
``` session
# autolock existed swarms
$ docker swarm update --autolock=true
Swarm updated.
To unlock a swarm manager after it restarts, run the `docker swarm unlock` command and
provide the following key:
SWMKEY-1-XDeU3XC75Ku7rvGXixJ0V7evhDJGvIAvq0D8VuEAEaw
Please remember to store this key in a password manager...
# restart docker
$ sudo systemctl restart docker
$ docker node ls
Error response from daemon: Swarm is encrypted and needs to be unlocked before it can be used.
$ docker swarm unlock
Please enter unlock key: <enter your key>
```
## Docker and WebAssembly (Wasm)
### Setup
#### Configure Docker Desktop for Wasm
Support for Wasm is a beta feature in Docker Desktop. To configure Docker Desktop for Wasm:
1. Open the Docker Desktop UI. And then click the Settings icon at the top right.
2. Make sure "Use containerd for pulling and storing images" is selected.
3. Click "Features in development" tab, select the "Enable Wasm" option.
4. Click "Apply & Restart" button.
#### Install Rust, Spin and Configure for Wasm
Once installed Rust, run the following command to install the `wasm32-wasi` target so that Rust can compile to Wasm:
``` bash
$ rustup target add wasm32-wasi
```
Spin is a Wasm framekwork and runtime that makes building and running Wasm applications easy. Just search the web for "How to install fermyon spin" and follow the instructions for our system.
### Wasm Containers
**Wasm** is a new type of application that is smaller, faster, and more portable than traditional Linux containers. It's a new virtual machine architecture that programming languages compile to.
- Wams applications are great for AI workloads, serverless funtions, plugins and edge devices, but not so good for complex networking or heavy I/O.
- Instead of compiling applications to Linux on ARM or Linux on AMD, we can compile them to Wasm. Then we can run these Wasm applications on any system with a Wasm runtime.
Docker Desktop already ships with several Wasm runtimes:
``` session
$ docker run --rm -i --privileged --pid=host jorgeprendes420/docker-desktop-shim-manager:latest
io.containerd.wasmtime.v1
io.containerd.spin.v2
io.containerd.wws.v1
io.containerd.lunatic.v1
io.containerd.slight.v1
io.containerd.wasmedge.v1
io.containerd.wasmer.v1
```
### Containerize Wasm Applications
#### Write Wasm Applications
We can use **spin** to create a simple web server:
``` session
$ spin new hello-world -t http-rust
Description: Wasm app
HTTP path: /hello
```
After edited the `src/librs` file, we can run the `spin build` command to compile the application as a Wasm binary. Then the application could be run on any system with the spin runtime.
``` session
$ spin build
Building component with `cargo build --target wasm32-wasi --release`
...
Finished building all Spin components
$ spin up
Logging component stdio to ".spin/logs/"
Serving http://127.0.0.1:3000
Available Routes:
hello-world: http://127.0.0.1:3000/hello
```
#### Build the Images and Run as Containers
As always, we need a Dockerfile that tells Docker how to package the application as an image. Just create a new file called `Dockerfile` in current directory:
``` dockerfile
FROM scratch
COPY /target/wasm32-wasi/release/hello_world.wasm .
COPY spin.toml .
```
Then run the following commands to containerize the Wasm application:
``` bash
$ docker build \
--platform wasi/wasm \
--provenance=false \
-t nigelpoulton/ddd-book:wasm .
$ docker run -d --name wasm-ctr \
--runtime=io.containerd.spin.v2 \
--platform=wasi/wasm \
-p 5556:80 \
nigelpoulton/ddd-book:wasm /
```
## Docker Networks
### Overview
Docker networking is based on [libnetwork](https://github.com/moby/libnetwork), which is the reference implementation of an open-source architecture called the **Container Network Model (CNM)**:
- **The Container Network Model (CNM)** is the design specification and outlines the fundamental building blocks of a Docker Network.
- **Libnetwork** is a real-world implementation of the CNM. It's open-sourced as part of the Moby project and used by Docker.
- **Drivers** extend the model by implementing specific network topologies such as VXLAN overlay networks.
### Network Type: Single-host Bridge Network
#### The Bridge Network
The *single-host bridge network* is the simplest type of Docker Network. The bridge network only spans a single Docker Host, and it's an implementation of an `802.1d` bridge (layer 2 switch).
<figure>
<img src="https://hackmd.io/_uploads/S1sNNVcM1e.png">
<figcaption>The Complete stack with containers connecting to the bridge network.</figcaption>
</figure>
- When containers are connected to the bridge network, they can communicate by their IP addresses or container names but remain isolated from the external network unless specific port mappings are configured.
- Every new Docker Host gets a default single-host bridge network called `bridge` that Docker connects new containers to, unless we override it with the `--network` flag.
- The default `bridge` network on all Linux-based Docker Hosts is called `bridge` and maps to an underlying Linux bridge in the host's kernerl called `docker0`.
``` session
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
c7464dce29ce bridge bridge local
c65ab18d0580 host host local
42a783df0fbe none null local
$ docker network inspect bridge | grep bridge.name
"com.docker.network.bridge.name": "docker0",
$ brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.0242aff9eb4f no
docker_gwbridge 8000.02427abba76b no
$ ip link show docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc...
link/ether 02:42:af:f9:eb:4f brd ff:ff:ff:ff:ff:ff
```
<figure>
<img src="https://hackmd.io/_uploads/HyNeQ4qGyx.png">
<figcaption>Mapping the default Docker bridge network to the docker0 bridge in the host's kernel</figcaption>
</figure>
::: info
- Docker creates single-host bridge networks with the built-in `bridge` driver. If you run Windows containers, you'll need to use the `nat` driver.
- When create a new Docker bridge network, it also creates a new Linux bridge in the host's kernel behind the scenes.
:::
#### Create Bridge Network
Run the following command to create a new single-host bridge network:
``` session
$ docker network create -d bridge localnet
f918f1bb0602373bf949615d99cb2bbbef14ede935fbb2ff8e83c74f10e4b986
$ docker run -d --name c1 \
--network localnet \
alpine sleep 1d
$ docker network inspect localnet --format '{{json .Containers}}' | jq
{
"09c5f4926c87da12039b3b510a5950b3fe9db80e13431dc17d870450a45fd84a": {
"Name": "c1",
"EndpointID": "27770ac305773b352d716690fb9f8e05c1b71e10dc66f67b88e93cb923ab9749",
"MacAddress": "02:42:ac:15:00:02",
"IPv4Address": "172.21.0.2/16",
"IPv6Address": ""
}
}
$ brctl show
bridge name bridge id STP enabled interfaces
br-f918f1bb0602 8000.0242372a886b no veth833aaf9
docker0 8000.0242aff9eb4f no
docker_gwbridge 8000.02427abba76b no
```
<figure>
<img src="https://hackmd.io/_uploads/BkZqBNqfyl.png">
<figcaption>The bridge configuration on the host.</figcaption>
</figure>
<figure>
<img src="https://hackmd.io/_uploads/Ska78E9zJg.png">
<figcaption>Every veth is like a cable with an interface on either end. One end is connected to the Docker network, and the other end is connected to the associated bridge in the kernel</figcaption>
</figure>
If we add more containers to the `localnet` network, they'll all be able to communicate using names. That's because Docker automatically registers container names with an internal DNS service. The exception to this rule is the built-in `bridge` network that doesn't support DNS resolution.
#### Port Mapings for External Access
Containers on bridge networks can only communicate with other containers on the same network. By mapping containers to ports on the Docker host, we can access the container external.
``` session
$ docker run -d --name web \
--network localnet \
--publish 5005:80 \
nginx
$ docker port web
80/tcp -> 0.0.0.0:5005
80/tcp -> [::]:5005
```
### Network Type: The `macvlan` Network
#### The `macvlan` Driver
The build-in `macvlan` driver (`transparent` if using Windows containers) gives every container its own IP and MAC address on the external physical network. As it doesn't require port mappings or additional bridges, the performance is good.
<figure>
<img src="https://hackmd.io/_uploads/rkdj8rczJe.png">
<figcaption>The <code>macvlan</code> driver makes containers visible on external networks.</figcaption>
</figure>
#### Connect Containers to `macvlan` Network
Assume we have the network with two VLANs, and then add the Docker Host connected to the network. To attach a container to VLAN 100, we need to create a new Docker network with `macvlan` driver and configure: *Subnet Info*, *Gateway*, *Range of IPs can be assigned*, *the Host's Interfaces to use*:
``` session
$ docker network create -d macvlan \
--subnet=10.0.0.0/24 \
--ip-range=10.0.0.0/25 \
--gateway=10.0.0.1 \
-o parent=eth0.100 \
macvlan100
$ docker run -d --name mactainer1 \
--network macvlan100 \
alpine sleep 1d
```
<figure>
<img src="https://hackmd.io/_uploads/BksrjHcz1l.png">
<figcaption>Add the Docker Host connected to the network.</figcaption>
</figure>
<figure>
<img src="https://hackmd.io/_uploads/rkHnjBqGkx.png">
<figcaption>Attach container to VLAN 100.</figcaption>
</figure>
Note that the Docker `macvlan` driver also supports **VLAN trunking**. It means we can create multiple `macvlan` networks that connect to different VLANs.
### Network Type: Overlay Network
### Troubleshooting: The Daemon Logs and The Container Logs
#### Daemon Logs
- For Windows containers, we can view them in the Windows Event Viewer or `~\AppData\Local\Docker`.
- For Linux containers, it depends on which `init` system been used. If `systemd` been used, Docker will post logs to `journald` (check with `journalctl -u docker.service` command).
We can also tell Docker how verbose we want daemon logging to be. By editing the daemon config file at `/etc/docker/daemon.json`. Be sure to restart Docker after making any changes.
#### Container Logs
We can normally view container logs with the `docker logs` command. For Docker Swarm, we should use the `docker service logs` command.
- Docker supports a few different log drivers. The `json-file` and `journald` are the easiest to configure. Both of them work with `docker logs` and `docker service logs` commands.
- We can also start a container or a service with the `--log-driver` and `--log-opts` flags to override the settings in `daemon.json`.
### Service Discovery
The `libnetwork` framework also provides **service discovery** that allows all containers and Swarm services to locate each other by name. The only requirement is that the containers be on the same network.
<figure>
<img src="https://hackmd.io/_uploads/B1fikL5zkg.png">
<figcaption>Docker implements a native DNS server, and configures every containers to use it for name resolution.</figcaption>
</figure>
Assume the container `c1` pinging another container `c2` by name:
1. The `c1` container issues a `ping c2` command. The container's local DNS resolver checks its cache to see if it has an IP address for `c2`. Note that *All Docker containers have a local DNS resolver!*
2. The local resolver doesn't have an IP address for `c2`, so it initiates a recursive query to the embedded Docker DNS server. Note that *All Docker containers are pre-configured to know how to send queries to the embedded DNS server.*
3. The Docker DNS server maintains name-to-IP mappings for every container we create with the `--name` or `--net-alias` flags.
4. The DNS server returns the IP address of the `c2` container to the local resolver in the `c1` container.
5. The `c1` container sends the ping request (ICMP echo request) to the IP address of `c2`.
Besides, we can use the `--dns` flag to start containers and services with a customized list of DNS servers, and the `--dns-search` flag to add custom search domains for queries against unqualified names. Both of these useful if the applications query names outside the Docker environment. It will add entries to the container's `/etc/resolv.conf` file.
``` session
$ docker run -it --name custom-dns \
--dns=8.8.8.8 \
--dns-search=nigelpoulton.com \
alpine sh
```
### Ingress Load Balancing
Docker Swarm supports two ways of publishing services to external clients:
- **Ingress Mode** (default) : External clients can access services via any swarm node, even nodes not hosting a service replica.
- **Host Mode** : External clients can only access services via nodes running replicas.
<figure>
<img src="https://hackmd.io/_uploads/BJsQX85fkg.png">
<figcaption>Ingress Mode / Host Mode</figcaption>
</figure>
``` session
$ docker service create -d --name svc1 \
--publish published=5005,target=80,mode=host \
nginx
```
## Docker Volumes
### The Local Storage Layer
Docker creates containers by stacking *read-only* image layers and placing a thin layer of local storage on the top. This allows multiple containers to share the same read-only image layers:
<figure>
<img src="https://hackmd.io/_uploads/B1nuQ3tGJx.png">
<figcaption>Ephemeral Container Storage</figcaption>
</figure>
The local storage layer (*the thin writeable layer*, *ephemeral storage*, *read-write storage* or *graphdriver storage*) is coupled to the container's lifecycle, it gets created when the container created, and deleted when the container deleted. It's not a good place for data persistant.
Docker keeps the local storage layer on the Docker Host's filesystem:
- Linux Containers: `/var/lib/docker/<storage-driver>/...`
- Windows Containers: `C:\ProgramData\Docker\windowsfilter\...`
::: info
We should treat containers as **immutable objects**, and never change them once deployed. If we need to fix or change the configuration of a live container, we should create and test a new container with the changes and then replace the live container with new one.
:::
### The Volumes
We can create volumes and mount them into the containers. When we mount the volumes, they could be a directory in the containers' filesystem, and anything we write to the directory gets stored in the volume:
<figure>
<img src="https://hackmd.io/_uploads/Sy1TU3YGJe.png">
<figcaption>High-level view of Volumes and Containers</figcaption>
</figure>
The volumes are first-class objects in Docker. There's a `docker volume` sub-command and a **volume** resource in the API. Let's create a new volume by following commands:
``` session
$ docker volume create myvol
myvol
$ docker volume ls
DRIVER VOLUME NAME
local myvol
$ docker volume inspect myvol
[
{
"CreatedAt": "2024-05-15T12:23:14Z",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/myvol/_data",
"Name": "myvol",
"Options": null,
"Scope": "local"
}
]
$ docker volume prune --all
```
- By default, Docker creates new volumes with the built-in `local` driver. We can also use the `-d` or `--driver` flag to specify a different driver.
- The `Mountpoint` tells us where the volume exists in the Docker Host's filesystem.
- The `docker volume prune` command deletes all volumes not mounted into a container
or service replica.
We can also deploy volumes via Dockerfiles by using the `VOLUME` instruction with the format `VOLUME <container-mount-point>`. But we can't specifiy a host directory when define volumes in Dockerfiles.
### Use Volumes with Containers
We can specified the `--mount` flag when using the `docker run` command, to tell Docker for mounting a volume. If we sepcify a volume that already exists, Docker will use it; while the volume doesn't exist, Docker will create it.
``` session
$ docker run -it --name voltainer \
--mount source=bizvol,target=/vol \
alpine
$ docker volume rm myvol
```
Furthermore, integrating Docker with *external storage system* lets us present shared storage to multiple nodes.
## Docker Security
## References
- [How libnetwork has been Designed](https://github.com/moby/moby/blob/master/libnetwork/docs/design.md)
<!-- Widgets: Likecoin -->
{%hackmd @Hsins/widget-license %}