讀書筆記 | Docker Deep Dive (2024)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Overview
Docker and The Container-Related Standards and Projects
- Docker
- Container-Related Standards and Projects
Getting Started: The Ops Perspective and The Dev Perspective
- The Ops Perspective
- The Dev Perspective
  - Build Docker Images And Run as Containers
  - Clean Up
The Docker Engine
The Docker Images
The Docker Containers
Multi-Container Applications with Docker Compose
Docker Swarm
Docker and WebAssembly (Wasm)
Docker Networks
Docker Volumes
Docker Security
References

Overview

The feature of the container model is that every container shares the OS of the host it's running on.
The major technologies behind modern containers include: kernel namespaces, control groups (cgroups), capabilities and more.
Kubernetes is the industry standard platform for deploying and managing containerized applications. Older versions of k8s used Docker to start and stop containers. However, newer versions use containerd.

Docker

The word Docker is a British expression meaning dock worker that referes to a person who loads and unloads cargo from ships.

There're two major parts to the Docker platform:

The Docker CLI (client) : just the familiar docker command-line tool for deploying and managing containers. It converts simple commands into API requests and sends them to the engine.
The Docker Engine (server) : comprises all the server-side components that run and manage containers.

Docker CLI and Daemon

There're several standards and governance bodies influencing the development of containers and its ecosystem. e.g.:

The Open Container Initiative (OCI) : maintains three standards like image-spec, runtime-spec and distribution-spec.
The CloudNative Computing Foundation (CNCF) : hosts important projects such as Kubernetes, containerd, Notary, Prometheus, Cilium…
The Moby Project : created by Docker as a community-led place for developers to build specialized tools for building container platforms.

Getting Started: The Ops Perspective and The Dev Perspective

The Ops Perspective

Check Docker is Working

A typical Docker installation installs the client and the engine on the same machine, and configures them to talk to each other. Run the docker version command to ensure both are installed and running:

$ docker version
Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:41:19 2024
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:41:19 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.23
  GitCommit:        57f17b0a6295a39009d861b89e3b3b87b005ca27
 runc:
  Version:          1.1.14
  GitCommit:        v1.1.14-0-g2c9f560
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Download Images and Start Containers

The images are objects that contain everything an application needs to run. It includes an OS filesystem, the application and all dependencies. They're similar to VM templates or classes in development.

$ docker pull ubuntu:latest
latest: Pulling from library/ubuntu
b91d8878f844: Download complete
Digest: sha256:e9569c25505f33ff72e88b2990887c9dcf230f23259da296eb814fc2b41af999
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu latest e9569c25505f 10 days ago 106MB

Use the docker run command to start a new container, and attach the shell to the container's terminal; We can also use the docker attach command to attach the shell to the container's main process for executing command inside the container.

$ docker run --name test -it ubuntu:latest bash
root@bbd2e5ad1817:/#

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
bbd2e5ad1817 ubuntu:latest "/bin/bash" 7 mins Up 7 min test

$ docker attach test
root@bbd2e5ad1817:/#

We can use the docker attach command to attach the shell to the container's main process for executing command inside the container.

Delete the Container

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
bbd2e5ad1817 ubuntu:latest "/bin/bash" 9 mins Up 9 min test

$ docker stop test
test

$ docker rm test
test

The Dev Perspective

Build Docker Images And Run as Containers

The Dockerfile is a plain-text document that tells Docker how to build the application and dependencies into an image. For example:

FROM alpine
LABEL maintainer="nigelpoulton@hotmail.com"
RUN apk add --update nodejs npm curl
COPY . /src
WORKDIR /src
RUN npm install
EXPOSE 8080
ENTRYPOINT ["node", "./app.js"]

We can run docker build command to create a new image based on the instructions in the Dockerfile. Also, that's called "containerized the application" in jargon.

$ git clone https://github.com/nigelpoulton/psweb.git && cd psweb
$ docker build -t test:latest .
[+] Building 36.2s (11/11) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => naming to docker.io/library/test:latest 0.0s
=> => unpacking to docker.io/library/test:latest 0.7s

$ docker images
REPO TAG IMAGE ID CREATED SIZE
test latest 0435f2738cf6 21 seconds ago 160MB

$ docker run -d --name web1 --publish 8080:8080 test:latest

Clean Up

$ docker rm web1 -f
web1

$ docker rmi test:latest
Untagged: test:latest
Deleted: sha256:0435f27...cac8e2b

The Docker Engine

Two Major Components: The Docker Daemon and LXC

Docker Engine is jargon for the server-side components of Docker that run and manage containers. It's similar to ESXi in VMware.

The Docker Engine had two major components:

The Docker Daemon was a monolithic binary containing all the code for the API, image builders, container execution, volumes, networking.
LXC did the hard work of interfacing with the Linux Kernel and constructing the required namespaces and cgroups to build and start containers.

But now, Docker replaced LXC with its own tool libcontainer. And the Docker Dameon are break apart the features.

Docker Engine Components and Responsibilites

`runc (r)` and `containerd (c)`

Docker and Kubernetes both use runc as their default low-level runtime, and both pair it with the containerd high-level runtime:

containerd operates as the high-level runtime managing lifecycle events
runc operates as the low-level runtime executing lifecycle events by interfacing
with the kernel to do the work of actually building containers and deleting them

Most of the time, containerd is paired with runc as its lowlevel runtime. However, it uses shims that make it possible to replace runc with other low-level runtimes.

The Process of `docker run`

The process of docker run

When we run commands like docker run, the Docker client converts them into API requests and sends them to the API exposed by the Docker daemon.

The daemon can expose the API on a local socket or over the network. By default, the local socket is /var/run/docker.sock on Linux or \pipe\docker_engine on Windows.
The daemon communicates with containerd via a CRUD-style API over gRPC.
The containerd will converts the required Docker image into an OCI bundle and tells runc to use this to create a new container.
The container is started as a child process of runc. As soon as the containers starts, runc exits.

Sometimes, we call this daemonless containers.

The `shim` Component

The Docker Engine uses shims in between containerd and the OCI layer. It brings the following benefits:

Daemonless Containers.
Improves Effeciency.
Makes the OCI Layer Pluggable.

The containerd forks a shim and a runc process for every new container. Each runc process exits as soon as the container starts running, leaving the shim process as the container's parent process.

The Docker Images

Docker Images and Registries

An image is a read-only package containing everything we need to run an application. This means they include application code, dependencies, a minimal set of OS constructs, and metadata.

The image registries contain one or more image repositories, and image repositories contain one or more images:

The Local Repository is jargon for an area on the local machine, where Docker stores images for more convenient access. Sometimes, it's been called "image cache".
People store images in centralized places called registries. Most modern registries implement the OCI distribution-spec.

Most of the popular applications and operating systems have official repositories on Docker Hub. Their URLs that exist at the top level of the Docker Hub namespace.

Image Naming and Tagging

A fully qualified image name include the registry name, user name, organization name, repository name, and tag. (Docker will automatically populates the registry and tag values if we don't specify them).

Fully qualified image name.

Images and Layers

Images are made by stacking independent layers and representing them as a single
unified object. Note that images are build-time constructs, whereas containers are run-time constructs.

Docker Images and Stacked Layers.

Inspect Layers Information

We can use the docker inspect or docker history command to inspect layer information. Also, when we pull the images, each line ending with Pull complete represents a layer.

$ docker inspect node:latest
[
    {
        "Id": "sha256:09d13f1ec5ed523d89cbb8976c48a79d3efc033c9c66ec53915ceb569e4406b5",
        "RepoTags": [
            "node:latest"
        ],
        ...
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:ec8ae7dad7aba50e0f8bff1dc969d34d3584fb7ada6ce9948dad83e95939b5cc",
                "sha256:de0d18f93508670ab2b3cc68c87ea02006b72b988a939ba6a0e0dd71cbfbd329",
                "sha256:29842e18ccdd85692bd8ec615cb35cecba3fb4021234e67d76244bae975ae6db",
                "sha256:c5d4093056babc39221f883cf48609f24ea97cb29312fc40e0fbe350ca0a56b7",
                "sha256:e27a35349faa7ce69e85e350be2d6328bf69ecc236175a08d28e4ff7e1719aa0",
                "sha256:49f14352d048e099a09224e57fff85022d5be5652a539beb46644c3d7228f7ce",
                "sha256:9e2c4e8f8639dd6b716d4d2666f26714d6fa4f4d604f19e0ea8d770005b3767b",
                "sha256:34c7b32b88fe8d91638482ce1c8bd5a151d654d32df57d8c2e78ef45a118aafb"
            ]
        },
        "Metadata": {
            "LastTagTime": "0001-01-01T00:00:00Z"
        }
    }
]

Note that docker history command shows the build history of an image and is not a strict list of layers in the final image. For example, some Dockerfile instructions (ENV, EXPOSE, CMD, and ENTRYPOINT) only add metadata and don’t create layers.

More about Images and Layers

Under the hood, Docker uses storage drivers to stack layers and present them as a unified filesystem and image. Almost all Docker setups use the overlay2 driver, but zfs, btrfs and vfs are alternative options.

Stacking Layers.

All Docker images start with a base layer, and every time we add new content, Docker adds a new layer.
The file in the higher layer obscures the dile directly below it. We can update the file in an image by adding new layers.
Images can share layers, leading to efficiencies in space and
performance.
Images and layers have their own digests. And all changes to layers or image manifests result in new hashes, giving us an
easy and reliable way to know if changes have been made.
Docker compares hashes before and after every push and pull to ensure no tampering
has occurred.
Docker support Multi-architecture images for different platforms and architectures, such as Windows and Linux on variations of ARM, x64, PowerPC, s390x and more. By using command like docker buildx imagetools inspect alpine:latest, we can see the different architectures supported behind the given image tag.

The Docker Containers

The Containers are run-time instances of images, it's designed to be stateless, ephemeral and immutable. The containers should only run a single process and we use them to build microservices applications.

Containers and Virtual Machines

The VM models virtualized hardware. When the hypervisor boots, it claims all hardware resources such as CPU, RAM, Storage and Network Adapters. Once we have the VM, we install an OS and then an application.

The Container models virtualize operating systems. To deploy an application, we ask Docker to create a container by carving up OS resources into virtual versions.

VM Model vs Container Model.

How Containers Start Applications

There are three ways you can tell Docker how to start an app in a container:

An ENTRYPOINT instruction in the image.
A CMD instruction in the image.
A CLI argument.

The ENTRYPOINT and CMD instructions are optional image metadata that store the command Docker uses to start the default app.

The ENTRYPOINT instructions cannot be overridden on the CLI, and anything you pass in via the CLI will be appended to the ENTRYPOINT instruction as an argument.
The CMD instructions can be overridden by CLI arguments.

The Restart Policies

The Container restart policies are a simple form of self-healing that allows the local Docker Engine to automatically restart failed containers. Docker supports the following four policies:

Restart Policy	non-zero exit code	Zero exit code	docker stop	daemon restarts
`no`	N	N	N	N
`on-failure`	Y	N	N	Y
`always`	Y	Y	N	Y
`unless-stopped`	Y	Y	N	N

Containerize Applications

Step of containering applications.

The basic flow of containering applications is shown below:

Write the applications and create the list of dependencies.
Create a Dockerfile that tells Docker how to build and run the app.
Build the app into an image.
Push the image to a registry (optional).
Run a container from the image.

Containerize Single-Container Applications

Let's start with a simple Node.js web application that serves a web page on port 8080. Newer versions of Docker support the docker init command that analyses applications and automatically creates Dockerfile that implement good practice:

$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/node-app
$ docker init

The process created a new Dockerfile and placed it in the current directory:

# syntax=docker/dockerfile:1

ARG NODE_VERSION=20.8.0

FROM node:${NODE_VERSION}-alpine

# Use production node environment by default
ENV NODE_ENV production


WORKDIR /usr/src/app

# Download dependencies as a separate step to take advantage of Docker's caching.
# Leverage a cache mount to /root/.npm to speed up subsequent builds.
# Leverage a bind mounts to package.json and package-lock.json to avoid having to copy them into this layer.
RUN --mount=type=bind,source=package.json,target=package.json \
    --mount=type=bind,source=package-lock.json,target=package-lock.json \
    --mount=type=cache,target=/root/.npm \
    npm ci --omit=dev

# Run the application as a non-root user.
USER node

# Copy the rest of the source files into the image.
COPY . .

# Expose the port that the application listens on.
EXPOSE 8080

# Run the application.
CMD node app.js

Then use the docker build command to build the application into a container image. Don't forget the trailing period . as this tells Docker to use the current working directory as the build context.

$ docker build -t ddd-book:ch8.node .
$ docker inspect ddd-book:ch8.node
$ docker run -d --name c1 -p 5005:8080  nigelpoulton/ddd-book:ch8.node

There're 7 layers when only 4 Dockerfile instructions created layers. Some of layers are come from the base layer.

In the Dockerfile, all non-comment lines are called instructions or steps and take the format <INSTRUCTION> <arguments>. Some instructions (e.g. FROM, RUN, COPY and WORKDIR) create new layers, whereas others (e.g. EXPOSE, ENV, CMD and ENTRYPOINT) add metadata.

Maps the Dockerfile instructions to image layers.

Multi-stage Builds for Production (1)

The container images should only contain the stuff needed to run the applications in production.

$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/multi-stage
$ docker build -t multi:full .

That's why we need multi-stage builds —— Multi-stage builds use a single Dockerfile with multiple FROM instructions. And each FROM instruction represents a new build stage.

# Stage 0 (base) : builds an image with compilation tools
FROM golang:1.22.1-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
COPY . .

# Stage 1 (build-client) : compiles the client executable
FROM base AS build-client
RUN go build -o /bin/client ./cmd/client

# Stage 2 (build-server) : compiles the server executable
FROM base AS build-server
RUN go build -o /bin/server ./cmd/server

# Stage 3 (prod) : copies the client and server executables into a slim image
FROM scratch AS prod
COPY --from=build-client /bin/client /bin/
COPY --from=build-server /bin/server /bin/
ENTRYPOINT [ "/bin/server" ]

There're 4 FROM instructions, and each of these is a distinct build stage. Docker will number them starting from 0. By using the multi-stage builds, each stage ouputs an intermediate image that later stages can use. However, Docker deletes them when the final stage completes.

Multi-stage Builds for Production (2)

We can also build multiple images from a single Dockerfile. Docker makes it easy to create a separate image for each by splitting the final prod stage into two stages as follows:

# Stage 0 (base) : builds an image with compilation tools
FROM golang:1.22.1-alpine AS base
WORKDIR /src
COPY go.mod go.sum .
RUN go mod download
COPY . .

# Stage 1 (build-client) : compiles the client executable
FROM base AS build-client
RUN go build -o /bin/client ./cmd/client

# Stage 2 (build-server) : compiles the server executable
FROM base AS build-server
RUN go build -o /bin/server ./cmd/server

# Stage (prod-client)
FROM scratch AS prod-client
COPY --from=build-client /bin/client /bin/
ENTRYPOINT [ "/bin/client" ]

# Stage (prod-server)
FROM scratch AS prod-server
COPY --from=build-server /bin/server /bin/
ENTRYPOINT [ "/bin/server" ]

With a Dockerfile like this, we can use the docker build command and give that which of the two final stages to target for the build.

$ docker build -t multi:client --target prod-client -f Dockerfile .
$ docker build -t multi:server --target prod-server -f Dockerfile .

Docker Build System

Buildx, BuildKit, Drivers and Build Cloud

Behind the scenes, Docker's build system has a client and server:

Client: Buildx
Server: BuildKit

We can configure Buildx to talk to multiple Buildkit instances. Each instance called "builder", and the builders can be the local machine, cloud computing engine instance or Docker's Build Cloud.

Docker Build Architecture.

We can use the following command to see the builders we have configured on the system.

$ docker buildx ls
NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker
 \_ default    \_ default       running   v0.16.0    linux/arm64, linux/arm/v7, linux/arm/v6

$ docker buildx inspect default
Name:          default
Driver:        docker
Last Activity: 2024-11-18 13:38:51 +0000 UTC

Nodes:
Name:             default
Endpoint:         default
Status:           running
BuildKit version: v0.16.0
Platforms:        linux/arm64, linux/arm/v7, linux/arm/v6
Labels:
 org.mobyproject.buildkit.worker.moby.host-gateway-ip: 172.17.0.1

Multi-architecture Builds

We can use the docker build command to build images from multiple architectures, including ones different from the local machine. We can use the docker buildx create command to create a new builder which uses the docker-container driver:

# create builders
$ docker buildx create --driver=docker-container --name=container

# make it the defualt builder
$ docker buildx use container

# build images
$ docker buildx build --builder=container \
    --platform=linux/amd64,linux/arm64 \
    -t nigelpoulton/ddd-book:ch8.1 --load .

Good Practice for Building Images

Leverage the Build Cache : The cache is only available to other builds on the same system; However, your team can share the cache on Docker Build Cloud.
Only Install Essential Packages : Some package managers provide a way to only download and install essential packages instead of the entire internet. e.g. apt with --no-install-recommends flag.
Clean Up

Multi-Container Applications with Docker Compose

Overview

The modern cloud-native applications are combining lots of small services. That's the microservices applications. We can use the docker compose command to deploy and manage the application with the compose file in YAML format.

There is also a Compose Specification driving Compose as an open standard for defining multi-container microservices applications. The specification is community-led and kept separate from the Docker implementation to maintain better governance and clearer demarcation.

The Sample Application

The directory ddd-book/multi-container is the build context and contains all the application code, configuration files needed to deploy and manage the application:

$ git clone https://github.com/nigelpoulton/ddd-book.git
$ cd ddd-book/multi-container
$ ls -l
total 20
drwxrwxr-x 4 ubuntu ubuntu 4096 May 21 15:53 app
-rw-rw-r-- 1 ubuntu ubuntu 288 May 21 15:53 Dockerfile
-rw-rw-r-- 1 ubuntu ubuntu 18 May 21 15:53 requirements.txt
-rw-rw-r-- 1 ubuntu ubuntu 355 May 21 15:53 compose.yaml
-rw-rw-r-- 1 ubuntu ubuntu 332 May 21 15:53 README.md

$ docker compose up --detach

The app folder contains the application code, views, and templates.
The Dockerfile describes how to build the image for the web-fe service.
The requirements.txt file lists the application dependencies.
The compose.yaml file is the Compose file that describes how the app works.

The application with two services web-fe and redis, a network counter-net and a volume counter-vol.

The sample application.

Docker Compose use the compose files in YAML file to define microservices applications. It should be named as compose.yaml or compose.yml in convention:

networks:
  counter-net:
volumes:
  counter-vol:

services:
  web-fe:
    build: .
    deploy:
      replicas: 1
    command: python app.py
    ports:
      - target: 8080
        published: 5001
    networks:
      - counter-net
    volumes:
      - type: volume
        source: counter-vol
        target: /app
  redis:
    image: redis:alpine
    deploy:
      replicas: 1
    networks:
      counter-net:

Connecting both services to the counter-net network means they can resolve each
other by name and communicate. This is important, as the following extract from the
app.py file shows the web app communicating with the redis service by name:

import time
import redis
from flask import Flask, render_template


app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)

Manage Applications with Compose

Shutdown the Containers

Run the following command to shut down the application. Docker removes both the containers and the networks. However, the volume still exists, including the data stored in it:

# shut down the application
$ docker compose down

# show the volumes
$ docker volume ls

Check the State of Containers

To check the current state and the network ports of the application, just use the docker compose ps command; Moreover, run the docker compose top command to list the processes inside each container:

# show the state of containers
$ docker compose ps
NAME COMMAND SERVICE STATUS PORTS
multi-container-redis-1 "docker-entrypoint.." redis Up 33 sec 6379/tcp
multi-container-web-fe-1 "python app/app.py" web-fe Up 33 sec 0.0.0.0:5001->8080

# show the processes inside each container
$ docker compose top
multi-container-redis-1
UID PID PPID ... CMD
lxd 12023 11980 redis-server *:6379

multi-container-web-fe-1
UID PID PPID ... CMD
root 12024 12002 0 python app/app.py python app.py
root 12085 12024 0 /usr/local/bin/python app/app.py python app.py

The PID numbers returned are the PID numbers as seen from the Docker Host instead of the containers.

Stop and Restart the Application

Both the docker compose down command and the docker compose stop command halt Docker Compose applications, but in different ways:

docker compose down will stop all containers and deletes associated networks, volumes (when using -v), and optionally images (when using --rmi). This fully removes the application and its resources.
docker compose stop only stops the containers, leaving networks, volumes, and the containers themselves intact. The application can be quickly restarted later using docker compose start.

$ docker compose stop
[+] Running 2/2
 - Container multi-container-redis-1  Stopped 0.4s
 - Container multi-container-web-fe-1 Stopped 0.5s
 
$ docker compose restart
[+] Running 2/2
 - Container multi-container-redis-1 Started 0.4s
 - Container multi-container-web-fe-1 Started 0.5s

Clean Up

Running the following command to stop and delete the application. The --volumes flag will delete all of the app's volumes, and the --rmi all will delete all of its images.

$ docker-compose down --volumes --rmi all

 - Container multi-container-web-fe-1  Removed 0.2s
 - Container multi-container-redis-1   Removed 0.1s
 - Volume multi-container_counter-vol  Removed 0.0s
 - Image multi-container-web-fe:latest Removed 0.1s
 - Image redis:alpine                  Removed 0.1s
 - Network multi-container_counter-net Removed 0.1s

Docker Swarm

Overview

Docker Swarm is not only an enterprise-grade cluster of Docker nodes, but also an orchestrator of microservices applications.

Cluster : Docker Swarm groups one or more Docker nodes into a cluser. We will get an encrypted distributed cluster store, encrypted networks, mutual TLS, secure cluster join tokens, and a PKI. We can even add and remove nodes non-disruptively. We call these clusters "swarms".
Orchestration : Docker Swarm makes deploying and managing complex microservices applications easy. We can define applications declaratively in Compose files and use simple Docker commands to deploy them to the "swarm". We can even perform rolling updates, rollbacks, and scaling operations.

Basic Concepts

High-level swarm.

A swarm is one or more Docker nodes that can be physical servers, VMs, cloud instances or even Raspberry Pi. The only requirement is that they all run Docker and can communicate over reliable networks.

Every node in a swarm is either a manager or a worker:

Managers run the control plane services that maintain the state of the cluster and schedule user applications to workers.
Workers run user applications.

Swarm uses TLS to encrypt communications, authenticate nodes, and authorize roles (managers and workers). It also configures and performs automatic key rotation.

Build Secure Swarm Cluster

Create VMs

Run the following commands to create 5 VMs running Docker. Then named 3 of the nodes as mgr1, mgr2 and mgr3, and named the other 2 wrk1 and wrk2:

$ multipass launch docker --name mgr1
Launched: mgr1

$ multipass launch docker --name mgr2
Launched: mgr2

$ multipass launch docker --name mgr3
Launched: mgr3

$ multipass launch docker --name wrk1
Launched: wrk1

$ multipass launch docker --name wrk2
Launched: wrk2

$ multipass ls
Name    State      IPv4             Image
mgr1    Running    192.168.64.61    Ubuntu 22.04 LTS
                   172.17.0.1
mgr2    Running    192.168.64.62    Ubuntu 22.04 LTS
                   172.17.0.1
mgr3    Running    192.168.64.63    Ubuntu 22.04 LTS
                   172.17.0.1
wrk1    Running    192.168.64.64    Ubuntu 22.04 LTS
                   172.17.0.1
wrk2    Running    192.168.64.65    Ubuntu 22.04 LTS
                   172.17.0.1

Initialize a New Swarm

Before a Docker node joins a swarm, it runs in single-engine mode and can only run regular containers. After joining a swarm, it switches into swarm mode and can run advanced containers called swarm services. To initialize a swarm:

Initialize the first manager : Initialize a new swarm from mgr1.
Join workers : Join wrk1 and wrk2 as worker nodes.
Join additional managers : join mgr2 and mgr3 as additional managers.

Step 01. Log on to mgr1 and initialize a new swarm.

$ docker swarm init \
    --advertise-addr 192.168.64.61:2377 \
    --listen-addr 192.168.64.61:2377

Swarm initialized: current node (d21lyz...c79qzkx) is now a manager.

The --advertise-addr flag is optional and tells Docker which of the node's
IP addresses to advertise as the swarm API endpoint. It’s usually one of the
node's IP addresses but can also be an external load balancer.
The --listen-addr flag tells Docker which of the node's interfaces to accept
swarm traffic on. It defaults to the same value as --advertise-addr if you
don't specify it. However, if --advertise-addr is a load balancer, you must
use --listen-addr to specify a local IP.

Step 02. List the nodes in the swarm. See the tokens needed to add new workers and managers.

$ docker node ls
ID               HOSTNAME    STATUS    AVAILABILITY    MANAGER STATUS    ENGINE VERSION
d21...qzkx *     mgr1        Ready     Active          Leader

$ docker swarm join-token worker
To add a manager to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-0uahebax...c87tu8dx2c 192.168.64.61:2377

$ docker swarm join-token manager
To add a manager to this swarm, run the following command:
    docker swarm join --token SWMTKN-1-0uahebax...ue4hv6ps3p 192.168.64.61:2377

Step 03. Log on wrk1 and wrk2 then join them as worker nodes.

$ docker swarm join \
    --token SWMTKN-1-0uahebax...c87tu8dx2c \
    10.0.0.1:2377 \
    --advertise-addr 192.168.64.64:2377 \
    --listen-addr 192.168.64.64:2377
    
$ docker swarm join \
    --token SWMTKN-1-0uahebax...c87tu8dx2c \
    10.0.0.1:2377 \
    --advertise-addr 192.168.64.65:2377 \
    --listen-addr 192.168.64.65:2377

Step 04. Log on mgr2 and mgr3 then join them as managers.

$ docker swarm join \
    --token SWMTKN-1-0uahebax...ue4hv6ps3p \
    10.0.0.1:2377 \
    --advertise-addr 192.168.64.63:2377 \
    --listen-addr 192.168.64.62:2377

$ docker swarm join \
    --token SWMTKN-1-0uahebax...ue4hv6ps3p \
    10.0.0.1:2377 \
    --advertise-addr 192.168.64.63:2377 \
    --listen-addr 192.168.64.62:2377

Step 05. List the nodes in the swarm.

$ docker node ls
ID               HOSTNAME    STATUS    AVAILABILITY    MANAGER STATUS    ENGINE VERSION
0g4rl...babl8 *  mgr2        Ready     Active          Reachable         26.1.1
2xlti...l0nyp    mgr3        Ready     Active          Reachable         26.1.1
d21ly...9qzkx    mgr1        Ready     Active          Leader            26.1.1
8yv0b...wmr67    wrk1        Ready     Active                            26.1.1
e62gf...l5wt6    wrk2        Ready     Active                            26.1.1

You should keep the join tokens in a safe place, as they're all that's required to join other nodes to the swarm.
Make sure the network ports are open between all nodes: 2377/tcp, 7946/tcp,udp, 4789/udp
The nodes with nothing in the MANAGE STATUS column are work nodes.

High Availability (HA) and Security

The Leader and The Followers

Swarm Clusters are highly available (HA) —— one or more managers can fail and the swarm will keep running. It implements active/passive multi-manager HA:

There're are the active manager (leader) and the passitive managers (followers), and the leader is the only manager that can update the swarm configuration.
If the leader fails, one of the followers will be elected as the new leader. And the swarm will keep running without any service interruption.

Leader and follower is Raft terminology. Swarm implements the The Raft Consensus Algorithm to maintain a consistent cluster state across multiple highly-available managers.

The active/passive multi-manager HA in Docker Swarm.

Good Practices When Manager HA

Here are the good practices apply when it comes to manager HA:

Always deploy an odd number of managers.
Don’t deploy too many managers (3 or 5 is usually enough). Because more participants means longer times to achieve consensus.
Spread managers across availability zones.

Consider the following situations:

The active/passive multi-manager HA in Docker Swarm.

Even Number of Managers : network incident created a network partition with two managers on either side. The split brain occured because neither side can be sure it has a majority, and the cluster goes into read-only mode.
Odd Number of Managers : the swarm remains fully operational in read-write mode because two managers on the right side of the network partition know they have a majority (quorum).

Even number of managers / Odd number of managers

Locking a Swarm

Swarms automatically configures a lot of security features, but restarting an old manager or restoring an old backup can potentially compromise the cluster. Hence, we should use Swarms's autolock feature to force restarted managers to present a key before being admitted back into the swarm:

# autolock new swarms at build time
$ docker swarm init --autolock=true

# autolock existed swarms
$ docker swarm update --autolock=true
Swarm updated.
To unlock a swarm manager after it restarts, run the `docker swarm unlock` command and
provide the following key:

    SWMKEY-1-XDeU3XC75Ku7rvGXixJ0V7evhDJGvIAvq0D8VuEAEaw

Please remember to store this key in a password manager...

# restart docker
$ sudo systemctl restart docker

$ docker node ls
Error response from daemon: Swarm is encrypted and needs to be unlocked before it can be used.

$ docker swarm unlock
Please enter unlock key: <enter your key>

Docker and WebAssembly (Wasm)

Setup

Configure Docker Desktop for Wasm

Support for Wasm is a beta feature in Docker Desktop. To configure Docker Desktop for Wasm:

Open the Docker Desktop UI. And then click the Settings icon at the top right.
Make sure "Use containerd for pulling and storing images" is selected.
Click "Features in development" tab, select the "Enable Wasm" option.
Click "Apply & Restart" button.

Install Rust, Spin and Configure for Wasm

Once installed Rust, run the following command to install the wasm32-wasi target so that Rust can compile to Wasm:

$ rustup target add wasm32-wasi

Spin is a Wasm framekwork and runtime that makes building and running Wasm applications easy. Just search the web for "How to install fermyon spin" and follow the instructions for our system.

Wasm Containers

Wasm is a new type of application that is smaller, faster, and more portable than traditional Linux containers. It's a new virtual machine architecture that programming languages compile to.

Wams applications are great for AI workloads, serverless funtions, plugins and edge devices, but not so good for complex networking or heavy I/O.
Instead of compiling applications to Linux on ARM or Linux on AMD, we can compile them to Wasm. Then we can run these Wasm applications on any system with a Wasm runtime.

Docker Desktop already ships with several Wasm runtimes:

$ docker run --rm -i --privileged --pid=host jorgeprendes420/docker-desktop-shim-manager:latest
io.containerd.wasmtime.v1
io.containerd.spin.v2
io.containerd.wws.v1
io.containerd.lunatic.v1
io.containerd.slight.v1
io.containerd.wasmedge.v1
io.containerd.wasmer.v1

Containerize Wasm Applications

Write Wasm Applications

We can use spin to create a simple web server:

$ spin new hello-world -t http-rust
Description: Wasm app
HTTP path: /hello

After edited the src/librs file, we can run the spin build command to compile the application as a Wasm binary. Then the application could be run on any system with the spin runtime.

$ spin build
Building component with `cargo build --target wasm32-wasi --release`
...
Finished building all Spin components

$ spin up
Logging component stdio to ".spin/logs/"

Serving http://127.0.0.1:3000
Available Routes:
    hello-world: http://127.0.0.1:3000/hello

Build the Images and Run as Containers

As always, we need a Dockerfile that tells Docker how to package the application as an image. Just create a new file called Dockerfile in current directory:

FROM scratch
COPY /target/wasm32-wasi/release/hello_world.wasm .
COPY spin.toml .

Then run the following commands to containerize the Wasm application:

$ docker build \
    --platform wasi/wasm \
    --provenance=false \
    -t nigelpoulton/ddd-book:wasm .
    
$ docker run -d --name wasm-ctr \
    --runtime=io.containerd.spin.v2 \
    --platform=wasi/wasm \
    -p 5556:80 \
    nigelpoulton/ddd-book:wasm /

Docker Networks

Overview

Docker networking is based on libnetwork, which is the reference implementation of an open-source architecture called the Container Network Model (CNM):

The Container Network Model (CNM) is the design specification and outlines the fundamental building blocks of a Docker Network.
Libnetwork is a real-world implementation of the CNM. It's open-sourced as part of the Moby project and used by Docker.
Drivers extend the model by implementing specific network topologies such as VXLAN overlay networks.

Network Type: Single-host Bridge Network

The Bridge Network

The single-host bridge network is the simplest type of Docker Network. The bridge network only spans a single Docker Host, and it's an implementation of an 802.1d bridge (layer 2 switch).

The Complete stack with containers connecting to the bridge network.

When containers are connected to the bridge network, they can communicate by their IP addresses or container names but remain isolated from the external network unless specific port mappings are configured.
Every new Docker Host gets a default single-host bridge network called bridge that Docker connects new containers to, unless we override it with the --network flag.
The default bridge network on all Linux-based Docker Hosts is called bridge and maps to an underlying Linux bridge in the host's kernerl called docker0.

$ docker network ls
NETWORK ID      NAME      DRIVER    SCOPE
c7464dce29ce    bridge    bridge    local
c65ab18d0580    host      host      local
42a783df0fbe    none      null      local

$ docker network inspect bridge | grep bridge.name
"com.docker.network.bridge.name": "docker0",

$ brctl show
bridge name        bridge id            STP enabled interfaces
docker0            8000.0242aff9eb4f    no
docker_gwbridge    8000.02427abba76b    no

$ ip link show docker0
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc...
    link/ether 02:42:af:f9:eb:4f brd ff:ff:ff:ff:ff:ff

Mapping the default Docker bridge network to the docker0 bridge in the host's kernel

Docker creates single-host bridge networks with the built-in bridge driver. If you run Windows containers, you'll need to use the nat driver.
When create a new Docker bridge network, it also creates a new Linux bridge in the host's kernel behind the scenes.

Create Bridge Network

Run the following command to create a new single-host bridge network:

$ docker network create -d bridge localnet
f918f1bb0602373bf949615d99cb2bbbef14ede935fbb2ff8e83c74f10e4b986

$ docker run -d --name c1 \
    --network localnet \
    alpine sleep 1d
    
$ docker network inspect localnet --format '{{json .Containers}}' | jq
{
  "09c5f4926c87da12039b3b510a5950b3fe9db80e13431dc17d870450a45fd84a": {
    "Name": "c1",
    "EndpointID": "27770ac305773b352d716690fb9f8e05c1b71e10dc66f67b88e93cb923ab9749",
    "MacAddress": "02:42:ac:15:00:02",
    "IPv4Address": "172.21.0.2/16",
    "IPv6Address": ""
  }
}

$ brctl show
bridge name        bridge id            STP enabled interfaces
br-f918f1bb0602    8000.0242372a886b    no          veth833aaf9
docker0            8000.0242aff9eb4f    no
docker_gwbridge    8000.02427abba76b    no

The bridge configuration on the host.

Every veth is like a cable with an interface on either end. One end is connected to the Docker network, and the other end is connected to the associated bridge in the kernel

If we add more containers to the localnet network, they'll all be able to communicate using names. That's because Docker automatically registers container names with an internal DNS service. The exception to this rule is the built-in bridge network that doesn't support DNS resolution.

Port Mapings for External Access

Containers on bridge networks can only communicate with other containers on the same network. By mapping containers to ports on the Docker host, we can access the container external.

$ docker run -d --name web \
    --network localnet \
    --publish 5005:80 \
    nginx
    
$ docker port web
80/tcp -> 0.0.0.0:5005
80/tcp -> [::]:5005

Network Type: The `macvlan` Network

The `macvlan` Driver

The build-in macvlan driver (transparent if using Windows containers) gives every container its own IP and MAC address on the external physical network. As it doesn't require port mappings or additional bridges, the performance is good.

The macvlan driver makes containers visible on external networks.

Connect Containers to `macvlan` Network

Assume we have the network with two VLANs, and then add the Docker Host connected to the network. To attach a container to VLAN 100, we need to create a new Docker network with macvlan driver and configure: Subnet Info, Gateway, Range of IPs can be assigned, the Host's Interfaces to use:

$ docker network create -d macvlan \
    --subnet=10.0.0.0/24 \
    --ip-range=10.0.0.0/25 \
    --gateway=10.0.0.1 \
    -o parent=eth0.100 \
    macvlan100
    
$ docker run -d --name mactainer1 \
    --network macvlan100 \
    alpine sleep 1d

Add the Docker Host connected to the network.

Attach container to VLAN 100.

Note that the Docker macvlan driver also supports VLAN trunking. It means we can create multiple macvlan networks that connect to different VLANs.

Network Type: Overlay Network

Troubleshooting: The Daemon Logs and The Container Logs

Daemon Logs

For Windows containers, we can view them in the Windows Event Viewer or ~\AppData\Local\Docker.
For Linux containers, it depends on which init system been used. If systemd been used, Docker will post logs to journald (check with journalctl -u docker.service command).

We can also tell Docker how verbose we want daemon logging to be. By editing the daemon config file at /etc/docker/daemon.json. Be sure to restart Docker after making any changes.

Container Logs

We can normally view container logs with the docker logs command. For Docker Swarm, we should use the docker service logs command.

Docker supports a few different log drivers. The json-file and journald are the easiest to configure. Both of them work with docker logs and docker service logs commands.
We can also start a container or a service with the --log-driver and --log-opts flags to override the settings in daemon.json.

Service Discovery

The libnetwork framework also provides service discovery that allows all containers and Swarm services to locate each other by name. The only requirement is that the containers be on the same network.

Docker implements a native DNS server, and configures every containers to use it for name resolution.

Assume the container c1 pinging another container c2 by name:

The c1 container issues a ping c2 command. The container's local DNS resolver checks its cache to see if it has an IP address for c2. Note that All Docker containers have a local DNS resolver!
The local resolver doesn't have an IP address for c2, so it initiates a recursive query to the embedded Docker DNS server. Note that All Docker containers are pre-configured to know how to send queries to the embedded DNS server.
The Docker DNS server maintains name-to-IP mappings for every container we create with the --name or --net-alias flags.
The DNS server returns the IP address of the c2 container to the local resolver in the c1 container.
The c1 container sends the ping request (ICMP echo request) to the IP address of c2.

Besides, we can use the --dns flag to start containers and services with a customized list of DNS servers, and the --dns-search flag to add custom search domains for queries against unqualified names. Both of these useful if the applications query names outside the Docker environment. It will add entries to the container's /etc/resolv.conf file.

$ docker run -it --name custom-dns \
    --dns=8.8.8.8 \
    --dns-search=nigelpoulton.com \
    alpine sh

Ingress Load Balancing

Docker Swarm supports two ways of publishing services to external clients:

Ingress Mode (default) : External clients can access services via any swarm node, even nodes not hosting a service replica.
Host Mode : External clients can only access services via nodes running replicas.

Ingress Mode / Host Mode

$ docker service create -d --name svc1 \
    --publish published=5005,target=80,mode=host \
    nginx

Docker Volumes

The Local Storage Layer

Docker creates containers by stacking read-only image layers and placing a thin layer of local storage on the top. This allows multiple containers to share the same read-only image layers:

Ephemeral Container Storage

The local storage layer (the thin writeable layer, ephemeral storage, read-write storage or graphdriver storage) is coupled to the container's lifecycle, it gets created when the container created, and deleted when the container deleted. It's not a good place for data persistant.

Docker keeps the local storage layer on the Docker Host's filesystem:

Linux Containers: /var/lib/docker/<storage-driver>/...
Windows Containers: C:\ProgramData\Docker\windowsfilter\...

We should treat containers as immutable objects, and never change them once deployed. If we need to fix or change the configuration of a live container, we should create and test a new container with the changes and then replace the live container with new one.

The Volumes

We can create volumes and mount them into the containers. When we mount the volumes, they could be a directory in the containers' filesystem, and anything we write to the directory gets stored in the volume:

High-level view of Volumes and Containers

The volumes are first-class objects in Docker. There's a docker volume sub-command and a volume resource in the API. Let's create a new volume by following commands:

$ docker volume create myvol
myvol

$ docker volume ls
DRIVER    VOLUME NAME
local     myvol

$ docker volume inspect myvol
[
    {
        "CreatedAt": "2024-05-15T12:23:14Z",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/myvol/_data",
        "Name": "myvol",
        "Options": null,
        "Scope": "local"
    }
]

$ docker volume prune --all

By default, Docker creates new volumes with the built-in local driver. We can also use the -d or --driver flag to specify a different driver.
The Mountpoint tells us where the volume exists in the Docker Host's filesystem.
The docker volume prune command deletes all volumes not mounted into a container
or service replica.

We can also deploy volumes via Dockerfiles by using the VOLUME instruction with the format VOLUME <container-mount-point>. But we can't specifiy a host directory when define volumes in Dockerfiles.

Use Volumes with Containers

We can specified the --mount flag when using the docker run command, to tell Docker for mounting a volume. If we sepcify a volume that already exists, Docker will use it; while the volume doesn't exist, Docker will create it.

$ docker run -it --name voltainer \
    --mount source=bizvol,target=/vol \
    alpine
    
$ docker volume rm myvol

Furthermore, integrating Docker with external storage system lets us present shared storage to multiple nodes.

Docker Security

References

How libnetwork has been Designed

Overview

Docker and The Container-Related Standards and Projects

Docker

Container-Related Standards and Projects

Getting Started: The Ops Perspective and The Dev Perspective

The Ops Perspective

Check Docker is Working

Download Images and Start Containers

Delete the Container

The Dev Perspective

Build Docker Images And Run as Containers

Clean Up

The Docker Engine

Two Major Components: The Docker Daemon and LXC

runc (r) and containerd (c)

The Process of docker run

The shim Component

The Docker Images

Docker Images and Registries

Image Naming and Tagging

Images and Layers

Inspect Layers Information

More about Images and Layers

The Docker Containers

Containers and Virtual Machines

How Containers Start Applications

The Restart Policies

Containerize Applications

Containerize Single-Container Applications

Multi-stage Builds for Production (1)

Multi-stage Builds for Production (2)

Docker Build System

Buildx, BuildKit, Drivers and Build Cloud

Multi-architecture Builds

Good Practice for Building Images

Multi-Container Applications with Docker Compose

Overview

The Sample Application

Manage Applications with Compose

Shutdown the Containers

Check the State of Containers

Stop and Restart the Application

Clean Up

Docker Swarm

Overview

Basic Concepts

Build Secure Swarm Cluster

Create VMs

Initialize a New Swarm

High Availability (HA) and Security

The Leader and The Followers

Good Practices When Manager HA

Locking a Swarm

Docker and WebAssembly (Wasm)

Setup

Configure Docker Desktop for Wasm

Install Rust, Spin and Configure for Wasm

Wasm Containers

Containerize Wasm Applications

Write Wasm Applications

Build the Images and Run as Containers

Docker Networks

Overview

Network Type: Single-host Bridge Network

The Bridge Network

Create Bridge Network

Port Mapings for External Access

Network Type: The macvlan Network

The macvlan Driver

Connect Containers to macvlan Network

Network Type: Overlay Network

Troubleshooting: The Daemon Logs and The Container Logs

Daemon Logs

Container Logs

Service Discovery

Ingress Load Balancing

Docker Volumes

The Local Storage Layer

The Volumes

Use Volumes with Containers

`runc (r)` and `containerd (c)`

The Process of `docker run`

The `shim` Component

Network Type: The `macvlan` Network

The `macvlan` Driver

Connect Containers to `macvlan` Network