Container - HackMD

--- tags: System Architecture --- Container === * 參考書籍： * ![](https://i.imgur.com/zmdXU5v.jpg) * [30 essential container technology tools and resources](https://techbeacon.com/enterprise-it/30-essential-container-technology-tools-resources) * Unikernel 未來想探討的主題紀錄： * [ops - build and run nanos unikernels](https://github.com/nanovms/ops) * [golang-developer-roadmap](https://github.com/Alikhll/golang-developer-roadmap?fbclid=IwAR3OtodrnJsGmEUP93nDdEf6b5tcUziRgWlje5jF0rwGIRURw7ZPtr3y_0E) * Go 調用 C/C++, [如何把 Go 调用 C 的性能提升 10 倍？](https://hacpai.com/article/1526368142985) # 名詞解釋 * daemon：常駐伺服器內的應用程式統稱 * pipe：進程(process)間通訊管道。 # Introduction ## What is a container? ( take docker as example) * [What even is a container: namespaces and cgroups ](https://jvns.ca/blog/2016/10/10/what-even-is-a-container/) * [A Beginner-Friendly Introduction to Containers, VMs and Docker](https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b) Containers and VMs are similar in their goals: to isolate an application and its dependencies into a self-contained unit that can run anywhere. Moreover, containers and VMs remove the need for physical hardware, allowing for more efficient use of computing resources, both in terms of energy consumption and cost effectiveness. The main difference between containers and VMs is in their architectural approach. Let’s take a closer look. 容器本質上是主機上的處理程序，主要是透過 **namespace** 實現了資源隔離，透過 **cgroup** 實現了資源限制, 並透過 **寫入時複製機制(copy-on-write)** 實現了高效率的檔案操作，其中前面兩個 Linux kernel 原生的功能更是容器的基礎。 Container & VM 最直接的比較 ![](https://i.imgur.com/BY77SxM.png) ## Why we use container 經典雲端運算架構(Cloud Computing)包含 * IaaS(Infrastructure as a Service)-基礎設施及服務, * PaaS(Platform as a Service)-平台及服務, * SaaS(Software as a Service)-軟體及服務, 其中,IaaS 的發展以虛擬機器(Virtual Machine)為最小粒度的資源單位時，出現了資源使用率低、排程分發緩慢、軟體堆疊環境不統一...等問題,以 IaaS 為基礎的 PaaS,有意識到可用容器(container)技術來解決資源使用率的問題，但因為依賴 IaaS 而對應用架構的選擇有比較大的限制，因此需要一個好的解決方案，顯然 Docker 掌握了很好的機會。 * Docker 定義：Docker 是以 Docker 容器為資源分割和排程的基本單位，封裝整個軟體執行時期環境，為開發者和系統管理員設計的，用於建制、發布和執行分散式應用的平台。 * Docker == Container, Container != Docker * 容器雲：以容器為資源分割和排程的基本單位，封裝整個軟體執行時期環境，為開發者和系統管理員提供用於建置、發布和執行分散式應用平台。 * 2019.04.12 容器雲趨勢 [The 6 most important announcements from Google Cloud Next 2019](https://techcrunch.com/2019/04/10/the-6-most-important-announcements-from-google-cloud-next-2019/) * Anthos：the new name of the Google Cloud Services Platform, base on kubernetes, allowing enterprises to run applications in their private data center and in Google’s cloud * Open-source integrations into the Google Cloud Console：Confluent, DataStax, Elastic, InfluxData, MongoDB, Neo4j and Redis Labs * Google Cloud Code ## Container Standardization * [參考：Open Container Initiative](https://www.opencontainers.org/) * [參考：Open Container Initiative Runtime Specification](https://github.com/opencontainers/runtime-spec/blob/master/spec.md#notational-conventions) * 趨勢，容器標準： * 2018.02.19 [Goodbye Docker, hello Containers](https://blog.worldline.tech/2018/02/19/goodbye-docker-hello-containers.html) * All changes as follow encourage us to focus on **CNCF container standardization**. * Kubernetes integration into Docker-Enterprise defines it as the main actor in the container orchestration domain. * The three big players (Google, Microsoft, Docker) work together to improve the container environment. * The **Cloud Native Computing Foundation (CNCF)** is the main actor in the OpenSource communities around containers with projects like Kubernetes, Prometheus, Fluentd, … * CNCF started working on a new standard,is called **Container Runtime Interface (CRI)**. * A standard is only successful if the implementations are interchangeable. It’s a good news, switching between CRI-O and CRI-Containerd seems possible. * 開放容器標準（Open Container Initiative，OCI）：是 Docker和一些容器領導公司於2015年6月推動建立，其標準規格主要是制定**configuration file formats, a set of standard operation, an execution environment**, 而器標準化主要試想達到以下目的： 1. **Standard operations**: Standard Containers define a set of STANDARD OPERATIONS. They can be: * created, started, and stopped using standard container tools; * copied and snapshotted using standard filesystem tools; * downloaded and uploaded using standard network tools. 2. **Content-agnostic**(與內容無關/不依賴/獨立於...) Standard Containers are CONTENT-AGNOSTIC: all standard operations have the same effect regardless of the contents. 3. **Infrastructure-agnostic** Standard Containers are INFRASTRUCTURE-AGNOSTIC: they can be run in any OCI supported infrastructure. For example, a standard container can be bundled on a laptop, uploaded to cloud storage, downloaded, run and snapshotted by a build server at a fiber hotel in Virginia, uploaded to 10 staging servers in a home-made private cloud cluster, then sent to 30 production instances across 3 public cloud regions. 4. **Designed for automation** Because OCI offer the same standard operations regardless of content and infrastructure, Standard Containers, are extremely well-suited for automation. In fact, you could say automation is their secret weapon. 5. **Industrial-grade delivery** Leveraging all of the properties listed above, Standard Containers are enabling large and small enterprises to streamline and automate their software delivery pipelines. Whether it is in-house devOps flows, or external customer-based software delivery mechanisms, Standard Containers are changing the way the community thinks about software packaging and delivery. * 目前主要包含兩種規格(Specifications): * Runtime 標準(runtime-spec)：讓開發者打包、簽署應用程式(outlines how to run a “filesystem bundle” that is unpacked on disk), 目前有以下標準： * **dockershim**: to support the old Docker solution * **cri-containerd**: to support Docker Containerd * **cri-o**: a new container solution created by RedHat,Its primary goal is to replace the Docker service as the container engine for **Kubernetes** implementations, such as OpenShift Container Platform. * **rkt**: to support CoreOS Rocket * **frakti**: to run containers inside Virtual Machines * ... ![](https://i.imgur.com/0lGuR16.png) * images 標準(image-spec)：確立容器映像檔建立、認證、簽署以及命名的方式(An OCI implementation would download an OCI Image then unpack that image into an OCI Runtime filesystem bundle. At this point the OCI Runtime Bundle would be run by an OCI Runtime.) ## Container runtimes * [Container runtimes: clarity](https://medium.com/cri-o/container-runtimes-clarity-342b62172dc3) * [Container Runtimes Part 1: An Introduction to Container Runtimes](https://www.ianlewis.org/en/container-runtimes-part-1-introduction-container-r) * [A history of low-level Linux container runtimes](https://opensource.com/article/18/1/history-low-level-container-runtimes) **Container in Linux：** At Red Hat we like to say, "Containers are Linux—Linux is Containers." Here is what this means. Traditional containers are processes on a system that usually have the following three characteristics: 1. Resource constraints 2. Security constraints 3. Virtual separation **The concept of container runtime:** Container runtime tools just modify these resource constraints, security settings, and namespaces. Then the Linux kernel executes the processes. After the container is launched, the container runtime can monitor PID 1 inside the container or the container's stdin/stdout—the container runtime manages the lifecycles of these processes. Sounding pretty similar to a system management. **some confusion about runtime:** Docker is often called a container runtime, but "container runtime" is an overloaded term. When folks talk about a "container runtime," they're really talking about higher-level tools like Docker, CRI-O, and RKT that come with developer functionality. They are API driven. They include concepts like pulling the container image from the container registry, setting up the storage, and finally launching the container. Launching the container often involves running a specialized tool that configures the kernel to run the container, and these are also referred to as "container runtimes." I will refer to them as "low-level container runtimes." Daemons like Docker and CRI-O, as well as command-line tools like Podman and Buildah, should probably be called "container managers" instead. **OCI Runtime Specification:** Open Container Initiative (OCI) was formed, party because people wanted to be able to launch containers in additional ways. Traditional namespace-separated containers were popular, but people also had the desire for virtual machine-level isolation. Intel and Hyper.sh were working on KVM-separated containers, and Microsoft was working on Windows-based containers. The OCI wanted a standard specification defining what a container is, so the OCI Runtime Specification was born. # Docker Basic (How to run docker) ## Basic Introduction Docker 是一個容器管平台。 * [The Docker Ecosystem: An Introduction to Common Components](https://www.digitalocean.com/community/tutorials/the-docker-ecosystem-an-introduction-to-common-components) ![](https://i.imgur.com/1ngLVty.png) [阿里P8架構師談:Docker容器的原理、特徵、基本架構、與應用場景](https://kknews.cc/zh-tw/other/k2qpn6p.html) ## Installation * OS: Ubuntu18 * update existing packages: ```$ sudo apt update``` * install few prerequisite packages with **apt** ```$ sudo apt install apt-transport-https ca-certificates curl software-properties-common``` ```$ sudo apt-get install \ $ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release ``` * then add the GPG key for the official Docker repository to your system: ```curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - ``` 由於Ubuntu 官方 APT 儲存庫提供的 Docker 常和 Docker 官方版本有落差，因此在此改用 Docker 官方提供的 APT 儲存庫來安裝 Docker。 * check apt authentication: ```$ sudo apt-key finger ``` ![](https://i.imgur.com/K0XXKNl.png) * add the docker repository to APT sources: ```sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"``` ** Note: bionic is ubuntu18.04 codename; ubuntu16.04 is xenial** * check ```$ sudo tail -2 /etc/apt/sources.list ``` ![](https://i.imgur.com/XC352nM.png) * update existing packages: ```$ sudo apt update``` * make usre installation from Docker repo: ```$ apt-cache policy docker-ce``` ![](https://i.imgur.com/fNzbMeh.png) * finally, install Docker ```$ sudo apt install docker-ce``` ## Docker Basic Command ![](https://i.imgur.com/v96fjMk.jpg) ```bash $ sudo docker -v # ``` ... ... # Container Principle 之前提到 **namespace** 和 **cgroup** 是容器的基礎，在此先說明。 ## namespace ### Basic Concept namespace 是 Linux 內核用來隔離內核資源的方式。通過 namespace 可以讓一些進程只能看到與自己相關的一部分資源，而另外一些進程也只能看到與它們自己相關的資源，進程間無法感受對方的存在。具體的實現方式是把一個或多個進程的相關資源指定在同一個 namespace 中。從版本號為 3.8 的內核開始，**/proc/[pid]/ns** 目錄下會包含進程所屬的 namespace 信息，使用下面的命令可以查看當前進程所屬的 namespace 信息： ``` $ ls -al /proc/$$/ns $ ll /proc/$$/ns # $$ 是在 shell 中表示目前執行的處理程序之 ID 號 ``` ![](https://i.imgur.com/y0wlhtg.png) 以上資訊第一個字元是 l,表示 namespace 的檔案都是 link file,而最後的中括號數字[4026531836]是 namespace 編號，如果其編號相同表示是在同一個 namespace 下，設定這些 link file 目的是讓 namespace 就算結束所有處理程序了也可以持續存在，因為只要打開文件描述符號(file descriptor)就可以了，後續新的處理程序也可以再加進來。補充：文件描述符（file descriptor）是內核為了高效管理已被打開的文件所創建的索引，其是一個非負整數（通常是小整數），用於指代被打開的文件，所有執行I/O操作的系統調用都通過文件描述符。在 Docker container 中，透過 link file 定位和加入一個存在的 namespace 是最基本方式。 Linux 提供了多个 API 用来操作 namespace，它们是 * clone()：建立新處理程序同時建立 namespace，其實際上是系統呼叫 fork() 的一種更通用方法。 * setns()：在已經存在的 namespace 中加入處理程序。 * unshare()：在原處理程序進行 namespace 隔離。 * fork() (非屬 namespace API) * 及在 /proc 下的部份檔案操作 ### Resource Isolation 從本質來思考一個資源隔離的容器到底要進行哪些動作呢？也許第一反應是檔案系統先被隔離吧(Ex: chroot 指令可以切換根目錄掛載點)，處理程序呢？網路通訊呢？由此出發點延伸，可以歸納出完成一個容器要有最基本的 **6種隔離**,如下表： |namespace|系統呼叫參數|隔離內容| |----|----|----| |UTS|CLONE_NEWUTS|主機名稱與域名| |IPC|CLONE_NEWIPC|號誌、訊號佇列和共用記憶體| |PID|CLONE_NEWPID|處理程序編號| |Network|CLONE_NEWNET|網路裝置、網路堆疊、通訊埠| |Mount|CLONE_NEWNS|掛載點(檔案系統)| |User|CLONE_NEWUSER|使用者和使用者群組| 同時，隨著 Linux 自身的發展以及容器技術持續發展帶來的需求，也會有新的 namespace 被支持，比如在內核 4.6 中就添加了 Cgroup namespace。以下探討幾個主要的 namespace，並建立一個相對隔離的 shell 環境，感受一下容器怎麼產生： #### UTS namespace UTS (UNIX Time-sharing System) namespace 提供了主機名稱和域名的隔離，容器擁有獨立的主機名稱和域名即可被視作一個獨立的節點，而非主機上的處理程序。在 Docker 中，映像檔會以本身服務名稱來命名映像檔的 hostname，且不會對主機產生影響。以下是c語言進行 UTS namespace 的一個範例： **uts_namespace.c** ```clike= #define _GNU_SOURCE #include <sys/types.h> #include <sys/wait.h> #include <stdio.h> #include <sched.h> #include <signal.h> #include <unistd.h> #define STACK_SIZE (1024 * 1024) static char child_stack[STACK_SIZE]; char* const child_args[] = { "/bin/bash", NULL }; int child_main(void* args) { printf("在子處理程序中！\n"); sethostname("NewNamespace", 12); execv(child_args[0], child_args); return 1; } int main(){ printf("程式開始： \n"); int child_pid = clone(child_main, child_stack + STACK_SIZE, CLONE_NEWUTS | SIGCHLD, NULL); waitpid(child_pid, NULL, 0); printf("已退出\n"); return 0; } ``` ``` $ gcc -Wall uts_namespace.c -o uts_namespace.o $ sudo ./uts_namespace.o #不用 sudo執行的話 clone() 內無法執行 ``` ![](https://i.imgur.com/SfA5oMA.png) 從執行結果可以看到 uts 的 namespace 編號不一樣。 #### IPC namespace 處理程序間通訊(Inter-Process Communication, IPC)有關的 IPC 資源包含常見的號誌、訊息佇列和共用記憶體。申請 IPC 資源就申請了一個全域唯一的32位元 ID,所以 IPC namespace 中實際包含了系統 IPC 識別符號以及實現 POSIX(可移植作業系統介面)訊息佇列的檔案系統。在同一個 IPC namespace 下處理程序彼此可見，不同 IPC 下處理程序不相不可見。要用clone()實現 IPC namespace 只需要將參數寫入 clone()即可： ** ipc_namespace.c** ```clike= //[...] int child_pid = clone(child_main, child_stack + STACK_SIZE, CLONE_NEWIPC | CLONE_NEWUTS | SIGCHLD, NULL); //[...] ``` 為了確定新的 IPC namespace 看不到原本的，先在原 namespace 創建一個 message queue： ```bash # ipcs, ipcmk, ipcrm: check, make, remove ipc $ ipcmk -Q $ ipcs -q ``` ![](https://i.imgur.com/g2UXlcR.png) 然後執行 ipc_namespace.c 程式後再看一次： ```bash $ gcc -Wall ipc_namespace.c -o ipc_namespace.o $ sudo ./ipc_namespace.o $ ipcs -q ``` ![](https://i.imgur.com/19q3Iyo.png) 可以發現在新的 namespace 裡看不到原本建立的 message queue！ #### PID namespace PID namespace 隔離讓不同 PID namespace 下的程序可以擁有相同序號，其中每個 PID namespace 都有自己的計數程式。系統核心為 PID namespace 維護了一個樹狀結構，最頂層是系統初始時建立的 root namespace, 其跟一般樹狀結構一樣會有 parent/child namespace 所形成的層級系統，而 child namespace 不能影響以及看見其 parent namespace 中的程序，每個 namespace 中的第一個程序 PID1 都有像傳統 Linux 中的 init 處理程序一樣的特殊權限。到此，因該可以很容易去聯想到, 透過監控跟篩選 Docker daemon 所在 PID namespace 下的程序，就可以在外部監控 Docker 中執行的程序,實作上一樣在原程式的clone() 函數加上 CLONE_NEWPID 即可。 ```clike= //[...] int child_pid = clone(child_main, child_stack + STACK_SIZE, CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWPID | SIGCHLD, NULL); //[...] ``` ![](https://i.imgur.com/WOjr4PG.png) 另外一題，因為本 PID namespace 尚未掛載到新的位置，所以還是可以看到原 namespace 所有處理程序，所以可以透過掛載進行完全隔離： ``` $ mount -t proc proc /proc $ ps -al # check whoel process ``` ![](https://i.imgur.com/6XvzTpw.png) #### mount namespace mount namespace 透過隔離檔案系統掛載點對隔離檔案系統提供支援，隔離後不同 mount namespace 中的檔案結構發生變化不會互相影響。以下是程式修改，一樣在 cline() 增加參數： ```clike= //[...] int child_pid = clone(child_main, child_stack + STACK_SIZE, CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWNS |SIGCHLD, NULL); //[...] ``` 然後可以透過以下指令查看目前 namespace 中檔案系統 ```bash # note: "$$" 表示目前程序 PID $ /proc/[pid]/mounts $ /proc/[pid]/mountstats ``` #### network namespace 如果不隔離網路資源，容器和主機最先遇到的就是連接埠佔用衝突問題了，因此需要相關技術去處理，network namespace 主要提供了網路資源的隔離，包含網路裝置、IPv4/IPv6協定層、IP路由表、防火牆、/proc/net目錄、/sys/class/net目錄、socket...等。原則上一個實體網路裝置(Ex:一張網路卡)最多存在於一個 network namespace 中，所以除非有多張網卡，不然目前必須透過類似管線的虛擬網路裝置(veth pair)在不同 network namespace 間建立通訊通道，讓資料在兩端互通。以 Docker daemon 啟動容器過程為例，Docker daemon 負責建立 veth pair，並把一端綁定到 docker0 橋接器上，另一端則新增到 network namespace 處理程序中，完成連接前，docker daemon 和容器內初始化處理程序是靠進程間通訊管道(pipe)通訊。進一步來說，容器內初始程序會在pipe的另一端循環等待，直到 Docker daemon 傳來 veth 裝置資訊並關閉 pipe後才結束等待過程，並把 eth0 啟動。 ![](https://i.imgur.com/WMp73Jo.png) #### user namespace ... ... ... ## cgroups **cgroups** 是 Linux kernel 核心提供的一種機制，這種機制可以把一系列的系統 task 及其 sub-task 整合(或分隔)到按資源劃分等級的不同組內，進一步為系統資源管理提供一個統一架構，換句話說，cgroup 可以限制、紀錄 task set 所用的物理資源(Ex: CPU, Memory, IO...),為容器實現虛擬化提供了基本保障,其主要提供了以下4大功能： * 資源限制：cgroups 可以對工作使用的資源總額進行限制。如設定應用執行時期使用記憶體的上限，一旦超過則發出 OOM。 * 優先順序分配：透過分配的 CPU 時間切片數量及磁碟 IO 頻寬大小,實際上就相當於控制了工作執行的優先順序。 * 資源統計：對 CPU的使用時長、記憶體用量...等進行紀錄統計。 * task 控制：對 task 執行暫停/恢復...等操作。在細部討論 cgroups 之前，先說明幾個技術名詞： * task(工作)：在 cgroups, 工作表示系統的處理程序或執行緒。 * cgroup：cgroups 中資源控制都是以 cgroup 為單位實現。其表示按某種資源控制標準劃分而成的工作組，包含一個或多個子系統。task 可以在 group 間移動。 * subsystem：cgroups 中的子系統就是一個資源排程控制器，每個子系統可以獨立控制一種資源，例如CPU subsystem, Memory subsystem。 * hierarchy：一系列 cgroup 會以樹狀結構排列而成層級狀態，每個層級透過綁定對應的子系統進行資源控制，系統可以存在好幾個 cgroup 樹狀結構。另外，子節點會繼承父節點所掛載的子系統。 ### 組織結構與基本規則：我們提到系統可以存在好幾棵由 cgroup 組成的樹狀結構，如此最大的好處就是所有 task 不用受限於同一組子系統的限制，在 Docker 中，每個**子系統**獨自組成一個樹狀層級結構，這樣在管理上比較容易，目前 Docker 使用下列9種子系統: * blkio：為區塊裝置設定輸入/輸出限制。 * cpu: 使用排程程式控制工作對CPU的使用。 * cpuacct:自動產生 cgroup 中工作對 CPU 資源使用情況的報告。 * cpuset: 可以為 cgroup 中的工作分配獨立 CPU(多核心狀況下)和記憶體。 * devices:可以開啟或關閉 cgroup 中工作對裝置的存取。 * freezer:暫停或恢復 cgroup 中工作。 * memory:可以設定 cgroup 中工作對記憶體使用量的限定，並自動產生報告。 * perf_event:讓 cgroup 中工作可以有統一的效能測試。 * net_cls:(Docker 沒直接使用)透過使用等級識別符號(classid)標記網路資料封包，進一步與許 Linux **流量控制程式(Traffic Controller, TC)** 識別從實際 cgroup 中產生的資料封包。 ### 簡易使用流程： Linux 中 cgroup 的實現形式表現為一個檔案系統，以下為限制某程序 CPU 用量的簡易流程: 1. mount cgroup file system: ![](https://i.imgur.com/5LigU2b.png) 2. 瀏覽 cpu 子系統下控制的檔案： ![](https://i.imgur.com/XHqgqkt.png) 3. 在 /sys/fs/cgroup 的 cpu 子目錄下建立控制組： ![](https://i.imgur.com/DD0cOxj.png) 4. 實現 cpu 限制，就像在檔案寫入一些參數而已： ```bash $ ps aux # show all PID # 限制 xxxx處理程序 $ echo pid-number >> /sys/fs/cgroup/cpu/cg1/tasks # 將 cpu 用量限制在 20% $ echo 20000 > /sys/fs/cgroup/cpu/cg1/cpu.cfs_quota_us ``` ### cgroups 原理： cgroups 實現的本質上，是 kernel 附加在 task 上的一系列鉤子 hook,當工作執行的過程中有關某種資源時，就會觸發 hook 上所附帶的子系統進行檢測，根據資源類別的不同，使用對應的技術進行資源限制和優先順序分配，進而達到資源追蹤和限制的目的。對於不同的系統資源，cgroups 提供了統一介面對資源進行控制和統計，但限制的實際方式則不盡相同。 cgroup 與工作之間是多對多的關係，所以其並不直接連接，而是透過一個中間結構把雙向的連結資訊紀錄起來。實際使用過程中，Docker 需要透過掛載 cgroup 檔案系統新增一個樹狀層級結構，掛載時指定需要綁定的子系統，然後就可以像操作檔案對 cgroups 的層級進行瀏覽和操作管理。 ... ## copy-on-writy 這裡不特別探討，因此僅引用 [維基百科](https://zh.wikipedia.org/wiki/%E5%AF%AB%E5%85%A5%E6%99%82%E8%A4%87%E8%A3%BD) 介紹，有個概念就好：『寫入時複製（英語：Copy-on-write，簡稱COW）是一種電腦程式設計領域的最佳化策略。其核心思想是，如果有多個呼叫者（callers）同時請求相同資源（如記憶體或磁碟上的資料儲存），他們會共同取得相同的指標指向相同的資源，直到某個呼叫者試圖修改資源的內容時，系統才會真正複製一份專用副本（private copy）給該呼叫者，而其他呼叫者所見到的最初的資源仍然保持不變。這過程對其他的呼叫者都是透明的（transparently）。此作法主要的優點是如果呼叫者沒有修改該資源，就不會有副本（private copy）被建立，因此多個呼叫者只是讀取操作時可以共用同一份資源。』 # Docker Core Anslysis ## 基本架構探討 Docker 核心時可能會遇到很多新名詞，這是因為 Docker 本身是透過很多弱耦合的技術來實現的，因此下圖可以先看一下其技術生態系統，遇到新名詞可以做比對： ![](https://i.imgur.com/TiU39Zv.jpg) ### 架構概覽 Docker 是用傳統 client-server 架構模式，使用者透過 Docker client 與 Docker daemon 建立通訊，並將請求發送給後者。而 Docker 的後端是一個鬆散耦合結構(Coupling),也就是說模組及模組之間資訊或參數依賴的程度是相對較低的，其架構總攬如下： ![](https://i.imgur.com/zlS5mho.png) Note:本版本已經是較早期版本 ![](https://i.imgur.com/mtnVWaT.png) 由架構圖可以了解，Docker daemon 是最主要的使用者介面，其接收 Docker client 的請求，其後根據不同的請求分發給 Docker daemon 的不同模組執行對應工作，其中 Runtime、volume、Images、netwoek 方面的實作都已經抽離 daemon 以外的模組或專案，而且 Docker 也一直致力於將自己進一步解耦，其中架構中相關模組概述如下： * Docker container 執行環境建立和管理依靠 driver module * 透過 image management 中的 distribution/registry module 從 Docker registry 中下載 Images * 透過 image management 中的 image/reference/layer 儲存映像檔中繼資料 * 透過映像檔儲存驅動 graphdriver 將映像檔檔案儲存於實際的檔案系統中 * network module 呼叫 libnetwork 建立並設定 Docker 容器的網路環境 * 透過 volumn module 呼叫 volumedriver 來建立一個資料卷冊並負責後續掛載操作 * 當需要限制 Docker 容器執行資源或執行使用者指令操作時，透過 execdriver 來完成，其是透過 libcontainer 來實現對容器的實際管理，而 libcontainer 又是對 cgroups/namespace 的二次封裝。 ## Docker 介面 - client & daemon Docker 指令的執行流程，涉及了兩個模式，client & daemon ### client 模式 client 指令工作流程。 1. 解析 flag 資訊 ... 2. 建立 client 實例 ... 3. 執行實際的指令 ... ### daemon 模式 1. API server 初始化。 ... 2. daemon 物件的建立與初始化過 ... ### client & daemon 互動流程以 ```docker run```指令為例 ... ## 容器管理 - libcontainer 使用 libcontainer 建立應用執行環境 ... ### runC Linux 基金會於2015年6月成立 **OCI(Open Container Initiative)** 組織，目的是建立容器規格和 runtime 的開放工業化標準，希望容器能不因底層結構不同而有所限定。 runC 就是直接對 libcontainer套件進行呼叫... ... ## Docker Image management ... ## Docker volume ... ## Docker network management ... ... # Kubernetes [官方教學文件：Learn Kubernetes Basics](https://kubernetes.io/docs/tutorials/kubernetes-basics/) [K~K8s index](https://fufu.gitbook.io/kk8s/) * container & orchestration * 在機器學習還就應用，三台主機以上就適合用 Kubernetes # Machine Learning Environment 聽了神人 clkao (高嘉良)分享其替人工智慧學校建立機器學習的環境後，對此議題想要更深入了解，因此多這一章節來紀錄，以下是神人的建置架構： ![](https://i.imgur.com/BjHrf5Z.png) * [primeHub (open source version on github)](https://github.com/InfuseAI/primehub) 以下是其他資源紀錄： * [The Littlest JupyterHub](https://tljh.jupyter.org/en/latest/):單基本 JupyterHub * [Zero to JupyterHub with Kubernetes](https://zero-to-jupyterhub.readthedocs.io/en/latest/) * [data school：Six easy ways to run your Jupyter Notebook in the cloud](https://www.dataschool.io/cloud-services-for-jupyter-notebook/) * kubeflow: * tf-operator/Pytorch operator : 用描述方式做平行化訓練 * KVC：kubenetes volume controller * efficiently manage data for ML workloads