Kubernetes Intro

# Container Orchestration In production environment, how to manage your applications which may rely on different containers, such as database or messaging services? We need to ensure - Fault Tolerance (可容錯性): When some errors happen, the system still can work. - Extensibility (可擴展性): scale up or scale down when needed - Vertical Scale (硬體): host, cpu, memory, disk - Horizontal Scale(應用程式): through container orchestration - Utlize resources efficiently(能充分並有效使用資源): resources are costs! - Container communication(容器可互相溝通):containers added or deleted - External usage (可提供管道供外部存取) - Rollback or update without stopping the machine(不需要停機就能更新) ### Orchestration Technologies ![image](https://hackmd.io/_uploads/ByTUUn1XR.png) Docker Swarm (from docker): easy to setup and get started, but lack some of advanced autoscaling features. Mesos (from meta): difficult to setup but supports many advanced features. Kubernetes (from google): a bit difficult but provides a lot of options to customize deployments. K8s is now supported on all public clouds (GCP, Azure, AWS). ### Kubernetes Advantages - Integrate multiple hosts or nodes into a cluster. - Dispatch containers to run on different hosts (調度容器運行在不同主機上) - Provide the channel for containers on different hosts to communicate - Connect containers and storage space - Ensure efficient use of resources (確保資源有效利用) - Ensure container healthy check (確保容器運行狀態) - Ensure the security of access within containers (確保容器內存取的安全性) - Virtualize containers to higher-level services (不須直接存取容器, 而是封裝成更高階的服務) # K8s Architecture ## Nodes(Minions) A node is a machine (physical or virtual). It can be on which K8s is installed, or it can be a worker machine where containers will be launched. ### Master Node > the node that manages clusters. - Managers can use CLI or API or Dashboard to communicate, control, modify the cluster status with master. #### api-server the manager will use REST command to transfer jobs to api-server. It will verify and handle commands, and update the status to etcd. (把需要的工作傳送到api-server, api-server會驗證並處理要求的工作, 剛job執行完畢後會update etcd) #### Scheduler Scheduler knows the latest status of worker node. When there's a need to allocate POD, Scheduler will find the suitable nodes and allocate POD. Scheduler 會知道目前 Worker Node 的狀況，當需要配置 Pod 時，Scheduler 會找出最合適的 Node 並配置 Pod。 #### Controller It will get cluster status through api-server, and try to adjust the current status to what the manager wants. Controller 會透過 api-server 了解目前叢集的狀態，並嘗試把目前狀態調整為管理者想要的狀態。 #### etcd key-value storage. It is used to store cluster status. (Can be part of Master or set externally 可以為Master一部份或是被設置在外) 由 go 寫成負責儲存叢集內的狀態與組態設定，包含 Secrets, ConfigMaps 等等 ### Worker Node > A physical or virtual machine. Multiple Pod will be allocated to worker nodes to run. #### container runtime the underlying software that is used to run continers. k8s預設使用 docker 來建立容器 #### kubelet the agent runs on each node in the cluster. It is responsible for making sure that the containers are running on the nodes as expected. kubelet 運行在 Worker Node，負責建立 Pod 中的容器。當 kubelet 收到來自 Master Node 送來 Pod 定義內容時，kubelet 會透過 container runtime 建立 Pod 需要的容器並確保容器狀態是可運行的。 #### pod Basic unit of k8s. k8s 基本運行單位，應用程式 (application) 皆會以 Pod 為單位運行在 k8s 中 ## Install Minikube https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux If you encounter below errors, some possibles reasons 1. kubectl is not added in enbironment variables 2. minikube hasn't started ``` E0520 23:56:21.279159 2595 memcache.go:265] couldn't get current server API group list: Get "https://127.0.0.1:32769/api?timeout=32s": dial tcp 127.0.0.1:32769: connect: connection refused ``` https://minikube.sigs.k8s.io/docs/start/ ## Replicaset - 一種 Controller 負責控制 Replication - 可以利用設定 replica 的方式快速建立 Pod 數量，其實除了建立之外 Replication Controller 也確保了 Pod 的數量與我們設定的 replica 一致 Encountered some errors during creating replicaset.yaml ```bash! error: resource mapping not found for name: "myapp-replicaset" namespace: "" from "replicaset.yaml": no matches for kind "ReplicaSet" in version "v1" ensure CRDs are installed first ``` ```bash! Error from server (BadRequest): error when creating "replicaset.yaml": ReplicaSet in version "v1" cannot be handled as a ReplicaSet: strict decoding error: unknown field "spec.selector.replicas", unknown field "spec.selector.template" ``` check replicaset.yaml and pay attention to the tab. apiVersion has to be set as ```apps/v1```. ([official document](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/replica-set-v1/)) ```yaml= apiVersion: apps/v1 kind: ReplicaSet metadata: name: myapp-replicaset labels: app: myapp spec: selector: matchLabels: app: myapp replicas: 3 template: metadata: name: nginx-2 labels: app: myapp spec: containers: - name: nginx image: nginx ``` under the ```template```, it is Pod's setting. If create successfully, it will show three pods as the setting. It will add different postfix to each pod. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl create -f replicaset.yaml replicaset.apps/myapp-replicaset created shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl get re plicaset NAME DESIRED CURRENT READY AGE myapp-replicaset 3 3 3 39s shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl get po ds NAME READY STATUS RESTARTS AGE myapp-replicaset-pl6mx 1/1 Running 0 53s myapp-replicaset-t624z 1/1 Running 0 53s myapp-replicaset-tnwmt 1/1 Running 0 53s nginx 1/1 Running 1 (28m ago) 7d22h nginx-2 1/1 Running 0 20m ``` If you delete one pod, replicaset will create a new pod. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl get po ds NAME READY STATUS RESTARTS AGE myapp-replicaset-pl6mx 1/1 Running 0 53s myapp-replicaset-t624z 1/1 Running 0 53s myapp-replicaset-tnwmt 1/1 Running 0 53s nginx 1/1 Running 1 (28m ago) 7d22h nginx-2 1/1 Running 0 20m shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl delete pod myapp-replicaset-pl6mx pod "myapp-replicaset-pl6mx" deleted shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl get pods NAME READY STATUS RESTARTS AGE myapp-replicaset-swwkv 1/1 Running 0 7s myapp-replicaset-t624z 1/1 Running 0 22m myapp-replicaset-tnwmt 1/1 Running 0 22m nginx 1/1 Running 1 (50m ago) 7d22h nginx-2 1/1 Running 0 42m ``` Use describe replicaset to check details ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/replicaset$ kubectl descri be replicaset myapp-replicaset Name: myapp-replicaset Namespace: default Selector: app=myapp Labels: app=myapp Annotations: <none> Replicas: 3 current / 3 desired Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=myapp Containers: nginx: Image: nginx Port: <none> Host Port: <none> Environment: <none> Mounts: <none> Volumes: <none> Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 24m replicaset-controller Created pod: myapp-replicaset-pl6mx Normal SuccessfulCreate 24m replicaset-controller Created pod: myapp-replicaset-t624z Normal SuccessfulCreate 24m replicaset-controller Created pod: myapp-replicaset-tnwmt Normal SuccessfulCreate 109s replicaset-controller Created pod: myapp-replicaset-swwkv ``` Same as adding a new pod, replicaset doesn't allow to add a new pod if the label is the same. Once it is added, it will be terminated immediately. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl get pods NAME READY STATUS RESTARTS AGE myapp-replicaset-swwkv 1/1 Running 0 7m2s myapp-replicaset-t624z 1/1 Running 0 29m myapp-replicaset-tnwmt 1/1 Running 0 29m shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl create -f nginx.yaml pod/nginx-2 created shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl get pods NAME READY STATUS RESTARTS AGE myapp-replicaset-swwkv 1/1 Running 0 7m7s myapp-replicaset-t624z 1/1 Running 0 29m myapp-replicaset-tnwmt 1/1 Running 0 29m ``` Use scale to reset the count of replicaset. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl get pods NAME READY STATUS RESTARTS AGE myapp-replicaset-swwkv 1/1 Running 0 7m7s myapp-replicaset-t624z 1/1 Running 0 29m myapp-replicaset-tnwmt 1/1 Running 0 29m shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl scale replic aset myapp-replicaset --replicas=2 replicaset.apps/myapp-replicaset scaled shawnylee@TW-SHAWNYLEE:~/k8s_practice/pods$ kubectl get pods NAME READY STATUS RESTARTS AGE myapp-replicaset-t624z 1/1 Running 0 33m myapp-replicaset-tnwmt 1/1 Running 0 33m ``` ## Deployment ### create deployment.yaml yaml structure ``` apiVersion: apps/v1 kind: Deployment metadata: name: myapp-deployment labels: tier: frontend app: nginx spec: selector: matchLabels: app: myapp replicas: 3 template: metadata: name: nginx-2 labels: app: myapp spec: containers: - name: nginx image: nginx ``` create deployment with cmd ```kubectl create -f deployment.yaml``` ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl create -f deployment.yaml deployment.apps/myapp-deployment created shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl get de ployments NAME READY UP-TO-DATE AVAILABLE AGE myapp-deployment 3/3 3 3 16s shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl get po ds NAME READY STATUS RESTARTS AGE myapp-replicaset-rhf74 1/1 Running 0 21s myapp-replicaset-t624z 1/1 Running 0 83m myapp-replicaset-tnwmt 1/1 Running 0 83m ``` use ```kubectl get all``` to retrieve all the info. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl get al l NAME READY STATUS RESTARTS AGE pod/myapp-replicaset-rhf74 1/1 Running 0 2m4s pod/myapp-replicaset-t624z 1/1 Running 0 85m pod/myapp-replicaset-tnwmt 1/1 Running 0 85m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 13d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/myapp-deployment 3/3 3 3 2m4s NAME DESIRED CURRENT READY AGE replicaset.apps/myapp-replicaset 3 3 3 85m ``` ### Update Deployment strategy: recreate vs rolling update (one-by-one) | spec | recreate | rolling update | | -------- | -------- | -------- | | strategy | kill old pods and then create new pods | kill an old pod and create a new pod at one time | | replicaset scale down | scale down old replicasets to 0 | scale down old replicasets decrementally | ![image](https://hackmd.io/_uploads/SyuyuIjNA.png) use ```kubectl apply -f deployment-definition.yaml``` to apply the changes A new rollout is triggered and new revision is created. #### RollingUpdate 預設值，先建立新的 ReplicaSet 並控制新內容的 Pod ，待新 Pod 也可以正常運作，才會通知 ReplicaSet 將現有的 Pod 移除，由於過程中會有新舊兩種 Pod 同時上線，因此會有一段時間是新舊內容會隨機出現的情形發生。 ```maxSurge```：代表 ReplicaSet 可以產生比 Deployment 設定檔中的 replica 所設定數量還多幾個出來，多增加 Pod 的好處為在滾動更新的過程可以減少舊內容顯示在頁面上的機率。 ```maxUnavailable```：代表在滾動更新的過程中，可以允許多少的 Pod 無法使用，假如 maxSurge 設定非 0 ，maxUnavailable 也要設定非 0。 #### Recreate 先通知當前 ReplicaSet 把砍掉舊有的 Pod 後再產生新的 ReplicaSet 並控制新內容的 Pod ，由於先砍掉 Pod 才建立新的 Pod ，所以中間會有一段伺服器無法連線的時間。也因為 Recreate 會砍掉重建，因此 Recreate 就無法像 RollingUpdate 設定 maxSurge 以及 maxUnavailable。 ### Rollback check rollout status ```kubectl rollout status deployment/myapp-deployment ``` demo: when you check rollout status right after the rollout. You can see the process. It will try to bring up the pod at one time. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl create -f deployment.yaml deployment.apps/myapp-deployment created shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout status deployment.app/myapp-deployment deployment "myapp-deployment" successfully rolled out shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl delete deployment myapp-deployment deployment.apps "myapp-deployment" deleted shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl create -f deployment.yaml deployment.apps/myapp-deployment created shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout status deployment.app/myapp-deployment Waiting for deployment "myapp-deployment" rollout to finish: 0 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 1 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 2 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 3 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 4 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 5 of 6 updated replicas are available... deployment "myapp-deployment" successfully rolled out ``` check rollout history ```kubectl rollout history deployment/myapp-deployment``` demo: there's also a change-cause column. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout history deployment.app/myapp-deployment deployment.apps/myapp-deployment REVISION CHANGE-CAUSE 1 <none> ``` when adding ```--record``` during create, you can see the change-cause recording the previous command. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl create -f deployment.yaml --record Flag --record has been deprecated, --record will be removed in the future deployment.apps/myapp-deployment created shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout status deployment.app/myapp-deployment Waiting for deployment "myapp-deployment" rollout to finish: 2 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 3 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 4 of 6 updated replicas are available... Waiting for deployment "myapp-deployment" rollout to finish: 5 of 6 updated replicas are available... deployment "myapp-deployment" successfully rolled out shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout history deployment.app/myapp-deployment deployment.apps/myapp-deployment REVISION CHANGE-CAUSE 1 kubectl create --filename=deployment.yaml --record=true ``` check rollout undo ```kubectl rollout undo deployment/myapp-deployment``` edit the deployment directly, and the deployment will rollout automatically. ``` shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl edit deployment.app/myapp-deployment --record deployment.apps/myapp-deployment edited ``` When you check the result, revision becomes 2, and there's change-cause and events. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl describe deployment myapp-deployment Name: myapp-deployment Namespace: default CreationTimestamp: Mon, 03 Jun 2024 23:31:36 +0800 Labels: app=nginx tier=frontend Annotations: deployment.kubernetes.io/revision: 2 kubernetes.io/change-cause: kubectl edit deployment.app/myapp-deployment --record=true Selector: app=myapp Replicas: 6 desired | 6 updated | 6 total | 6 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=myapp Containers: nginx: Image: nginx:1.26 Port: <none> Host Port: <none> Environment: <none> Mounts: <none> Volumes: <none> Node-Selectors: <none> Tolerations: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: myapp-deployment-8874cd976 (0/0 replicas created) NewReplicaSet: myapp-deployment-6f75d7f59c (6/6 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 21m deployment-controller Scaled up replica set myapp-deployment-8874cd976 to 6 Normal ScalingReplicaSet 2m19s deployment-controller Scaled up replica set myapp-deployment-6f75d7f59c to 2 Normal ScalingReplicaSet 2m19s deployment-controller Scaled down replica set myapp-deployment-8874cd976 to 5 from 6 Normal ScalingReplicaSet 2m19s deployment-controller Scaled up replica set myapp-deployment-6f75d7f59c to 3 from 2 Normal ScalingReplicaSet 2m10s deployment-controller Scaled down replica set myapp-deployment-8874cd976 to 4 from 5 Normal ScalingReplicaSet 2m10s deployment-controller Scaled up replica set myapp-deployment-6f75d7f59c to 4 from 3 Normal ScalingReplicaSet 2m8s deployment-controller Scaled down replica set myapp-deployment-8874cd976 to 3 from 4 Normal ScalingReplicaSet 2m8s deployment-controller Scaled up replica set myapp-deployment-6f75d7f59c to 5 from 4 Normal ScalingReplicaSet 2m6s deployment-controller Scaled down replica set myapp-deployment-8874cd976 to 2 from 3 Normal ScalingReplicaSet 2m6s deployment-controller Scaled up replica set myapp-deployment-6f75d7f59c to 6 from 5 Normal ScalingReplicaSet 2m2s (x2 over 2m4s) deployment-controller (combined from similar events): Scaled down replica set myapp-deployment-8874cd976 to 0 from 1 ``` Another way is to use ```set image``` to change the image version of deployment. ```bash= shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl set image deployment myapp-deployment nginx=nginx:1.25 --record Flag --record has been deprecated, --record will be removed in the future deployment.apps/myapp-deployment image updated shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout status deployment.app/myapp-deployment Waiting for deployment "myapp-deployment" rollout to finish: 4 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 4 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 4 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 4 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 5 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 5 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 5 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 5 out of 6 new replicas have been updated... Waiting for deployment "myapp-deployment" rollout to finish: 2 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 2 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 2 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 1 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 1 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 1 old replicas are pending termination... Waiting for deployment "myapp-deployment" rollout to finish: 5 of 6 updated replicas are available... deployment "myapp-deployment" successfully rolled out shawnylee@TW-SHAWNYLEE:~/k8s_practice/deployment$ kubectl rollout history deployment.app/myapp-deployment deployment.apps/myapp-deployment REVISION CHANGE-CAUSE 1 kubectl create --filename=deployment.yaml --record=true 2 kubectl edit deployment.app/myapp-deployment --record=true 3 kubectl set image deployment myapp-deployment nginx=nginx:1.25 --record=true ``` ## Network kubernetes networking是設計來讓K8S內運行的實體可以相互溝通 * 在不使用NAT的方式下，Pod之間要能溝通 * 在不使用NAT的方式下，所有節點的Pod要能相互溝通 * Pod 認為自己的 IP 與其他 Pod 認為的 IP 相同所以根據以上三種需求加上所謂分層級的設計理念，所以發展出接下來說明的幾種情境。 ### Container -- container 容器間的網路是在Pod network namespace內產生。Network namespace讓網路介面與路由可以從整個系統做到抽離與獨立運行每個Pod都有自已的Network namespace與container。在同一個Pod內部會共享同一個IP與Port。 ![image](https://hackmd.io/_uploads/HJbVL6HH0.png) ### Pod -- Pod 在K8S Cluster內部，每個節點都會有指定好的IP範圍來提供給Pod使用。確保每個Pod都會有一個唯一的IP(真實IP)位置。當一個新的Pod建立後，不會拿到已經有人在使用的IP位置。不像是容器對容器的網路直接在內部就交換完畢，Pod-to-Pod的溝通使用IP，不論將Pod部署在同一個節點或是不同的節點。 ![image](https://hackmd.io/_uploads/Bk9D8pHrC.png) 上圖顯示Pod是如何相互溝通。透過virtual ethernet device or veth pair(veth0/veth1) 來實現流量在Pod network namespace與root network namespace之間的移動。Virtual Bridge (L2)會連接這些虛擬介面，允許流量透過ARP來進行移動。 Virtual Bridge會負責檢查透過的封包目的地來決定是否要轉送到已連接的其他網段，並且負責維護這個轉發表，透過檢查MAC來決定是否要drop。接著，我們來看看如果資料要從Pod1流向Pod2的流程(同節點的移動)流程，可直接參考上圖： 1. Pod1的流量透過eth0往Root network namespace的虛擬介面veth0 2. 流量透過veth0流向virtual bridge，這個virtual bridge與veth1連接(在L2層級，使用ARP協定來發現與廣播到所有相連的介面) 3. 流量透過virtual bridge流向veth1(透過arp table定義誰要回應) 4. 流量抵達Pod2的eth0 了解同一個節點內資料的流動方式之後，讓我們再來看一下如何要到其他節點的話，該怎麼流動： ![image](https://hackmd.io/_uploads/H19JwTSBC.png) 1. 封包從Pod1出發，走到vbridge後因為找不到連接的目的地，所以bridge將封包往default route(eth0)，此時封包就準備要離開這個節點。 2. 假設網路環境是可以將這個IP正確路由到對的節點 3. 封包進入另一個節點的root namespace，並且透過virtual bridge連結到正確的veth1 4. 透過veth pair正確將封包傳到Pod4的eth0 ### Pod -- Service Pod是一個非常動態的元件。因為可能會有動態的擴展、也可能因為應用服務的需求而需要重新建立。這些情況可能導致Pod IP改變，會導致服務有無法使用的挑戰。 Kubernetes透過Service的方式來解決這個問題，透過以下做法： Service會在前端被指派一個靜態的虛擬IP，用來連接所有後端的Pod 這個虛擬IP會負載平衡所有要到後端Pod的流量持續追蹤Pod的IP位置(就算是Pod IP 改變)，因為外部使用者只會連結到前端的Service vip。上述的負載平衡會用以下二種方式實現： * IPTABLES：kube-proxy監看API Server的變化，當新的Service產生，就將Iptables rules塞進去。由規則捕捉往Service Cluster IP與Port的流量。並且重導至後端的Pod。 * IPVS：IPVS是建構在Netfilter與傳輸層之上的負載平衡。IPVS會使用Netfilter hook功能，使用hash table做為低層的資料結構並且在核心空間運行。 ![image](https://hackmd.io/_uploads/H1ElFprBC.png) [Reference](https://weng-albert.medium.com/%E6%B7%BA%E8%AB%87kubernetes%E5%85%A7%E9%83%A8%E7%B6%B2%E8%B7%AF%E9%80%9A%E4%BF%A1%E7%9A%84%E5%9F%BA%E6%9C%AC%E8%A7%80%E5%BF%B5-e9d993e01423) ## Services > Kubernetes Service 是個抽象化的概念，主要定義了邏輯上的一群 Pod 以及如何存取他們的規則。 ![image](https://hackmd.io/_uploads/S1O-ppBH0.png) 除外部使用者會透過 Service 存取內部 Pod 以外 (路徑 1 -> 2)，同集群其他的 Pod 也有可能需要存取 (路徑 3 -> 2)。值得注意的是兩條路徑的存取方式以及存取的 IP 位址有所不同 ### 什麼是邏輯上的一群 Pod 簡單一句話描述就是 > 帶著相同標籤、做類似事情的一群 Pod 每個 Pod 本身會帶著一或多個不等的標籤在身上，當 Service 指定存取某些特定的標籤時，Label Selector 會根據 Pod 身上自帶的標籤進行分組，並回傳 Service 所要求的 Pod 資訊。上圖右邊共有三種分別為黃、綠、藍色的 Pod，正代表著集群內有三種不同的 Pod 群組，當今天收到使用者請求時，便會將請求送至對應的藍色群組內的其中一個 Pod 進行處理。 --- Service 作為中介層，避免使用者和 Pod 進行直接連線，除了讓我們服務維持一定彈性能夠選擇不同的 Pod 來處理請求外，某種程度上亦避免裸露無謂的 Port 而導致資安問題。另外，也體現出雲服務架構設計中一個非常重要的觀念: > 對於使用者而言，僅須知道有人會處理他們的請求，而毋須知道實際上處理的人是誰。 Service 定義檔主要涵括三個主要元素: #### 服務元資料 (Metadata) 簡單來說就是服務的名稱，讓其他人瞭解該服務的用途 ```yaml metadata: name: service-example ``` #### 被存取的應用之標籤 (Label) 由於每個 Pod 本身會帶有一至多個標籤，如何將使用者請求送到正確的 Pod，仰賴管理者設定的標籤是否得當。比方說，今天有 Nginx 以及 Apache 兩種網頁伺服器在運作，維運人員希望將流量導向至 Nginx，他們只要在 Pod 的 spec.selector 設定一組 app: nginx 的標籤，接著在 Service 內定義: ```yaml spec: selector: app: nginx ``` Service 便會根據定義檔內所設定的標籤，透過 Label Selector 找到對應的 Pod 後，建立相對應的 Iptable 規則。如此一來，當使用者請求送達 Kubernetes cluster 時，便會依照 Iptable 規則將封包繞送到實際執行的 Pod 內。 #### 存取該服務的方式服務要開放給外界使用時，需要定義該服務的 Port、Protocol。以常見網頁伺服器來打比方: Port: 80、443 Protocol: TCP ```yaml spec: ports: - name: http # the port for client to access port: 80 # the port that web server listen on targetPort: 80 # TCP/UDP are available, default is TCP protocol: TCP - name: https port: 443 targetPort: 443 protocol: TCP ``` 而完整的 spec.ports 共包含以下幾個欄位: * name: 讓維運人員瞭解該埠用途 * port: 對外部開放的埠號 * targetPort: 實際 Pod 所開放的埠號 * protocol (optional): 該服務使用的協定目前有 TCP/UDP 兩種，預設為 TCP * nodePort (optional): 此設定只有在 spec.type 為 NodePort 或 LoadBalancer 才會存在 [Reference](https://tachingchen.com/tw/blog/kubernetes-service/) ### Service 存取路徑裡跟外有兩種角色會透過 Service 存取其他的 Pod，分別是 Kubernetes 集群外部使用者以及集群內部其他 Pod。儘管兩種路徑的結果都相同，但之間仍存在一個關鍵差異在於: 存取 IP 不同。實際上在 Kubernetes 內共有三種 IP 存在 * External IP: 裸露的網路位址，供外部使用者連線 * Cluster IP: 叢集內部的網路位址，在 Cluster 內的 Pod 可以透過此位址存取該服務 * Pod IP: 每個 Pod 獨有的網路位址，只有叢集內可連線 ```bash $ kubectl get service NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE example-svc 10.31.251.56 35.185.145.186 80/TCP 2d ``` 一個 Service 會擁有兩種 IP * Cluster IP: 該服務在集群內用來存取的 IP 位址 * External IP: 指定 service 為 type: LoadBalancer 時，Cloud Provider (如 GCP、AWS) 配發的 Public IP 每個 Pod 本身也有自己獨立的 Pod IP，此 IP 可以被叢集內的 Pod 直接存取。 *三者關係圖* ![image](https://hackmd.io/_uploads/HyVJG0rBA.png) 路徑 1 -> 2: - 外部使用者連線到時的 destination 為 External IP，當使用者請求送到 Service 時會經由 iptables NAT 將 destination 替換成 Pod IP。路徑 3 -> 2: - 內部 Pod存取服務時的 destination 為 Cluster IP，當使用者請求送到 Service 時會經由 iptables NAT 將 destination 替換成 Pod IP。 [Reference](https://tachingchen.com/tw/blog/kubernetes-service-in-detail-1/) ## Microservice architecture ![image](https://hackmd.io/_uploads/rJMf4ASBC.png) 如果以一開始container的方式建立, 需要透過link才能建立每個container的關係 ![image](https://hackmd.io/_uploads/rkJQI0rS0.png) 先分析不同pods的連結關係 1. voting app需要向redis(in-memory database)存取資料 2. worker會將redis的資料放進postgres 3. resulting app會去postgres拿取結果 ![image](https://hackmd.io/_uploads/S1nDUCHrR.png) 由於voting-app和resulting-app會開放給外部使用者, 因此需要有對外的port, 而redis, postgres則需要有對內的port來讓pod之間溝通 ![image](https://hackmd.io/_uploads/rJdrjCHB0.png) 因此, 除了worker因為不需要其他pod的溝通, 不需要建立service(而是做為pod/node就足夠) voting-app和resulting-app需要有對外的service redis和postgres則需要有對內的service pods之間的溝通還是以service為主 ![image](https://hackmd.io/_uploads/BJKsT0BHC.png)