[雲端] K8s / 指令集

[雲端] K8s / 指令集 === ###### tags: `雲端 / K8s` ###### tags: `雲端`, `K8s`, `GPUs`, `張數`, `Available` ![](https://i.imgur.com/vsCp0RX.png) [TOC] # doc - [kubectl Commands](https://jamesdefabia.github.io/docs/user-guide/kubectl/kubectl/) # 名詞解釋 ## kube-proxy - 9789864768226-Kubernetes：建置與執行 (Page 88) ![](https://i.imgur.com/QqaqMGl.png) - kube-proxy 透過 API 伺服器監視叢集中的新 service。 - 然後，它在該主機編寫一組 iptables 的規則， - 以重寫封包的目的地，讓其直接進入該 service 其中一個 endpoint - 假如某個 service 中的 endpoint 發生變化（可能由於 Pod 關閉，或是失敗的 readiness 檢查）將會重寫該組 iptables 規則 ## cluster IP - 9789864768226-Kubernetes：建置與執行 (Page 88) ![](https://i.imgur.com/QqaqMGl.png) - cluster IP 通常在建立 service 時由 API 伺服器所分配。 - 但是當建立 service，用戶可以指定 IP。 - 一旦建立後，除非刪除或重建 service 物件，否則無法改變。 ## yaml - yaml = 配置檔 - 9789864768226-Kubernetes：建置與執行 (Page 118) - 使用配置文件 - ```kubectl apply -f xxx.yaml``` 提交 xxx(pod, deployment, ...) - 動作名稱：提交配置檔 ### spec - 9789864768226-Kubernetes：建置與執行 (Page 115) - (資源的)規格 ### template - 9789864768226-Kubernetes：建置與執行 (Page 115) - (資源的)模板 ## pod - backoff 退出 - crashLoop Backoff 當機迴圈的退出狀態 (程式一啟動就崩潰，不斷重啟中) - back-off 倒回 - exponential backoff 指數倒回/指數後退 ## job - 9789864768226-Kubernetes：建置與執行 (Page 115~118) - 用來管理「短期」的工作負載 - 用來管理「批次處理類型」的工作負載 - **pod v.s. job** - pod：常駐(長時間執行)的工作負載(程序)，如：網站應用程式、資料庫 - 中止條件：升級、不再需要該服務 - job：短期的工作負載(程序)，如：批次作業、資料庫移轉 - 性質：一次性任務，會有退出狀態碼 <hr> # config ## default config ```bash $ mkdir ~/.kube $ sudo cp /etc/kubernetes/admin.conf ~/.kube/config $ sudo chown tj_tsai:tj_tsai ~/.kube/confg ``` - 預設 k8s config 是放在 `/etc/kubernetes/admin.conf` 把它複製一份到自己家目錄下的 `.kube/config` - 測試 k8s ```bash $ kubectl get node $ kubectl get namespace ``` <hr> # kubeadm ## image ``` list Print a list of images kubeadm will use. The configuration file is used in case any images or image repositories are customized pull Pull images used by kubeadm ``` - ### kubeadm 會使用的 image ```bash= $ sudo kubeadm config images list k8s.gcr.io/kube-apiserver:v1.21.1 k8s.gcr.io/kube-controller-manager:v1.21.1 k8s.gcr.io/kube-scheduler:v1.21.1 k8s.gcr.io/kube-proxy:v1.21.1 k8s.gcr.io/pause:3.4.1 k8s.gcr.io/etcd:3.4.13-0 k8s.gcr.io/coredns/coredns:v1.8.0 ``` - ### 在建置 K8s 前，預先抓取 kubeadm 會使用的 image ```bash= $ sudo kubeadm config images pull ``` <hr> # kubectl ## [安裝 kubectl](https://medium.com/@vishal.sharma./configure-local-kubeclt-to-manage-remote-kubernetes-cluster-51b0a1adafe1) ``` $ sudo apt install kubeclt -y $ kubeclt version ``` ## 叢集管理(Cluster Management) ``` Cluster Management Commands: certificate Modify certificate resources. cluster-info Display cluster info top Display Resource (CPU/Memory/Storage) usage. cordon Mark node as unschedulable uncordon Mark node as schedulable drain Drain node in preparation for maintenance taint Update the taints on one or more nodes ``` ### [顯示叢集資訊(Display cluster info)](https://www.digitalocean.com/community/tutorials/how-to-install-software-on-kubernetes-clusters-with-the-helm-3-package-manager) ```bash= $ kubectl cluster-info Kubernetes master is running at https://127.0.0.1:34925 KubeDNS is running at https://127.0.0.1:34925/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'. ``` <hr> ## 基本指令 (初學者 Beginner) ``` Basic Commands (Beginner): create Create a resource from a file or from stdin. expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service run Run a particular image on the cluster set Set specific features on objects ``` ### [kubectl create](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#create) - ### kubectl create namespace - 建立 namespace ```bash= $ kubectl create namespace tj-namespace namespace/tj-namespace created # namespace 可以用縮寫 ns 來表示 # kubectl create ns tj-namespace $ kubectl get ns NAME STATUS AGE default Active 3d18h kube-node-lease Active 3d18h kube-public Active 3d18h kube-system Active 3d18h local-path-storage Active 3d18h tj-namespace Active 46s tj-nginx-operator-system Active 3d17h $ kubectl delete ns tj-namespace namespace "tj-namespace" deleted ``` 亦可透過 yaml 建立： ```bash $ nano my-ns.yaml ``` ```yaml apiVersion: v1 kind: Namespace metadata: name: tj-namespace ``` ```bash $ kubectl apply -f my-ns.yaml ``` - ### [kubectl create deployment](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#-em-deployment-em-) - 建立 3 個 kuard server, 並搭載一個 load balancer ![](https://i.imgur.com/VswH1Cx.png) 建立 deployment ```bash $ kubectl create deployment tj-kuard-deployment \ --image gcr.io/kuar-demo/kuard-amd64:1 \ --port 8080 \ --replicas 3 ``` 檢視 deployment & pod ```bash $ kubectl get deploy -o wide NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR tj-kuard-deployment 3/3 3 3 17s kuard-amd64 gcr.io/kuar-demo/kuard-amd64:1 app=tj-kuard-deployment $ kubectl get pod -o wide NAME READY STATUS RESTARTS IP tj-kuard-deployment-767656b6fb-2z4c8 1/1 Running 0 10.244.1.229 tj-kuard-deployment-767656b6fb-czpgl 1/1 Running 0 10.244.0.42 tj-kuard-deployment-767656b6fb-hjzlq 1/1 Running 0 10.244.1.230 ``` 對內開放出 deployment ，並搭載一個 service 與其連接 ``` $ kubectl expose deploy/tj-kuard-deployment $ kubectl get svc -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) SELECTOR tj-kuard-deployment ClusterIP 10.108.131.100 <none> 8080/TCP app=tj-kuard-deployment ``` 測試 service 連線 ``` $ curl 10.108.131.100:8080 ``` 每次連線結果，可能連到的 pod 都不一樣 ``` $ curl 10.108.131.100:8080 | grep tj-kuard-deployment "hostname":"tj-kuard-deployment-767656b6fb-czpgl" "hostname":"tj-kuard-deployment-767656b6fb-2z4c8" "hostname":"tj-kuard-deployment-767656b6fb-hjzlq" ... ``` 將 service 對外部開放，使用 ingress 銜接 ```tj-kuard-load-balancer.yaml``` 底下有兩種版本：```v1beta``` & ```v1``` ([參考資料](https://kubernetes.io/docs/concepts/services-networking/ingress/#types-of-ingress)) ```yaml= apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: name: tj-kuard-load-balancer spec: backend: serviceName: tj-kuard-deployment servicePort: 8080 ``` ```yaml= apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: tj-kuard-load-balancer spec: defaultBackend: service: name: tj-kuard-deployment port: number: 8080 ``` ``` $ kubectl apply -f tj-kuard-load-balancer.yaml ``` - ### kubectl create secret - [[Day 12] 敏感的資料怎麼存在k8s?! - Secrets](https://ithelp.ithome.com.tw/articles/10195094) ```$ kubectl create secret ...``` ### 關於 K8s secret 的簡單介紹 - 用途： - 儲存隱私資訊，像是帳號、密碼 - 沒有加密，只是 base64 暗碼 - 權限 - 一般操作情況下，任何人都可以存取 - 若要保護，還需要用 service account 控制 - container 要如何存取 secret，兩種方法 - 從 env (環境變數) 存取 ``` root@my-pod-with-secret:/app# env ... user_name=tj user_password=asus#1234 ... ``` - 掛載到 volume，從 volumn 存取檔案 - ```/mnt/tj/user_name``` - ```/mnt/tj/user_password``` - K8s 的操作 - 建立 - 從參數帶入「檔案路徑」（檔名：key, 檔案內容：value） - 從參數帶入「key/value pairs」 - 透過 yaml 建立 - 檢索 ```bash $ kubectl get secret NAME TYPE DATA AGE default-token-vb9sd kubernetes.io/service-account-token 3 12d secret-from-file-demo Opaque 2 5h59m secret-from-literal-demo Opaque 2 5h35m secret-from-yaml-demo Opaque 2 5h3m ``` - [[官網] Secrets](https://kubernetes.io/docs/concepts/configuration/secret/) ``` $ kubectl create secret generic empty-secret $ kubectl get secret empty-secret NAME TYPE DATA AGE empty-secret Opaque 0 2m6s ``` <hr> ## 基本指令 (中級 Intermediate) ``` Basic Commands (Intermediate): explain Documentation of resources get Display one or many resources edit Edit a resource on the server delete Delete resources by filenames, stdin, resources and names, or by resources and label selector ``` ### kubectl get - 存取資源物件 - kubectl get pods kubectl get pod kubectl get po - kubectl get endpoints kubectl get ep - `kubectl get all` 不會列出底下資源 - secret - pv, pvc - role ### kubectl edit - kubectl edit deploy/tj-guard-deployment ``` readinessProbe: httpGet: path: /ready port: 8080 scheme: HTTP initialDelaySeconds: 2 periodSeconds: 2 failureThreshold: 3 successThreshold: 1 ``` - readInessProbe - I: initialDelaySeconds - P: periodSeconds - readinESsProbe - E: error -> failureThreshold - S: successThreshold - 參考資料 - [Pod 的健康檢查方式](https://ithelp.ithome.com.tw/articles/10204947) #livenessProbe - [Pod 的健康檢查方式 Part-2](https://ithelp.ithome.com.tw/articles/10205264) #readinessProbe ### kubectl delete - kubectl delete deployment ```bash= $ kubectl get deployment --all-namespaces NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE kube-system coredns 2/2 2 2 3d18h local-path-storage local-path-provisioner 1/1 1 1 3d18h tj-nginx-operator-system tj-nginx-operator-controller-manager 0/1 1 0 3d17h $ kubectl delete deployment tj-nginx-operator-controller-manager Error from server (NotFound): deployments.apps "tj-nginx-operator-controller-manager" not found # 需要指定 namespace $ kubectl delete deployment tj-nginx-operator-controller-manager --namespace tj-nginx-operator-system deployment.apps "tj-nginx-operator-controller-manager" deleted ``` - deployment-resource ``` NAME SHORTNAMES APIGROUP NAMESPACED KIND deployments deploy apps true Deployment deployments deploy extensions true Deployment ``` - kubectl delete service 刪除不在當前 namespace 的 service，要加上 ```--namespace``` ```bash= $ kubectl get service --all-namespaces -o wide NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d21h <none> helm-lab first-pod-service NodePort 10.107.84.42 <none> 80:30080/TCP 34m app=webserver kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4d21h k8s-app=kube-dns tj-nginx-operator-system tj-nginx-operator-controller-manager-metrics-service ClusterIP 10.105.110.249 <none> 8443/TCP 4d20h control-plane=controller-manager $ kubectl delete service tj-nginx-operator-controller-manager-metrics-service --namespace tj-nginx-operator-system service "tj-nginx-operator-controller-manager-metrics-service" deleted $ kubectl get service --all-namespaces -o wide NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d21h <none> helm-lab first-pod-service NodePort 10.107.84.42 <none> 80:30080/TCP 37m app=webserver kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4d21h k8s-app=kube-dns ``` - kubectl delete all 當 helm 佈署服務到 k8s, 一旦有問題又無法反安裝，這時需要手動從 K8s 上移除相關服務 ```bash= $ kubectl get all -n airflow | awk '{print $2}' | xargs kubectl delete -n airflow ``` <hr> ## 佈署指令 ``` Deploy Commands: rollout Manage the rollout of a resource scale Set a new size for a Deployment, ReplicaSet or Replication Controller autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController ``` <hr> ## 故障排除和除錯指令 ``` Troubleshooting and Debugging Commands: describe Show details of a specific resource or group of resources logs Print the logs for a container in a pod attach Attach to a running container exec Execute a command in a container port-forward Forward one or more local ports to a pod proxy Run a proxy to the Kubernetes API server cp Copy files and directories to and from containers. auth Inspect authorization ``` ### kubectl describe > 通常用來查詢事件 (event)，看看是什麼原因導致無法運作 - 顯示特定 pod 的詳細資訊 ``` $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tj-pod-kuard 1/1 Running 0 20m 10.244.1.31 alprworker-1203417-iaas <none> <none> $ kubectl describe pod tj-pod-kuard ``` - 顯示特定 service 的詳細資訊 ``` $ kubectl get service -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR first-pod-service NodePort 10.110.65.198 <none> 80:30080/TCP 17m app=webserver kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h35m <none> $ kubectl describe service first-pod-service ``` ### kubectl port-forward - 使用 ssh 重導到本機端電腦 > SSH Tunnel 就是做到了 Port Forwarding 的功用 ```bash $ ssh -i key.pem ubuntu@203.145.218.3 -L 8080:localhost:30511 ``` 就可以透過本機端 http://localhost:8080 存取 http://203.145.218.3:30511 ![](https://i.imgur.com/B31e8nE.png) ![](https://i.imgur.com/7BKirRx.png) - 參考資料 - 9789864768226-Kubernetes：建置與執行, page 82 - [關於 SSH Tunnel 連線 (SSH Proxy, SSH Port Forwarding)](https://yu-jack.github.io/2019/01/08/ssh-tunnel/) - **示範**：建立一個 pod，沒有 service，接著透過本機端存取該 pod - 運作原理： - http://localhost:38080/ - -> remote:8080 (K8s: port-forward) - -> remote:3000 (K8s: pod) - 操作過程 ```bash # 位於伺服端 # 建立一個 blue-whale-pod $ kubectl run blue-whale-pod --image=hcwxd/blue-whale --port=3000 # 檢視 blue-whale-pod 的狀況 $ kubectl get pod/blue-whale-pod -o wide # 建立一個反向代理：將 8080 轉到 pod/blue-whale-pod:3000 $ kubectl port-forward pod/blue-whale-pod 8080:3000 Forwarding from 127.0.0.1:8080 -> 3000 ``` 再開另一個 terminal，並連到伺服端，測試 blue-whale-pod 的連線 ```bash # 位於伺服端 $ curl 127.0.0.1:8080 ``` 開另一個 terminal，並連到遠端，建立本地端與伺服端的 tunnel ```bash # 位於本地端，連線到伺服端時，加上 -L 轉埠參數 $ ssh -i key.pem ubuntu@203.145.218.3 -L 38080:localhost:8080 ``` 開啟本地端的瀏覽器：檢視 http://localhost:38080/ [![](https://i.imgur.com/4yVYCzs.png =60%x)](https://i.imgur.com/4yVYCzs.png) - 連線到 10.78.26.241 -> minikube -> airflow-web-service -> airflow-web-pob ```bash $ kubectl port-forward -n tj-airflow service/tj-airflow-cluster-web --address 10.78.26.241 9090:18080 Forwarding from 10.78.26.241:9090 -> 8080 ``` ### kubectl auth > auth: Inspect authorization - `kubectl auth can-i create pod` - `kubectl auth can-i get pod` - `ubectl auth reconcile -f xxx.yaml` 類似 `kubectl apply -f xxx.yaml` <hr> ## 進階指令(Advanced): ``` Advanced Commands: diff Diff live version against would-be applied version apply Apply a configuration to a resource by filename or stdin patch Update field(s) of a resource using strategic merge patch replace Replace a resource by filename or stdin wait Experimental: Wait for a specific condition on one or many resources. convert Convert config files between different API versions kustomize Build a kustomization target from a directory or a remote url. ``` ### kubectl apply -f xxx.yaml (kind: ResourceQuota) - [[Day 27] 限制某一 Namespaces 的運算資源](https://ithelp.ithome.com.tw/articles/10197186) ### kubectl delete - ### 取消 apply - **Question** `kubectl apply -f xxx.yaml` 如果要取消 apply，指令要如何下？ - **ChatGPT** 要取消 `kubectl apply` 命令應用的配置，您可以使用 `kubectl delete` 命令來刪除已應用的資源。請使用相同的配置文件（`xxx.yaml`）或提供相同的資源名稱以刪除它們。如果您的配置文件中包含 `metadata.name` 屬性，您可以使用以下方式刪除指定資源： ```bash kubectl delete -f xxx.yaml ``` 這將根據配置文件中指定的名稱刪除相應的資源。如果您的配置文件中包含多個資源，您可以根據需要使用 -l（標籤）或其他選項來選擇性地刪除資源。例如，您可以使用標籤選擇器刪除帶有特定標籤的資源： ```bash kubectl delete pods -l app=my-app ``` 這將刪除帶有 `app=my-app` 標籤的所有 Pod。請謹慎使用 `kubectl delete` 命令，因為它將永久刪除資源。確保您確實希望刪除這些資源，並且您已經做好了必要的備份和恢復措施。 <hr> ## 設定指令(Settings): ``` Settings Commands: label Update the labels on a resource annotate Update the annotations on a resource completion Output shell completion code for the specified shell (bash or zsh) ``` ### kubectl label - 簡單範例，選一個 pod ```bash # 檢視標籤 $ kubectl get pod/tj-prod --show-labels NAME READY STATUS RESTARTS AGE LABELS tj-prod 1/1 Running 0 3h48m run=tj-prod # 新增標籤 # name=tj_tsai # id=A1600128 $ kubectl label pod/tj-prod name=tj_tsai id=A1600128 pod/tj-prod labeled # 檢視標籤 $ kubectl get pod/tj-prod --show-labels NAME READY STATUS RESTARTS AGE LABELS tj-prod 1/1 Running 0 3h48m id=A1600128,name=tj_tsai,run=tj-prod # 移除標籤 $ kubectl label pod/tj-prod name- id- pod/tj-prod labeled # 檢視標籤 $ kubectl get pod/tj-prod --show-labels NAME READY STATUS RESTARTS AGE LABELS tj-prod 1/1 Running 0 3h49m run=tj-prod ``` - [替 node 加上 disk=ssd 標籤](https://ithelp.ithome.com.tw/articles/10221929?sc=rss.iron) > tags: (Affinity, Anti-Affinity, Taint 污點, Toleration 容忍) ```bash $ kubectl get no --show-labels $ kubectl get no --show-labels | awk '{print $6}' | sed 's/,/\n/g' LABELS beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=minikube kubernetes.io/os=linux minikube.k8s.io/commit=9f1e482427589ff8451c4723b6ba53bb9742fbb1 minikube.k8s.io/name=minikube minikube.k8s.io/updated_at=2020_12_28T14_24_11_0700 minikube.k8s.io/version=v1.16.0 node-role.kubernetes.io/control-plane= node-role.kubernetes.io/master= # 以上沒有 disk=ssd 標籤 # 替 node 加上 disk=ssd 標籤 $ kubectl label no minikube disk=ssd # 查看 node 的標籤 $ kubectl get no --show-labels | awk '{print $6}' | sed 's/,/\n/g' LABELS ... disk=ssd ... # 移除 disk=ssd 標籤：key 尾端加上減號 $ kubectl label no minikube disk- ``` <hr> ## 其他用法 ``` Other Commands: alpha Commands for features in alpha api-resources Print the supported API resources on the server api-versions Print the supported API versions on the server, in the form of "group/version" config Modify kubeconfig files plugin Provides utilities for interacting with plugins. version Print the client and server version information ``` ### kubectl api-resources > Print the supported API resources on the server > 印出在伺服器上有支援的 API 資源 ``` $ kubectl api-resources ``` | NAME | SHORT NAMES | APIGROUP | NAME SPACED | KIND | |---------------------------------|------------|------------------------------|------------|--------------------------------| | apiservices | | apiregistration.k8s.io | FALSE | APIService | | aplogconfs | | appprotect.f5.com | TRUE | APLogConf | | appolicies | | appprotect.f5.com | TRUE | APPolicy | | bindings | | | TRUE | Binding | | certificatesigningrequests | csr | certificates.k8s.io | FALSE | CertificateSigningRequest | | clusterrolebindings | | rbac.authorization.k8s.io | FALSE | ClusterRoleBinding | | clusterroles | | rbac.authorization.k8s.io | FALSE | ClusterRole | | componentstatuses | cs | | FALSE | ComponentStatus | | configmaps | cm | | TRUE | ConfigMap | | controllerrevisions | | apps | TRUE | ControllerRevision | | cronjobs | cj | batch | TRUE | CronJob | | csidrivers | | storage.k8s.io | FALSE | CSIDriver | | csinodes | | storage.k8s.io | FALSE | CSINode | | customresourcedefinitions | crd,crds | apiextensions.k8s.io | FALSE | CustomResourceDefinition | | daemonsets | ds | apps | TRUE | DaemonSet | | daemonsets | ds | extensions | TRUE | DaemonSet | | deployments | deploy | apps | TRUE | Deployment | | deployments | deploy | extensions | TRUE | Deployment | | endpoints | ep | | TRUE | Endpoints | | events | ev | | TRUE | Event | | events | ev | events.k8s.io | TRUE | Event | | globalconfigurations | gc | k8s.nginx.org | TRUE | GlobalConfiguration | | horizontalpodautoscalers | hpa | autoscaling | TRUE | HorizontalPodAutoscaler | | ingresses | ing | extensions | TRUE | Ingress | | ingresses | ing | networking.k8s.io | TRUE | Ingress | | ingressroutes | | traefik.containo.us | TRUE | IngressRoute | | ingressroutetcps | | traefik.containo.us | TRUE | IngressRouteTCP | | ingressrouteudps | | traefik.containo.us | TRUE | IngressRouteUDP | | jobs | | batch | TRUE | Job | | leases | | coordination.k8s.io | TRUE | Lease | | limitranges | limits | | TRUE | LimitRange | | localsubjectaccessreviews | | authorization.k8s.io | TRUE | LocalSubjectAccessReview | | middlewares | | traefik.containo.us | TRUE | Middleware | | mutatingwebhookconfigurations | | admissionregistration.k8s.io | FALSE | MutatingWebhookConfiguration | | namespaces | ns | | FALSE | Namespace | | networkpolicies | netpol | extensions | TRUE | NetworkPolicy | | networkpolicies | netpol | networking.k8s.io | TRUE | NetworkPolicy | | nodes | no | | FALSE | Node | | persistentvolumeclaims | pvc | | TRUE | PersistentVolumeClaim | | persistentvolumes | pv | | FALSE | PersistentVolume | | poddisruptionbudgets | pdb | policy | TRUE | PodDisruptionBudget | | pods | po | | TRUE | Pod | | podsecuritypolicies | psp | extensions | FALSE | PodSecurityPolicy | | podsecuritypolicies | psp | policy | FALSE | PodSecurityPolicy | | podtemplates | | | TRUE | PodTemplate | | policies | pol | k8s.nginx.org | TRUE | Policy | | priorityclasses | pc | scheduling.k8s.io | FALSE | PriorityClass | | replicasets | rs | apps | TRUE | ReplicaSet | | replicasets | rs | extensions | TRUE | ReplicaSet | | replicationcontrollers | rc | | TRUE | ReplicationController | | resourcequotas | quota | | TRUE | ResourceQuota | | rolebindings | | rbac.authorization.k8s.io | TRUE | RoleBinding | | roles | | rbac.authorization.k8s.io | TRUE | Role | | runtimeclasses | | node.k8s.io | FALSE | RuntimeClass | | secrets | | | TRUE | Secret | | selfsubjectaccessreviews | | authorization.k8s.io | FALSE | SelfSubjectAccessReview | | selfsubjectrulesreviews | | authorization.k8s.io | FALSE | SelfSubjectRulesReview | | serviceaccounts | sa | | TRUE | ServiceAccount | | services | svc | | TRUE | Service | | statefulsets | sts | apps | TRUE | StatefulSet | | storageclasses | sc | storage.k8s.io | FALSE | StorageClass | | subjectaccessreviews | | authorization.k8s.io | FALSE | SubjectAccessReview | | tlsoptions | | traefik.containo.us | TRUE | TLSOption | | tlsstores | | traefik.containo.us | TRUE | TLSStore | | tokenreviews | | authentication.k8s.io | FALSE | TokenReview | | traefikservices | | traefik.containo.us | TRUE | TraefikService | | transportservers | ts | k8s.nginx.org | TRUE | TransportServer | | validatingwebhookconfigurations | | admissionregistration.k8s.io | FALSE | ValidatingWebhookConfiguration | | virtualserverroutes | vsr | k8s.nginx.org | TRUE | VirtualServerRoute | | virtualservers | vs | k8s.nginx.org | TRUE | VirtualServer | | volumeattachments | | storage.k8s.io | FALSE | VolumeAttachment | - APIGROUP 欄位用途 - 作為 apiVersion 的 group 資訊 ``` apiVersion: group/version ``` version 資訊，則要參考 [#kubectl-api-resources](#kubectl-api-resources) ### kubectl api-versions > Print the supported API versions on the server, in the form of "group/version" > 以 "group/version" 的形式，印出在伺服器上有支援的 API 版本 ``` $ kubectl api-versions admissionregistration.k8s.io/v1 admissionregistration.k8s.io/v1beta1 apiextensions.k8s.io/v1 apiextensions.k8s.io/v1beta1 apiregistration.k8s.io/v1 apiregistration.k8s.io/v1beta1 apps/v1 authentication.k8s.io/v1 authentication.k8s.io/v1beta1 authorization.k8s.io/v1 authorization.k8s.io/v1beta1 autoscaling/v1 autoscaling/v2beta1 autoscaling/v2beta2 batch/v1 batch/v1beta1 certificates.k8s.io/v1 certificates.k8s.io/v1beta1 coordination.k8s.io/v1 coordination.k8s.io/v1beta1 discovery.k8s.io/v1beta1 events.k8s.io/v1 events.k8s.io/v1beta1 extensions/v1beta1 monitoring.coreos.com/v1 monitoring.coreos.com/v1alpha1 networking.k8s.io/v1 networking.k8s.io/v1beta1 node.k8s.io/v1beta1 policy/v1beta1 rbac.authorization.k8s.io/v1 rbac.authorization.k8s.io/v1beta1 scheduling.k8s.io/v1 scheduling.k8s.io/v1beta1 storage.k8s.io/v1 storage.k8s.io/v1beta1 v1 ``` - apiVersion 的資訊，例如： - kind: Pod ``` # Pod 沒有定義 apiGroup apiVersion: v1 <-- OK apiVersion: app/v1 <-- 有錯誤 unable to recognize "xxx.yaml": no matches for kind "Pod" in version "app/v1" apiVersion: apps/v1 <-- 有錯誤 unable to recognize "xxx.yaml": no matches for kind "Pod" in version "apps/v1" ``` - kind: Deployment ``` # Deployment 有定義 apiGroup: # - apps # - extensions apiVersion: apps/v1 <-- OK apiVersion: v1 <-- 有錯誤 error: unable to recognize "xxx.yaml": no matches for kind "Deployment" in version "v1" apiVersion: app/v1 <-- 有錯誤 error: unable to recognize "xxx.yaml": no matches for kind "Deployment" in version "app/v1" ``` - kind: Ingress ``` # Ingress 有定義 apiGroup: # - extensions # - networking.k8s.io ``` - 常用資源之 apiVerison | kind | apiVerison | | ---------- | ---------- | | Pod | ```v1``` | | Service | ```v1``` | | Deployment | ```apps/v1``` | | Ingress | ```extensions/v1beta1``` ```networking.k8s.io/v1``` ```networking.k8s.io/v1beta1 ``` | - k8s 資源的 http 路徑 /api/GROUP/VERSION/namespaces/NAMESPACE/TYPE/NAME - [K8S的apiVersion该用哪个](https://segmentfault.com/a/1190000017134399) > Depoyment: > 1.6版本之前 apiVsersion：extensions/v1beta1 > 1.6版本到1.9版本之间：apps/v1beta1 > 1.9版本之后：apps/v1 ### kubectl config > Modify kubeconfig files > 修改 kubeconfig 檔案 ### kubectl config : config - ### config 使用總覽 - 查閱 config 內容 ``` ``` - 查看 K8s 目前使用的 config ```bash $ kubectl config view apiVersion: v1 clusters: - cluster: certificate-authority-data: DATA+OMITTED server: https://(__ip_commited__):80 name: kubernetes contexts: - context: cluster: kubernetes namespace: default user: kubernetes-admin name: kubernetes-admin@kubernetes current-context: kubernetes-admin@kubernetes kind: Config preferences: {} users: - name: kubernetes-admin user: client-certificate-data: REDACTED client-key-data: REDACTED ``` 上面的 config，等同於 ```~/.kube/config``` ```bash $ cat ~/.kube/config apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRU...== server: https://103.124.74.126:80 name: kubernetes contexts: ... ``` - **如何檢視 certificate-authority-data 內容？** ```bash # 將值帶入，進行解碼 $ echo "LS0tLS1CRU...==" | base64 -d -----BEGIN CERTIFICATE----- MIIC5zCCAc+gAwIBAgIBADANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwprdWJl ... ... /HuLNU2H+030zdxRvmLlSakLU2dBO2dGkJOj -----END CERTIFICATE----- ``` - **如何檢視 client-certificate-data 內容？** ```bash # 將值帶入，進行解碼 $ echo "LS0tLS1CRU...=" | base64 -d -----BEGIN CERTIFICATE----- ... ... -----END CERTIFICATE----- ``` - **如何檢視 client-key-data 內容？** ```bash # 將值帶入，進行解碼 $ echo "LS0tLS1CRU...=" | base64 -d -----BEGIN RSA PRIVATE KEY----- ... ... -----END RSA PRIVATE KEY----- ``` ### kubectl config : config 應用 : 遠端操控 k8s - 先取得目標 k8s 的 config 檔 - config 檔來源：目標主機的 ~/.kube/config - 將 config 檔複製到 local 端 - #### 方法一：執行的指令，每次加上 ```--kubeconfig=esc4000-config``` ```bash= $ kubectl cluster-info --kubeconfig=esc4000-config $ kubectl --kubeconfig=esc4000-config get ns $ kubectl --kubeconfig=esc4000-config get pods $ kubectl --kubeconfig=esc4000-config get svc -o wide ``` - #### 方法二：修改 local 端的 config 連接（影響範圍：全域） ```bash= # 先切到 .kube 目錄下操作 ~$ cd ~/.kube/ ~/.kube$ mv config local-config ~/.kube$ ln -s esc4000-config config ~/.kube$ kubectl cluster-info ``` - #### [方法三：設定環境變數](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/)（影響範圍：當下的 terminal） ![](https://i.imgur.com/j5TDhmW.png) ```bash= # 測試前狀態 $ kubectl cluster-info $ kubectl get ns # 將 config 指到目標 k8s 節點 $ export KUBECONFIG=~/.kube/esc4000-config # 測試後狀態 $ kubectl cluster-info $ kubectl get ns ``` ### kubectl config : cluster - ### kubectl config : context - ### context 使用總覽 - 查閱 context 清單 ```kubectl config get-contexts``` - 建立 context ```kubectl config set-context your_context_name``` - 切換 context ```kubectl config use-context your_context_name``` - 刪出 context ```kubectl config delete-context your_context_name``` - ### [顯示一個或多個 K8s 情境](https://www.digitalocean.com/community/tutorials/how-to-install-software-on-kubernetes-clusters-with-the-helm-3-package-manager) > Describe one or many contexts ```bash= $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kind-tj-cluster kind-tj-cluster kind-tj-cluster kubernetes-admin@kubernetes kubernetes kubernetes-admin ``` - 要切換情境，使用 ``` $ kubectl config use-context context-name ``` - ### 在 context 中，切換不同的 namespace ```bash= $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kind-tj-cluster kind-tj-cluster kind-tj-cluster kubernetes-admin@kubernetes kubernetes kubernetes-admin # kubectl config set-context \ # $(kubectl config current-context) \ # --namespace=newspace $ kubectl config set-context "kind-tj-cluster" --namespace=helm-lab Context "kind-tj-cluster" modified. $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kind-tj-cluster kind-tj-cluster kind-tj-cluster helm-lab kubernetes-admin@kubernetes kubernetes kubernetes-admin `````` - context 操作完整範例：查詢 -> **建立** -> 查詢 -> **切換** -> 查詢 -> **刪除** -> 查詢 ```bash # [查詢] # 目前使用 context 的情境 $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kubernetes-admin@kubernetes kubernetes kubernetes-admin default # [查詢] -> [建立] # 建立新的情境：tj-context $ kubectl config set-context tj-context Context "tj-context" created. # [查詢] -> [建立] -> [查詢] # 查看 context 情境清單，會發現多出 tj-context $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kubernetes-admin@kubernetes kubernetes kubernetes-admin default tj-context # 目前使用 context 的情境，依舊是舊的 $ kubectl config current-context kubernetes-admin@kubernetes # [查詢] -> [建立] -> [查詢] -> [切換] # 切換 context 的情境：-> tj-context $ kubectl config use-context tj-context Switched to context "tj-context". # [查詢] -> [建立] -> [查詢] -> [切換] -> [查詢] # 查看 context 情境清單 $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin default * tj-context # 目前使用 context 的情境，是新的 $ kubectl config current-context tj-context # [查詢] -> [建立] -> [查詢] -> [切換] -> [查詢] -> [刪除] # 若沒有切換，就直接當下的 context，會發生何事？ $ kubectl config delete-context tj-context warning: this removed your active context, use "kubectl config use-context" to select a different one deleted context tj-context from /home/ubuntu/.kube/config # 查看 context 情境清單 # 會發現 context 情境不會自動切換到剩下來的唯一 context $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE kubernetes-admin@kubernetes kubernetes kubernetes-admin default # 目前使用 context 的情境 # 會發現目前的 context 情境還是停留在已經刪除的 context 上 $ kubectl config current-context tj-context # [查詢] -> [建立] -> [查詢] -> [切換] -> [查詢] -> [刪除] -> [重設] # 切換 context 的情境：-> kubernetes-admin@kubernetes $ kubectl config use-context kubernetes-admin@kubernetes Switched to context "kubernetes-admin@kubernetes". # 查看 context 情境清單 $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * kubernetes-admin@kubernetes kubernetes kubernetes-admin default # 目前使用 context 的情境 $ kubectl config current-context kubernetes-admin@kubernetes ``` ### kubectl reset > reset > - 移除整個 k8s > - Performs a best effort revert of changes made to this host by 'kubeadm init' or 'kubeadm join' > 盡最大努力，還原由 "kubeadm init" 或 "kubeadm join" 對此主機所做的更改 #reset #remove #delete #kubernetes #permanently #completely ``` $ sudo kubeadm reset [sudo] password for diatango_lin: [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. [reset] Are you sure you want to proceed? [y/N]: y [preflight] Running pre-flight checks [reset] Removing info for node "stage-kube01" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace W0527 14:09:23.988917 22021 removeetcdmember.go:61] [reset] failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints .Please manually remove this etcd member using etcdctl [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" W0527 14:09:24.041611 22021 unmount_linux.go:42] [reset] Failed to unmount mounted directory in /var/lib/kubelet: /var/lib/kubelet W0527 14:09:24.069158 22021 unmount_linux.go:42] [reset] Failed to unmount mounted directory in /var/lib/kubelet: /var/lib/kubelet/pods/1b2b800f-d135-401b-9c62-b97b1b1a300b/volumes/kubernetes.io~secret/kube-proxy-token-cnqdn [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni] The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. ``` 有些後面會有說明，要你手動移除： ```= ... The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. ``` - 手動移除 ``` $ ls /etc/cni/net.d $ cat /etc/cni/net.d/10-flannel.conflist $ sudo rm /etc/cni/net.d/10-flannel.conflist ``` ``` $ rm ~/.kube/config ``` - 參考資料 - [How to completely uninstall kubernetes](https://stackoverflow.com/questions/44698283) - 有些指令用法可以參考 - [Kubernetes Up And Running](https://fixes.co.za/kubernetes/kubernetes-up-and-running/) :+1: ??? <hr> # [查詢] 指令代碼 / 參數代碼 / 文件 ## kubelet 指令太長，用 alias 縮短 ```bash= # 設定簡短指令 (指令別名) $ alias ku=kubectl # 測試 ku 指令 $ ku version # 移除指令別名 $ unalias ku ``` - 參考資料 - [kubectl 額外補充](https://medium.com/@C.W.Hu/kubernetes-helm-chart-tutorial-fbdad62a8b61) - [How do I remove an alias?](https://askubuntu.com/questions/325368/how-do-i-remove-an-alias) ## 指令代碼、參數代碼 ### 指令代碼 - 參考 [#kubectl-api-resources](#kubectl-api-resources) ```bash= $ kubectl api-resources ``` - pods 可以輸入 pod, po - services 可以輸入 service, svc，但不能是 servic (少一個e) - 參考資料 - [kubectl 額外補充](https://medium.com/@C.W.Hu/kubernetes-helm-chart-tutorial-fbdad62a8b61) - [K8S的apiVersion该用哪个](https://segmentfault.com/a/1190000017134399) ```bash $ kubectl api-resources $ kubectl api-versions ``` ### 參數代碼 - ```--all-namespaces``` 可以輸入 ```-A``` ```kubectl get -h``` 可問到 ## kubectl 的 auto-complete - 沒有試成功 - 參考資料 - [kubectl 額外補充](https://medium.com/@C.W.Hu/kubernetes-helm-chart-tutorial-fbdad62a8b61) ## 常用屬性查詢，如 container - [Kubernetes API Reference Docs](https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#container-v1-core) - name - image - imagePullPolicy - [ports[]](https://v1-18.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#containerport-v1-core) - name - containerPort - protocol - [k8s.io.api.core.v1.Container](https://cloud.google.com/run/docs/reference/rpc/k8s.io.api.core.v1#k8s.io.api.core.v1.Container) <hr> # Debug ## 查 pods 狀態 ### 相關指令 - ### 列出 pods 清單 ``` $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tj-pod-kuard 1/1 Running 0 47m 10.244.1.8 tj-cluster-worker2 <none> <none> ``` - 參數 ```--all-namespaces``` 列出所有 ns 底下的 pods - 10.244.1.8 為 pod IP - ### 查詢 pods 狀態 ``` $ kubectl describe pod tj-pod-test ``` - ### 若 pod 有問題，就要查 container ```bash $ docker exec -it your_container_id /bin/bash $ curl -X get 127.0.0.1:9999 # testing port ``` 或是 ```bash $ kubectl exec -it podName bash ``` - from Diatango: - 透過 kubectl 是比較正統的方式 - 因為你不一定會在那座 k8s cluster 環境裡 - k3s 的 container 引擎用 containerd 沒有 docker 指令可以 debug 只能使用 kubectl exec -it ... ### pod 狀態 - ### ContainerCreating - Question 如果 pod 處於 ContainerCreating ，表示什麼？ - ChatGPT 當一個 Pod 的狀態顯示為 "ContainerCreating" 時，這表示該 Pod 目前正在建立中，Kubernetes 正在嘗試在該 Pod 中建立一個或多個容器。這是一個正常的 Pod 建立過程的一部分。以下是一些可能導致 "ContainerCreating" 狀態的常見情況： - **容器映像下載**：Kubernetes 需要下載 Pod 中指定容器的映像（image）文件。如果映像尚未緩存在節點上，這可能需要一些時間，具體時間取決於映像的大小和網絡速度。 - **容器初始化**：一旦映像下載完畢，Kubernetes 開始初始化容器。這可能包括設定環境變數、掛載存儲等操作。 - **容器調度**：如果您的 Kubernetes 集群是多節點的，Pod 還需要根據調度策略在適當的節點上啟動。這可能需要一些時間，特別是在繁忙的集群中。在 **"ContainerCreating"** 狀態下，Pod 尚未完全就緒，無法提供正常的服務。當 Pod 的狀態變為 **"Running"** 時，表示該 Pod 中的容器已經成功建力並運行，可以提供服務。您可以使用以下命令檢查 Pod 的詳細信息，以瞭解 Pod 建力的進度和可能的問題： ```bash kubectl describe pod <pod-name> ``` 此命令將顯示有關 Pod 的詳細信息，包括容器的當前狀態、事件和問題描述，以幫助您診斷問題。 - ### Back-off restarting failed container inference-server in pod - **Question** 如果 pod 的訊息顯示： Back-off restarting failed container inference-server in pod 又表示什麼？ - **ChatGPT** 當 Pod 的訊息顯示 **"Back-off restarting failed container"** 時，這通常表示容器內部出現了錯誤，並且 Kubernetes 正在嘗試重啟該容器。這是一個反覆執行嘗試以確保容器正常運行的過程。具體而言，"Back-off" 表示 Kubernetes 已經嘗試多次重啟容器，但容器仍然無法成功運行。這可能是由於應用程序錯誤、容器配置問題、映像問題等多種原因引起的。Kubernetes 會自動採取這種 "Back-off" 策略，以確保不斷嘗試重啟容器，直到達到最大重試次數（通常是一個限制，例如 5 次）或容器成功運行為止。要確定造成 **"Back-off restarting failed container"** 的確切原因，您可以執行以下操作： 1. 使用 `kubectl describe pod <pod-name>` 命令來查看 Pod 的詳細信息，尋找容器重啟的錯誤消息和事件。這些信息可能會幫助您確定問題所在。 2. 使用 `kubectl logs <pod-name> -c <container-name>` 命令來檢查容器的日誌，以查看容器內部的錯誤訊息和記錄，這可以提供更多有關問題的上下文。 3. 確保容器的配置和設定正確，並檢查是否需要任何環境變數、配置文件或依賴關係。 4. 檢查容器的映像是否可用，並確保它是正確的版本。根據這些信息，您可以更容易地診斷問題並採取適當的措施，以確保容器能夠正常運行。 ## 查 service 狀態 - 列出 service 清單 ``` $ kubectl get service -o wide --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 4d21h <none> helm-lab first-pod-service NodePort 10.107.84.42 <none> 80:30080/TCP 50m app=webserver kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 4d21h k8s-app=kube-dns ``` - ```-o wide``` 可以印出 ```SELECTOR``` 欄位 - service IP 既為 cluster IP - internal IP 既為 node IP - 列出 service spec ```bash $ kubectl get service first-pod-service -o json { "apiVersion": "v1", "kind": "Service", "metadata": {...}, "spec": {...}, "status": {...} } ``` - 取出 spec 資訊 ```bash $ kubectl get service first-pod-service -o jsonpath="{.spec}"| python3 -m json.tool ``` - 取出 nodePort 資訊 ```bash $ kubectl get service first-pod-service -o jsonpath="{.spec.ports[0].nodePort}" ``` ## 查 node 狀態 - 列出 node 清單 ``` $ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME tj-cluster-control-plane Ready master 4d21h v1.18.2 172.18.0.2 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 tj-cluster-worker Ready <none> 4d21h v1.18.2 172.18.0.4 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 tj-cluster-worker2 Ready <none> 4d21h v1.18.2 172.18.0.3 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 ``` - 172.18.0.3 為 Node:tj-cluster-worker2 的 node IP - 172.18.0.4 為 Node:tj-cluster-worker 的 node IP - node IP 就是 internal IP - 取出 node 的 address (INTERNAL-IP) ``` $ kubelet get nodes -o jsonpath="{.items[0].status.addresses[0]}" | python3 -m json.tool ``` ## 查 pod, container, service, node IP - ### 查 pod IP: ==10.244.1.8== ``` $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tj-pod-kuard 1/1 Running 0 56m 10.244.1.8 tj-cluster-worker2 <none> <none> ``` - **port-forward** ``` # localhost:9090 -> pod/tj-pod-kuard:8080 $ kubectl port-forward pod/tj-pod-kuard 9090:8080 # 測試 curl 127.0.0.1:9090 ``` - ### 查 container IP: ==10.244.1.8== ``` $ kubectl exec -it tj-pod-kuard -- /bin/sh # 或是 $ kubectl exec -it tj-pod-nginx -- /bin/bash $ ifconfig ... inet addr:10.244.1.8 ... ``` - ### 查 service IP (cluster IP): ==10.107.84.42== (Port:80) ``` $ kubectl get service -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR first-pod-service NodePort 10.107.84.42 <none> 80:30080/TCP 60m app=webserver ``` - service IP = cluster IP - **port-forward** ``` # localhost:9091 -> service/first-pod-service:80 $ kubectl port-forward service/first-pod-service 9091:80 # 測試 curl 127.0.0.1:9091 ``` - ### 查 node IP (internal IP): > (上述 ```ServicePort:NodePort = 80:30080```) ```bash= # ESC4000 (10.78.26.241) $ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME stage-kube01 Ready master 8d v1.15.4 10.78.26.241 <none> Ubuntu 16.04.7 LTS 4.12.14-041214-generic docker://19.3.12 # node IP = 10.78.26.241 (NodePort:30080) ``` ```bash= # TWCC VM $ kubectl get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME tj-cluster-control-plane Ready master 4d22h v1.18.2 172.18.0.2 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 tj-cluster-worker Ready <none> 4d22h v1.18.2 172.18.0.4 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 tj-cluster-worker2 Ready <none> 4d22h v1.18.2 172.18.0.3 <none> Ubuntu 19.10 4.15.0-72-generic containerd://1.3.3-14-g449e9269 # node IP = 172.18.0.3 (NodePort:30080) # 測試：curl 172.18.0.3:30080 (預期要能拿到資料) ``` - node IP = internal IP - ### 查 external IP ``` # 如果當前環境，就是 K8s 節點 $ curl ifconfig.me 123.51.152.16 # ESC4000 ``` ``` # 如果當前環境，是透過 container 架 K8s 節點，要先連到該 container $ docker exec -it tj-cluster-worker2 bash root@tj-cluster-worker2:/# curl ifconfig.me 203.145.218.3 ``` - external IP = 203.145.218.3 - ### 概念1：[External IP / Cluster IP / Pod IP](https://tachingchen.com/tw/blog/kubernetes-service-in-detail-1/) ![](https://i.imgur.com/kxYK1jC.png) - ### 概念2：[Kubernetes – Service Publishing](https://theithollow.com/2019/02/05/kubernetes-service-publishing/) :::success :memo: **重點整理：詞彙關係圖** - node IP = internal IP (外部存取 node 的方式) - service IP = cluster IP (節點存取 service 的方式) port (就是 service port) - targetPort = podPort = containerPort ::: ![](https://i.imgur.com/aPUR2OT.png) ## debug 技術指引路線圖 (roadmap) ### [A visual guide on troubleshooting Kubernetes deployments](https://learnk8s.io/troubleshooting-deployments) [![](https://i.imgur.com/2H6Uk2U.png)](https://learnk8s.io/a/f65ffe9f61de0f4a417f7a05306edd4c.png) - [如何 debug K8s] [超長視覺化指南！帶你理清K8S部署的故障排查思路，讓bug無處遁形](https://www.itread01.com/content/1582099687.html) ### [How To Inspect Kubernetes Networking](https://www.digitalocean.com/community/tutorials/how-to-inspect-kubernetes-networking) ### [[K8s] Monitoring, Logging, and Debugging](https://kubernetes.io/docs/tasks/debug-application-cluster/) - [Debug Init Containers](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-init-containers/) - [Debug Running Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/) - [Debug Services](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/) <hr> ## 資源傾印格式化 (son, jsonpath) > 輸出資源配置的 foramt 格式 ### kubectl get res_type res_name -o [ yaml | json | jsonpath ] > 查看資源的配置檔，是否有誤 - ### 查看「完整」的資源配置檔 ```bash $ kubectl get pod my-pod -o json { "apiVersion": "v1", "kind": "Pod", "metadata": {...}, "spec": {...}, "status": {...} } ``` 上述內容，完全同底下操作： ``` $ kubectl get pod my-pod -o jsonpath="{@}" | python3 -m json.tool ``` - ### 查看「部份」的資源配置檔 - 查看資源的 apiVersion ```bash # 查看資源的 $ kubectl get pod my-pod -o jsonpath="{.apiVersion}" v1 # 或是 $ kubectl get pod my-pod -o json | jq '.apiVersion' "v1" ``` - jq: [JSON processor](http://manpages.ubuntu.com/manpages/trusty/man1/jq.1.html) - 查看資源的 kind ```bash $ kubectl get pod my-pod -o jsonpath="{.kind}" Pod # 或是 $ kubectl get pod my-pod -o json | jq '.kind' Pod ``` - 查看資源的 status ```bash $ kubectl get pod my-pod -o jsonpath="{.status}" | python3 -m json.tool { "conditions": [...], "containerStatuses": [...], "hostIP": "192.168.34.17", "phase": "Running", "podIP": "10.244.1.160", "podIPs": [...], "qosClass": "BestEffort", "startTime": "2020-12-15T08:22:11Z" } # 或是 $ kubectl get pod my-pod -o json | jq '.status' ``` - 查看資源的 status.phase ```bash $ kubectl get pod my-pod \ -o jsonpath="{.status.phase}" Running ``` 上述指令，同底下操作 ```bash $ kubectl get pod my-pod \ -o jsonpath="{range .status}{.phase}{.end}" Running ``` - ```{range object}{.field}{end}``` - 以這個 object 為參考範圍，印出 object 底下的欄位 (i.e. object.field) - ```{range object}``` 以當前這個物件，當作參考範圍 - ```{.field}``` 取出當前的物件的屬性內容 - 查看資源的 status.phase & hostIP & startTime ```bash $ kubectl get pod my-pod \ -o jsonpath="{range .status}{.phase}, {.hostIP}, {.startTime}{.end}" Running, 192.168.34.17, 2020-12-15T08:22:11Z, , ``` - ### [[官網] JSONPath Support](https://kubernetes.io/docs/reference/kubectl/jsonpath/) - 列出所有 service 的 spec ``` $ kubectl get service -o jsonpath="{range .items[*]}{.spec}{end}" | python3 -m json.tool { "clusterIP": "10.96.0.1", "ports": [ { "name": "https", "port": 443, "protocol": "TCP", "targetPort": 6443 } ], "sessionAffinity": "None", "type": "ClusterIP" } ``` - ```{range object}{.field}{end}``` - 以這個 object 為參考範圍，印出 object 底下的欄位 (i.e. object.field) :::info :information_source: **對照 docker 的用法，比較看看差異性** - [Docker格式化输出命令:"docker inspect --format" 学习笔记](https://www.cnblogs.com/kevingrace/p/6424476.html) - 查看 volume 掛載 [![](https://i.imgur.com/sSv9hK4.png)](https://i.imgur.com/sSv9hK4.png) ``` $ docker inspect 757e453769d2 --format="{{json .Mounts }}" | jq ``` ::: <hr> # [物件] 各式物件 ## spec.containers[].resources - [Kubernetes — Resources Limit (資源限制) | by Ray Lee | 李宗叡 | Learn or Die | Medium](https://medium.com/learn-or-die/803bac05c061) - APIs `spec.containers[].resources.limits.cpu` `spec.containers[].resources.limits.memory` `spec.containers[].resources.requests.cpu` `spec.containers[].resources.requests.memory` - units - 1 AWS vCPU - 1 GCP Core - 1 Azure vCore - 1 IBM vCPU ## 儲存, Volume ### 建立 empty volume ```yaml= apiVersion: v1 kind: Pod metadata: name: empty-volume-pod spec: containers: - name: nginx-container image: nginx volumeMounts: - name: empty-volume mountPath: /tmp/config volumes: - name: empty-volume emptyDir: {} ``` - [參考資料](https://ithelp.ithome.com.tw/articles/10193546) ### 建立 hostPath volume ```yaml= piVersion: v1 kind: Pod metadata: name: hostpath-volume-pod spec: containers: - name: nginx-container image: nginx volumeMounts: - name: hostpath-volume mountPath: /tmp/config volumes: - name: hostpath-volume hostPath: path: /home/tj/k8s/hostpath ``` - 備註： - host 端會自動建立出 `/home/tj/k8s/hostpath` 路徑 - 在建立 pod 之後，該路徑就會自動建立出 - [參考資料](https://ithelp.ithome.com.tw/articles/10193546) ### 參考資料 - [Day 15 - 別再遺失資料了：Volume (2)](https://ithelp.ithome.com.tw/articles/10193550) ```yaml= # pv.yaml --- apiVersion: v1 kind: PersistentVolume <=== 指定物件種類為 PV metadata: name: pv001 <=== PV 名稱 spec: capacity: storage: 2Gi <=== 指定大小 accessModes: - ReadWriteOnce <=== 指定存取模式 hostPath: <=== 綁定在 host 的 /tmp 目錄 path: /tmp ``` ```yaml= # pvc.yaml --- apiVersion: v1 kind: Pod <=== 使用一個 Pod 並試著掛載 Volume metadata: name: pvc-nginx spec: containers: - name: nginx image: nginx ports: - containerPort: 80 volumeMounts: <=== 將名為 volume-pv 的 Volume 掛載到 /usr/share/nginx/html 目錄底下 - name: volume-pv mountPath: /usr/share/nginx/html volumes: - name: volume-pv <=== 宣告一個名為 volume-pv 的 Volume 物件 persistentVolumeClaim: <=== 綁定名為 pv-claim 的 PVC 物件 claimName: pv-claim --- kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pv-claim spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi <=== 要求 1G 容量 ``` ## RBAC ![](https://i.imgur.com/4fUO10C.png) (來源：[Kubernetes：建置與執行(第二版)](https://www.books.com.tw/products/0010860536)) - `kubectl get clusterroles` 取得叢集角色 - pods-and-services-role.yaml ```yaml= apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: default name: pods-and-services rules: - apiGroups: [""] resources: ["pods", "services"] verbs: ["create", "delete", "get", "list", "patch", "update", "watch"] ``` - `pods-and-services-rolebinding.yaml` ```yaml= apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: namespace: default name: pods-and-services-rolebinding subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: tj - apiGroup: rbac.authorization.k8s.io kind: Group name: appDevs roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: pods-and-services-role ``` <hr> # 情境綜合運用 ## 指令運用 ### 查看各節點的 GPU 張數與可用張數 - `cur_gpu_status.sh` `#!/bin/bash kubectl describe nodes -l nvidia.com/gpu.present=true | tr -d '\000' | sed -n -e '/^Name/,/Roles/p' -e '/^Capacity/,/Allocatable/p' -e '/^Allocated resources/,/Events/p' | grep -e Name -e nvidia.com | perl -pe 's/\n//' | perl -pe 's/Name:/\n/g' | sed 's/nvidia.com\/gpu:\?//g' | sed '1s/^/Node Available(GPUs) Used(GPUs)/' | sed 's/$/ 0 0 0/' | awk '{print $1, $2, $3}' | column -t | grep "infer"` <hr> # [App] 應用程式範例 ## Notebook ### Elyra - Docker: ```bash= $ docker run --rm -it \ -v `pwd`:/workspace \ -w /workspace \ -p 38888:8888 \ elyra/elyra:latest jupyter lab --debug ``` - Yaml: `setup-elyra.yaml` ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: elyra-deployment labels: app: elyra spec: replicas: 1 selector: matchLabels: app: elyra template: metadata: labels: app: elyra spec: containers: - name: elyra image: elyra/elyra:latest ports: - containerPort: 8888 ```