Kubeadm 建立 k8s cluster

# Kubeadm 建立 k8s cluster ###### tags: `k8s` `CKA` [toc] ## 目標在 Oracle cloud(OCI) 中建立 multi-control-plane 的 k8s cluster(1.26) ## 準備工作 - Virtual Cloud Network(VCN) - 1 - Subnet - 1 - Load Balancer(VM) - 1 - Control Plane(VM) - 3 - Worker Node(VM) - 2 ## 步驟流程 ### 1. 建立機器資訊 - Load Balancer - cpu 2 core - memory 16G - 10.0.4.55 - Control Plane - cpu 2 core - memory 16G - 10.0.4.125, 10.0.4.191, 10.0.4.205 - Worker Node - cpu 1 core - memory 8G - 10.0.4.174, 10.0.4.52 ### 2. 設定 Load Balancer 使用 Ha Proxy 建立 10.0.4.55 #### 2-1 建立憑證安裝 cfssl ```bash= # Download $ wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 $ wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 # add permission $ chmod +x cfssl* # move to /usr/local/bin $ sudo mv cfssl_linux-amd64 /usr/local/bin/cfssl $ sudo mv cfssljson_linux-amd64 /usr/local/bin/cfssljson # check $ cfssl version ``` 設定憑證 ```bash= # Create the certificate authority configuration file $ vim ca-config.json { "signing": { "default": { "expiry": "8760h" }, "profiles": { "kubernetes": { "usages": ["signing", "key encipherment", "server auth", "client auth"], "expiry": "8760h" } } } } # Create the certificate authority signing request configuration file $ vim ca-csr.json { "CN": "Kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "IE", "L": "Cork", "O": "Kubernetes", "OU": "CA", "ST": "Cork Co." } ] } # Generate the certificate authority certificate and private key $ cfssl gencert -initca ca-csr.json | cfssljson -bare ca # certificate for the Etcd cluster $ vim kubernetes-csr.json { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "IE", "L": "Cork", "O": "Kubernetes", "OU": "Kubernetes", "ST": "Cork Co." } ] } # Generate the certificate and private key $ cfssl gencert \ -ca=ca.pem \ -ca-key=ca-key.pem \ -config=ca-config.json \ -hostname=10.0.4.125,10.0.4.191,10.0.4.205,10.0.4.55,127.0.0.1,kubernetes.default \ -profile=kubernetes kubernetes-csr.json | \ cfssljson -bare kubernetes # copy CA files to all nodes scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.205:~ scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.191:~ scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.125:~ scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.174:~ scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.52:~ ``` #### 2-2 建立 HA Proxy 安裝 HA Proxy ```bash= # 安裝 $ sudo apt-get update $ sudo apt-get install haproxy # 修改 config $ sudo vim /etc/haproxy/haproxy.cfg global ... default ... # 加入下面兩個 section frontend kubernetes bind 10.0.4.55:6443 # load balancer ip option tcplog mode tcp default_backend kubernetes-master-nodes backend kubernetes-master-nodes mode tcp balance roundrobin option tcp-check # control plane ip server k8s-master-0 10.0.4.125:6443 check fall 3 rise 2 server k8s-master-1 10.0.4.191:6443 check fall 3 rise 2 server k8s-master-2 10.0.4.205:6443 check fall 3 rise 2 # restart HA Proxy $ sudo systemctl restart haproxy ``` ### 3. 設定 Nodes (共通) Following [[1]](https://hackmd.io/@CNCF-meetup/ryAO-wKg3) to install CRI, kubectl, kubeadm, kubelet ### 4. 設定 Control Plane 10.0.4.125, 10.0.4.191, 10.0.4.205 #### 4-1 建立 Etcd Cluster ```bash= # Create etcd config directory $ sudo mkdir /etc/etcd /var/lib/etcd # 把前面建好的 ca pem file 搬進去 $ sudo mv ~/ca.pem ~/kubernetes.pem ~/kubernetes-key.pem /etc/etcd # Setting etcd binary $ wget https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz $ tar xvzf etcd-v3.3.13-linux-amd64.tar.gz $ sudo mv etcd-v3.3.13-linux-amd64/etcd* /usr/local/bin/ # Create service file $ sudo vim /etc/systemd/system/etcd.service [Unit] Description=etcd Documentation=https://github.com/coreos [Service] ExecStart=/usr/local/bin/etcd \ --name $SERVER_IP \ --cert-file=/etc/etcd/kubernetes.pem \ --key-file=/etc/etcd/kubernetes-key.pem \ --peer-cert-file=/etc/etcd/kubernetes.pem \ --peer-key-file=/etc/etcd/kubernetes-key.pem \ --trusted-ca-file=/etc/etcd/ca.pem \ --peer-trusted-ca-file=/etc/etcd/ca.pem \ --peer-client-cert-auth \ --client-cert-auth \ --initial-advertise-peer-urls https://$SERVER_IP:2380 \ --listen-peer-urls https://$SERVER_IP:2380 \ --listen-client-urls https://$SERVER_IP:2379,http://127.0.0.1:2379 \ --advertise-client-urls https://$SERVER_IP:2379 \ --initial-cluster-token etcd-cluster-0 \ --initial-cluster 10.0.4.205=https://10.0.4.205:2380,10.0.4.191=https://10.0.4.191:2380,10.0.4.125=https://10.0.4.125:2380 \ --initial-cluster-state new \ --data-dir=/var/lib/etcd Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target # Start etcd Service $ sudo systemctl daemon-reload $ sudo systemctl enable etcd $ sudo systemctl start etcd # Check etcd cluster $ ETCDCTL_API=3 etcdctl member list # Output 會像這樣 61a2630562d869a9, started, 10.0.4.125, https://10.0.4.125:2380, https://10.0.4.125:2379 bb5ee5165948fea8, started, 10.0.4.205, https://10.0.4.205:2380, https://10.0.4.205:2379 ca0549b24a0f415a, started, 10.0.4.191, https://10.0.4.191:2380, https://10.0.4.191:2379 ``` #### 4-2 啟動 Control Plane 分三部分，第一部分為共通執行，第二部分為第一台 Control Plane 執行，第三部分在剩餘 Control Plane 上執行 ##### 4-2-1 共通準備這裡要注意使用的 k8s 版本會讓 apiVersion 有所不同，這裡的 k8s 是 1.26，所以 apiVersion 要使用 v1beta3 ```bash= $ vim config.yaml apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration kubernetesVersion: stable apiServerCertSANs: - 10.0.4.55 controlPlaneEndpoint: "10.0.4.55:6443" etcd: external: endpoints: - https://10.0.4.191:2379 - https://10.0.4.205:2379 - https://10.0.4.125:2379 caFile: /etc/etcd/ca.pem certFile: /etc/etcd/kubernetes.pem keyFile: /etc/etcd/kubernetes-key.pem networking: podSubnet: 172.16.0.0/16 apiServerExtraArgs: apiserver-count: "3" ``` ##### 4-2-2 First Control Plane ```bash= # kubeadm 啟動 $ sudo kubeadm init --config=config.yaml # 將產生的 pki cp 到其他 control plane sudo scp -i ssh-key -r /etc/kubernetes/pki ubuntu@10.0.4.191:~ sudo scp -i ssh-key -r /etc/kubernetes/pki ubuntu@10.0.4.125:~ ``` ##### 4-2-3 Other Control Plane ```bash= # 調整移動 pki files $ rm ~/pki/apiserver.* $ sudo mv ~/pki /etc/kubernetes/ # kubeadm 啟動 $ sudo kubeadm init --config=config.yaml ``` 都設定好後應該會取得各自 kubeadm join 的指令，拿其中一個到 Worker Node 執行就好 kubeconfig 可以在 `/etc/kubernetes/admin.conf` 找到，再自行搬運即可 ### 5. 設定 Worker Node 10.0.4.174, 10.0.4.52 ```bash= $ sudo kubeadm join 10.0.4.55:6443 --token $TOKEN --discovery-token-ca-cert-hash sha256:$HASH_TOKEN ``` ### 6. 建立 CNI Following [[4]](https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises) to apply calico 記得修改 yaml 中的 pod CIDR 為 `172.16.0.0/16` 同時如果 network interface 有不同的也要在 `CLUSTER_TYPE` 下面加一個 Variable ```yaml - name: IP_AUTODETECTION_METHOD value: "interface=ens3" ``` #### 6-1 建立 Calico 的踩雷過程 calico node 呈現 not ready ```bash= $ kubectl get po -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5857bf8d58-vtfcw 1/1 Running 0 13h calico-node-4wqfk 0/1 Running 0 13h calico-node-7qwm5 0/1 Running 0 13h calico-node-c8x2t 0/1 Running 0 13h calico-node-ljnhc 0/1 Running 0 13h calico-node-q5pqt 0/1 Running 0 13h ``` 從 log 中有點難看懂 ```log= 2023-04-27 02:20:00.283 [INFO][73] monitor-addresses/autodetection_methods.go 117: Using autodetected IPv4 address 10.0.4.205/24 on matching interface ens3 2023-04-27 02:20:13.953 [INFO][72] felix/summary.go 100: Summarising 13 dataplane reconciliation loops over 1m8.3s: avg=2ms longest=5ms () 2023-04-27 02:20:37.520 [INFO][72] felix/int_dataplane.go 1693: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"instance-20230321-danny-worker02" ipv4_addr:"10.0.4.52/24" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"instance-20230321-danny-worker02" > labels:<key:"kubernetes.io/os" value:"linux" > ``` 從 event 中也看不太出來，只知道是 probe 沒過 ```bash= $ kubectl get event -n kube-system LAST SEEN TYPE REASON OBJECT MESSAGE 3m48s Warning Unhealthy pod/calico-node-4wqfk (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.219 [INFO][159156] confd/health.go 180: Number of node(s) with BGP peering established = 0... 3m48s Warning Unhealthy pod/calico-node-7qwm5 (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.246 [INFO][158624] confd/health.go 180: Number of node(s) with BGP peering established = 0... 3m48s Warning Unhealthy pod/calico-node-c8x2t (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.258 [INFO][144156] confd/health.go 180: Number of node(s) with BGP peering established = 0... 3m48s Warning Unhealthy pod/calico-node-ljnhc (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.305 [INFO][143449] confd/health.go 180: Number of node(s) with BGP peering established = 0... 3m48s Warning Unhealthy pod/calico-node-q5pqt (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.266 [INFO][143592] confd/health.go 180: Number of node(s) with BGP peering established = 0... ``` 用 describe 看一下裡面的 event 會不會寫什麼 ```bash= $ kubectl describe po calico-node-4wqfk -n kube-system Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Unhealthy 65s (x5564 over 13h) kubelet (combined from similar events): Readiness probe failed: 2023-04-27 03:00:51.232 [INFO][160142] confd/health.go 180: Number of node(s) with BGP peering established = 0 calico/node is not ready: BIRD is not ready: BGP not established with 10.0.4.191,10.0.4.174,10.0.4.52,10.0.4.125 ``` 發現 BGP peering 沒有成功建立，參考 [[5]](https://inf.news/technique/5297caf1aee055a7eeef8a636b5673d5.html) 確認網路 ```bash= $ netstat -antp|grep bird tcp 0 0 0.0.0.0:179 0.0.0.0:* LISTEN 83544/bird tcp 0 1 10.0.4.205:59875 10.0.4.191:179 SYN_SENT 83544/bird tcp 0 1 10.0.4.205:35625 10.0.4.174:179 SYN_SENT 83544/bird tcp 0 1 10.0.4.205:46797 10.0.4.125:179 SYN_SENT 83544/bird ``` 從這邊發現有一個 179 的 port 連線資訊，但在一開始建 node 的時候就有下過 `iptables -F`，照理說應該不會有連線被擋的問題，所以突然想到是 VCN 裡面的 security list 沒有設定白名單 ![](https://i.imgur.com/Y1fHh6A.png) 設定好之後就一切正常了 ```bash= $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME instance-20230321-danny-master Ready control-plane 19h v1.26.3 10.0.4.191 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12 instance-20230321-danny-worker01 Ready <none> 19h v1.26.3 10.0.4.174 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12 instance-20230321-danny-worker02 Ready <none> 19h v1.26.3 10.0.4.52 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12 instance-20230426-danny-m01 Ready control-plane 19h v1.26.3 10.0.4.205 <none> Ubuntu 20.04.6 LTS 5.15.0-1030-oracle containerd://1.6.12 instance-20230426-danny-m02 Ready control-plane 19h v1.26.3 10.0.4.125 <none> Ubuntu 20.04.6 LTS 5.15.0-1030-oracle containerd://1.6.12 $ kubectl get po -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-5857bf8d58-vtfcw 1/1 Running 0 13h calico-node-4wqfk 1/1 Running 0 13h calico-node-7qwm5 1/1 Running 0 13h calico-node-c8x2t 1/1 Running 0 13h calico-node-ljnhc 1/1 Running 0 13h calico-node-q5pqt 1/1 Running 0 13h coredns-787d4945fb-m2f6k 1/1 Running 0 13h coredns-787d4945fb-tv8bk 1/1 Running 0 15h kube-apiserver-instance-20230321-danny-master 1/1 Running 1 15h kube-apiserver-instance-20230426-danny-m01 1/1 Running 1 15h kube-apiserver-instance-20230426-danny-m02 1/1 Running 2 15h kube-controller-manager-instance-20230321-danny-master 1/1 Running 0 15h kube-controller-manager-instance-20230426-danny-m01 1/1 Running 0 15h kube-controller-manager-instance-20230426-danny-m02 1/1 Running 0 15h kube-proxy-d4ktk 1/1 Running 0 15h kube-proxy-k2jth 1/1 Running 0 15h kube-proxy-kl9c7 1/1 Running 0 15h kube-proxy-ntkch 1/1 Running 0 15h kube-proxy-xj4dm 1/1 Running 0 15h kube-scheduler-instance-20230321-danny-master 1/1 Running 1 15h kube-scheduler-instance-20230426-danny-m01 1/1 Running 3 15h kube-scheduler-instance-20230426-danny-m02 1/1 Running 5 15h ``` ## Refs [1] [透過 kubeadm 安裝 k8s](https://hackmd.io/@CNCF-meetup/ryAO-wKg3) [2] [【從題目中學習k8s】-【Day3】建立K8s Cluster環境-以kubeadm為例](https://ithelp.ithome.com.tw/articles/10235069) [3] [Install and configure a multi-master Kubernetes cluster with kubeadm](https://dockerlabs.collabnix.com/kubernetes/beginners/Install-and-configure-a-multi-master-Kubernetes-cluster-with-kubeadm.html) [4] [Install Calico networking and network policy for on-premises deployments](https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises) [5] [k8s中calico問題BIRD is not ready: BGP not established with解決](https://inf.news/technique/5297caf1aee055a7eeef8a636b5673d5.html)