# Kubeadm 建立 k8s cluster
###### tags: `k8s` `CKA`
[toc]
## 目標
在 Oracle cloud(OCI) 中建立 multi-control-plane 的 k8s cluster(1.26)
## 準備工作
- Virtual Cloud Network(VCN) - 1
- Subnet - 1
- Load Balancer(VM) - 1
- Control Plane(VM) - 3
- Worker Node(VM) - 2
## 步驟流程
### 1. 建立機器資訊
- Load Balancer
- cpu 2 core
- memory 16G
- 10.0.4.55
- Control Plane
- cpu 2 core
- memory 16G
- 10.0.4.125, 10.0.4.191, 10.0.4.205
- Worker Node
- cpu 1 core
- memory 8G
- 10.0.4.174, 10.0.4.52
### 2. 設定 Load Balancer
使用 Ha Proxy 建立
10.0.4.55
#### 2-1 建立憑證
安裝 cfssl
```bash=
# Download
$ wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
$ wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
# add permission
$ chmod +x cfssl*
# move to /usr/local/bin
$ sudo mv cfssl_linux-amd64 /usr/local/bin/cfssl
$ sudo mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
# check
$ cfssl version
```
設定憑證
```bash=
# Create the certificate authority configuration file
$ vim ca-config.json
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"kubernetes": {
"usages": ["signing", "key encipherment", "server auth", "client auth"],
"expiry": "8760h"
}
}
}
}
# Create the certificate authority signing request configuration file
$ vim ca-csr.json
{
"CN": "Kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "IE",
"L": "Cork",
"O": "Kubernetes",
"OU": "CA",
"ST": "Cork Co."
}
]
}
# Generate the certificate authority certificate and private key
$ cfssl gencert -initca ca-csr.json | cfssljson -bare ca
# certificate for the Etcd cluster
$ vim kubernetes-csr.json
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "IE",
"L": "Cork",
"O": "Kubernetes",
"OU": "Kubernetes",
"ST": "Cork Co."
}
]
}
# Generate the certificate and private key
$ cfssl gencert \
-ca=ca.pem \
-ca-key=ca-key.pem \
-config=ca-config.json \
-hostname=10.0.4.125,10.0.4.191,10.0.4.205,10.0.4.55,127.0.0.1,kubernetes.default \
-profile=kubernetes kubernetes-csr.json | \
cfssljson -bare kubernetes
# copy CA files to all nodes
scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.205:~
scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.191:~
scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.125:~
scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.174:~
scp -i ssh-key ca.pem kubernetes.pem kubernetes-key.pem ubuntu@10.0.4.52:~
```
#### 2-2 建立 HA Proxy
安裝 HA Proxy
```bash=
# 安裝
$ sudo apt-get update
$ sudo apt-get install haproxy
# 修改 config
$ sudo vim /etc/haproxy/haproxy.cfg
global
...
default
...
# 加入下面兩個 section
frontend kubernetes
bind 10.0.4.55:6443 # load balancer ip
option tcplog
mode tcp
default_backend kubernetes-master-nodes
backend kubernetes-master-nodes
mode tcp
balance roundrobin
option tcp-check
# control plane ip
server k8s-master-0 10.0.4.125:6443 check fall 3 rise 2
server k8s-master-1 10.0.4.191:6443 check fall 3 rise 2
server k8s-master-2 10.0.4.205:6443 check fall 3 rise 2
# restart HA Proxy
$ sudo systemctl restart haproxy
```
### 3. 設定 Nodes (共通)
Following [[1]](https://hackmd.io/@CNCF-meetup/ryAO-wKg3) to install CRI, kubectl, kubeadm, kubelet
### 4. 設定 Control Plane
10.0.4.125, 10.0.4.191, 10.0.4.205
#### 4-1 建立 Etcd Cluster
```bash=
# Create etcd config directory
$ sudo mkdir /etc/etcd /var/lib/etcd
# 把前面建好的 ca pem file 搬進去
$ sudo mv ~/ca.pem ~/kubernetes.pem ~/kubernetes-key.pem /etc/etcd
# Setting etcd binary
$ wget https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz
$ tar xvzf etcd-v3.3.13-linux-amd64.tar.gz
$ sudo mv etcd-v3.3.13-linux-amd64/etcd* /usr/local/bin/
# Create service file
$ sudo vim /etc/systemd/system/etcd.service
[Unit]
Description=etcd
Documentation=https://github.com/coreos
[Service]
ExecStart=/usr/local/bin/etcd \
--name $SERVER_IP \
--cert-file=/etc/etcd/kubernetes.pem \
--key-file=/etc/etcd/kubernetes-key.pem \
--peer-cert-file=/etc/etcd/kubernetes.pem \
--peer-key-file=/etc/etcd/kubernetes-key.pem \
--trusted-ca-file=/etc/etcd/ca.pem \
--peer-trusted-ca-file=/etc/etcd/ca.pem \
--peer-client-cert-auth \
--client-cert-auth \
--initial-advertise-peer-urls https://$SERVER_IP:2380 \
--listen-peer-urls https://$SERVER_IP:2380 \
--listen-client-urls https://$SERVER_IP:2379,http://127.0.0.1:2379 \
--advertise-client-urls https://$SERVER_IP:2379 \
--initial-cluster-token etcd-cluster-0 \
--initial-cluster 10.0.4.205=https://10.0.4.205:2380,10.0.4.191=https://10.0.4.191:2380,10.0.4.125=https://10.0.4.125:2380 \
--initial-cluster-state new \
--data-dir=/var/lib/etcd
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
# Start etcd Service
$ sudo systemctl daemon-reload
$ sudo systemctl enable etcd
$ sudo systemctl start etcd
# Check etcd cluster
$ ETCDCTL_API=3 etcdctl member list
# Output 會像這樣
61a2630562d869a9, started, 10.0.4.125, https://10.0.4.125:2380, https://10.0.4.125:2379
bb5ee5165948fea8, started, 10.0.4.205, https://10.0.4.205:2380, https://10.0.4.205:2379
ca0549b24a0f415a, started, 10.0.4.191, https://10.0.4.191:2380, https://10.0.4.191:2379
```
#### 4-2 啟動 Control Plane
分三部分,第一部分為共通執行,第二部分為第一台 Control Plane 執行,第三部分在剩餘 Control Plane 上執行
##### 4-2-1 共通準備
這裡要注意使用的 k8s 版本會讓 apiVersion 有所不同,這裡的 k8s 是 1.26,所以 apiVersion 要使用 v1beta3
```bash=
$ vim config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: stable
apiServerCertSANs:
- 10.0.4.55
controlPlaneEndpoint: "10.0.4.55:6443"
etcd:
external:
endpoints:
- https://10.0.4.191:2379
- https://10.0.4.205:2379
- https://10.0.4.125:2379
caFile: /etc/etcd/ca.pem
certFile: /etc/etcd/kubernetes.pem
keyFile: /etc/etcd/kubernetes-key.pem
networking:
podSubnet: 172.16.0.0/16
apiServerExtraArgs:
apiserver-count: "3"
```
##### 4-2-2 First Control Plane
```bash=
# kubeadm 啟動
$ sudo kubeadm init --config=config.yaml
# 將產生的 pki cp 到其他 control plane
sudo scp -i ssh-key -r /etc/kubernetes/pki ubuntu@10.0.4.191:~
sudo scp -i ssh-key -r /etc/kubernetes/pki ubuntu@10.0.4.125:~
```
##### 4-2-3 Other Control Plane
```bash=
# 調整移動 pki files
$ rm ~/pki/apiserver.*
$ sudo mv ~/pki /etc/kubernetes/
# kubeadm 啟動
$ sudo kubeadm init --config=config.yaml
```
都設定好後應該會取得各自 kubeadm join 的指令,拿其中一個到 Worker Node 執行就好
kubeconfig 可以在 `/etc/kubernetes/admin.conf` 找到,再自行搬運即可
### 5. 設定 Worker Node
10.0.4.174, 10.0.4.52
```bash=
$ sudo kubeadm join 10.0.4.55:6443 --token $TOKEN --discovery-token-ca-cert-hash sha256:$HASH_TOKEN
```
### 6. 建立 CNI
Following [[4]](https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises) to apply calico
記得修改 yaml 中的 pod CIDR 為 `172.16.0.0/16`
同時如果 network interface 有不同的也要在 `CLUSTER_TYPE` 下面加一個 Variable
```yaml
- name: IP_AUTODETECTION_METHOD
value: "interface=ens3"
```
#### 6-1 建立 Calico 的踩雷過程
calico node 呈現 not ready
```bash=
$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5857bf8d58-vtfcw 1/1 Running 0 13h
calico-node-4wqfk 0/1 Running 0 13h
calico-node-7qwm5 0/1 Running 0 13h
calico-node-c8x2t 0/1 Running 0 13h
calico-node-ljnhc 0/1 Running 0 13h
calico-node-q5pqt 0/1 Running 0 13h
```
從 log 中有點難看懂
```log=
2023-04-27 02:20:00.283 [INFO][73] monitor-addresses/autodetection_methods.go 117: Using autodetected IPv4 address 10.0.4.205/24 on matching interface ens3
2023-04-27 02:20:13.953 [INFO][72] felix/summary.go 100: Summarising 13 dataplane reconciliation loops over 1m8.3s: avg=2ms longest=5ms ()
2023-04-27 02:20:37.520 [INFO][72] felix/int_dataplane.go 1693: Received *proto.HostMetadataV4V6Update update from calculation graph msg=hostname:"instance-20230321-danny-worker02" ipv4_addr:"10.0.4.52/24" labels:<key:"beta.kubernetes.io/arch" value:"amd64" > labels:<key:"beta.kubernetes.io/os" value:"linux" > labels:<key:"kubernetes.io/arch" value:"amd64" > labels:<key:"kubernetes.io/hostname" value:"instance-20230321-danny-worker02" > labels:<key:"kubernetes.io/os" value:"linux" >
```
從 event 中也看不太出來,只知道是 probe 沒過
```bash=
$ kubectl get event -n kube-system
LAST SEEN TYPE REASON OBJECT MESSAGE
3m48s Warning Unhealthy pod/calico-node-4wqfk (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.219 [INFO][159156] confd/health.go 180: Number of node(s) with BGP peering established = 0...
3m48s Warning Unhealthy pod/calico-node-7qwm5 (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.246 [INFO][158624] confd/health.go 180: Number of node(s) with BGP peering established = 0...
3m48s Warning Unhealthy pod/calico-node-c8x2t (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.258 [INFO][144156] confd/health.go 180: Number of node(s) with BGP peering established = 0...
3m48s Warning Unhealthy pod/calico-node-ljnhc (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.305 [INFO][143449] confd/health.go 180: Number of node(s) with BGP peering established = 0...
3m48s Warning Unhealthy pod/calico-node-q5pqt (combined from similar events): Readiness probe failed: 2023-04-27 02:55:51.266 [INFO][143592] confd/health.go 180: Number of node(s) with BGP peering established = 0...
```
用 describe 看一下裡面的 event 會不會寫什麼
```bash=
$ kubectl describe po calico-node-4wqfk -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 65s (x5564 over 13h) kubelet (combined from similar events): Readiness probe failed: 2023-04-27 03:00:51.232 [INFO][160142] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.0.4.191,10.0.4.174,10.0.4.52,10.0.4.125
```
發現 BGP peering 沒有成功建立,參考 [[5]](https://inf.news/technique/5297caf1aee055a7eeef8a636b5673d5.html) 確認網路
```bash=
$ netstat -antp|grep bird
tcp 0 0 0.0.0.0:179 0.0.0.0:* LISTEN 83544/bird
tcp 0 1 10.0.4.205:59875 10.0.4.191:179 SYN_SENT 83544/bird
tcp 0 1 10.0.4.205:35625 10.0.4.174:179 SYN_SENT 83544/bird
tcp 0 1 10.0.4.205:46797 10.0.4.125:179 SYN_SENT 83544/bird
```
從這邊發現有一個 179 的 port 連線資訊,但在一開始建 node 的時候就有下過 `iptables -F`,照理說應該不會有連線被擋的問題,所以突然想到是 VCN 裡面的 security list 沒有設定白名單

設定好之後就一切正常了
```bash=
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
instance-20230321-danny-master Ready control-plane 19h v1.26.3 10.0.4.191 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12
instance-20230321-danny-worker01 Ready <none> 19h v1.26.3 10.0.4.174 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12
instance-20230321-danny-worker02 Ready <none> 19h v1.26.3 10.0.4.52 <none> Ubuntu 20.04.5 LTS 5.15.0-1032-oracle containerd://1.6.12
instance-20230426-danny-m01 Ready control-plane 19h v1.26.3 10.0.4.205 <none> Ubuntu 20.04.6 LTS 5.15.0-1030-oracle containerd://1.6.12
instance-20230426-danny-m02 Ready control-plane 19h v1.26.3 10.0.4.125 <none> Ubuntu 20.04.6 LTS 5.15.0-1030-oracle containerd://1.6.12
$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5857bf8d58-vtfcw 1/1 Running 0 13h
calico-node-4wqfk 1/1 Running 0 13h
calico-node-7qwm5 1/1 Running 0 13h
calico-node-c8x2t 1/1 Running 0 13h
calico-node-ljnhc 1/1 Running 0 13h
calico-node-q5pqt 1/1 Running 0 13h
coredns-787d4945fb-m2f6k 1/1 Running 0 13h
coredns-787d4945fb-tv8bk 1/1 Running 0 15h
kube-apiserver-instance-20230321-danny-master 1/1 Running 1 15h
kube-apiserver-instance-20230426-danny-m01 1/1 Running 1 15h
kube-apiserver-instance-20230426-danny-m02 1/1 Running 2 15h
kube-controller-manager-instance-20230321-danny-master 1/1 Running 0 15h
kube-controller-manager-instance-20230426-danny-m01 1/1 Running 0 15h
kube-controller-manager-instance-20230426-danny-m02 1/1 Running 0 15h
kube-proxy-d4ktk 1/1 Running 0 15h
kube-proxy-k2jth 1/1 Running 0 15h
kube-proxy-kl9c7 1/1 Running 0 15h
kube-proxy-ntkch 1/1 Running 0 15h
kube-proxy-xj4dm 1/1 Running 0 15h
kube-scheduler-instance-20230321-danny-master 1/1 Running 1 15h
kube-scheduler-instance-20230426-danny-m01 1/1 Running 3 15h
kube-scheduler-instance-20230426-danny-m02 1/1 Running 5 15h
```
## Refs
[1] [透過 kubeadm 安裝 k8s](https://hackmd.io/@CNCF-meetup/ryAO-wKg3)
[2] [【從題目中學習k8s】-【Day3】建立K8s Cluster環境-以kubeadm為例](https://ithelp.ithome.com.tw/articles/10235069)
[3] [Install and configure a multi-master Kubernetes cluster with kubeadm](https://dockerlabs.collabnix.com/kubernetes/beginners/Install-and-configure-a-multi-master-Kubernetes-cluster-with-kubeadm.html)
[4] [Install Calico networking and network policy for on-premises deployments](https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises)
[5] [k8s中calico問題BIRD is not ready: BGP not established with解決](https://inf.news/technique/5297caf1aee055a7eeef8a636b5673d5.html)