Talos Linux with Cilium

# Talos Linux with Cilium ## cilium clustermesh ![](https://hackmd.io/_uploads/ryECFN_1a.png) ## 系統架構圖 ![image](https://hackmd.io/_uploads/r1XlbFrHT.png) * 總共會產生 6 台 VM * c1 cluster 環境 - 1 master 1 worker - Network ID: 192.168.247.0/24 - OS: talos linux - 網路模式: host only - route1: 設定錄由規則與 NAT * c2 cluster 環境 - 1 master 1 worker - Network ID: 192.168.168.0/24 - OS: talos linux - 網路模式: host only - route2: 設定錄由規則與 NAT * 各個叢集的 route 設定錄由規則與 NAT，讓這兩個叢集可以互相溝通與上網。 ## 環境安裝準備 * 在 route1 與 route2 安裝以下命令 * 安裝 kubectl ``` # install kubectl $ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" $ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl $ rm -r kubectl ``` * 安裝 talosctl ``` $ curl -sL https://talos.dev/install | sh ``` * 安裝 cilium cli ``` $ CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) $ wget https://github.com/cilium/cilium-cli/releases/download/$CILIUM_CLI_VERSION/cilium-linux-amd64.tar.gz $ tar zxvf cilium-linux-amd64.tar.gz $ sudo mv cilium /usr/local/bin $ cilium version --client cilium-cli: v0.15.7 compiled with go1.21.0 on linux/amd64 cilium image (default): v1.14.1 cilium image (stable): v1.14.1 ``` * 安裝 helm ``` $ wget https://get.helm.sh/helm-v3.8.2-linux-amd64.tar.gz $ tar zxvf helm-v3.8.2-linux-amd64.tar.gz $ sudo cp linux-amd64/helm /usr/local/bin/ ``` ## 注意事項 * 所有集群的 `podSubnets` 與 `serviceSubnets` 範圍必須是不衝突且唯一的 IP。 * kubeconfig 需要可以直接訪問兩個叢集 * 兩個叢集需要先設定好路由規則 ## 建立 c1 cluster > master ip: 192.168.247.11/24 > worker ip: 192.168.247.20/24 > * 以下環境在 route1 執行 ### 建立 c1-m1 ``` $ mkdir ~/cilium1; cd ~/cilium1 $ talosctl gen secrets -o secrets.yaml $ talosctl gen config --with-secrets secrets.yaml c1-cluster https://192.168.247.11:6443 ``` * 修改 pod、service Subnets ``` $ nano m1.patch machine: network: hostname: c1-m1 interfaces: - interface: eno16777728 dhcp: false addresses: - 192.168.247.11/24 routes: - gateway: 192.168.247.254 nameservers: - 168.95.1.1 - 8.8.8.8 cluster: network: podSubnets: - 10.11.0.0/16 serviceSubnets: - 10.12.0.0/16 cni: name: none ``` ``` $ talosctl machineconfig patch controlplane.yaml --patch @m1.patch --output m1.yaml # 執行後系統會自行 reboot $ talosctl apply-config --insecure --nodes 192.168.247.128 --file m1.yaml $ talosctl --talosconfig=./talosconfig config endpoint 192.168.247.11 ``` * 在 bootstrap 以後才會真正的安裝 talos linux ``` $ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig version # 安裝 k8s $ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig bootstrap $ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig dashboard # 下載 kubeconfig $ talosctl \ --nodes 192.168.247.11 \ --talosconfig=./talosconfig \ kubeconfig ``` * 檢查 k8s 狀態，因為還沒安裝 CNI ，因此會是 NotReady ``` $ kubectl get no NAME STATUS ROLES AGE VERSION c1-m1 NotReady control-plane 5m43s v1.28.1 $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * admin@c1-cluster c1-cluster admin@c1-cluster default $ kubectl taint node c1-m1 node-role.kubernetes.io/control-plane:NoSchedule- ``` * 安裝 cilium ``` $ helm repo add cilium https://helm.cilium.io/ $ helm repo update $ helm install --kube-context admin@c1-cluster cilium cilium/cilium --version 1.14.2 \ --namespace kube-system \ --set ipam.mode=kubernetes \ --set cluster.id=1 \ --set cluster.name=cilium-1 \ --set=kubeProxyReplacement=disabled \ --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \ --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \ --set=cgroup.autoMount.enabled=false \ --set=cgroup.hostRoot=/sys/fs/cgroup ``` ### 建立 c1-w1 * 修改 pod、service Subnets ``` $ nano w1.patch machine: network: hostname: c1-w1 interfaces: - interface: eno16777728 dhcp: false addresses: - 192.168.247.20/24 routes: - gateway: 192.168.247.254 nameservers: - 168.95.1.1 - 8.8.8.8 cluster: network: podSubnets: - 10.11.0.0/16 serviceSubnets: - 10.12.0.0/16 cni: name: none ``` * 加入 worker 並貼上標籤 ``` $ talosctl machineconfig patch worker.yaml --patch @w1.patch --output w1.yaml $ talosctl apply-config --insecure --nodes 192.168.247.129 --file w1.yaml ``` ``` $ talosctl \ --nodes 192.168.247.20 \ --talosconfig=./talosconfig \ dashboard $ kubectl label node c1-w1 node-role.kubernetes.io/worker= $ kubectl get no NAME STATUS ROLES AGE VERSION c1-m1 Ready control-plane 24m v1.28.1 c1-w1 Ready worker 2m1s v1.28.1 ``` * 備份 kubeconfig 檔 ``` $ cp ~/.kube/config ~/.kube/cluster1 ``` ## 建立 c2 cluster > master ip: 192.168.186.11/24 > worker ip: 192.168.186.20/24 * 以下環境在 route2 執行 ### 建立 c2-m1 ``` $ mkdir ~/cilium2; cd ~/cilium2 $ talosctl gen secrets -o secrets.yaml $ talosctl gen config --with-secrets secrets.yaml c2-cluster https://192.168.186.11:6443 ``` * 修改 pod、service Subnets ``` $ nano m1.patch machine: network: hostname: c2-m1 interfaces: - interface: eno16777728 dhcp: false addresses: - 192.168.186.11/24 routes: - gateway: 192.168.186.254 nameservers: - 168.95.1.1 - 8.8.8.8 cluster: network: podSubnets: - 10.21.0.0/16 serviceSubnets: - 10.22.0.0/16 cni: name: none ``` ``` $ talosctl machineconfig patch controlplane.yaml --patch @m1.patch --output m1.yaml # 執行後系統會自行 reboot $ talosctl apply-config --insecure --nodes 192.168.186.128 --file m1.yaml $ talosctl --talosconfig=./talosconfig \ config endpoint 192.168.186.11 ``` ``` $ talosctl --nodes 192.168.186.11 --talosconfig=./talosconfig version # 安裝 k8s $ talosctl \ --nodes 192.168.186.11 \ --talosconfig=./talosconfig \ bootstrap $ talosctl \ --nodes 192.168.186.11 \ --talosconfig=./talosconfig \ dashboard $ talosctl \ --nodes 192.168.186.11 \ --talosconfig=./talosconfig \ kubeconfig $ kubectl --context admin@c2-cluster taint node c2-m1 node-role.kubernetes.io/control-plane:NoSchedule- ``` * 安裝 cilium ``` $ helm repo add cilium https://helm.cilium.io/ $ helm repo update $ helm install --kube-context admin@c2-cluster cilium cilium/cilium --version 1.14.2 \ --namespace kube-system \ --set ipam.mode=kubernetes \ --set cluster.id=2 \ --set cluster.name=cilium-2 \ --set=kubeProxyReplacement=disabled \ --set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \ --set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \ --set=cgroup.autoMount.enabled=false \ --set=cgroup.hostRoot=/sys/fs/cgroup ``` ``` $ kubectl --context admin@c2-cluster get no NAME STATUS ROLES AGE VERSION c2-m1 Ready control-plane 27m v1.28.1 ``` ### 建立 c2-w1 ``` $ nano w1.patch machine: network: hostname: c2-w1 interfaces: - interface: eno16777728 dhcp: false addresses: - 192.168.186.20/24 routes: - gateway: 192.168.186.254 nameservers: - 168.95.1.1 - 8.8.8.8 cluster: network: podSubnets: - 10.21.0.0/16 serviceSubnets: - 10.22.0.0/16 cni: name: none ``` * 加入 worker 並貼上標籤 ``` $ talosctl machineconfig patch worker.yaml --patch @w1.patch --output w1.yaml $ talosctl apply-config --insecure --nodes 192.168.186.129 --file w1.yaml ``` ``` $ talosctl \ --nodes 192.168.186.20 \ --talosconfig=./talosconfig \ dashboard $ kubectl --context admin@c2-cluster label node c2-w1 node-role.kubernetes.io/worker= $ kubectl --context admin@c2-cluster get no NAME STATUS ROLES AGE VERSION c2-m1 Ready control-plane 35m v1.28.1 c2-w1 Ready worker 2m37s v1.28.1 ``` * 在 route2 把 kubeconfig 複製到 route1 上，並且宣告 kubeconfig 環境變數，需要可以訪問兩個叢集 ``` $ scp ~/.kube/config 192.168.61.254:~/.kube/cluster2 # 登入 route1 並執行以下命令 $ echo 'export KUBECONFIG="/home/bigred/.kube/cluster1:/home/bigred/.kube/cluster2"' | sudo tee -a /etc/profile ``` * 重新 login ``` $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * admin@c1-cluster c1-cluster admin@c1-cluster default admin@c2-cluster c2-cluster admin@c2-cluster default ``` ## 安裝 metalb * 以下指令皆在 route1 上執行 ``` $ wget -qO - https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml | kubectl --context admin@c1-cluster apply -f - $ wget -qO - https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml | kubectl --context admin@c2-cluster apply -f - ``` * 設定 metalb 允許使用的 ip 範圍 ``` $ echo ' apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: mlb1 namespace: metallb-system spec: addresses: - 192.168.247.220-192.168.247.230 --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: mlb1 namespace: metallb-system' | kubectl --context admin@c1-cluster apply -f - $ echo ' apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: mlb1 namespace: metallb-system spec: addresses: - 192.168.186.220-192.168.186.230 --- apiVersion: metallb.io/v1beta1 kind: L2Advertisement metadata: name: mlb1 namespace: metallb-system' | kubectl --context admin@c2-cluster apply -f - ``` ## 啟動 clustermesh 功能 ``` $ cilium clustermesh enable --context admin@c1-cluster --service-type LoadBalancer $ cilium clustermesh enable --context admin@c2-cluster --service-type LoadBalancer ``` * 檢查 clustermesh 功能是否正常 ``` $ cilium status --context admin@c1-cluster /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode) \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: OK Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2 DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2 Deployment clustermesh-apiserver Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium-operator Running: 2 clustermesh-apiserver Running: 1 cilium Running: 2 Cluster Pods: 4/4 managed by Cilium Helm chart version: 1.14.2 Image versions cilium quay.io/cilium/cilium:v1.14.2@sha256:6263f3a3d5d63b267b538298dbeb5ae87da3efacf09a2c620446c873ba807d35: 2 cilium-operator quay.io/cilium/operator-generic:v1.14.2@sha256:52f70250dea22e506959439a7c4ea31b10fe8375db62f5c27ab746e3a2af866d: 2 clustermesh-apiserver quay.io/coreos/etcd:v3.5.4@sha256:795d8660c48c439a7c3764c2330ed9222ab5db5bb524d8d0607cac76f7ba82a3: 1 clustermesh-apiserver quay.io/cilium/clustermesh-apiserver:v1.14.2@sha256:0650beac6633a483261640b6539c9609f5a761f4ab4504fd1e6ffe7f2bb82e9a: 1 $ cilium status --context admin@c2-cluster /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode) \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: OK Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2 DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2 Deployment clustermesh-apiserver Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium Running: 2 cilium-operator Running: 2 clustermesh-apiserver Running: 1 Cluster Pods: 4/4 managed by Cilium Helm chart version: 1.14.2 Image versions cilium quay.io/cilium/cilium:v1.14.2@sha256:6263f3a3d5d63b267b538298dbeb5ae87da3efacf09a2c620446c873ba807d35: 2 cilium-operator quay.io/cilium/operator-generic:v1.14.2@sha256:52f70250dea22e506959439a7c4ea31b10fe8375db62f5c27ab746e3a2af866d: 2 clustermesh-apiserver quay.io/cilium/clustermesh-apiserver:v1.14.2@sha256:0650beac6633a483261640b6539c9609f5a761f4ab4504fd1e6ffe7f2bb82e9a: 1 clustermesh-apiserver quay.io/coreos/etcd:v3.5.4@sha256:795d8660c48c439a7c3764c2330ed9222ab5db5bb524d8d0607cac76f7ba82a3: 1 ``` ## 連接集群 ``` $ cilium clustermesh connect --context admin@c1-cluster --destination-context admin@c2-cluster ``` * 檢查兩個叢集是否連接 ``` $ cilium clustermesh status --context admin@c1-cluster ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found ✅ Cluster access information is available: - 192.168.247.220:2379 ✅ Deployment clustermesh-apiserver is ready ✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1] 🔌 Cluster Connections: - cilium-2: 2/2 configured, 2/2 connected 🔀 Global services: [ min:0 / avg:0.0 / max:0 ] $ cilium clustermesh status --context admin@c2-cluster ✅ Service "clustermesh-apiserver" of type "LoadBalancer" found ✅ Cluster access information is available: - 192.168.186.220:2379 ✅ Deployment clustermesh-apiserver is ready ✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1] 🔌 Cluster Connections: - cilium-1: 2/2 configured, 2/2 connected 🔀 Global services: [ min:0 / avg:0.0 / max:0 ] ``` ## 測試 ``` $ kubectl --context admin@c1-cluster run web --image=nginx $ kubectl --context admin@c1-cluster get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web 1/1 Running 0 27s 10.11.2.154 w1 <none> <none> ``` * 在 c2 叢集的 pod 可以用 pod 的 ip curl 到 c1 的 pod ``` $ kubectl --context admin@c2-cluster run alp -it --image=alpine / # apk add curl / # curl 10.11.2.154 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html> ``` ### 測試服務 loadbalance 與 failover 轉移 ![](https://hackmd.io/_uploads/SJXJyOOAh.png) ``` $ echo ' apiVersion: apps/v1 kind: Deployment metadata: name: s1.dep spec: replicas: 1 selector: matchLabels: app: s1.dep template: metadata: labels: app: s1.dep spec: containers: - name: app image: quay.io/flysangel/image:app.golang' | kubectl --context admin@c1-cluster apply -f - $ echo ' apiVersion: v1 kind: Service metadata: name: s1 annotations: io.cilium/global-service: "true" # 啓用全局負載平衡 spec: ports: - port: 80 targetPort: 8080 selector: app: s1.dep type: ClusterIP' | kubectl --context admin@c1-cluster apply -f - ``` ``` $ echo ' apiVersion: apps/v1 kind: Deployment metadata: name: s1.dep spec: replicas: 1 selector: matchLabels: app: s1.dep template: metadata: labels: app: s1.dep spec: containers: - name: app image: quay.io/flysangel/image:app.golang' | kubectl --context admin@c2-cluster apply -f - $ echo ' apiVersion: v1 kind: Service metadata: name: s1 annotations: io.cilium/global-service: "true" # 啓用全局負載平衡 spec: ports: - port: 80 targetPort: 8080 selector: app: s1.dep type: ClusterIP' | kubectl --context admin@c2-cluster apply -f - ``` * 檢查兩個叢集服務狀態 ``` $ kubectl --context admin@c1-cluster get po,svc NAME READY STATUS RESTARTS AGE pod/s1.dep-56657c4d58-gp6v5 1/1 Running 0 119s pod/web 1/1 Running 0 5h42m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.12.0.1 <none> 443/TCP 6h27m service/s1 ClusterIP 10.12.25.212 <none> 80/TCP 79s $ kubectl --context admin@c2-cluster get po,svc NAME READY STATUS RESTARTS AGE pod/alp 1/1 Running 1 (5h42m ago) 5h42m pod/s1.dep-56657c4d58-gzxgx 1/1 Running 0 72s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.22.0.1 <none> 443/TCP 6h10m service/s1 ClusterIP 10.22.41.26 <none> 80/TCP 41s ``` * 啟動一個測試用 pod ，他會 loadbalance 到兩邊的服務 ``` $ kubectl --context admin@c1-cluster run test -it --image=alpine / # apk add curl / # curl -w "\n" http://s1/hostname {"message":"s1.dep-56657c4d58-gzxgx"} / # curl -w "\n" http://s1/hostname {"message":"s1.dep-56657c4d58-gp6v5"} ``` * 刪除其中一個 deployment ，驗證服務是否有中斷 ``` $ kubectl --context admin@c2-cluster delete deploy s1.dep deployment.apps "s1.dep" deleted $ kubectl --context admin@c1-cluster exec -it test -- sh / # apk add curl / # curl -w "\n" http://s1/hostname {"message":"s1.dep-56657c4d58-gp6v5"} / # curl -w "\n" http://s1/hostname {"message":"s1.dep-56657c4d58-gp6v5"} ```