# Talos Linux with Cilium
## cilium clustermesh

## 系統架構圖

* 總共會產生 6 台 VM
* c1 cluster 環境
- 1 master 1 worker
- Network ID: 192.168.247.0/24
- OS: talos linux
- 網路模式: host only
- route1: 設定錄由規則與 NAT
* c2 cluster 環境
- 1 master 1 worker
- Network ID: 192.168.168.0/24
- OS: talos linux
- 網路模式: host only
- route2: 設定錄由規則與 NAT
* 各個叢集的 route 設定錄由規則與 NAT,讓這兩個叢集可以互相溝通與上網。
## 環境安裝準備
* 在 route1 與 route2 安裝以下命令
* 安裝 kubectl
```
# install kubectl
$ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
$ sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
$ rm -r kubectl
```
* 安裝 talosctl
```
$ curl -sL https://talos.dev/install | sh
```
* 安裝 cilium cli
```
$ CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
$ wget https://github.com/cilium/cilium-cli/releases/download/$CILIUM_CLI_VERSION/cilium-linux-amd64.tar.gz
$ tar zxvf cilium-linux-amd64.tar.gz
$ sudo mv cilium /usr/local/bin
$ cilium version --client
cilium-cli: v0.15.7 compiled with go1.21.0 on linux/amd64
cilium image (default): v1.14.1
cilium image (stable): v1.14.1
```
* 安裝 helm
```
$ wget https://get.helm.sh/helm-v3.8.2-linux-amd64.tar.gz
$ tar zxvf helm-v3.8.2-linux-amd64.tar.gz
$ sudo cp linux-amd64/helm /usr/local/bin/
```
## 注意事項
* 所有集群的 `podSubnets` 與 `serviceSubnets` 範圍必須是不衝突且唯一的 IP。
* kubeconfig 需要可以直接訪問兩個叢集
* 兩個叢集需要先設定好路由規則
## 建立 c1 cluster
> master ip: 192.168.247.11/24
> worker ip: 192.168.247.20/24
>
* 以下環境在 route1 執行
### 建立 c1-m1
```
$ mkdir ~/cilium1; cd ~/cilium1
$ talosctl gen secrets -o secrets.yaml
$ talosctl gen config --with-secrets secrets.yaml c1-cluster https://192.168.247.11:6443
```
* 修改 pod、service Subnets
```
$ nano m1.patch
machine:
network:
hostname: c1-m1
interfaces:
- interface: eno16777728
dhcp: false
addresses:
- 192.168.247.11/24
routes:
- gateway: 192.168.247.254
nameservers:
- 168.95.1.1
- 8.8.8.8
cluster:
network:
podSubnets:
- 10.11.0.0/16
serviceSubnets:
- 10.12.0.0/16
cni:
name: none
```
```
$ talosctl machineconfig patch controlplane.yaml --patch @m1.patch --output m1.yaml
# 執行後系統會自行 reboot
$ talosctl apply-config --insecure --nodes 192.168.247.128 --file m1.yaml
$ talosctl --talosconfig=./talosconfig config endpoint 192.168.247.11
```
* 在 bootstrap 以後才會真正的安裝 talos linux
```
$ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig version
# 安裝 k8s
$ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig bootstrap
$ talosctl --nodes 192.168.247.11 --talosconfig=./talosconfig dashboard
# 下載 kubeconfig
$ talosctl \
--nodes 192.168.247.11 \
--talosconfig=./talosconfig \
kubeconfig
```
* 檢查 k8s 狀態,因為還沒安裝 CNI ,因此會是 NotReady
```
$ kubectl get no
NAME STATUS ROLES AGE VERSION
c1-m1 NotReady control-plane 5m43s v1.28.1
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* admin@c1-cluster c1-cluster admin@c1-cluster default
$ kubectl taint node c1-m1 node-role.kubernetes.io/control-plane:NoSchedule-
```
* 安裝 cilium
```
$ helm repo add cilium https://helm.cilium.io/
$ helm repo update
$ helm install --kube-context admin@c1-cluster cilium cilium/cilium --version 1.14.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set cluster.id=1 \
--set cluster.name=cilium-1 \
--set=kubeProxyReplacement=disabled \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup
```
### 建立 c1-w1
* 修改 pod、service Subnets
```
$ nano w1.patch
machine:
network:
hostname: c1-w1
interfaces:
- interface: eno16777728
dhcp: false
addresses:
- 192.168.247.20/24
routes:
- gateway: 192.168.247.254
nameservers:
- 168.95.1.1
- 8.8.8.8
cluster:
network:
podSubnets:
- 10.11.0.0/16
serviceSubnets:
- 10.12.0.0/16
cni:
name: none
```
* 加入 worker 並貼上標籤
```
$ talosctl machineconfig patch worker.yaml --patch @w1.patch --output w1.yaml
$ talosctl apply-config --insecure --nodes 192.168.247.129 --file w1.yaml
```
```
$ talosctl \
--nodes 192.168.247.20 \
--talosconfig=./talosconfig \
dashboard
$ kubectl label node c1-w1 node-role.kubernetes.io/worker=
$ kubectl get no
NAME STATUS ROLES AGE VERSION
c1-m1 Ready control-plane 24m v1.28.1
c1-w1 Ready worker 2m1s v1.28.1
```
* 備份 kubeconfig 檔
```
$ cp ~/.kube/config ~/.kube/cluster1
```
## 建立 c2 cluster
> master ip: 192.168.186.11/24
> worker ip: 192.168.186.20/24
* 以下環境在 route2 執行
### 建立 c2-m1
```
$ mkdir ~/cilium2; cd ~/cilium2
$ talosctl gen secrets -o secrets.yaml
$ talosctl gen config --with-secrets secrets.yaml c2-cluster https://192.168.186.11:6443
```
* 修改 pod、service Subnets
```
$ nano m1.patch
machine:
network:
hostname: c2-m1
interfaces:
- interface: eno16777728
dhcp: false
addresses:
- 192.168.186.11/24
routes:
- gateway: 192.168.186.254
nameservers:
- 168.95.1.1
- 8.8.8.8
cluster:
network:
podSubnets:
- 10.21.0.0/16
serviceSubnets:
- 10.22.0.0/16
cni:
name: none
```
```
$ talosctl machineconfig patch controlplane.yaml --patch @m1.patch --output m1.yaml
# 執行後系統會自行 reboot
$ talosctl apply-config --insecure --nodes 192.168.186.128 --file m1.yaml
$ talosctl --talosconfig=./talosconfig \
config endpoint 192.168.186.11
```
```
$ talosctl --nodes 192.168.186.11 --talosconfig=./talosconfig version
# 安裝 k8s
$ talosctl \
--nodes 192.168.186.11 \
--talosconfig=./talosconfig \
bootstrap
$ talosctl \
--nodes 192.168.186.11 \
--talosconfig=./talosconfig \
dashboard
$ talosctl \
--nodes 192.168.186.11 \
--talosconfig=./talosconfig \
kubeconfig
$ kubectl --context admin@c2-cluster taint node c2-m1 node-role.kubernetes.io/control-plane:NoSchedule-
```
* 安裝 cilium
```
$ helm repo add cilium https://helm.cilium.io/
$ helm repo update
$ helm install --kube-context admin@c2-cluster cilium cilium/cilium --version 1.14.2 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set cluster.id=2 \
--set cluster.name=cilium-2 \
--set=kubeProxyReplacement=disabled \
--set=securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set=securityContext.capabilities.cleanCiliumState="{NET_ADMIN,SYS_ADMIN,SYS_RESOURCE}" \
--set=cgroup.autoMount.enabled=false \
--set=cgroup.hostRoot=/sys/fs/cgroup
```
```
$ kubectl --context admin@c2-cluster get no
NAME STATUS ROLES AGE VERSION
c2-m1 Ready control-plane 27m v1.28.1
```
### 建立 c2-w1
```
$ nano w1.patch
machine:
network:
hostname: c2-w1
interfaces:
- interface: eno16777728
dhcp: false
addresses:
- 192.168.186.20/24
routes:
- gateway: 192.168.186.254
nameservers:
- 168.95.1.1
- 8.8.8.8
cluster:
network:
podSubnets:
- 10.21.0.0/16
serviceSubnets:
- 10.22.0.0/16
cni:
name: none
```
* 加入 worker 並貼上標籤
```
$ talosctl machineconfig patch worker.yaml --patch @w1.patch --output w1.yaml
$ talosctl apply-config --insecure --nodes 192.168.186.129 --file w1.yaml
```
```
$ talosctl \
--nodes 192.168.186.20 \
--talosconfig=./talosconfig \
dashboard
$ kubectl --context admin@c2-cluster label node c2-w1 node-role.kubernetes.io/worker=
$ kubectl --context admin@c2-cluster get no
NAME STATUS ROLES AGE VERSION
c2-m1 Ready control-plane 35m v1.28.1
c2-w1 Ready worker 2m37s v1.28.1
```
* 在 route2 把 kubeconfig 複製到 route1 上,並且宣告 kubeconfig 環境變數,需要可以訪問兩個叢集
```
$ scp ~/.kube/config 192.168.61.254:~/.kube/cluster2
# 登入 route1 並執行以下命令
$ echo 'export KUBECONFIG="/home/bigred/.kube/cluster1:/home/bigred/.kube/cluster2"' | sudo tee -a /etc/profile
```
* 重新 login
```
$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* admin@c1-cluster c1-cluster admin@c1-cluster default
admin@c2-cluster c2-cluster admin@c2-cluster default
```
## 安裝 metalb
* 以下指令皆在 route1 上執行
```
$ wget -qO - https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml | kubectl --context admin@c1-cluster apply -f -
$ wget -qO - https://raw.githubusercontent.com/metallb/metallb/v0.13.11/config/manifests/metallb-native.yaml | kubectl --context admin@c2-cluster apply -f -
```
* 設定 metalb 允許使用的 ip 範圍
```
$ echo '
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: mlb1
namespace: metallb-system
spec:
addresses:
- 192.168.247.220-192.168.247.230
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: mlb1
namespace: metallb-system' | kubectl --context admin@c1-cluster apply -f -
$ echo '
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: mlb1
namespace: metallb-system
spec:
addresses:
- 192.168.186.220-192.168.186.230
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: mlb1
namespace: metallb-system' | kubectl --context admin@c2-cluster apply -f -
```
## 啟動 clustermesh 功能
```
$ cilium clustermesh enable --context admin@c1-cluster --service-type LoadBalancer
$ cilium clustermesh enable --context admin@c2-cluster --service-type LoadBalancer
```
* 檢查 clustermesh 功能是否正常
```
$ cilium status --context admin@c1-cluster
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode)
\__/¯¯\__/ Hubble Relay: disabled
\__/ ClusterMesh: OK
Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2
Deployment clustermesh-apiserver Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium-operator Running: 2
clustermesh-apiserver Running: 1
cilium Running: 2
Cluster Pods: 4/4 managed by Cilium
Helm chart version: 1.14.2
Image versions cilium quay.io/cilium/cilium:v1.14.2@sha256:6263f3a3d5d63b267b538298dbeb5ae87da3efacf09a2c620446c873ba807d35: 2
cilium-operator quay.io/cilium/operator-generic:v1.14.2@sha256:52f70250dea22e506959439a7c4ea31b10fe8375db62f5c27ab746e3a2af866d: 2
clustermesh-apiserver quay.io/coreos/etcd:v3.5.4@sha256:795d8660c48c439a7c3764c2330ed9222ab5db5bb524d8d0607cac76f7ba82a3: 1
clustermesh-apiserver quay.io/cilium/clustermesh-apiserver:v1.14.2@sha256:0650beac6633a483261640b6539c9609f5a761f4ab4504fd1e6ffe7f2bb82e9a: 1
$ cilium status --context admin@c2-cluster
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: disabled (using embedded mode)
\__/¯¯\__/ Hubble Relay: disabled
\__/ ClusterMesh: OK
Deployment cilium-operator Desired: 2, Ready: 2/2, Available: 2/2
DaemonSet cilium Desired: 2, Ready: 2/2, Available: 2/2
Deployment clustermesh-apiserver Desired: 1, Ready: 1/1, Available: 1/1
Containers: cilium Running: 2
cilium-operator Running: 2
clustermesh-apiserver Running: 1
Cluster Pods: 4/4 managed by Cilium
Helm chart version: 1.14.2
Image versions cilium quay.io/cilium/cilium:v1.14.2@sha256:6263f3a3d5d63b267b538298dbeb5ae87da3efacf09a2c620446c873ba807d35: 2
cilium-operator quay.io/cilium/operator-generic:v1.14.2@sha256:52f70250dea22e506959439a7c4ea31b10fe8375db62f5c27ab746e3a2af866d: 2
clustermesh-apiserver quay.io/cilium/clustermesh-apiserver:v1.14.2@sha256:0650beac6633a483261640b6539c9609f5a761f4ab4504fd1e6ffe7f2bb82e9a: 1
clustermesh-apiserver quay.io/coreos/etcd:v3.5.4@sha256:795d8660c48c439a7c3764c2330ed9222ab5db5bb524d8d0607cac76f7ba82a3: 1
```
## 連接集群
```
$ cilium clustermesh connect --context admin@c1-cluster --destination-context admin@c2-cluster
```
* 檢查兩個叢集是否連接
```
$ cilium clustermesh status --context admin@c1-cluster
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 192.168.247.220:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- cilium-2: 2/2 configured, 2/2 connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
$ cilium clustermesh status --context admin@c2-cluster
✅ Service "clustermesh-apiserver" of type "LoadBalancer" found
✅ Cluster access information is available:
- 192.168.186.220:2379
✅ Deployment clustermesh-apiserver is ready
✅ All 2 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- cilium-1: 2/2 configured, 2/2 connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
```
## 測試
```
$ kubectl --context admin@c1-cluster run web --image=nginx
$ kubectl --context admin@c1-cluster get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web 1/1 Running 0 27s 10.11.2.154 w1 <none> <none>
```
* 在 c2 叢集的 pod 可以用 pod 的 ip curl 到 c1 的 pod
```
$ kubectl --context admin@c2-cluster run alp -it --image=alpine
/ # apk add curl
/ # curl 10.11.2.154
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
```
### 測試服務 loadbalance 與 failover 轉移

```
$ echo '
apiVersion: apps/v1
kind: Deployment
metadata:
name: s1.dep
spec:
replicas: 1
selector:
matchLabels:
app: s1.dep
template:
metadata:
labels:
app: s1.dep
spec:
containers:
- name: app
image: quay.io/flysangel/image:app.golang' | kubectl --context admin@c1-cluster apply -f -
$ echo '
apiVersion: v1
kind: Service
metadata:
name: s1
annotations:
io.cilium/global-service: "true" # 啓用全局負載平衡
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: s1.dep
type: ClusterIP' | kubectl --context admin@c1-cluster apply -f -
```
```
$ echo '
apiVersion: apps/v1
kind: Deployment
metadata:
name: s1.dep
spec:
replicas: 1
selector:
matchLabels:
app: s1.dep
template:
metadata:
labels:
app: s1.dep
spec:
containers:
- name: app
image: quay.io/flysangel/image:app.golang' | kubectl --context admin@c2-cluster apply -f -
$ echo '
apiVersion: v1
kind: Service
metadata:
name: s1
annotations:
io.cilium/global-service: "true" # 啓用全局負載平衡
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: s1.dep
type: ClusterIP' | kubectl --context admin@c2-cluster apply -f -
```
* 檢查兩個叢集服務狀態
```
$ kubectl --context admin@c1-cluster get po,svc
NAME READY STATUS RESTARTS AGE
pod/s1.dep-56657c4d58-gp6v5 1/1 Running 0 119s
pod/web 1/1 Running 0 5h42m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.12.0.1 <none> 443/TCP 6h27m
service/s1 ClusterIP 10.12.25.212 <none> 80/TCP 79s
$ kubectl --context admin@c2-cluster get po,svc
NAME READY STATUS RESTARTS AGE
pod/alp 1/1 Running 1 (5h42m ago) 5h42m
pod/s1.dep-56657c4d58-gzxgx 1/1 Running 0 72s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.22.0.1 <none> 443/TCP 6h10m
service/s1 ClusterIP 10.22.41.26 <none> 80/TCP 41s
```
* 啟動一個測試用 pod ,他會 loadbalance 到兩邊的服務
```
$ kubectl --context admin@c1-cluster run test -it --image=alpine
/ # apk add curl
/ # curl -w "\n" http://s1/hostname
{"message":"s1.dep-56657c4d58-gzxgx"}
/ # curl -w "\n" http://s1/hostname
{"message":"s1.dep-56657c4d58-gp6v5"}
```
* 刪除其中一個 deployment ,驗證服務是否有中斷
```
$ kubectl --context admin@c2-cluster delete deploy s1.dep
deployment.apps "s1.dep" deleted
$ kubectl --context admin@c1-cluster exec -it test -- sh
/ # apk add curl
/ # curl -w "\n" http://s1/hostname
{"message":"s1.dep-56657c4d58-gp6v5"}
/ # curl -w "\n" http://s1/hostname
{"message":"s1.dep-56657c4d58-gp6v5"}
```