K8s "Coredns CrashLoopBackOff", How to solved it ?

# K8s "Coredns CrashLoopBackOff", How to solved it ? ###### tags: `kubernetes` --- ### 大綱這篇主要會提到如何解決`coredns CrashLoopBackOff`的問題當你在安裝 k8s 時，若還未安裝 k8s 的人可以參照以下[網址](https://hackmd.io/@110510549/HJ15fWpiD)進行安裝。 --- ### Coredns Crashloopbackoff 當你查看 pod 狀態時，若 coredns 呈現`CrashLoopBackOff` ``` root@vm1:/run/flannel# kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE coredns-f9fd979d6-29866 0/1 CrashLoopBackOff 10 18h coredns-f9fd979d6-bdpgc 0/1 CrashLoopBackOff 220 18h etcd-vm1 1/1 Running 0 18h kube-apiserver-vm1 1/1 Running 0 18h kube-controller-manager-vm1 1/1 Running 0 18h kube-flannel-ds-fdltm 1/1 Running 0 18h kube-flannel-ds-m5kw9 1/1 Running 0 18h kube-flannel-ds-ntwnm 1/1 Running 0 18h kube-proxy-m7997 1/1 Running 0 18h kube-proxy-qp8vf 1/1 Running 0 18h kube-proxy-r6thx 1/1 Running 0 18h kube-scheduler-vm1 1/1 Running 0 18h ``` 可以透過`describe`查看coredns詳細資訊 ``` kubectl describe pods -n kube-system coredns-f9fd979d6-29866 kubectl describe pods -n kube-system coredns-f9fd979d6-bdpgc ``` 而當我進行查找`coredns-f9fd979d6-29866 `的資訊時發現到: ``` root@vm1:/home/user# kubectl describe pod -n kube-system coredns-f9fd979d6-29866 Name: coredns-f9fd979d6-29866 Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Node: vm3/192.168.102.145 Start Time: Tue, 26 Jan 2021 01:51:22 -0800 Labels: k8s-app=kube-dns pod-template-hash=f9fd979d6 Annotations: <none> Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/coredns-f9fd979d6 Containers: coredns: Container ID: Image: k8s.gcr.io/coredns:1.7.0 Image ID: Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-zbhf5 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false coredns-token-zbhf5: Type: Secret (a volume populated by a Secret) SecretName: coredns-token-zbhf5 Optional: false QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SandboxChanged 9m24s (x60268 over 17h) kubelet Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 4m24s (x60528 over 17h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e6c92e05e425808acf04778aff55258785edf4daabc3e6eafa64f697d60e4c55" network for pod "coredns-f9fd979d6-29866": networkPlugin cni failed to set up pod "coredns-f9fd979d6-29866_kube-system" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24 ``` > network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.1.1/24 透過這個錯誤訊息我可以知道 flannel 當初在設定 ip address 有跑掉，因此可先觀察 master 以及其他的 worker node 的 IP，透過`ifconfig`。 **其中我的 master 並沒有`cni0`的網卡**，因此可以透過該指令生成`cni0`這張網卡 ``` ip link add cni0 type bridge ``` 並啟動該網卡 ``` ip link set cni0 up ``` 但由於要讓 flannel 重新設定 IP，因此我們必須將`cni0`網卡關閉服務 ``` ifconfig cni0 down ``` 初始化k8s (master & workers) ``` kubeadm reset ``` k8s 安裝以及網路設定 (master) ``` kubeadm init --apiserver-advertise-address=192.168.102.146 --pod-network-cidr=10.244.0.0/16 ``` flannel CNI Interface 設定 ``` kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml ``` 再次查看 pod 狀態，可以發現到 flannel 已經正確地將 IP 安置在對的機器上(master)，因此 coredns 狀態為`running` ``` root@vm1:~/.kube# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-f9fd979d6-8lqr5 1/1 Running 0 82s kube-system coredns-f9fd979d6-w7gpb 1/1 Running 0 82s kube-system etcd-vm1 1/1 Running 0 92s kube-system kube-apiserver-vm1 1/1 Running 0 92s kube-system kube-controller-manager-vm1 1/1 Running 0 92s kube-system kube-flannel-ds-c5lw9 1/1 Running 0 31s kube-system kube-flannel-ds-jc929 1/1 Running 0 31s kube-system kube-flannel-ds-vd6kb 1/1 Running 0 31s kube-system kube-proxy-bqmhs 1/1 Running 0 60s kube-system kube-proxy-pxtk4 1/1 Running 0 82s kube-system kube-proxy-x8vgq 1/1 Running 0 54s kube-system kube-scheduler-vm1 1/1 Running 0 92s ``` --- ### 目前最佳解法 - Crashloopbackoff 1. 編輯: `/etc/resolv.conf` ``` nameserver 8.8.8.8 nameserver 8.8.4.4 ``` 2. 重啟配置檔案 + Docker 服務: ``` systemctl daemon-reload systemctl restart docker ``` --- ### Reference 1. https://blog.csdn.net/Wuli_SmBug/article/details/104712653 2. https://zhuanlan.zhihu.com/p/110648535 3. https://blog.tomy168.com/2019/08/centos-76-kubernetes.html 4. https://stackoverflow.com/questions/53559291/kubernetes-coredns-in-crashloopbackoff 5. https://github.com/calebhailey/homelab/issues/3 6. https://kubernetes.io/zh/docs/concepts/scheduling-eviction/taint-and-toleration/